The “sendfile” system call is a way to send file contents directly out to a network socket. This saves time in userspace (so it doesn’t have to copy buffer contents around), and was one of the reasons I upgraded kernel.org‘s Apache to version 2.x at the end of 2003 (because version 1.x doesn’t have sendfile support). A few weeks ago, one of the other kernel.org admins discovered that files greater than 2G were not being delivered by Apache.
I had a lot of fun tracking down the issue. The “amount to send” argument in the sendfile call is a “size_t”, which is basically an “unsigned long”. Having a 2G limit didn’t make sense, since even with 32 bits, that should be a 4G limit. However, the kernel.org servers are both 64bit, so as it turns out, “size_t” is a full 64 bits. After writing a quick test, I was able to verify that it was, indeed, a 31 bit limit on both 64 bit and 32 bit kernels. Peter Anvin took it from here, and tracked down the origin of the problem: filesystem operations greater than 31 bits in offset were being rejected deep in the kernel. He suggested truncating the request instead of returning a failure.
Seems as though Linus decided to limit the size of filesystem calls to make sure there aren’t security problems (signed vs unsigned overflows) in the various filesystem drivers, while people using the Linux kernel migrate more from 32bit to 64bit systems. Personally, I don’t agree with this, but from a practical stand-point, it hardly makes a difference. Instead of sending all 4G out the pipe and returning to user space, it just returns twice, sending 2G per call.
© 2006, Kees Cook. This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License.