Who wants to support largefile?

This post is inspired by a post of Eric Sandeen, whose blog I read last night after discovering we share an interest in making software build in parallel.

A little background for those who don’t know the issue I’m going to talk about. Classically, inode numbers and offsets were 32-bit values, but as you might guess nowadays this cannot be true, files bigger than 2GB (the highest offset that 32-bit can represent) are quite common, just think of DVD images, or even better of BluRay disks, 50GB are huge), and modern filesystems (as Eric points out: XFS, btrfs and ext4) have or might have 64-bit inode numbers. Since changing the size of types would have broken ABI compatibility, GNU libc, as well as other libraries, added support for the so-called “largefile” mode. In largefile mode, the standard file operations have types with 64-bit size. The way this is implemented is by replacing calls like open() or stat() with 64-bit variants, called open64() and stat64(). Other operating systems like FreeBSD broke ABI compatibility and only have 64-bit interfaces. On new systems that are natively 64-bit, like AMD64, the new 64-bit interface is enabled by default, so the 64-bit specific interface is not needed.

Now since the two interfaces are, well, different interfaces, the only moment when they can be switched is at build time, indeed, you need to pass some compiler defines so that it replaces he calls at buildtime, and thus make use of either the old or the new largefile interface. Most packages you can think of are probably using largefiles already, some conditionally, some unconditionally as needed, and some unconditionally, needed or not just to be safe. The problem is that not all software can deal with largefile properly as it is.

The usual way to discover a package does not support largefile is watching it fail on a >2GB file. The problem is that it’s not so nice since it means you have to fix the problem when it becomes a problem, while it would be much better to be able to identify the problem earlier, so that it can be solved before it becomes a true problem. But Eric’s post has given me an idea; I asked him for the script (which you can find attached to this post if Typo is not going to do some funny thing update: I finally was able to make lighttpd serve the script; for once Typo was innocent) and I used the same logic to identify packages using 32-bit interfaces with scanelf after portage installs it.

This is not yet a complete test since I’m forcing it to work only on x86 systems (I wanted to exclude AMD64), and it only checks stat symbols, it should check open, read write and all the other symbols too. More importantly, this is not going to work with the scanelf that you got installed by portage right now (0.1.18), since I had to fix it a bit to properly handle regexp matching and multiple symbols matching. So if you want to try this you’ll probably have to wait till I release a 0.1.19 version. At any rate, the code in the bashrc file is just the following, for now:

post_src_install() {
    scanelf -q -F "#s%F" -R -s '-__xstat,-__lxstat,-__fxstat' "${D}" > "${T}"/flameeyes-scanelf-stat64.log
    if [[ -s "${T}"/flameeyes-scanelf-stat64.log ]]; then
    ewarn "Flameeyes QA Warning! Missing largefile support"
    cat "${T}"/flameeyes-scanelf-stat64.log >/dev/stderr

Please don’t rush submitting bugs for these things though; these are useful to know and they should probably be fixed, but please send the patches upstream rather than directly to Gentoo, for now.