Another good reason to use 64-bit installations: Large File Support headaches

A couple of months ago I wrote about why I made my router a 64-bit install listing a series of reasons why 64-bit hardened systems are safer to manage than 32-bit ones, mostly because of the feature set of the CPUs themselves. What I didn’t write about that time, though, is the fact that 64-bit installs also don’t require to deal with the curse of large file support (LFS).

It was over two years ago I last wrote about this and at the time my motivation was mostly drained by a widely known troll, insisting that I got my explanation wrong. Just for the sake of not wanting to repeat the same pantomime, I’d like to thank Lars for actually getting me a copy of Advanced Programming in the Unix Environment so that I can actually reference said troll with the pages where the diagrams he referred to are: 106 to 108. And there is nothing there to corroborate his views against mine.

But now, let’s take a few steps back and let’s look at what I’m talking about altogether.

What is the large file support? It is a set of interfaces designed to work around the limits imposed by the original design of the POSIX API for file support on 32-bit systems. The original implementations of functions like open(), stat(), fseeko() and so on was designed using 32-bit data types, either signed or unsigned depending on the use case. This has the unfortunate effect of limiting a number of attributes to that boundary; the most obvious problem is the size of the files themselves: you cannot use open() to open a descriptor to a file that is bigger than 2GB, as then the offsets would overflow. The inability to process files bigger than 2GB by some of your software isn’t, though, that much of a problem – after all, not all software can work with such files within reasonable resource constrain – but that’s not the worst problem you have to consider.

Because of this limit on file size, the new set of interfaces has been always called “large file”, but the name itself is a bit of a misnomer; this new set of interfaces, with extended 64-bit parameters and data fields, is required for operating on large file systems as well. I might not have expressed it in the most comprehensible of terms two years ago, so let’s here it from scratch again.

In a filesystem, the files’ data and meta-data is tied to structures called inodes; each inode has an individual number; this number is listed within the content of a directory to link that to the files it contains. The number of files that can be created on a filesystem is limited by the number of unique inode numbers that the filesystem is able to cope with — you need at least one inode per file; you can check the status with df -i. This amount is in turn tied both to the size of the datafield itself, and to the data structure used to look up the location of the inode over the filesystem. Because of this, the ext3 filesystem does not even reach the 32-bit size limit. On the other hand, both XFS and ext4, using more modern data structures, can reach that limit just fine… and they are actually designed to overcome it altogether.

Now, the fact that they are designed to support a 64-bit inode number field does not mean that they’ll always will; for what it’s worth, XFS is designed to support block sizes over 4KiB, up to 64KiB, but the Linux kernel does not support that feature. On the other hand, as I said, the support is there to be used in the future. Unfortunately this cannot be feasibly done until we know for sure that the userland software will work with such a filesystem. It is one thing to be unable to open a huge file, it is another to not being able to interact in any way with files within a huge filesystem. Which is why both Eric and me in the previous post focused first off on testing what software was still using the old stat() calls with the data structure with a 32-bit inode number field. It’s not about the single file size, it’s a matter of huge filesystem support.

Now, let’s wander back to why I wanted to go back at this topic. With my current line of work I discovered at least one package in Gentoo (bsdiff) that was supposed to have LFS support, but didn’t because of a simple mistake (append-lfs-flags acts on CPPFLAGS but that variable wasn’t used in the build at all). I thought a bit about it, and there are so many ways to sneak in a mistake that would cause a package to lose LFS support even if it was added at first. For instance for a package based on autotools, using AC_SYS_LARGEFILE to look for the proper largefile support, is easy to forget including config.h before any other system library header, and when that happens, the largefile support is lost.

To make it easier to identify packages that might have problems, I’ve decided to implement a tool for this in my Ruby-Elf project called verify-lfs.rb which checks for the presence of non-LFS symbols, as well as a mix of both LFS and non-LFS interfaces. The code is available on Gitorious, although I have yet to write a man page, and I have to add a recursive scan option as well.

Finally, as the title suggest, if you are using a 64-bit Linux system you don’t have to even think about this at all: modern 64-bit architectures define the original ABI as 64-bit already, making all the largefile support headaches irrelevant. The same goes for FreeBSD as well, as they implemented the LFS interface as their only interface with version 5, avoiding the whole mess of conditionality.

I’m seriously scared of what I could see if I were to run my script over the (32-bit) tinderbox. Sigh.

Who wants to support largefile?

This post is inspired by a post of Eric Sandeen, whose blog I read last night after discovering we share an interest in making software build in parallel.

A little background for those who don’t know the issue I’m going to talk about. Classically, inode numbers and offsets were 32-bit values, but as you might guess nowadays this cannot be true, files bigger than 2GB (the highest offset that 32-bit can represent) are quite common, just think of DVD images, or even better of BluRay disks, 50GB are huge), and modern filesystems (as Eric points out: XFS, btrfs and ext4) have or might have 64-bit inode numbers. Since changing the size of types would have broken ABI compatibility, GNU libc, as well as other libraries, added support for the so-called “largefile” mode. In largefile mode, the standard file operations have types with 64-bit size. The way this is implemented is by replacing calls like open() or stat() with 64-bit variants, called open64() and stat64(). Other operating systems like FreeBSD broke ABI compatibility and only have 64-bit interfaces. On new systems that are natively 64-bit, like AMD64, the new 64-bit interface is enabled by default, so the 64-bit specific interface is not needed.

Now since the two interfaces are, well, different interfaces, the only moment when they can be switched is at build time, indeed, you need to pass some compiler defines so that it replaces he calls at buildtime, and thus make use of either the old or the new largefile interface. Most packages you can think of are probably using largefiles already, some conditionally, some unconditionally as needed, and some unconditionally, needed or not just to be safe. The problem is that not all software can deal with largefile properly as it is.

The usual way to discover a package does not support largefile is watching it fail on a >2GB file. The problem is that it’s not so nice since it means you have to fix the problem when it becomes a problem, while it would be much better to be able to identify the problem earlier, so that it can be solved before it becomes a true problem. But Eric’s post has given me an idea; I asked him for the script (which you can find attached to this post if Typo is not going to do some funny thing update: I finally was able to make lighttpd serve the script; for once Typo was innocent) and I used the same logic to identify packages using 32-bit interfaces with scanelf after portage installs it.

This is not yet a complete test since I’m forcing it to work only on x86 systems (I wanted to exclude AMD64), and it only checks stat symbols, it should check open, read write and all the other symbols too. More importantly, this is not going to work with the scanelf that you got installed by portage right now (0.1.18), since I had to fix it a bit to properly handle regexp matching and multiple symbols matching. So if you want to try this you’ll probably have to wait till I release a 0.1.19 version. At any rate, the code in the bashrc file is just the following, for now:

post_src_install() {
    scanelf -q -F "#s%F" -R -s '-__xstat,-__lxstat,-__fxstat' "${D}" > "${T}"/flameeyes-scanelf-stat64.log
    if [[ -s "${T}"/flameeyes-scanelf-stat64.log ]]; then
    ewarn "Flameeyes QA Warning! Missing largefile support"
    cat "${T}"/flameeyes-scanelf-stat64.log >/dev/stderr
    fi
}

Please don’t rush submitting bugs for these things though; these are useful to know and they should probably be fixed, but please send the patches upstream rather than directly to Gentoo, for now.