A few more reason why FatELF is not

Note that the tone of this blog post reflects on a Flameeyes of 2010, and does not represent the way I would be phrasing it in 2020 as I edit typos away. Unfortunately WordPress still has no way to skip the social sharing of the link after editing. Sorry, Ryan!

Seems like people keep on expecting Ryan Gordon’s FatELF to solve all the problems. Today I was told I was being illogical by writing that it has no benefits. Well, I’d like to reiterate that I’m quite sure of what I’m saying here! And even if this is likely going to be a pointless expedient, I’ll try to convey once again why I think even just by discussing that we’re wasting time and resources.

I have to say, most of the people pretending that FatELF is useful seem to be expert mirror-climber, so they changed so many ideas on how it should be used, where it should be used, and which benefits it has, that this post will jump from point to point quite confusingly. I’m sorry about that.

First of all, let me try to make this clear: FatELF is going to do nothing to make cross-arch development easier. If you want easier cross-arch or cross-OS development, you go with interpreted or byte-compiled languages such as Ruby, Python, Java, C#/.NET, or whatever else. FatELF focuses on ELF files, which are produced by the C, C++, Fortran compilers and the like. I can’t speak for Fortran as it’s a language that I do not know, but C and C++ datatypes are very much specific to architecture, operating system, compiler, heck even version of the libraries, system and 3rdparty! You cannot solve those problems with FatELF, as whatever benefits it has, they can only appear after the build. But at any rate, let’s proceed.

FatELF supposedly make build easier, but it doesn’t. If you really think so you have never ever tried building something for Apple’s Universal Binary. Support for autoconf and most likely any other build system along those lines, simply suck. The problem is that whatever results you get from a test in one architecture might not have the same result in the other. And Apple’s Universal Binary only encompass an operating system that has been developed without thinking too much of compatibility with others, and was under the same tight controls, where the tests for APIs are going to be almost identical for all the arches. (You might not know, but Linux’s syscall numbers are not the same across architectures; the reason is that they are actually designed to partly maintain compatibility with proprietary (and non) operating systems that originated on that architecture and were mainstream at the time. So for instance on IA-64 the syscall numbers are compatible with HP-UX, while on SPARC are compatible with Solaris, for the most part.)

This does not even come close to consider the mess of the toolchain. Of course you could have a single toolchain patched to emit code for a number of architectures, but is that going to work at all? Given that I have actually worked as a consultant building cross-toolchains for embedded architectures, I can tell you that it’s difficult enough to get one working. Count the need for patches for specific architectures, and you might start to get part of a pictures. While binutils already theoretically supports a “multitarget” build that adds in one build the support for all the architectures that they have written code for, doing the same for gcc is going to be a huge mess. Now you could (as I suggested) write a cc frontend that takes care of compiling the code for multiple architectures at the time, but as I said above it’s not easy to ensure the tests are actually meaningful between architectures, let alone operating systems.

FatELF cannot share data sections. One common mistake to make thinking about FatELF is that it only requires duplication of executable sections (.text), but that’s not the case. Data sections (.data, .bss, .rodata) are dependent on the data types, which as I said above are architecture dependent, and operating system dependent, and even library dependent. They are part of the ABI; each ELF you build for a number of target arches right now is going to have its own ABI, so the data sections are not shared. The best I can think of, to reduce this problem, is to make use of -fdata-sections and then merge sections with identical content; it’s feasible, but I’m sure that at the best of cases is going to create a problem with caching of near objects, and the best it’s going to cause misalignment of data to be read in the same pass. D’uh!

Just so you know how variable are the data sections: even though you could just use #ifdef, both the Linux kernel and the GNU C Library (and most likely uClibc as well even though I don’t have it around to double-check it) install different sets of headers; this should be an indication of how different the interfaces are between them.

Another important note here: as far as I could tell from the specifics that Ryan provided (I really can’t be arsed to look back at them right now), FatELF files are not interpolated, mixing the sections of them, but merged in a sort-of archive, with the loader/kernel deciding which parts of it will be loaded as a normal ELF. The reason for this decision likely lies in one tiny winy detail: ELF files were designed to be mapped straight from disk to data structures in memory; for this reason, ELF have classes and data orders. For instance x86-64 uses ELF of class 64 and order LSB (Least-significant bit first, or little-endian) while PPC uses ELF of class 32 and order MSB (Most-significant bit first, or big-endian). It’s not just a matter of the content of .text but it’s also pervasive in the index structures within the ELF file, and it is so for performance reasons. Having to swap all the data or deal with different sizes is not something you want to do in the kernel loader.

When do you distribute a FatELF? This is one tricky question because both Ryan and various supporters change opinion here more often they change socks. Ryan said that he was working on getting an Ubuntu built entirely of FatELFs, leaving to intend that distributions would be using FatELF for their packaging. Then he said that it wasn’t something for packagers but for Indipendent Software Vendors (ISVs). Today I was told that FatELF simplifies the distribution by distributing a single executable that can be “executed in place” rather than having to provide two of them.

Let’s be clear: distributors are very unlikely to provide FatELF binaries in their packages. Even though it might sound tenting to implement multilib with them, it’s going to be a mess just the same; it might reduce the size of the whole archive of binary packages, because you share the non-ELF files between architectures, but it’ll increase the used traffic to download them, and while disk space is mostly getting cheaper and cheaper, network traffic is still a rare commodity. Even more so, users won’t like to have installed stuff for architectures that they don’t use, and will likely ask for a way to clean them up, at which point they’ll wonder why they are downloading it at all. Please note that while Apple did their best to convince people to use Universal Binary, a number of software was produced to strip the alternative architecture files from their executable files at all.

Today I was offered, as I said, that it is easier to distribute one executable file rather than two. But when are you found doing that at all? Quite rarely; ISVs provide pre-defined packaging, usually in form of binary packages for the particular distribution (this already makes it a multiple-file download). Most complex software will be in an archive anyway because you don’t just build everything in but rather put it in different files: icons, images, … even Java software that actually use archives as main object format (JAR files are ZIP files), ships in archives installing separate data files, multiple JARs, wrapper scripts and so on so forth. I was also told that you could strip the extra architectures at install time, but if you do so, you might as well decide which of multiple files to install, making it moot to use a fat binary at all.

All in all, I still have to see one use case that actually can be solved by FatELF better than a few wrapper scripts and an archive. Sure you can create some straw-man arguments where FatELF works and scripts don’t, such as the “execute in-place” idea above, but really tell me when was the last time you needed that? Please also remember that while “changes happen everytime”, we’re talking about changing in a particularly invasive way a number of layers:

  • the kernel;
  • the loader;
  • the C library (separated from the loader!);
  • the compiler, has almost to be rewritten;
  • the linker, obviously;
  • the tools to handle the files;
  • all the libraries that change API among architectures.

Even if all of this became stock, there’s a huge marginal cost here, and it’s not going to happen anytime soon. And even if it did, how much time is going to take before it gets mainstream enough to be used by ISVs? There are some that sill support RHEL3.

There are “smaller benefits”, as, again, I was told before, and those are not nothing. Maybe that’s the case but the question is “is it worth it?”. Once in the kernel is not going to take much work at runtime, but is it work the marginal cost of implementing all that stuff and maintaining it? I most definitely don’t think so. I guess the only reason why Apple coped with that is that they had most of the logic code already developed and laying around from when they transitioned from M68K to PowerPC.

I’m sorry to burst your bubbles, but FatELF was an ill-conceived idea, that is not going to gain traction for a very good reason: it makes no sense! Any of the use-cases I have read up to now are straw-men, that either resemble what OSX does or what Windows does. But Linux is neither. Now, let’s move on, please?

The why and how of RPATH

This post is brought to you by a conversation with Fabio, which actually reminded me of an older conversation I had with someone else (exactly whom, right now, escapes me) about the ability to inject RPATH values into already-built binaries. I’m sorry to have forgotten who asked me about that, I hope he or she won’t take it bad.

But before I go to the core of the problem, let me try to give a brief introduction of what we’re talking about here, because jumping straight to talk about injection is going to be too much for most of my readers, I’m sure. Even though, this whole topic reconnects with multiple other smaller topics I discussed about in the past on my blog, so you’ll see a few links here and there for that.

First of all, what the heck is RPATH? When using dynamic linking (shared libraries), the operating system need to know where to look for the libraries an executable uses; to do so, each operating system has one or more PATH variables, plus eventual configuration files, that are used to look up the libraries; on Windows, for instance, the same PATH variable that is used to find the commands is used to load the libraries; and the libraries are looked for in the same directory where the executable is first of all. On Unix, the commands and the libraries use distinct paths, by design, and the executable’s directory is not searched for; this is also because the two directories are quite distinct (/usr/bin vs /usr/lib as an example). The GNU/Linux loader (and here the name is proper, as the loader comes out of GLIBC — almost identical behaviour is expected by uClibc, FreeBSD and other OSes but I know that much for sure) differs extensively from the Sun loader; I say this because I’ll introduce the Sun system later.

In your average GNU/Linux system, including Gentoo, the paths where to look up the libraries in are defined in the /etc/ld.so.conf file; prepended to that list, the LD_LIBRARY_PATH variable is used. (There is a second variable, LDPATH LIBRARY_PATH, that tells the gcc frotnend to the linker where to look for libraries to link to at build time, rather than to load — update: thanks to Fabio who pointed me I got the wrong variable; LDPATH is used by the env-update script to set the proper search paths in the ld.so.conf file ). All the executables will look in all these paths for both the libraries they link to directly and for non-absolute dlopen() calls. But what happens with private libraries — libraries that are shared only among a small number of executables coming from the same package?

The obvious choice is to install them normally in the same path as the general libraries; this does not require playing with the search paths at all, but it causes two problems: the build-time linker will still find them during link time, and it might not be what you want and it increases the number of files present in the single directory (which means accessing its content slows down, little by little). The common alternative approach is installing it in a sub-directory that is specific to the package (automake already provides a pkglib installation class for this type of libraries). I already discussed and advocated this solution so that internal libraries are not “shown” to the rest of the software on the system.

Of course, adding the internal library directories to the global search path also means slowing down the libraries’ load, as more directories are being searched when looking for the libraries. To solve this issue, runpath first and rpath now is used. The DT_RPATH is a .dynamic attribute that provides further search paths for the object it is listed in (it can be used both for executables and for shared objects); the list is inserted in-between the LD_LIBRARY_PATH environment variable and the search libraries from the /etc/ld.so.conf file. In that paths, there are two special cases: $ORIGIN is expanded to be the path of the directory where the object is found, while both empty and . values in the list are meant to refer to the current work directory (so-called insecure rpath).

Now, while rpath is not a panacea and also slightly slow down the load of the executable, it should have decent effects, especially by not requiring further symlinks to switch among almost-equivalent libraries that don’t have ABIs that stable. They also get very useful when you want to build a software you’re developing so that it loads your special libraries rather than the system one, without relying on wrapper scripts like libtool does.

To create an RPATH entry, you simply tell it to the linker (not the compiler!), so for instance you could add to your ./configure call the string LDFLAGS=-Wl,-rpath,/my/software/base/my/lib/.libs to build and run against a specific version of your library. But what about an already-linked binary? The idea of using RPATHs make it also for a nicer handling of binary packages and their dependencies, so there is an obvious advantage in having the RPATH editable after the final link took place… unfortunately this isn’t as easy. While there is a tool called chrpath that allows you to change an already-present RPATH, and especially to delete one (it comes handy to resolve insecure rpath problems), it has two limitations: the new RPATH cannot be longer than the previous one, and you cannot add a new one from scratch.

The reason is that the .dynamic entries are fixed-sized; you can remove an RPATH by setting its type to NULL, so that the dynamic loader can skip over it; you can edit an already-present RPATH by changing the string it points to in the string table, but you cannot extend neither .dynamic nor the string table itself. This reduces the usefulness of RPATH for binary package to almost nothing. Is there anything we can do to improve on this? Well, yes.

Ali Bahrami at Sun already solved this problem in June 2007, which means just over three years ago. They implemented it with a very simple trick that could be implemented totally in the build-time linker, without having to change even just a bit of the dynamic loader: they added padding!

The only new definition they had to add was DT_SUNW_STRPAD, a new entry in the .dynamic table that gives you the size of the padding space at the end of the string table; together with that, they added a number of extra DT_NULL entries in the same .dynamic table. Since all the entries in .dynamic are fixed sized, a DT_NULL can become a DT_RPATH without a problem. Even if some broken software might expect the DT_NULL at the end of the table (which would be wrong anyway), you just need to keep them at the end. All the ELF software should ignore the .dynamic entries they don’t understand, as long as they are in the reserved specific ranges, at least.

Unfortunately, as far as I know, there is no implementation of this idea in the GNU toolchain (GNU binutils is where ld is). It shouldn’t be hard to implement, as I said; it’s just a matter of emitting a defined amount of zeros at the end of a string table, and add a new .dynamic tag with its size… the same rpath command from OpenSolaris would probably be usable on Linux after that. I considered porting this myself, but I have no direct need for it; if you develop proprietary software for Linux, or need this feature for deployment, though, you can contact me for a quote on implementing it. Or you can do it yourself or find someone else.

More details about symbol collisions

You might remember that I was quite upset by the amount of packages that block each other when they install the same files (or files with the same name). It might not be obvious – and actually it’s totally non-obvious – why I would insist on these to be solved whenever possible, and why I’m still wishing we had a proper way to switch around alternative implementations without using blockers, so I guess I’ll explain it here, while I also discuss again the problem of symbol collisions, which is also one very nebulous problem.

I have to say, first, that the same problem with blockers is present also with conflicting USE requirements; the bottom-line is the same: there are packages that you cannot have merged at the same time, and that’s a problem for me.

The problem can probably be solved just as well by changing the script I use for gathering ELF symbol information to check for collisions, but since that requires quite a bit for work just to work around this trouble, I’m fine with accepting a subset of packages, and ranting about the fact that we have no way (yet) to solve the problem of colliding packages, and incompatible USE requirements.

The script, basically, roams the filesystem to gather all the symbols that are exported by various ELF files, both executables and libraries, saves them into a PostgreSQL database, and then a separate script combines that data to generate a list of possibly colliding symbols. The whole point of this script is to find possible hard-to-diagnose bugs due to unwanted symbol interposing: an internal function or datum replaced with another from another ELF object (shared library or executable) that use the same name.

This kind of work is, as you can guess by now, hindered by the inability of keeping all the packages merged at once because then I cannot compare the symbols between them, and at the same time, it’s also hindered by those bad packages that install ELF files in uncommon and wrong paths (like /usr/share) — running the script over the whole filesystem would solve that problem, but the required amount of time to be wasted to run it is definitely going to be a problem.

On a different note, I have to point out that not all the cases where two objects export the same symbol are mistakes. Sometimes you’re willingly interposing symbols, like we do with the sandbox and google-perftools do with malloc and mmap; some other times, you’re implementing a specific interface, which might be that of a plugin but might also be required by a library, like the libwrap interface for allow/deny symbols. In some cases, plugins will not interfere one with the other because they get loaded with the RTLD_LOCAL option that tells the runtime loader that the symbols are not to be exported.

This makes the whole work pretty cumbersome: some packages will have to be ignored altogether, others will require specific rules, others will have to be fixed upstream, and some are actually issues with bundled libraries. The whole work is long, tedious and will not bring huge improvements all around, but it’s a nice polishing action that any upstream should really consider to show that they do care about the quality of their code.

But if I were to expect all of them to start actually making use of hidden symbols and the like, I’d probably be waiting forever. For now, I guess I’ll have to start with making use of these ideas on the projects I contribute to, like lscube. It might end up getting used more often.

Garbage-collecting sections is not for production

Some time ago I wrote about using --gc-sections to avoid unused functions to creep into final code. Today instead I’d like to show how that can be quite a problem if it was used indiscriminately.

I’m still using at least for some projects the diagnostic of --gc-sections to identify stuff that is unused but still kept around. Today I noticed one bad thing with it and pulseaudio:

/usr/lib/gcc/x86_64-pc-linux-gnu/4.4.2/../../../../x86_64-pc-linux-gnu/bin/ld: Removing unused section '.data.__padsp_disabled__' in file 'pulseaudio-main.o'

The __padsp_disabled__ symbol is declared in main.c to avoid pulseaudio’s access to OSS devices to be wrapped around by the padsp script. When I first have seen this, I thought the problem was a missing #ifdef directive: if I didn’t ask for the wrapper, it might still have declared the (unused) symbol. That was not the case.

Looking at the code, I found what the problem was: the symbol is (obviously) never used by pulseaudio itself; it is, rather, checked through dlsym() by the DSP wrapper library. For this reason, for compiler and linker, the symbol looks pretty much unused, and when asking for it to be dropped explicitly, it is. Since the symbol is loaded via the runtime linker, neither building nor executing pulseaudio will have any problem. And indeed, the only problem would be when running pulseaudio as a children of padsp, and using the OSS output module (so not on most Linux systems).

This shows how just using -fdata-sections -ffunction-sections -Wl,--gc-sections is not safe at all and why you shouldn’t get excited about GCC and ld optimisations without understanding how they work in detail.

In particular, even I thought that it would be easier to work around than it actually seem to be: while GCC provides a used attribute that allows to declare a variable (or a function) as used even though the compiler can’t tell that by itself (it’s often used together with inline hand-written ASM the compiler doesn’t check for), this does not propagate to the linker, so it won’t save the section from being emitted. The only solution I can think of is adding one instruction that sets the variable to itself, but that’s probably going to be optimised away. Or giving a way for gcc to explicit that the section is used.

What distributions want

Or A 101 lesson on how to ensure that your software package is available to the distribution users (which, incidentally, are the Linux users; while already the conglomerate Linux marketshare is pretty low when compared with Windows and OS X, the marketshare of not-really-distributions like Slackware or Linux from scratch is probably so trivial that you don’t have to care about them most of the time. That, and their users are usually not so much scared about installing stuff by themselves.

I’m posting this quickie because I’d like to tell one other thing to Ryan… yes, you are a drama queen. And you proven it right with your latest rant and it really upsets me that instead of trying to understand the problem your solution seems to be closing yourslef even more inside your little world. It upsets me because I can see you as a capable programmer and I’d prefer your capacity being used for something that people can benefit from, rather than wasted on stuff pointless, like FatELF.

The fact that instead of trying to understand the technical points that me and others made, and tell us why you think they are not good enough, you’re just closing yourself further. By saying that “lots of people talked about it” you’re just proving what you’re looking for: fame and glory. Without actual substantial results to back it up. Just an hint: the people who matters aren’t those who continue saying “FatELF will make distributions useless, will make it possible to develop cross-platform software, will solve the world’s hunger”; the people who matters are those that review FatELF for its technical side, and most of us already deemed it pointless; I already explained what I think about it.

Any ISV that thinks FatELF will solve their cross-distribution or cross-architecture problems have no idea what an ELF file is; they don’t really understand the whole situation at all. I’m pretty sure there are such ISV out there… but I wouldn’t really look forward for them to decide what to put inside the kernel and the other projects.

Do you want your software (your games) to be available to as many people as possible? Start working with the freaking distributions! You don’t need to have mastered all the possible package managers, you don’t even need to know about any of them directly, but you got to listen if packagers ask for some changes. If a packager asks you to unbundle a library or allow selecting between bundled or system library; do it, they have their reasons and they know how to deal with eventual incompatibilities. If a packager asks you to either change your installation structure or at least make it flexible, that’s because with a very few exceptions, distributions are fine with following the FHS.

Take a look to “Distributions-friendly packages”: part 1 part 2 and part 3 .

But no, Ryan’s solution here is again taking cheap shots to distributions and packagers, without actually noticing that, after more than ten years distributions are not going away.

Oh and the first commenter who will try to say again that FatELF is the solution, can please tell me how’s that going to ensure that the people writing the code will understand the difference between little endian and big endian? Or that the size of a pointer is not always 32-bit? Count that in as a captcha; if you cannot give me an answer to those two questions, your comment supporting FatELF as The Solution will be deleted.

ELF should rather be on a diet

I’ve been first linked the FatELF project in late October by our very own solar; I wanted to write some commentary about it but I couldn’t find the time; today the news is that the author gave up on it after both Linux kernel and GLIBC developer dissed his idea. The post where he noted his intention to discontinue the project looks one drama-queen of a post regarding the idea of contributing to other projects… I say that because, well, it’s always going to be this way if you think about an idea, don’t discuss it before implementing, and then feel angry for the rejection when it comes. I’m pretty sure that no rejection was personal in this rejection, and I can tell you that what I would have written after reading about it the first time would have been “Nice Proof of Concept, but it’s not going to fly”.

Let’s first introduce the idea behind the project: to copy Apple’s “Universal Binaries”, that technique that allowed programs to run both on PPC-based Mac as well as the new Intel-based Mac when they decided to make the transaction, this time applying the same principle to the ELF files that are used on basically all modern UNIX and Unix-like systems (Linux, *BSD, Solaris). There is a strange list of benefits in the project’s homepage; I say strange because they really seem straw arguments for creating FatELF, since I rarely have seen this applied in real world.

Let’s be clear, when Ulrich Drepper (who’s definitely not the most charming developer in our community) says this:

Yes. It is a “solution” which adds costs in many, many places for a problem that doesn’t exist. I don’t see why people even spend a second thinking about this.

I’m not agreeing to the fact that nobody should have spent a second thinking about the idea; toying with ideas, even silly ideas like this one (because as you’ll soon see, this is a silly idea), is always worth it: it gives you an idea of how stuff works, they might actually lead somewhere, or they might simply give yo the sense of proportion of why they don’t work. But there are things to consider when doing stuff like this, and the first is that if there is a status quo, it might be worth discussing the reason of that status quo before going in full sprint and spend a huge amount of time to implement something, as the chance that’s just not going to work is quite high.

To make an example of another status quo-fiddling idea, you might remember Michael Meeks’s direct bindings for ELF files; the idea was definitely interesting, it proven quite fast as well, but it didn’t lead anywhere; Michael, and others including me, “wasted” time in testing it out, even though it was later blocked by Drepper with enough reasons and it’s no longer worked on. Let me qualify that “wasted” though: it was wasted only from the point of view of that particular feature, which led nowhere, but that particular work was what actually made me learn how the two linkers worked together, and got me interested in problems of visibility and cow as well as finding out one xine bug that would have been absolutely voodoo to me if I didn’t spend time learning about symbol resolution before.

Back to FatELF now: why do I think the idea is silly? Why am I agreeing with Drepper about the fact that it’s a solution with too high costs for the unrequested results? Well the first point to make is when Apple made the first step toward universal binaries; if you think the idea sprouted during the PPC to Intel transition, you’re wrong. As Wikipedia notes Apple’s first fat binary implementation dates back to 1994. During the M68K to PPC transition. Replicating the same procedure for an architecture change wasn’t extremely difficult to them to begin with, even though it wasn’t OSX that was used during that particular transition. The other fact is that the first Intel transition was – for their good or bad – a temporary one. As you can probably have noted, they are now transitioning from i386 software to x86-64 software (after my post on PIE you can probably guess why that’s definitely important to them).

But it goes much further than that: Apple has a long history of allowing users to port the content their computer from one to the next with each update, and at the same time they have a lot of third party providing software; since third parties started upgrading to universal binaries before Intel Macs were released for the users, if users kept up to date with the release, one they got their new Intel Mac they would just had to copy the content from the old to the new system and be done with it. This is definitely due to the target audience of Apple.

There is another thing to know about Apple and OS X, that you might not know about if you’ve never used a Mac: applications are distributed in bundles, which are nothing more than a directory structure, inside which the actual binary is hidden; inside the bundle you find all the resources that are needed for the program to run (translations, pictures, help files, generic data files, and so on). To copy an application you only have to copy the bundle, to remove almost all applications you just shove the bundle in the trash can. This forces distributions to happen in bundles as well, which is why Universal Binaries were so important to Apple: the same bundle had to work for all people so that it could still be copied identical between one computer to the other and work, no matter the architecture. This is also why, comparing the size of bundles built Universal, PPC-only and Intel-only, the first is not as big as the size of the other two: all the external resources are shared.

So back to Linux, and see how this applies: with a single notable exception all the Linux distributions out there use a more or less standard Filesystem Hierarchy Standard compatible layout (some use LSB-compatible layout, the two are not one and the same, but the whole idea is definitely similar). In such a setup, there are no bundles, and the executable code is already separated from the code that is not architecture-dependent (/usr/share) and thus shareable. So the only parts that cannot be shared, that FatELF would allow to be shared are the executable code parts, like /bin and /lib.

Now let’s start with understanding where the whole idea is going to be applied: first of all, Linux distributions, by their own design, have a central repository for software, which OS X does not have; and that central repository can be set up at installation time for getting the correct version of the software, without asking the user to know about the architecture by itself. The idea of using fat binaries to reduce the size of that repository is moot: the shareable data is already, for most distributions I know, shared in -noarch packages (arch-independent); the only thing you’d be able to spare would be the metadata of packages, which I’m quite sure for most “big” applications is not going to be that important. And on the other hand, the space you’d be saving on the repository side is going to be wasted by the users on their harddrive (which is definitely going to be disproportionally smaller) and by the bandwidth used to push the data around (hey, if even Google is trying to reduce the downloaded size fatelf is not only going against the status quo but also the technical trend!).

And while I’m quite sure people are going to say that once again, disk space is cheap nowadays, and thus throwing more disks at the problem is going to fix it, there is one place where it’s quite difficult to throw more space at it: CDs and DVDs, which is actually one of the things that FatELF is proposing to make easier, probably in light of users not knowing whether their architecture is x86, amd64 or whatever else. Well, this is already been tackled by projects such as SysRescueCD that provide two kernels and a single userland for the two architectures, given that x86-64 can run x86 code.

The benefits listed in FatELF’s page seem also to focus somewhat to the transition between one arch and the other, like it’s now happening between x86 and x86-64; sure it looks like a big transition and quite a few players in the market are striving to do their best to make the thing as smooth as possible, but either we start thinking of the new x86-64 as the arch, and keep x86 as legacy, or we’re going to get stuck in a transition state forever; Universal Binaries for Apple played a fundamental role in what has been a temporary transition, and one they actually completed quite fast: Snow Leopard does no longer support PPC systems, and everybody is expected the next iteration (10.7) to drop support for 32-bit Intel processors entirely to make the best use of the new 64-bit capabilities. Sure there could be some better handling of transitioning between architectures in Linux as well, especially for people migrating from one system to the other, but given the way distributions work, it’s much easier for a new install to pick up the home directories set up in the older system, import the configuration, and then install the same packages that are installed in the previous one.

After all, FatELF is a trade-off: you trade bigger binaries for almost-universal compatibility. But is the space the only problem at stake here? Not at all; to support something like FatELF you need changes at a high number of layers; the same project page shows that changes were needed in the Linux kernel, the C library (glibc only, but Linux supports uclibc as well), binutils, gdb, elfutils and so on. For interpreted language bindings you also have to count in changing the way Ruby, Python, Java, and the others load their libraries since they now hardcode the architecture information in the path.

Now, let’s get to the real only speakable benefit in that page:

A download that is largely data and not executable code, such as a large video game, doesn’t need to use disproportionate amounts of disk space and bandwidth to supply builds for multiple architectures. Just supply one, with a slightly larger binary with the otherwise unchanged hundreds of megabytes of data.

You might or might know that icculus.org where the FatELF project is hosted is the home of the Linux port of Quake and other similar games, so this is likely the only real problem that was, up to now, really come up before: having big packages for multiple arches that consists mostly of shareable data. As said before, distributions already have architecture-independent packages most of the time; it’s also not uncommon for games to separate the data from the engine source itself, since the engine is much more likely to change than the data (and at the same time, if you use the source version you still need the same data as the binary version). The easiest solution is thus to detach the engine from the data and get the two downloaded separately; I wonder what the issue is with that.

On the other hand, there is a much easier way to handle all this: ship multiple separate ELF binaries in the same binary package, then add a simple sh script that calls the right one for the current host. This is quite easy to do, and requires no change at any of the previously-noted layers. Of course, there is another point made on the FatELF project page that this does not work with libraries… but it’s really not that of an issue, since the script can also set LD_LIBRARY_PATH to point to the correct path for the current architecture as well. Again, this would solve the same exact problem for vendors without requiring any change at all in the layers of the operating system. It’s transparent, it’s easy, it’s perfectly feasible.

I hear already people complaining “but a single FatELF file would be smaller than multiple ELF files!”. Not really. What you can share between the different ELF objects, in theory, is still metadata only (and I’m not convinced by the project page alone that that’s what it’s going to do, it seems to me like it’s a sheer bundling of files together): SONAME, NEEDED entries and similar. Unless you also start bundling different operating systems together – which is what the project also seem to hint at – because in that case you also have no warranty that the metadata is going to be the same: the same code will require different libraries depending on the operating system it’s built for.

Generally, an ELF file is composed of executable code, data, metadata related to the ELF file itself, and then metadata related to the executable code (symbol tables, debugging information) and metadata related to the data (relocations). You can barely share the file’s metadata between architectures, you definitely cannot share it between operating systems as stated above (different SONAME rules, different NEEDED).

You could share string data, since that’s actually the same between different architectures and operating systems most of the time but that’s not really a good reason; you cannot share constant data because there are different ordering, different sizes and different paddings across architectures, even two very alike like x86 and x86-64 (which is why it’s basically impossible to have inter-ABI calls!).

You cannot share debugging information either (which might be the big part of an ELF file) because it’s tied to the offset of the executable code, and the same applies to the symbol tables.

So, bottomline, since there are quite a few strawy benefits on the FatELF project page, here is a list of problems caused by that approach:

  • introduces a non-trivial amount of new code at various layers of the system (kernel, loader, linker, compiler, debugger, language interpreters, …), it doesn’t matter that a lot of that code is already published by now, it has to be maintained long-time as well, and this introduces a huge amount of complexity;
  • would increase dramatically the size of downloading packages for the optimistic case (a single architecture throughout a household or organisation) since each package would comprise of multiple architectures at once;
  • would use up more space on disk since each executable and library would then be duplicated entirely multiple times; note that at the time Universal Binaries started popping up on systems, more than one software was released to strip the other architecture out of them to reduce space to be wasted on already-ported or won’t-be-ported systems; while FatELF obviously comes with the utilities by itself, I’m pretty sure most tech-savvy users would then decide simply to strip off the architectures that are useless to them;
  • would require non-trivial cross-compilation from build servers which right now all the distributions, as far as I know, tend to avoid.

In general, distributions will definitely never going to want to use this; free software projects would probably employ better their time by making sure the software is easily available in distributions (which often means they should talk to distributors to make sure their software has an usable build system and runtime configuration); proprietary software vendors might be interested in something like that – if they are insane or know nothing about ELF, that is – but even then the whole stack of changes needed is going to be way disproportionate to the advantages/

So I’m sorry if Ryan feels bad about contributing to other projects now because people turned down his idea, but maybe he should try for once to get out of his little world and see how things work with other projects involved, like discussing stuff first, asking around and proposing: people would have turned him down with probably most of the same arguments I used here today, without him having to spend time writing unused (and unusable) code.

Some details about our old friends the .la files

Today, I’m going to write about the so-called “libtool archives” that you might have read about in posts like What about those .la files? or Again about .la files (or why should they be killed off sooner rather than later) (anybody picking up the faint and vague citation here is enough of a TV geek).

Before starting I’m going to say that I’m not reading any public Gentoo mailing list lately which means that if I’m going to bring out points that have been brought up already, I don’t care. I’m just writing this for the sake of it and because Jorge asked me some clarification about what I did write some longish time ago. Indeed the first post is almost exactly one year ago. I do know that there has been more discussion about the need or not need of these libraries for ebuild-provided stuff, so I’m going to try to be as clear as possible.

First problem is to identify what these files are: they are simple text files, and they provide metadata about a library, or a pair of static and shared libraries; this metadata includes some obvious and non-obvious stuff like the names of the two type of libraries, the formal name (soname) of the shared library that can be used with dlopen() and a few more things included the directory the library is to be found in. The one piece of data that creates a problem for us is, though, the dependency list of libraries that needs to be linked against when linking against this library, but I’ll go back to that later. Just please note that it’s there.

Let’s go on with what these files are used for: libtool generates them, and libtool consumes them; they are used when linking with libtool to set the proper runpath if needed (by checking the library’s directory), to choose between the static or shared library version, and to bring in further dependency libraries. This latter thing is controversial and our main issue here: older operating systems had no way to define dependencies between libraries of any kind, and even nowadays on Linux we have no way to define the dependencies of static libraries (archives). So the dependency information is needed when using static link; unfortunately libtool does not ignore the dependencies with shared objects, which can manage their dependencies just fine (through the DT_NEEDED tag present in the .dynamic section), and pollutes the linking line causing problems that can be solved by --as-needed.

Another use for libtool archive files is to know how to load or preload modules (plugins). The idea is that when the operating system provides no way to dynamic load and link further shared objects (that is, a dlopen()-like interface), libtool can simulate that facility by linking together the static modules, using the libltdl library instead. Please note, though, that you don’t need the libtool archives to use libltdl if the operating systems does provide dynamic loading and linking.

This is all fine and dandy for theory, what about practicality? Do we need the .la files installed by ebuilds? And slightly related (as we’ll see later) do we need the static libraries? Unsurprisingly, the answer comes from the developer’s mantra: There is no Answer; it Depends.

To simplify my discussion here, I’m going to reduce the scope to the two official (or semi-official) operating systems supported by Gentoo: Linux and FreeBSD. The situation is likely to be different for some of the operating systems supported by Gentoo/Prefix, and while I don’t want to reduce their impact, the situation is likely to become much more complicated to explain by adding them to the picture.

Both Linux and FreeBSD use as primary executable and linkable format ELF (which stands exactly for “executable and linkable format”), they both support shared objects (libraries), the dlopen() interface, the DT_NEEDED tag, and they both use ar flat archives for static libraries. Most importantly, they use (for now, since FreeBSD is working toward changing this) the same toolchain for compile and link, GCC and GNU binutils.

In this situation, the libltdl “fake dlopen()” is sincerely a huge waste of time, and almost nobody use it; which means that most people wouldn’t want to use the .la files to open plugins (that is, with the exception of KDE 3), which makes installing libtool archives of, say, PulseAudio’s plugins, pointless. Since most software is likely not to use libltdl in the first place, like xine or PAM to cite two that I maintain somehow, their plugins also don’t need the libtool archive files to be installed. I already have reported some rogue pam modules that install pointless .la files (even worse, in the root fs). The rule of thumb here is that if the application is using plugins with standard dlopen() instead of libltdl (or an hacked libltdl like it’s the case for KDE 3), the libtool archives for these plugins are futile; this, as far as I know, includes glib’s GModule support (and you can see by using ls /usr/lib*/gnome-*/*.la that there are some installed for probably no good reason).

But this only provides a description of what to do with the libtool archive files for plugins (modules) and not with libraries; with libraries the situation is a bit more complicated, but not too much, since the rule is even simpler: you can drop all the libtool archives for libraries that are only ever shared (no static archive provided), and for those that have no dependency other than the C library itself (after making sure it does not simply forget to link the dependencies in). In those cases, the static library is enough to be listed by itself, and you don’t need any extra file to tell you what else is needed to be linked in. This already takes care of quite a bit of libtool files: grepping for the string “dependency_libs=''” (which is present in libraries that don’t have any further dependencies than the C library), provides 62 files in my system.

There is another issue that was brought up last year: libraries whose official discovery method is a -config script, or pkg-config; these libraries can ignore the need for providing dependencies for the static variant since they provide it themselves. Unfortunately this has two nasty issues: the first is that most likely someone is not using the correct scripts to find the stuff; I’ve read one blog post a week or two ago about a developer disgruntled because pkg-config was used for a library that didn’t provide it and suggested not to use pkg-config at all (which is quite silly actually). The other problem is that while pkg-config does provide a --static parameter to use different dependency lists between shared and static linking of a library (to avoid polluting the link line), I know of no way to tell autoconf to use that option during discovery at all. But there are also a few things that can be said, since there is enough space for binutils to just implement an extension to the static archives that can actually provide the needed dependency data, but that’s beside the point now I guess.

So let’s sidestep this issue for now and return to the three known cases when we can assert with a relative certainty that the libtool archives are unneeded: non-ltdl-fakeloaded plugins (xine, pam, GModule, …), libraries with no other dependencies than the C library and libraries that only install shared objects. While the first two are pretty obvious, there is something else to say about that last one.

By Gentoo policy we’re supposed to always install both the static and shared object version of a library; unless, that is, upstream doesn’t think so. The reason for this policy is that static linking is preferred for some mission-critical software that might not allow the system to boot up if the library is somehow broken (think bash and libreadline), and because sometimes, well, you just have to leave the user with the option of static linking stuff. There have been proposals of adding an USE flag to allow enabling/disabling build of static libraries, but that’s nowhere to use, yet; one of the problems was to link up the static USE flag with the static-libs USE flag of its dependencies; EAPI 2 USE dependencies can solve that just fine. There are, though, a few cases where you might be better off not providing a static library at all, even if upstream doesn’t say something outright about it, since most likely they never cared.

This is the case for instance of libraries that use, in turn, the dlopen() interface to load their plugins (using static linking with those can produce nasty results); for instance you won’t find a static library for Linux-PAM. There are a few more cases where having static libraries is not suggested at all and we might actually decide to take it out entirely, with the due caution. In those cases you can remove the libtool archive file as well since shared objects do take care of themselves.

Now, case in point, Peter took lots of flames for changing libpcre; there are mixed flames related to, from one side, him removing the libtool archive, and from the other to him removing the static library. I haven’t been part of the flames, in any way, because I’m still minding my own health first of all (is there any point in having a sick me not working on Gentoo?), yet here is my opinion: Peter did one mistake, and that was to unconditionally remove the static library. Funnily enough, what probably most people shouted him at for is the removal of the libtool archive, which is just nothing useful since, you can guess, the library has no further dependency beside the C library (it’s different for what regards libpcreposix, though).

My suggestion at this point is for someone to actually finish cleaning up the script that I think I posted to the mailing lists some time ago, and almost surely can be recreated quite quickly, that takes care of fixing the libtool archive files in the system without requiring a full rebuild of everything. Or even better get a post-processing task for Portage that replaces the direct references to libtool archives in the new libtool archives with generic library references (so that /usr/lib/libpcre.la would become -lpcre straight away and so on for all libraries; the result would be that breakage for future libtool archives removal wouldn’t exist at all).

Security considerations: scanning for bundled libraries

My fight against bundled libraries might soon transcend the implementation limits of my ruby-elf script .

The script I’ve been using to find the bundled libraries up to now was not designed with that in mind originally; the idea was to identify colliding symbols between different object files, so to identify failure cases like xine’s aac decoder hopefully before they become a nuisance to users like PHP. Unfortunately the amount of data generated by the script due to bundled libraries makes it tremendously difficult to deal with in advance, so it can currently only be used as a post-mortem.

But as a security tool, I already stated it’s not enough because it only checks for symbols that are exported by shared objects (and often mistakenly by executables). To actually go deeper, one would have to look at one of two options: the .symtab entries in the ELF files (which are stripped out before installing), or the data that the compiler emits for each output file in form of DWARF sections with -g flags. The former can be done with the same tools I’ve been using up to now, the latter you list with pfunct from dev-util/dwarves. Trust me, though, that if the current database of optimistically suppressed symbols is difficult to deal with, doing a search using dwarf functions is likely to be unmanageable, at least to handle with the same algorithm that I’m using for the collision detection script.

Being able to identify bundled libraries in that kind of output is going to be tremendously tricky; if my collision detection script already finds collision between executables like the one from MySQL (well, before Jorge’s fixes at least) and Samba packages, because they don’t use internally shared libraries, running it against the internal symbols list is going to be even worse because it would then find equally-named internal functions (usage() anybody), static libraries links (including system support libraries) and so on.

So there are little hopes to tackle the issue this way; which makes the idea of finding beforehand all the bundled libraries in a system an inhuman task; on the other hand that doesn’t mean I have to give up on the idea. We can still make use of that data to do some kind of post-mortem, once again, with some tricks though. When it comes to vulnerabilities, you usually have a function, or a series of function, that are involved; depending on the centrality of the functions in a library, there will be more or less applications using that vulnerable codepath; while it’s not extremely difficult to track them down when the function is a direct API (just look for software having external references to that symbol), it’s quite another story with internal convenience functions, since they are called indirectly. For this reason while some advisories do report the problematic symbols, most of the time the thing is just ignored.

We can, though, use that particular piece of information to track down extra vulnerable software, that bundles the code. I’ve been doing that on request for Robert a couple of times with the data produced by the collision detection scripts, but unfortunately it doesn’t help because it also is only able to check externally-defined API, just like a search for use would. How to solve the problem? Well, I could just not strip the files and just read the data from .symtab to see whether the function is defined, and this might actually be what I’m going to do soonish; unfortunately this creates a couple of issues that needs to be taken care of.

The first is that the debug data is not exactly small, the second is that the chroots volume is under RAID1 so the space is a concern; it’s already 100GB big and with just 10% of it free, if I am not to strip data, it’s going to require even more space; I can probably just split out some of the data of the volume in a chroots-throwable volume that I don’t have to keep on RAID1. If I split the debug data with the splitdebug feature, it would make it quite easy to deal with.

Unfortunately this brings me to the second problem, or rather the second set of problems: ruby-elf does not currently support the debuglink facilities, but that’s easy to implemente, after all it’s just a note section with the name of the debug file, the second is nastier and relates to the fact that the debuglink section created by portage lists the basename of the file with the debug information, which is basically the same name as the original with a .debug suffix. The reason why this is not just left to be intended is that if you look up the debuglink for libfoo.so you’ll see the real name might be libfoo.so.2.4.6.debug; on the other hand it’s far from trivial since it leaves something to be intended: the path to find the file into. By default all tools will be looking at the same path as the executable file, and prepend /usr/lib/debug to that. All well as long as there are no symlinks in the path, but if there are (like on multilib AMD64 systems), it starts to be a problem: accessing a shared object via /usr/lib/libfoo.so will try a read of /usr/lib/debug/usr/lib/libfoo.so.2.4.6.debug which will not exist (it would be /usr/lib/debug/usr/lib64/libfoo.so.2.4.6.debug). I have to track down and check if it’s feasible to use full canonicalised path for the debuglink; on the other hand that will assume that the root for the file is the same as the root of the system, which might not be the case. The third option is to use a debugroot-relative path, so that debuglink would look like usr/lib64/libfoo.so.2.4.6.debug; unfortunately I have no clue how gdb and company would take a debuglink like that, and I have to check it).

Problem does not stop here though; since packages collide one with the other when they try to install files with the same name (even when they are not really alternatives), I cannot rely to have all the packages installed in the tinderbox, which is actually making it even worse to analyse the symbol collisions dataset. So I should at least scan the data before merge on livefs is done, and load it in a database, indexed on a per-package per-slot basis, and then select the search that data to identify the problems. Not an easy or a quick solution.

Nor a complete one to be honest: the .symtab method will not show the symbols that are not emitted, like inlined functions; while we do want the unused symbols to be cut out, we still need static inlined functions names, since if a vulnerability is found there, it has to be found. I should check whether DWARF data is emitted for that at least but I wouldn’t be surprised if it wasn’t either. And also does not cope at all with renamed symbols, or copied code… So, still a long way before we actually can reassure users that all security issues are tackled down when found (and this does not limit to Gentoo, remember; Gentoo is the base tool I use to tackle the task, but the same problems involve basically every distribution out there).

Plugins should not go in /usr/lib

Although most of the software in Portage already follows the correct rule of not putting their plugins in /usr/lib or equivalent directories, during my collision detection analysis I did find a few that instead pollute the standard library path with their plugins.

In general, I see way too much pollution in the library path, and that is quite bad since the loader will have to iterate through all of it whenever it has to find a library that is not in the cache, or if environment variables are set that forces it to look for libraries in a different order. Also, the linker (which is quite slow already) needs to look there to find the libraries to link to, which means more and more work every time, to scan everything.

For this reason, the amount of files in /usr/lib that are not needed to be there should really be reduced drastically, this means for instance not installing shell scripts there, like quite a bit of software does, and especially not installing plugins, since those would be picked up by the ldconfig utility and written in a cachefile that should really be as small as possible.

It’s not just that though, there are more problems with software installing plugins, for instance I noticed that in the PAM modules directory /lib/security there is one plugin with a full-versioned name (that is with the final .0.0 which is not needed by PAM modules) and another that uses the lib prefix (also unneeded); both are likely built with libtool by somebody who knows not how to properly build plugins.

I really should be writing another further test to make sure that the packages get installed following the usual layout, I just don’t know how for now. It would be a nice way to identify “rogue” packages that install outside of the standard-defined directories.

In general, whenever you want to install plugins you should do so by putting them in a separate directory inside /usr/lib, usually pkglibdir when using automake (defaults to /usr/lib/PACKAGE_NAME). This also allows a much simpler approach to plugins loading. And if you’re not really allowing plug-ins to be added after the ones compiled at build time, then you should probably not use plugins but rather build them in.

I guess there is more work for my tinderbox now…

Knowing your loader, or why you shouldn’t misuse env.d files

I’m not sure why that happens, but almost every time I’m working on something, I find something different that diverges my attention, providing me with more insight on what is going wrong with software all over. This is what causes me to have a TODO list that continuously grows rather than stopping for a while.

Today, while I went on working on my dependency checker script which for now does not even exist, but for which I added a few more functions to Ruby-Elf (you can check them out if you wish), I’ve started noticing some interesting problems with the library search path of my tinderbox chroot.

The problem is that I got 47 entries in the /etc/ld.so.conf file, which means 47 paths that needs to be scanned, and yet I have two entries crated by the env files as LD_LIBRARY_PATH and yet I find bundled libraries with the Portage bash trigger that my collision detection script misses. This let me understand there is more to that than I have been seeing up to now, and got me to dig deeper.

So I checked out some of the entries in the ld.so.conf file, some of them are due to multiple slot per library for different library versions, like QCA does. This is one decent way to handle the problem, by making two libraries differing just by soname in different directories, and passing the proper -L flag to get one or the other. A much saner alternative would be to ensure that two libraries with different API would get different version names, just like glib 2.0 gets a libglib-2.0.so name, but that’s a different point altogether.

Some other entries are more worrisome. For instance, the cuda packages install all their libraries in /opt/cuda/lib, but the profiler installs there also Qt4 libraries, causing them to appear on the system library list, even though with smaller priority than the copy as installed by the Qt ebuilds. Similar things happen with the binary packages of Firefox and Thunderbird, since I have the two of them together with xulrunner, adding their library path to the system library path.

Instead of doing this, there are a few packages that install the libraries outside the library path, and then use a launcher script, whose main use is setting the LD_LIBRARY_PATH variable to make sure the loader can find their libraries. This is what, for instance, /usr/bin/kurso does. Indeed if you start the kurso (closed-source) executable directly you’re presented with a failure:

# /opt/kurso/bin/kurso3
/opt/kurso/bin/kurso3: symbol lookup error: /opt/kurso/bin/kurso3: undefined symbol: initPAnsiStrings

The symbol in question is defined in the libborqt library as provided by Kurso itself, which is loaded through dlopen() (and thus does not appear in neither ldd nor scanelf -n output, just so I remember you of this problem). I could now write a bit around the amount of software written using Kylix and the amount of copies of libborqt in the system, each with its own copy of libpng and zlib, but that’s beside the point for now. The problem is that it indeed does not work by just starting it directly, and thus a wrapper file is created.

These wrappers are a necessary evil for proprietary closed-source software, but are quite obnoxious when I see them for Free Software that just gets build improperly; that’s the case both for the binary packages of Firefox and similar and the source packages, since my current /usr/bin/firefox contains this:

#!/bin/sh
export LD_LIBRARY_PATH="/usr/lib64/mozilla-firefox"
exec "/usr/lib64/mozilla-firefox"/firefox "$`"

This has a minor evil in it (it overwrites my current LD_LIBRARY_PATH settings) and a nastier one, it really isn’t much of a good idea because then also the programs it will call will get the same value, unless it’s unset later on. In general this is not a tremendously bright idea since we do have a method, the use of DT_RUNPATH in ELF files, that allows us to tell an executable to find the libraries in a directory that is not in the path. This is tremendously useful, and should really be used much more, since setting LD_LIBRARY_PATH also means skipping over the loader cache to find the libraries for all the processes involved rather than just for the (direct) dependency of a single binary. You can see that this has been a problem for Sun too.

For what it’s worth, the OpenSolaris linker, provides special padding space since two years ago to allow changing the runpath of executable files you don’t have access to (like the kurso binary above). It’s too bad that we have to deal with software lacking such padding because it would be quite useful too.

Is this all? Not really no, since you also get env.d entries for packages like quagga that seems to install some libraries in a sub-directory of the standard library directory and then adding that to the set of scanned directories. I haven’t investigated much about it yet, this blog post was intended as just a warning call, and I’ll probably try to work further on this to make it work better, but in general if your package installs internal “core” libraries, that you rightly don’t want into the standard library path (to avoid polluting the search path), you should not add it to ld.so.conf through environment files, but should rather use the -rpath option during link to tell it to find the library in a different path.

For developers, if your package does some funny thing with its libraries and you’re not sure if you’re handling it properly, may I just ask you please to tell me about it? I’d be glad to look into it, and eventually post a full explanation of how it works in the blog so that it can be used as reference for the future. I also have a few very nasty issues to blog about on Firefox and xulrunner, but I’ll save those for another time, for now.