ELF should rather be on a diet

I’ve been first linked the FatELF project in late October by our very own solar; I wanted to write some commentary about it but I couldn’t find the time; today the news is that the author gave up on it after both Linux kernel and GLIBC developer dissed his idea. The post where he noted his intention to discontinue the project looks one drama-queen of a post regarding the idea of contributing to other projects… I say that because, well, it’s always going to be this way if you think about an idea, don’t discuss it before implementing, and then feel angry for the rejection when it comes. I’m pretty sure that no rejection was personal in this rejection, and I can tell you that what I would have written after reading about it the first time would have been “Nice Proof of Concept, but it’s not going to fly”.

Let’s first introduce the idea behind the project: to copy Apple’s “Universal Binaries”, that technique that allowed programs to run both on PPC-based Mac as well as the new Intel-based Mac when they decided to make the transaction, this time applying the same principle to the ELF files that are used on basically all modern UNIX and Unix-like systems (Linux, *BSD, Solaris). There is a strange list of benefits in the project’s homepage; I say strange because they really seem straw arguments for creating FatELF, since I rarely have seen this applied in real world.

Let’s be clear, when Ulrich Drepper (who’s definitely not the most charming developer in our community) says this:

Yes. It is a “solution” which adds costs in many, many places for a problem that doesn’t exist. I don’t see why people even spend a second thinking about this.

I’m not agreeing to the fact that nobody should have spent a second thinking about the idea; toying with ideas, even silly ideas like this one (because as you’ll soon see, this is a silly idea), is always worth it: it gives you an idea of how stuff works, they might actually lead somewhere, or they might simply give yo the sense of proportion of why they don’t work. But there are things to consider when doing stuff like this, and the first is that if there is a status quo, it might be worth discussing the reason of that status quo before going in full sprint and spend a huge amount of time to implement something, as the chance that’s just not going to work is quite high.

To make an example of another status quo-fiddling idea, you might remember Michael Meeks’s direct bindings for ELF files; the idea was definitely interesting, it proven quite fast as well, but it didn’t lead anywhere; Michael, and others including me, “wasted” time in testing it out, even though it was later blocked by Drepper with enough reasons and it’s no longer worked on. Let me qualify that “wasted” though: it was wasted only from the point of view of that particular feature, which led nowhere, but that particular work was what actually made me learn how the two linkers worked together, and got me interested in problems of visibility and cow as well as finding out one xine bug that would have been absolutely voodoo to me if I didn’t spend time learning about symbol resolution before.

Back to FatELF now: why do I think the idea is silly? Why am I agreeing with Drepper about the fact that it’s a solution with too high costs for the unrequested results? Well the first point to make is when Apple made the first step toward universal binaries; if you think the idea sprouted during the PPC to Intel transition, you’re wrong. As Wikipedia notes Apple’s first fat binary implementation dates back to 1994. During the M68K to PPC transition. Replicating the same procedure for an architecture change wasn’t extremely difficult to them to begin with, even though it wasn’t OSX that was used during that particular transition. The other fact is that the first Intel transition was – for their good or bad – a temporary one. As you can probably have noted, they are now transitioning from i386 software to x86-64 software (after my post on PIE you can probably guess why that’s definitely important to them).

But it goes much further than that: Apple has a long history of allowing users to port the content their computer from one to the next with each update, and at the same time they have a lot of third party providing software; since third parties started upgrading to universal binaries before Intel Macs were released for the users, if users kept up to date with the release, one they got their new Intel Mac they would just had to copy the content from the old to the new system and be done with it. This is definitely due to the target audience of Apple.

There is another thing to know about Apple and OS X, that you might not know about if you’ve never used a Mac: applications are distributed in bundles, which are nothing more than a directory structure, inside which the actual binary is hidden; inside the bundle you find all the resources that are needed for the program to run (translations, pictures, help files, generic data files, and so on). To copy an application you only have to copy the bundle, to remove almost all applications you just shove the bundle in the trash can. This forces distributions to happen in bundles as well, which is why Universal Binaries were so important to Apple: the same bundle had to work for all people so that it could still be copied identical between one computer to the other and work, no matter the architecture. This is also why, comparing the size of bundles built Universal, PPC-only and Intel-only, the first is not as big as the size of the other two: all the external resources are shared.

So back to Linux, and see how this applies: with a single notable exception all the Linux distributions out there use a more or less standard Filesystem Hierarchy Standard compatible layout (some use LSB-compatible layout, the two are not one and the same, but the whole idea is definitely similar). In such a setup, there are no bundles, and the executable code is already separated from the code that is not architecture-dependent (/usr/share) and thus shareable. So the only parts that cannot be shared, that FatELF would allow to be shared are the executable code parts, like /bin and /lib.

Now let’s start with understanding where the whole idea is going to be applied: first of all, Linux distributions, by their own design, have a central repository for software, which OS X does not have; and that central repository can be set up at installation time for getting the correct version of the software, without asking the user to know about the architecture by itself. The idea of using fat binaries to reduce the size of that repository is moot: the shareable data is already, for most distributions I know, shared in -noarch packages (arch-independent); the only thing you’d be able to spare would be the metadata of packages, which I’m quite sure for most “big” applications is not going to be that important. And on the other hand, the space you’d be saving on the repository side is going to be wasted by the users on their harddrive (which is definitely going to be disproportionally smaller) and by the bandwidth used to push the data around (hey, if even Google is trying to reduce the downloaded size fatelf is not only going against the status quo but also the technical trend!).

And while I’m quite sure people are going to say that once again, disk space is cheap nowadays, and thus throwing more disks at the problem is going to fix it, there is one place where it’s quite difficult to throw more space at it: CDs and DVDs, which is actually one of the things that FatELF is proposing to make easier, probably in light of users not knowing whether their architecture is x86, amd64 or whatever else. Well, this is already been tackled by projects such as SysRescueCD that provide two kernels and a single userland for the two architectures, given that x86-64 can run x86 code.

The benefits listed in FatELF’s page seem also to focus somewhat to the transition between one arch and the other, like it’s now happening between x86 and x86-64; sure it looks like a big transition and quite a few players in the market are striving to do their best to make the thing as smooth as possible, but either we start thinking of the new x86-64 as the arch, and keep x86 as legacy, or we’re going to get stuck in a transition state forever; Universal Binaries for Apple played a fundamental role in what has been a temporary transition, and one they actually completed quite fast: Snow Leopard does no longer support PPC systems, and everybody is expected the next iteration (10.7) to drop support for 32-bit Intel processors entirely to make the best use of the new 64-bit capabilities. Sure there could be some better handling of transitioning between architectures in Linux as well, especially for people migrating from one system to the other, but given the way distributions work, it’s much easier for a new install to pick up the home directories set up in the older system, import the configuration, and then install the same packages that are installed in the previous one.

After all, FatELF is a trade-off: you trade bigger binaries for almost-universal compatibility. But is the space the only problem at stake here? Not at all; to support something like FatELF you need changes at a high number of layers; the same project page shows that changes were needed in the Linux kernel, the C library (glibc only, but Linux supports uclibc as well), binutils, gdb, elfutils and so on. For interpreted language bindings you also have to count in changing the way Ruby, Python, Java, and the others load their libraries since they now hardcode the architecture information in the path.

Now, let’s get to the real only speakable benefit in that page:

A download that is largely data and not executable code, such as a large video game, doesn’t need to use disproportionate amounts of disk space and bandwidth to supply builds for multiple architectures. Just supply one, with a slightly larger binary with the otherwise unchanged hundreds of megabytes of data.

You might or might know that icculus.org where the FatELF project is hosted is the home of the Linux port of Quake and other similar games, so this is likely the only real problem that was, up to now, really come up before: having big packages for multiple arches that consists mostly of shareable data. As said before, distributions already have architecture-independent packages most of the time; it’s also not uncommon for games to separate the data from the engine source itself, since the engine is much more likely to change than the data (and at the same time, if you use the source version you still need the same data as the binary version). The easiest solution is thus to detach the engine from the data and get the two downloaded separately; I wonder what the issue is with that.

On the other hand, there is a much easier way to handle all this: ship multiple separate ELF binaries in the same binary package, then add a simple sh script that calls the right one for the current host. This is quite easy to do, and requires no change at any of the previously-noted layers. Of course, there is another point made on the FatELF project page that this does not work with libraries… but it’s really not that of an issue, since the script can also set LD_LIBRARY_PATH to point to the correct path for the current architecture as well. Again, this would solve the same exact problem for vendors without requiring any change at all in the layers of the operating system. It’s transparent, it’s easy, it’s perfectly feasible.

I hear already people complaining “but a single FatELF file would be smaller than multiple ELF files!”. Not really. What you can share between the different ELF objects, in theory, is still metadata only (and I’m not convinced by the project page alone that that’s what it’s going to do, it seems to me like it’s a sheer bundling of files together): SONAME, NEEDED entries and similar. Unless you also start bundling different operating systems together – which is what the project also seem to hint at – because in that case you also have no warranty that the metadata is going to be the same: the same code will require different libraries depending on the operating system it’s built for.

Generally, an ELF file is composed of executable code, data, metadata related to the ELF file itself, and then metadata related to the executable code (symbol tables, debugging information) and metadata related to the data (relocations). You can barely share the file’s metadata between architectures, you definitely cannot share it between operating systems as stated above (different SONAME rules, different NEEDED).

You could share string data, since that’s actually the same between different architectures and operating systems most of the time but that’s not really a good reason; you cannot share constant data because there are different ordering, different sizes and different paddings across architectures, even two very alike like x86 and x86-64 (which is why it’s basically impossible to have inter-ABI calls!).

You cannot share debugging information either (which might be the big part of an ELF file) because it’s tied to the offset of the executable code, and the same applies to the symbol tables.

So, bottomline, since there are quite a few strawy benefits on the FatELF project page, here is a list of problems caused by that approach:

  • introduces a non-trivial amount of new code at various layers of the system (kernel, loader, linker, compiler, debugger, language interpreters, …), it doesn’t matter that a lot of that code is already published by now, it has to be maintained long-time as well, and this introduces a huge amount of complexity;
  • would increase dramatically the size of downloading packages for the optimistic case (a single architecture throughout a household or organisation) since each package would comprise of multiple architectures at once;
  • would use up more space on disk since each executable and library would then be duplicated entirely multiple times; note that at the time Universal Binaries started popping up on systems, more than one software was released to strip the other architecture out of them to reduce space to be wasted on already-ported or won’t-be-ported systems; while FatELF obviously comes with the utilities by itself, I’m pretty sure most tech-savvy users would then decide simply to strip off the architectures that are useless to them;
  • would require non-trivial cross-compilation from build servers which right now all the distributions, as far as I know, tend to avoid.

In general, distributions will definitely never going to want to use this; free software projects would probably employ better their time by making sure the software is easily available in distributions (which often means they should talk to distributors to make sure their software has an usable build system and runtime configuration); proprietary software vendors might be interested in something like that – if they are insane or know nothing about ELF, that is – but even then the whole stack of changes needed is going to be way disproportionate to the advantages/

So I’m sorry if Ryan feels bad about contributing to other projects now because people turned down his idea, but maybe he should try for once to get out of his little world and see how things work with other projects involved, like discussing stuff first, asking around and proposing: people would have turned him down with probably most of the same arguments I used here today, without him having to spend time writing unused (and unusable) code.

14 thoughts on “ELF should rather be on a diet

  1. A couple points…First, not all apps on OS X are application bundles. Many of them are installed via a .pkg or .mpkg package file. The downside to these is that they are almost impossible to uninstall. Sometimes an uninstallation script (which may or may not work) is installed with the program; sometimes one is provided with the setup program (which people may not keep around); and there are some uninstall utilities that attempt to read the package installation receipt and remove everything, and which may or may not break your machine. I know you know this, but for someone not familiar with OS X, it’s important to understand that AppBundles (and by extension FatELF) can’t be used in all circumstances anyways.Also, it’s worth noting that the Steam Dedicated Server provided by Valve for running dedicated Half-Life 2, Team Fortress 2, Left 4 Dead, et all servers does exactly what you suggest — it contains multiple binaries, and it switches which one it loads based upon your architecture.

    Like

  2. The idea is not to have the distribution ship all of its libs as FatELF on the installed system – the installed system already knows what arch it’s going to be on and can use the package manager to pick just what it needs.Instead the idea is to use FatELF for those situations where we can’t know what arch we have ahead of time. A distribution could easily support FatELF without actually using it for its libraries or binaries – instead third parties could provide the FatELF binaries.Now what you say about having a script set LD_LIBRARY_PATH and pick the correct arch could have some merit, however the smart thing is for the _distribution_ to provide a standard method of doing this – even a standard name for the script itself that vendors could just call.

    Like

  3. Good post, thank you!I have to disagree with one point though:”first of all, Linux distributions, by their own design, have a central repository for software”That’s one of the things I dislike the most when talking about linux. You are actually very much right with what you’re saying. Every distro has it’s software repository, but that can be a big problem once you need to install third-party software. Less so for the user but more for the one distributing that software.Many people have complained on how hard it is to distribute binary software on a distro-neutral basis. It’s just a fact that distributing binaries across distributions is extremely hard on linux mainly because the software is not built around this and more an afterthought than anything else. I’m not saying that FatELF would solve these issues, but I think one needs to think about these people too. Having most open source is a great thing, but it’s not always possible. Look i.e. at what length the matlab guys go in order to be able to distribute their software on linux? They almost come with their own set of system libraries. As yourselves why!

    Like

  4. You’re looking at the problem with the eyes of a distributionpackager. From that point of view Ryan’s idea indeed doesn’t makeany sense. However, try to put yourself into the role of aindependent game developer.You want a “download here” button on your web site. If that linkleads users to a site with dozens of download links to cover allpossible combinations of distributions and architectures they wouldbe scared off. Even if you only offer i386 and x86_64 you’d needfour to six links (2x rpm [+1 noarch] and 2x deb [+1 noarch]).Pointing people to a repo and including instructions how to enableit in various distros is more than clumsy. The openSUSE 1clickframework isn’t widely supported unfortunately. Even if it was you’dstill need to be root to install that game.So unfortunately the most user friendly way in this case is toprovide a self-extracting executable. How to make that work withmultiple architectures? *shudder*, right, a shell script:http://megastep.org/makeself/ that extracts, selects and executes aninstaller binary (http://icculus.org/loki_set… that is appendedto the shell script. Of course web servers get that wrong, delivertext, corrupt the payload, newer shell script tools break the scriptpart by einforcing weird POSIX rules etc etc. Ryan tried to solveleast that part with a self-extracting zip archive with binary stub:http://icculus.org/mojosetup/. Of course that one only works on onearchitecture. Place #1 where FatELF could help.Now consider the game developer is scared off by all of this andjust offers a zip file with binaries for all architectures included.Or the game can be run directly from CD or the installer alsoinstalls all architectures. You need some way to determine whichbinary to run now. Solution, of couse: shell script. Like this one:http://svn.icculus.org/quak…Think back a few years. The x86_64 line wouldn’t be there or wouldprobably match for ‘amd64’. Many games are ‘fire and forget’ type ofsoftware, yet the user still wants to run it some years afterrelease. The shell script would break today and the novice userdoesn’t know that he needs to use e.g. linux32/setarch to make itwork. Also, again, shell scripts tend to break in other creativeways on newer distros. So leaving the decision which binary to runto the kernel/linker/etc would be the safer choice.FatELF maybe is not the solution. And those installers aren’t niceeither. But the problem it tries to solve does exist. Not only forproprietary software, also for free one like ioquake3.

    Like

  5. Another such solution was multi-architecture binaries by NeXT. They ported NeXTstep from their 68K hardware to x86, SPARC and PA-RISC (I think there was an 88K port in house as well.)The last time there was a major effort to support multiple architectures in an ongoing fashion was in the 1990s with PowerPC and x86. Solaris, Windows NT, BeOS, OpenStep as it moved to Mac OS X, etc… all made some effort, but a standard PowerPC reference platform took too long to evolve and the effort fell apart. Small devices aren’t going to have the storage for multiple architectures, and aside of the 32-bit to 64-bit x86 transition, there isn’t much need for multiple architecture support on the larger boxes. That leaves boutique applications as the main beneficiary of such a solution, and that community has the skills to manage multiple binary files for different architectures.The other thing to point out is the fact that applications these days rarely involve a single binary file. Most are larger packages with a wide range of executables, libraries and support files. A slight reduction in the number of these files isn’t a big win.

    Like

  6. I don’t see a huge problem here. Just provide an installer as a simple x86 file which detects the system it’s running on and fetches the correct data.You can expect distributions to be configured for allowing x86 binaries to run on x86_64 systems. Otherwise the user made a poor choice or is a power user not wanting that feature. For the latter group, just allow the specific download as well, hidden on a second page most users won’t access.

    Like

  7. Perhaps it doesn’t solve distribution problems, but IMO FatELF would be very nice in solving the multilib problem: no more /usr/lib, /usr/lib32 and /usr/lib64 but just /lib and *fat* libraries, letting the dynamic linker choose which code to load

    Like

  8. Davide, what do you think is easier between having three directories with separate files and *having to modify a f-huge stack of software* to support FatELF? The linkers *already* know how to choose the right library out of a series of load paths, they do that without any need for changing kernel, linker, compiler, …Do you think it’d be easier to build? Not at all, there is *nothing* in the FatELF changes stack that makes it *easier* to develop, compile, test, debug multi-architecture software.And there is no need for a script to *download* the right binary: ship all the available binaries together, as @.foo-x86-linux@ @.foo-x86_64-linux@, @.foo-sparc64-solaris@ and then add a very simple, very short and very impossible to get wrong @foo@ that calls the right one based on the output of @uname -m@ and @uname -s@. Easy as pie without having to maintain huge, complex, pieces of crap-software around.

    Like

  9. Ludwig is right. Linux will never have a place to serious game developers just because these little things that nobody needs….. (well, just the final dumb user – not you default/power *nix user).Sorry, but you are biased. FatELF or something similar is indeed highly needed for serious business.http://www.phoronix.com/sca…“… Anyway, without a compelling Linux gamer customer base, it is hard to imagine many commercial game developers supporting Linux ports of their games. “http://www.phoronix.com/sca…“Q: Which part of Linux / X.Org is most troublesome?The hardest thing about distributing a proprietary driver for Linux is to build a binary that will run across as many Linux distributions as possible. The challenges with this are:…3) Being very careful about library and symbol dependencies in any of the binaries we distribute. The classic newbie mistakes here are things like:a) Compiling/linking something on a new distro against a fairly recent version of glibc, and then trying to run that binary on a different distro, with a slightly older version of glibc. Classic errors are things like:undefined reference to `regexec@@GLIBC_2.3.4’undefined reference to `__ctype_b’b) Linking against the C++ runtime library (libstdc++) but then a different distro having a different version of the libstdc++.libstdc++.so.6: cannot open shared object file: No such file or directoryWe avoid these problems by a) explicitly linking against a very old glibc, and b) avoiding use of the C++ runtime. However, it requires careful attention.#1 and #2 are our own fault for trying to produce a binary-only Linux driver (I consider this the price we pay for leveraging so much of the core kernel module from NVIDIA’s cross-platform code base).However, I’ve seen #3 bite many others just trying to provide an application on Linux. Many people eventually give up and just certify their commercial application for a small controlled list of Linux distributions that all have the same glibc and libstdc++ versions, or rebuild their application for different targeted distributions.Providing a more consistent runtime environment for applications I think is an area for improvement in the Linux user space. To be fair, my experiences with the problems in #3 are a bit stale, so there might be less churn in DSO interfaces these days. And I should note that the actual Linux kernel ABI to user space is quite robust — the point is mostly just about the DSOs that applications link against. I should also note that Ulrich Drepper’s DSO How To [16] is an excellent resource for anyone trying to portably use Linux DSOs (in addition to those of us trying to produce portable Linux DSOs).”Think about it before saying that it is a sillyidea. THAT IS A PROBLEM NEEDING A SOLUTION. I doubt you can come with something better than FatELF.You need get out of you little secure world man.

    Like

  10. Seriously, I’m accepting all comments by default but I guess I should start putting a sanity filter.*FatELF does not make it any easier to build binaries that work across distributions.*Repeat after me.*FatELF only makes somewhat easier _for the end user_ to run a binary across _hardware architectures_.*Really, the next person who says “FatELF is the solution to the third party developers” need first *to have a clue how the stuff works*.

    Like

  11. Man, just read:The hardest thing about ****distributing***** a proprietary driver for Linux is to build a binary that will run across ****as many Linux distributions as possible****.Not building. Distributing. Easy for end user. Take of your developer glasses and act as a stupid user wanting to download anything from the product website. plain as that.

    Like

  12. And do you think the reason why vendors don’t distribute stuff is because user have no idea how to install it? Not the fact that it’s already *difficult to handle two operating systems in design time*?Really, stupid uses should keep at doing what they are able to do: complain and don’t propose solutions. Having them support solutions with no technical basis because “they sound cool” and “the proposer is a cool guy” is just wasting lots of people’s time.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s