Me and a RaspberryPi: Cross-linking

You can probably guess that my target is not building packages for an embedded device on the device itself. I usually have a good luck with cross-compilation, but I usually also use very nasty hacks to get it to work. This time around I’m trying to use as few hacks as humanly possible. And linking is becoming trouble.

So first of all you need to get your cross compiler, but that’s easy:

# crossdev armv6j-hardfloat-linux-gnueabi

And everything will be taken care of for you. The whole toolchain, prefixed with armv6j-hardfloat-linux-gnueabi- will then be available, including a suitable armv6j-hardfloat-linux-gnueabi-pkg-config so that you don’t have to deal with it manually.

Now you got two options on how you want to proceed: you may only need a single configuration for all your cross-compiled targets, or you may want to customize the configuration. The former case is the easiest: a root will be set up by crossdev in /usr/armv6j-hardfloat-linux-gnueabi so you can just configure it like a chroot and just armv6j-hardfloat-linux-gnueabi-emerge --root=$path to build the packages.

But I considered that a bad hack so I wanted to do something different: I wanted to have a self-contained root and configuration. The newest GCC theoretically allows this in a very simple fashion: you just need to have already the basic components (glibc, for once) in the root, and then you can use the --with-sysroot flag to switch out of the crossdev-installed headers and libraries. Unfortunately, while the compiler behaves perfectly with this switch, the same can’t be said of the link editor.

Indeed, while even ld supports the --with-sysroot parameter, it will ignore it, making it impossible to find the libraries that are not installed in /usr/armv6j-hardfloat-linux-gnueabi. The classical solution to this is to use -L$ROOT/usr/lib -L$ROOT/lib so that the link editor is forced to search those paths as well — unfortunately this can cause problems due to the presence of .la files, and even more so due to the presence of ldscripts in /usr/lib.

You might remember a quite older post of mine that discussed the names of shared libraries. The unversioned libfoo.so name link is used by the link editor to find which library to link to when you ask for -lfoo. For most libraries this alias is provided by a symlink, but for a number of reasons (which honestly are not that clear to me) for libraries that are installed in /lib, an ldscript is created. This ldscript will provide a non-root-relative path to the shared object, so for instance $ROOT/usr/lib/libz.so will point to /lib/libz.so.1 and that’s not going to fly very well unless sysroot gets respected, but luckily for us, this does actually work.

What it seems like, is that at least the BFD link editor coming from binutils 2.23 has trouble with the implementation of --sysroot for search paths only (it works fine when expanding the paths for the ldscripts) — what about the gold link editor that should be the shiniest out there? Well, it looks like --sysroot there, while technically supported, is implemented in an even worse way: it does not use it when expanding the paths in the ldscripts, which is a 2009 fix from Gentoo on the BFD link editor side.

At the end, my current solution involves setting this in the make.conf in the root:

CFLAGS="-O2 -pipe -march=armv6j -mfpu=vfp -mfloat-abi=hard --sysroot=/home/flame/rasppy/root"
CXXFLAGS="${CFLAGS}"
LDFLAGS="-Wl,-O1,--as-needed --sysroot=/home/flame/rasppy/root -Wl,--sysroot=/home/flame/rasppy/root -L=/usr/lib"

PKG_CONFIG_SYSROOT_DIR=/home/flame/rasppy/root
PKG_CONFIG_ALLOW_SYSTEM_LIBS=1
PKG_CONFIG_LIBDIR=/home/flame/rasppy/root/usr/lib/pkgconfig
PKG_CONFIG_PATH=/home/flame/rasppy/root/usr/lib/pkgconfig:/home/flame/rasppy/root/usr/share/pkgconfig

And then use this command as emerge:

% sudo armv6j-hardfloat-linux-gnueabi-emerge --root=/home/flame/rasppy/root --config-root=/home/flame/rasppy/root

And it seems to work decently enough. Some packages, of course, fail to cross-compile at all, but that’s a story for a different time.

Library SONAME bumps and .la files: some visual clues

Before going on with the post, I’ll give users who’re confused by the post’s title some pointers on how to decipher it: I discussed .la files extensively before, and you can find a description of SONAMEs in another post of mine.

Long- and medium-time Gentoo users most likely remember what happened last time libpng was bumped last year, and will probably worry now that I’m telling them that libpng 1.5 is almost ready to be unmasked (I’m building the reverse dependencies in the tinderbox as we speak to see what breaks). Since I’ve seen through it with the tinderbox, I’m already going to tell you that it’s going to hurt, as a revdep-rebuild call will ask you to rebuild oh-so-many packages due to .la files that, myself, I’ll probably take the chance to move to the hardened compiler and run an emerge -e world just for the kicks.

But why is it this bad? Well, mostly it is the “viral propagation” of dependencies in .la files, which by itself is the reason why .la files are so bad. Since libgtk links to libcairo, and libcairo to libpng, any other library linking with libgtk will be provided with a -lpng entry to link to libpng, no matter whether it uses it or not. Unfortunately, --as-needed does not apply to libtool archives, so they end up overlinking, and only the link editor can drop the unused libraries.

For the sake of example, Evolution does not use libpng directly (the graphic files are managed through GTK’s pixbuf interface), but all of its plugins’ .la files will refer to libpng, which in turn means that revdep-rebuild will pick it up to rebuild it. D’oh!

So what about the visual clue? Well, I’ve decided to use the data from the gold based tinderbox to provide a graph of how many ELF objects actually link to the most common libraries, and how many libtool archives reference them. The data wasn’t easy to extract, mostly because at a first glance, the .la files seemed to be dwarfed by the actually linked objects.. until I remembered that ELF executable can’t have a corresponding .la file.

Library linking histogram

I’m sorry of some browsers might fail to display the image properly; please upgrade to a decent, modern browser as it’s a simple SVG file. The gnuplot script and the raw data file are also available if you wish to look at them.

The graph corroborates what I’ve been saying before, that the bump of libraries such as libexpat and libpng only is a problem because of overlinking and .la files. Indeed you can see that there are about 500 .la files listing either of the two libraries, when there are fewer than a hundred shared objects referencing them. And for zlib it’s even worse: while there are definitely more shared objects using it (348), there are four times as many .la files listing it as one of the dependencies, for no good reason at all.

A different story applies to GLib and GTK+ themselves: the number of shared objects using them is higher than the number of .la files that list them among their dependencies. I guess the reason here is that a number of their users are built with non-libtool-based build systems, and another good amount of .la files are removed by the less lazy Gentoo packagers (XFCE should be entirely .la free nowadays, and yes, it links to GTK+).

Now it is true that the amount of .la files and ELF files is not proportional to the number of packages installing them (for instance Evolution installs 24 .la files and 69 ELF objects), so you can’t really say much about the number of packages you’d have to rebuild when one of the three “virulent” libraries (libpng, libexpat, libz) is installed, but it should still be clear that marking five hundreds files as broken simply because they list a library that is gone, without their respective binary actually having anything to do with said library, is not the best approach we can have.

Dropping the .la file for libcairo (which is where libgtk picks it up) should probably make it much more resilient to the libpng bumps, which have proven to be the nastiest ones. I hope somebody will step up to do so, sooner or later.

Gold readiness obstacle #6: more versioning trouble

And let’s keep talking about gold and the issues I’m encountering trying to build Gentoo with it. Waiting to see if glibc will ever get to implement the base versioning or if Ian would like to implement default versions I’ve found a third problem with the versioning support.

The problem, which I’ve reported as Sourceware bug #12893 as of today, is displayed by the libdebian-installer package: somehow the link editor is reporting two duplicated symbols… in the same source file. At first I thought it was some nasty bug with the link editor’s internals; what I found instead was a curious setting fort the source code itself.

What happens is this: one source file defines a function (with an additional alias, let’s ignore that for now); then it also uses the already described .symver directive to provide an alias which has a version:

int di_system_prebaseconfig_append (const char *udeb, const char *format, ...)
{ /* doesn't care about its body */ }

__asm__ (".symver di_system_prebaseconfig_append,di_system_prebaseconfig_append@LIBDI_4.0");

Do note that the original symbol is not marked static, which means it is also exposed as a base version. This by itself would be just fine, if it wasn’t that said source file is compiled into a translation unit, which is (indirectly, but I’ll simplify) used to produce a shared library. Said shared library uses a link editor version script to set the symbol/version pair of various symbols, like it is often done by properly-designed shared objects.

What becomes a problem is that the symbol’s name is also listed in the version script; which means the link editor will take the unversioned (base) symbol, and label it with the designed version (LIBDI_4.0)… causing two symbols with same name and version to be created. It should appear obvious that something’s wrong with the logic of this whole situation.

Why does this not cause a problem with the old bfd link editor? I’ve got no idea, although I can possibly speculate on the fact that the two symbols not only have the same name and version, but also the same address, which is likely to give the link editor enough clue to simply disregard the duplicate. On the other hand, the solution of this should probably be applied to the libdebian-installer package, the design of which I know nothing about; it might have been intended to support both shared and static linking, but it would look quite strange even in such a case.

At any rate, I’ll have to wait for Ian to express his opinion, and in the mean time I’ll be catching up with a few more buggy packages. I guess I don’t have any hurry, given that libtirpc is not fixed after all as it still reports missing symbols, but it does so only on glibc 2.14, which means that the main tinderbox won’t be able to be of much help for a little while.

Gold readiness obstacle #5: libtool (part 2, OpenMP)

After some digression on the issues with the latest glibc version, it’s time for the fifth instalment of gold readiness for the Gentoo tree, which is completing the libtool issues that I noted a couple of days ago in part 1.

As I said, this relates to a known issue with libtool and OpenMP, to the point that there is a Gentoo bug open and upstream libtool package is already fixed to deal with this, it’s just not trickled down to a release and from there into Gentoo. Although I guess it might be a good idea to just apply this with elibtoolize as I’ve done for the configure fix for gold.

What is the problem? Well, the first problem is with the design of OpenMP language extensions, and with some other flags that implicitly enable those extensions. While these are flags that relate to the GCC frontend (gcc command), they not only change the semantics of the compiled code, they also change the semantics of the linking step, by adding a link to the OpenMP implementation library. This means that the frontend needs to know about this both at compile time as well as link time (where the frontend converts it to the proper linking flags for the link editor to pick up OpenMP).

Unfortunately, when using libtool, its script is parsing and mangling the options first (it’s for this reason that libtool had to be patched to get --as-needed working as intended). Up to now, it didn’t know that -fopenmp should have been passed to the linking frontend of gcc just the same.

Okay, in truth this is not much of an issue for gold only; it is just the same issue when using ld/bfd. But the switch to a link editor that has stricter underlinking rejection makes the issue much more apparent, in particular because, while libtool is usually involved in building the libraries, there is no reason (beside the slight chance of using static linking through libtool archives) to force its usage for building final executables, which means that a single -fopenmp at the final linking point would be quite enough.

Are you kidding me? Or, why we’ll wait glibc 2.14 for a while

A couple of days ago I noted the move to glibc 2.14 of my tinderbox with the hope to quickly find and fix the packages that depend on the now-removed RPC interface. I didn’t expect this kind of apocalypse, but I’m almost wanting to cry, thinking about the mess this version seems to create.

First of all, it doesn’t seem like it’s just Ruby being hit by memory corruption issues, which makes it likely that the new memcpy() interface noted in the ChangeLog is to blame. I haven’t had time to debug this yet though.

A new scary situation arose as well: wget exits with a segmentation fault when trying to resolve any hostname that is not in /etc/hosts, which in the case of the tinderbox means anything that is not localhost or Yamato (as that’s where the Squid proxy is added that caches requests for the fetched Gentoo data). I’m not sure of the cause yet, as the fault happens not within the executable’s code but directly into libresolv, which would point at a bug in glibc itself.

For what concern RPC, I’m surprised that there are so many packages depending on it, and fo the widest variety: multimedia, scientific, network analysis tools, and so on. Now, I was optimist in my previous post, expecting that for most, if not all, of the packages using RPC would be fixed by relying on libtirpc. Ooh boy, how wrong I was.

See the issue is this: libtirpc itself does not build on glibc-2.14, as it relies on one of the NIS/YP headers that has also been removed. Even worse, the latest version (0.2.2) of libtirpc, which I hoped would solve the issue, does not work on any system at all, since a change by our very own Mike (vapier), which was merged upstream just before 0.2.2 release, causes the build to produce a library that lacks a couple of symbols — the source file where they are defined was not added, but even when you add it, you get a couple more symbols being missing. And this release has been out for over a month without any sign of a 0.2.3 coming (upstream repository is still broken, at the time of writing).

Are you freaking kidding me?

Oh and for those who wonder, the issue with base versioning that, as I’ve told, is holding up implementing base version support in gold, is still not fixed. This means that packages such as fuse, included, who wanted to keep binary compatibility with their original unversioned symbols are still not getting any compatibility, even with this version. In my personal opinion it would be a good time to drop the code for that in fuse, but upstream prefers waiting for the new 3.0 version, which is going to get tricky.

With all this considered, it really looks like a very badly broken release, and one that makes me wonder if it wasn’t too inconsiderate to reject the idea of moving to the eglibc patchset/fork like Debian and Ubuntu seems to have done.

Gold readiness obstacle #4: libtool (part 1)

In my current series of posts about gold this time I’m presenting you with a two-parter that shows how GNU libtool is causing further problem with this new link editor. The reason why I split this into two part is because it hits two different issues with it: one is a “minor inconvenience” due to its design, and the other is a known bug due to it.

You might remember me distinguishing into two schools of build configuration systems one relying on tests being compiled and executed, and the other relying on knowing intimate details about the working of the tools to be used with it. Autotools for the vast part are designed to fall squarely into the former category, with a number of advantages, but also of disadvantages, the most obvious of which is the slowness of the ./configure process.

This is true of moth autoconf and autoconf macro; on the other hand, libtool works vastly as a repository of knowledge, knowing rules about various operating systems, link editors and compilers. It shouldn’t surprise anybody, given that it already spends way more time than one would like, to discover how to build shared libraries; if it was a pure guessing game, it would probably make it unbearable to use.

Unfortunately this makes libtool vulnerable to the main issue of knowledge repository systems, with a bad twist: not only it becomes outdated as operating systems, link editors and compilers are released, but due to the autotools philosophy of not requiring the tools themselves at build time, simply updating the system copy of libtool is not enough. For details about this statement see my previous post on libtool from which I still haven’t had time to distil documentation for my guide.

What this boils down to is simple yet scary: even though libtool properly implemented support for gold starting from version 2.2.7 – which actually only means that it now knows that gold supports anonymous versions in linker scripts – any package whose autotools had been built with older versions wouldn’t support gold out of the box. Luckily for us, this isn’t an excessively invasive issue: anonymous versioning, as far as I can tell, is only used when using libtool to export and hide symbols which for good or bad is not used that often.

To make sure that projects know about gold support for the feature, you’re left with two solutions. The former is the obvious one of rebuilding autotools; while this is often necessary for other reasons, it isn’t that good an idea, because it wastes time. I have other notes about the rebuilding of autotools but I’ll skip over them for now. In Gentoo we have already a method to take care of this, consisting of the elibtoolize function. This function applies a number of patches over an already-generated autotools build system, to fix common issues (mostly due to libtool) without having to rebuild autotools altogether.

My first encounter with this interface was due to my early Gentoo/FreeBSD work. One of the things libtool “knows” about, is that FreeBSD does not like having two-part sonames which made many minor version bump, which should have kept the same, or a compatible, ABI into a link-breaking update. Since we didn’t look for binary compatibility with the original FreeBSD I went out of my way to patch libtool so that it would use the Linux naming scheme instead. Given that I already knew how to set it up, it wasn’t that difficult to add one extra patch when gold is used as the link editor of choice.

Unfortunately, having worked with that before also means I know the bad side of elibtoolize: too many packages do not use it at all. This is partly because developers do not know about the need for it (it applies, among others, the --as-needed compatibility patch), and partly because they think that it causes autotools to be rebuilt, rather than preventing the requirement for it.

Waiting for other obstacles to be solved, this one is probably going to be bothering us for months, if not years, to come. I don’t expect a resolution anytime soon, given that the idea of having a comprehensive, versioned system for autopatching packages, which was requested/proposed many times, never went into fruition. Maybe I’ll try pushing for it for the next GSoC (I note that I only partecipate into even years editions).

Gold readiness obstacle #3: side-by-side selection

One question I have been asked by developers and power users alike, is “how do I safely test gold?”, and slightly down the FAQ list “is there an ebuild for it?”. The answers to these questions are quite difficult to give, and it’s virtually impossible to just give a single answer to them. But I’ll try.

The second question is the one that has to be answered first. As a link editor, gold is not built standalone; it is still built as part of GNU binutils, just like the bfd-based link editor, while it is mostly standalone from the rest of the utilities. For a (little) while, Gentoo used to have an USE flag that could turn on building/installing gold, but then it was dropped, mostly because it was clearly too soon to have it even available. Right now, there is only one version to build a binutils package that uses gold as default link editor: the EXTRA_ECONF variable has to be set to --enable-gold=default (without the =default part it would build and install gold, but it wouldn’t be the preferred link editor — ld).

I could give more hints about how to do this, but I’m not going to do so; if you do not know how to set the variable, you really shouldn’t be playing with gold right now, it is not safe at this level just yet.

With the configure switch I give above, you both build gold and set it as the preferred link editor, not the only one though, as the old bfd editor is also built and installed. So what’s the problem? Doesn’t that mean that it is possible to have both and switch then at runtime? Unfortunately, not so much.

Unfortunately GCC does not use a variable to choose which linker to use, it only looks for three executable names (real-ld, collect-ld and ld) in a series of path, starting with the ones listed in COMPILER_PATH environment variable. On an unmodified Gentoo system, what the compiler will execute is something like this (taken from my laptop): /usr/lib/gcc/x86_64-pc-linux-gnu/4.5.2/../../../../x86_64-pc-linux-gnu/bin/ld, which is a symlink created directly by the binutils package. Of course you can use custom COMPILER_PATH values to play with gold, but that’s definitely not something you wish to keep doing for a very long time.

The perfect solution would be to have a tool to select different linker versions, which is what binutils-config would be supposed to do. Unfortunately it never worked as it was properly intended, and the couple of people who tried implementing eselect binutils before has shown that it’s not a trivial task to implement multiple binutils versions slotting properly. the same goes for eselect compiler. I think I have already proposed implementing new, complex eselect tools as part of GSoC, but it seems like nobody picked it up. I’m afraid that this could still be an option next year.

Gold readiness obstacle #2: base versioning

This is the second post in the series analysing the obstacles we face if we want to actually make use of gold as system link editor at some point in time in the future. As I said for the previous one, please make your interest in the topic explicit, as it is a draining exercise to me, due to huge lack of interest by many other developers.

I have already noted up in part #1 that I have submitted a patch for gold and it wasn’t merged, which ticked me off a bit. In this post I’ll explain what that patch was about. This is particularly interesting to me, because, while it is in a very commonly used package, this problem wouldn’t be an “obstacle” as much as it is, in my view, if it wasn’t that I was doing paid work to look into it.

I have already written yesterday, and a number of times before, how you use ELF symbol versioning, so I won’t go back to the topic right now. What I’ll repeat here is that there are two main reasons to use symbol versioning: preventing symbol collisions, as it is used by the Berkeley DB slots I wrote about yesterday, or preserving binary compatibility when making incompatible change to functions’ ABIs but wanting to keep the same library ABI (and thus, soname).

For the former task, you can use the same blanked version information for all the symbols, as I noted, while for the latter you need a more surgical approach. What you usually do, when you stabilized the interface the first time around, is marking with the same version string all the functions. When one of those functions need to be replaced, then, you use source-level symbol versioning to provide a new “default” version of the symbol, together with an explicitly-versioned copy of the symbol that abides to the previously-used ABI. For more details about this, you can see the Binutils documentation that shows the example I’m going to pick up here.

     __asm__(".symver original_foo,foo@");
     __asm__(".symver old_foo,foo@VERS_1.1");
     __asm__(".symver old_foo1,foo@VERS_1.2");
     __asm__(".symver new_foo,foo@@VERS_2.0");

The code above is taken verbatim from the latest version of GNU ld (bfd) documentation. What it translates to, is this:

  • the original, replaced/deprecated interface of the foo() function is implemented with the (hidden) symbol original_foo;
  • two further, replaced/deprecated versions of foo() are implemented as old_foo and old_foo1;
  • finally, new_foo implements the most current version of foo().

How does this work in practice? Well, first of all the headers should only declare the newest interface of foo() – that is new_foo – so that new programs only can use that. When linking a new binary, the link editor will know to satisfy foo() references lacking a version with that version, not because the version string is “higher” (the version string has no meaning for link editor and runtime loaders, it’s just a string); but because it is marked as the “default” version (see the double at symbol in the directive. The other interfaces don’t have to be in the headers, and they will be ignored by the link editor, like they weren’t there. Software built against a previous version of the library, where the default version for foo() was VERS_1.2 or VERS_1.1, would still reference those versions; the runtime loader (ld.so) would then look those up, rather than VERS_2.0.

Lovely, isn’t it? You can improve your interface, solve age-old issues, without having to break the ABI, with the sole “little” downside of increasing the size of the library itself… and relying on a feature only available, for what I know, on GLIBC and maybe FreeBSD (you can achieve the same effect on Windows, but their approach is massively different, anyway let’s ignore that for now). Before somebody says that you actually double the size of the code, I’d like to point out that most of the time, the old function can be expressed as a call to the new function, with properly adapted parameters, unless you’re really changing the function to something entirely different.

For those wondering, using this approach with C++ is very complex and I’d probably say impossible: the ABI for C++ libraries includes the vtable for classes; when adding a new function to a class you change the vtable, increasing its size, and causing the ABI to change. It is for this reason that Trolltech used D-pointers in Qt for a long time, and why KDE had many problems introducing new features and fixing old bugs within a major release cycle.

Now let’s go back to our story of gold, fuse, and symbols.

The fuse library is designed to keep as binary compatible as possible with its predecessors, at least when built for GLIBC (it has special rules to not version interfaces when building for uClibc for instance). This is because it is designed to allow proprietary filesystem providers — for instance the Mac version is used by Parallels to provide their shared folders support. Unfortunately it seems like this wasn’t a requirement in their original implementation, which was built wihtout version information for symbols. This happens quite often actually.

The Binutils example code above fortunately shows exactly how to deal with that: you declare a symbol with no version information. This is called the “base version”, and can only be referenced as the sole version in a linker script, or by omitting the version specifier in a .symver directive. This works with the GNU assembler (as) and with the BFD link editor, but when creating a library with a base-versioned symbols with gold, you get an error:

libtool: link: i686-pc-linux-gnu-gcc -shared  .libs/fuse.o .libs/fuse_kern_chan.o .libs/fuse_loop.o .libs/fuse_loop_mt.o .libs/fuse_lowlevel.o .libs/fuse_mt.o .libs/fuse_opt.o .libs/fuse_session.o .libs/fuse_signals.o .libs/cuse_lowlevel.o .libs/helper.o .libs/subdir.o .libs/iconv.o .libs/mount.o .libs/mount_util.o   -lrt -ldl -Wl,--as-needed  -pthread -Wl,--version-script -Wl,./fuse_versionscript -Wl,-O1 -Wl,--hash-style=gnu   -pthread -Wl,-soname -Wl,libfuse.so.2 -o .libs/libfuse.so.2.8.5
/usr/lib/gcc/i686-pc-linux-gnu/4.6.0/../../../../i686-pc-linux-gnu/bin/ld: error: symbol __fuse_exited has undefined version 

It might not be as quick to be said but this message simply means that gold does not support linking objects containing base-versioned symbols. Is it just a missing feature? Not really. I mean, the feature itself is missing, and indeed is simple to implement, to the point I have implemented it, and you can find the patch for it in Sourceware bug #12261 which is still pending.

The problem here is that even though GNU bfd/ld implements that feature, it is a pointless feature to implement, right now. The problem lies not in the link editor but in the runtime loader (ld.so). As you can see from the testcases provided by Ian in the bug linked above, GLIBC does not do what it is expected to.

What you expect is that, when the loader finds an undefined (requested) symbol, without an attached version information, it would look for a symbol with the corresponding name with base version (no version attached to the definition), and failing that it would look for the one in the default version. What actually happens is that the loader simply picks the first symbol it finds with the same name, without caring about the version if it wasn’t specified in the customer. It is just sheer luck if it finds the one that was intended to be found.

What’s the morale? Well, we have one advertised feature that never worked but that a few projects, such as fuse, wanted to rely upon… I don’t disagree with Ian that this should be fixed in GLIBC first, and that for now gold is just exposing code that doesn’t work. Unfortunately Ian’s requests about the feature went unanswered – and due to Drepper just dumping the list of bug numbers without description in the NEWS files I can’t tell if it was addressed in the new 2.14 version – which means we still have no clue whether this is a functionality that will ever be useful or not. I’ll have to try again if fuse project would agree at just dumping the symbols for now, since they cannot be useful with current glibc versions.

Again, expressing your interest on the topic helps me judge how much weight to put on it outside of my dayjob. Thanks in advance.

Gold readiness obstacle #1: Berkeley DB

I have already said that I’m working on getting gold tested a couple of years after its introduction. The situation with this linker is a bit difficult to assess; Google is set on making heavy use of it, and is supposedly faster to link Chrome (even though it uses an inconsiderate amount of RAM to achieve that — it’s the usual approach of Google software I guess: you can always throw more RAM, you can’t always throw more time!), among others. On the other hand I can tell for sure that no distribution tried to build their whole package set with it yet, simply by looking at the kind of packages that fail to build for one reason or another.

I’ll leave the failures that are important to other, non-Gentoo-based distributions for the next few posts; today, the target is a failure that limits itself to Gentoo systems, because it involves a workaround we implemented a long time ago, which is now going to bite our ass until we either solve it at once, or find an alternative workaround. But let’s start with the original problem.

The Berkeley DB library (berkdb) – which is now maintained by Oracle, for the record – is a very common library used for storing data in plain files. There are a number of different “generations” of API, one of which is provided by the FreeBSD C library as well (db1); and the very generic API structure (dbm) is also implemented by the GNU-project gdbm library. The use of BerkDB was much more prominent in day-to-day life a couple of years ago for any Linux user; nowadays, the storage format and library of preference is SQLite (to the point that even BerkDB itself provides an SQLite-based interface to its own storage format. But even so, it is very difficult to do without BerkDB: LibreOffice, Postfix, Evolution, Squid, Perl, … they all require BerkDB for this or that feature.

Unfortunately the most recent generation of APIs for Berkeley DB is still varying widely, and the format is not always compatible between minor version changes (so from 4.4 to 4.5, and so on). For these reasons, Gentoo has been allowing side-by-side installation of multiple Berkeley DB versions at the same time, so-called slotting. By allowing non-rebuilt software to still use the old version (and the old files), as well as allowing access to the utilities of the previous format, you make sysadmins’ work easier, usually. Unfortunately, since the functions present on more than one minor version have the same exact name, Gentoo users and developers ended up hitting ELF symbol collisions when programs and libraries linked different Berkeley DB versions.

Turns out that GLIBC is actually designed keeping this in mind, and includes symbol versioning to solve the issue: a particular string is assigned t each symbol, so that you can have multiple libraries providing ABI-incompatible symbols with the same name – usually there is a need for the API to be at least partially compatible, but I don’t want to go in too many details now – without clashes and collisions. To provide versioning you have three main option: inline with the C sources, through the use of a version script, or, with GNU ld/bfd, through the --default-symver option, which sets the version string of each symbol to the soname of the library it is exported from. This was a godsend for Gentoo at the time because it allowed avoiding collisions without having to edit anything in the build system: you just had to add the flag to the linker’s flags in the ebuild and voilà.

If you’re now wondering whether GNU gold supports this option, you’re on the right track. The answer is “no, not right now”, right now it chokes on such an option, which results in Berkeley DB reporting the compiler to be unable to create executables. Whether it will support said option or not in the future is still to be seen. Last time I tried to implement a bfd/ld feature in gold – namely support for emitting explicitly unversioned symbols, which is needed to build FUSE – the results have been disappointing although I understand there is a problem with implementing a build feature that cannot work at runtime right now.

So unless gold gains the same option, we need to find another solution or ignore the existence of gold for a while longer. An alternative that I have been told about already would be to replace the current --default-symver option with a --version-script option pointing to an explicit version script to set the version. Unfortunately, this is not as easy done as it is said, at least for the versions we have in tree right now. A similar blanket-version approach would make no issue if it was introduced with a new slot of the package, as the version would have to be different either way, but it wouldn’t work to keep binary compatibility with the older versions.

The problem is that BerkDB isn’t installing a single library, but a number of them instead; and since --default-symver uses the library’s soname when creating the versions for its symbols, it means that for each library, you’d need a different version. Implementing this same method through use of standard versioning scripts would be a world of pain, and probably not worth the prize. For now, I decided to simply mask BerkDB on the container that is testing gold, forcing as many packages as possible to use gdbm instead, which does not have the same problem.

I’m glad we decided not to go the same route with expat, even though the immediate fallout at the time was out of scale (at the time it was a dream even to think about using --as-needed.la files are a joke in comparison!), it saved us the headache of reaching the point where we decide whether to forgo modern tools, or break binary compatibility again.

At any rate this is just the tip of the iceberg, about gold and real-world software. I’ll write more about this in the next days as I find time. For now, I wouldn’t mind if you noted your interest on testing gold… comments, flattrs (on the blog, post or, even better, tinderbox since that’s what is doing the work!) and other tokens are definitely appreciated. At least it would tell me I’m not wrong in insisting spending time reporting and solving the gold bugs.

Surviving without libtool archives

The sheer amount of articles regarding libtool archives in my blog’s archive is probably enough to tell that I have a pretty bad relationship with them. But even though I have been quiet, lately, I haven’t stopped working on them.

In particular, my current “daily job”, or at least one of them, has among its task the one of getting more reliable cross-compilation support with Gentoo’s ebuilds, and dropping the cursed .la files is part of the plan. With them around, often enough, -L/usr/lib64 is added to the linking line, which causes all sorts of troubles once it’s resolved to a fully qualified path such as /usr/lib64/libfoo.so which cannot obviously be linked against.

After a bit of doubts on how to proceed (waiting for Gentoo proper to remove all of them, given the kind of obstruction I find coming from developers, was not really an option), it was decided to try building the whole chroot used for cross-compilation, and the cross-compiled roots by removing all the .la files with the exclusion of the one that is known to be required (libltdl.la).

For those wondering, it was achieved with this “simple” setting in make.conf:

INSTALL_MASK="
/usr/lib/lib[0-9]*.la
/usr/lib/lib[a-k]*.la
/usr/lib/lib[m-z]*.la
/usr/lib/libl[0-9]*.la
/usr/lib/libl[a-s]*.la
/usr/lib/libl[u-z]*.la
/usr/lib/liblt[0-9]*.la
/usr/lib/liblt[a-c]*.la
/usr/lib/liblt[e-z]*.la
/usr/lib/libltd[0-9]*.la
/usr/lib/libltd[a-k]*.la
/usr/lib/libltd[m-z]*.la
/usr/lib/libltdl[0-9]*.la
/usr/lib/libltdl[a-z]*.la
"

At this point, my opinion has always been that there would be no problem with building further packages even though the libtool archives were gone… turns out I was mostly, but not entirely, right. Indeed if the system were built using the standard GNU ld linker, there would have been no package failure; on the other hand, all of this is built using gold ­– which is much stricter about underlinking – and that makes a huge difference.

Not only the use of libtool by itself would make it mostly pointless to use --as-needed, but it also makes --no-add-needed (the feature that makes the linker strict in terms of underlinking) much less effective: if you only link to libfoo.la, which in turn lists -lbar, and you use symbols from both, the libtool archive would provide both to the linker, hiding the fact that you didn’t express your dependencies properly.

But out of a whole operating system built without .la files, how many packages did require fixes? The answer is two: libsoup (which was actually already fixed in the second-to-last release, so the fix was simply updating the version used) and tpm-tools (that, similarly to trousers and opencryptoki has a quite bogus build system).

I’m not saying that they would be the only packages suffering from these issues, and in particular, with the fact that this system is not building anything statically, it is likely to encounter much fewer complications, but it is more than likely that with minimal effort we’d suffer fewer problems with linking, rebuilds, and dependencies if we were to drop those files entirely, and switch to fixing the few packages failing.