Gold readiness obstacle #6: more versioning trouble

And let’s keep talking about gold and the issues I’m encountering trying to build Gentoo with it. Waiting to see if glibc will ever get to implement the base versioning or if Ian would like to implement default versions I’ve found a third problem with the versioning support.

The problem, which I’ve reported as Sourceware bug #12893 as of today, is displayed by the libdebian-installer package: somehow the link editor is reporting two duplicated symbols… in the same source file. At first I thought it was some nasty bug with the link editor’s internals; what I found instead was a curious setting fort the source code itself.

What happens is this: one source file defines a function (with an additional alias, let’s ignore that for now); then it also uses the already described .symver directive to provide an alias which has a version:

int di_system_prebaseconfig_append (const char *udeb, const char *format, ...)
{ /* doesn't care about its body */ }

__asm__ (".symver di_system_prebaseconfig_append,di_system_prebaseconfig_append@LIBDI_4.0");

Do note that the original symbol is not marked static, which means it is also exposed as a base version. This by itself would be just fine, if it wasn’t that said source file is compiled into a translation unit, which is (indirectly, but I’ll simplify) used to produce a shared library. Said shared library uses a link editor version script to set the symbol/version pair of various symbols, like it is often done by properly-designed shared objects.

What becomes a problem is that the symbol’s name is also listed in the version script; which means the link editor will take the unversioned (base) symbol, and label it with the designed version (LIBDI_4.0)… causing two symbols with same name and version to be created. It should appear obvious that something’s wrong with the logic of this whole situation.

Why does this not cause a problem with the old bfd link editor? I’ve got no idea, although I can possibly speculate on the fact that the two symbols not only have the same name and version, but also the same address, which is likely to give the link editor enough clue to simply disregard the duplicate. On the other hand, the solution of this should probably be applied to the libdebian-installer package, the design of which I know nothing about; it might have been intended to support both shared and static linking, but it would look quite strange even in such a case.

At any rate, I’ll have to wait for Ian to express his opinion, and in the mean time I’ll be catching up with a few more buggy packages. I guess I don’t have any hurry, given that libtirpc is not fixed after all as it still reports missing symbols, but it does so only on glibc 2.14, which means that the main tinderbox won’t be able to be of much help for a little while.

Gold readiness obstacle #4: libtool (part 1)

In my current series of posts about gold this time I’m presenting you with a two-parter that shows how GNU libtool is causing further problem with this new link editor. The reason why I split this into two part is because it hits two different issues with it: one is a “minor inconvenience” due to its design, and the other is a known bug due to it.

You might remember me distinguishing into two schools of build configuration systems one relying on tests being compiled and executed, and the other relying on knowing intimate details about the working of the tools to be used with it. Autotools for the vast part are designed to fall squarely into the former category, with a number of advantages, but also of disadvantages, the most obvious of which is the slowness of the ./configure process.

This is true of moth autoconf and autoconf macro; on the other hand, libtool works vastly as a repository of knowledge, knowing rules about various operating systems, link editors and compilers. It shouldn’t surprise anybody, given that it already spends way more time than one would like, to discover how to build shared libraries; if it was a pure guessing game, it would probably make it unbearable to use.

Unfortunately this makes libtool vulnerable to the main issue of knowledge repository systems, with a bad twist: not only it becomes outdated as operating systems, link editors and compilers are released, but due to the autotools philosophy of not requiring the tools themselves at build time, simply updating the system copy of libtool is not enough. For details about this statement see my previous post on libtool from which I still haven’t had time to distil documentation for my guide.

What this boils down to is simple yet scary: even though libtool properly implemented support for gold starting from version 2.2.7 – which actually only means that it now knows that gold supports anonymous versions in linker scripts – any package whose autotools had been built with older versions wouldn’t support gold out of the box. Luckily for us, this isn’t an excessively invasive issue: anonymous versioning, as far as I can tell, is only used when using libtool to export and hide symbols which for good or bad is not used that often.

To make sure that projects know about gold support for the feature, you’re left with two solutions. The former is the obvious one of rebuilding autotools; while this is often necessary for other reasons, it isn’t that good an idea, because it wastes time. I have other notes about the rebuilding of autotools but I’ll skip over them for now. In Gentoo we have already a method to take care of this, consisting of the elibtoolize function. This function applies a number of patches over an already-generated autotools build system, to fix common issues (mostly due to libtool) without having to rebuild autotools altogether.

My first encounter with this interface was due to my early Gentoo/FreeBSD work. One of the things libtool “knows” about, is that FreeBSD does not like having two-part sonames which made many minor version bump, which should have kept the same, or a compatible, ABI into a link-breaking update. Since we didn’t look for binary compatibility with the original FreeBSD I went out of my way to patch libtool so that it would use the Linux naming scheme instead. Given that I already knew how to set it up, it wasn’t that difficult to add one extra patch when gold is used as the link editor of choice.

Unfortunately, having worked with that before also means I know the bad side of elibtoolize: too many packages do not use it at all. This is partly because developers do not know about the need for it (it applies, among others, the --as-needed compatibility patch), and partly because they think that it causes autotools to be rebuilt, rather than preventing the requirement for it.

Waiting for other obstacles to be solved, this one is probably going to be bothering us for months, if not years, to come. I don’t expect a resolution anytime soon, given that the idea of having a comprehensive, versioned system for autopatching packages, which was requested/proposed many times, never went into fruition. Maybe I’ll try pushing for it for the next GSoC (I note that I only partecipate into even years editions).

Gold readiness obstacle #3: side-by-side selection

One question I have been asked by developers and power users alike, is “how do I safely test gold?”, and slightly down the FAQ list “is there an ebuild for it?”. The answers to these questions are quite difficult to give, and it’s virtually impossible to just give a single answer to them. But I’ll try.

The second question is the one that has to be answered first. As a link editor, gold is not built standalone; it is still built as part of GNU binutils, just like the bfd-based link editor, while it is mostly standalone from the rest of the utilities. For a (little) while, Gentoo used to have an USE flag that could turn on building/installing gold, but then it was dropped, mostly because it was clearly too soon to have it even available. Right now, there is only one version to build a binutils package that uses gold as default link editor: the EXTRA_ECONF variable has to be set to --enable-gold=default (without the =default part it would build and install gold, but it wouldn’t be the preferred link editor — ld).

I could give more hints about how to do this, but I’m not going to do so; if you do not know how to set the variable, you really shouldn’t be playing with gold right now, it is not safe at this level just yet.

With the configure switch I give above, you both build gold and set it as the preferred link editor, not the only one though, as the old bfd editor is also built and installed. So what’s the problem? Doesn’t that mean that it is possible to have both and switch then at runtime? Unfortunately, not so much.

Unfortunately GCC does not use a variable to choose which linker to use, it only looks for three executable names (real-ld, collect-ld and ld) in a series of path, starting with the ones listed in COMPILER_PATH environment variable. On an unmodified Gentoo system, what the compiler will execute is something like this (taken from my laptop): /usr/lib/gcc/x86_64-pc-linux-gnu/4.5.2/../../../../x86_64-pc-linux-gnu/bin/ld, which is a symlink created directly by the binutils package. Of course you can use custom COMPILER_PATH values to play with gold, but that’s definitely not something you wish to keep doing for a very long time.

The perfect solution would be to have a tool to select different linker versions, which is what binutils-config would be supposed to do. Unfortunately it never worked as it was properly intended, and the couple of people who tried implementing eselect binutils before has shown that it’s not a trivial task to implement multiple binutils versions slotting properly. the same goes for eselect compiler. I think I have already proposed implementing new, complex eselect tools as part of GSoC, but it seems like nobody picked it up. I’m afraid that this could still be an option next year.

Gold readiness obstacle #2: base versioning

This is the second post in the series analysing the obstacles we face if we want to actually make use of gold as system link editor at some point in time in the future. As I said for the previous one, please make your interest in the topic explicit, as it is a draining exercise to me, due to huge lack of interest by many other developers.

I have already noted up in part #1 that I have submitted a patch for gold and it wasn’t merged, which ticked me off a bit. In this post I’ll explain what that patch was about. This is particularly interesting to me, because, while it is in a very commonly used package, this problem wouldn’t be an “obstacle” as much as it is, in my view, if it wasn’t that I was doing paid work to look into it.

I have already written yesterday, and a number of times before, how you use ELF symbol versioning, so I won’t go back to the topic right now. What I’ll repeat here is that there are two main reasons to use symbol versioning: preventing symbol collisions, as it is used by the Berkeley DB slots I wrote about yesterday, or preserving binary compatibility when making incompatible change to functions’ ABIs but wanting to keep the same library ABI (and thus, soname).

For the former task, you can use the same blanked version information for all the symbols, as I noted, while for the latter you need a more surgical approach. What you usually do, when you stabilized the interface the first time around, is marking with the same version string all the functions. When one of those functions need to be replaced, then, you use source-level symbol versioning to provide a new “default” version of the symbol, together with an explicitly-versioned copy of the symbol that abides to the previously-used ABI. For more details about this, you can see the Binutils documentation that shows the example I’m going to pick up here.

     __asm__(".symver original_foo,foo@");
     __asm__(".symver old_foo,foo@VERS_1.1");
     __asm__(".symver old_foo1,foo@VERS_1.2");
     __asm__(".symver new_foo,foo@@VERS_2.0");

The code above is taken verbatim from the latest version of GNU ld (bfd) documentation. What it translates to, is this:

  • the original, replaced/deprecated interface of the foo() function is implemented with the (hidden) symbol original_foo;
  • two further, replaced/deprecated versions of foo() are implemented as old_foo and old_foo1;
  • finally, new_foo implements the most current version of foo().

How does this work in practice? Well, first of all the headers should only declare the newest interface of foo() – that is new_foo – so that new programs only can use that. When linking a new binary, the link editor will know to satisfy foo() references lacking a version with that version, not because the version string is “higher” (the version string has no meaning for link editor and runtime loaders, it’s just a string); but because it is marked as the “default” version (see the double at symbol in the directive. The other interfaces don’t have to be in the headers, and they will be ignored by the link editor, like they weren’t there. Software built against a previous version of the library, where the default version for foo() was VERS_1.2 or VERS_1.1, would still reference those versions; the runtime loader (ld.so) would then look those up, rather than VERS_2.0.

Lovely, isn’t it? You can improve your interface, solve age-old issues, without having to break the ABI, with the sole “little” downside of increasing the size of the library itself… and relying on a feature only available, for what I know, on GLIBC and maybe FreeBSD (you can achieve the same effect on Windows, but their approach is massively different, anyway let’s ignore that for now). Before somebody says that you actually double the size of the code, I’d like to point out that most of the time, the old function can be expressed as a call to the new function, with properly adapted parameters, unless you’re really changing the function to something entirely different.

For those wondering, using this approach with C++ is very complex and I’d probably say impossible: the ABI for C++ libraries includes the vtable for classes; when adding a new function to a class you change the vtable, increasing its size, and causing the ABI to change. It is for this reason that Trolltech used D-pointers in Qt for a long time, and why KDE had many problems introducing new features and fixing old bugs within a major release cycle.

Now let’s go back to our story of gold, fuse, and symbols.

The fuse library is designed to keep as binary compatible as possible with its predecessors, at least when built for GLIBC (it has special rules to not version interfaces when building for uClibc for instance). This is because it is designed to allow proprietary filesystem providers — for instance the Mac version is used by Parallels to provide their shared folders support. Unfortunately it seems like this wasn’t a requirement in their original implementation, which was built wihtout version information for symbols. This happens quite often actually.

The Binutils example code above fortunately shows exactly how to deal with that: you declare a symbol with no version information. This is called the “base version”, and can only be referenced as the sole version in a linker script, or by omitting the version specifier in a .symver directive. This works with the GNU assembler (as) and with the BFD link editor, but when creating a library with a base-versioned symbols with gold, you get an error:

libtool: link: i686-pc-linux-gnu-gcc -shared  .libs/fuse.o .libs/fuse_kern_chan.o .libs/fuse_loop.o .libs/fuse_loop_mt.o .libs/fuse_lowlevel.o .libs/fuse_mt.o .libs/fuse_opt.o .libs/fuse_session.o .libs/fuse_signals.o .libs/cuse_lowlevel.o .libs/helper.o .libs/subdir.o .libs/iconv.o .libs/mount.o .libs/mount_util.o   -lrt -ldl -Wl,--as-needed  -pthread -Wl,--version-script -Wl,./fuse_versionscript -Wl,-O1 -Wl,--hash-style=gnu   -pthread -Wl,-soname -Wl,libfuse.so.2 -o .libs/libfuse.so.2.8.5
/usr/lib/gcc/i686-pc-linux-gnu/4.6.0/../../../../i686-pc-linux-gnu/bin/ld: error: symbol __fuse_exited has undefined version 

It might not be as quick to be said but this message simply means that gold does not support linking objects containing base-versioned symbols. Is it just a missing feature? Not really. I mean, the feature itself is missing, and indeed is simple to implement, to the point I have implemented it, and you can find the patch for it in Sourceware bug #12261 which is still pending.

The problem here is that even though GNU bfd/ld implements that feature, it is a pointless feature to implement, right now. The problem lies not in the link editor but in the runtime loader (ld.so). As you can see from the testcases provided by Ian in the bug linked above, GLIBC does not do what it is expected to.

What you expect is that, when the loader finds an undefined (requested) symbol, without an attached version information, it would look for a symbol with the corresponding name with base version (no version attached to the definition), and failing that it would look for the one in the default version. What actually happens is that the loader simply picks the first symbol it finds with the same name, without caring about the version if it wasn’t specified in the customer. It is just sheer luck if it finds the one that was intended to be found.

What’s the morale? Well, we have one advertised feature that never worked but that a few projects, such as fuse, wanted to rely upon… I don’t disagree with Ian that this should be fixed in GLIBC first, and that for now gold is just exposing code that doesn’t work. Unfortunately Ian’s requests about the feature went unanswered – and due to Drepper just dumping the list of bug numbers without description in the NEWS files I can’t tell if it was addressed in the new 2.14 version – which means we still have no clue whether this is a functionality that will ever be useful or not. I’ll have to try again if fuse project would agree at just dumping the symbols for now, since they cannot be useful with current glibc versions.

Again, expressing your interest on the topic helps me judge how much weight to put on it outside of my dayjob. Thanks in advance.

Gold readiness obstacle #1: Berkeley DB

I have already said that I’m working on getting gold tested a couple of years after its introduction. The situation with this linker is a bit difficult to assess; Google is set on making heavy use of it, and is supposedly faster to link Chrome (even though it uses an inconsiderate amount of RAM to achieve that — it’s the usual approach of Google software I guess: you can always throw more RAM, you can’t always throw more time!), among others. On the other hand I can tell for sure that no distribution tried to build their whole package set with it yet, simply by looking at the kind of packages that fail to build for one reason or another.

I’ll leave the failures that are important to other, non-Gentoo-based distributions for the next few posts; today, the target is a failure that limits itself to Gentoo systems, because it involves a workaround we implemented a long time ago, which is now going to bite our ass until we either solve it at once, or find an alternative workaround. But let’s start with the original problem.

The Berkeley DB library (berkdb) – which is now maintained by Oracle, for the record – is a very common library used for storing data in plain files. There are a number of different “generations” of API, one of which is provided by the FreeBSD C library as well (db1); and the very generic API structure (dbm) is also implemented by the GNU-project gdbm library. The use of BerkDB was much more prominent in day-to-day life a couple of years ago for any Linux user; nowadays, the storage format and library of preference is SQLite (to the point that even BerkDB itself provides an SQLite-based interface to its own storage format. But even so, it is very difficult to do without BerkDB: LibreOffice, Postfix, Evolution, Squid, Perl, … they all require BerkDB for this or that feature.

Unfortunately the most recent generation of APIs for Berkeley DB is still varying widely, and the format is not always compatible between minor version changes (so from 4.4 to 4.5, and so on). For these reasons, Gentoo has been allowing side-by-side installation of multiple Berkeley DB versions at the same time, so-called slotting. By allowing non-rebuilt software to still use the old version (and the old files), as well as allowing access to the utilities of the previous format, you make sysadmins’ work easier, usually. Unfortunately, since the functions present on more than one minor version have the same exact name, Gentoo users and developers ended up hitting ELF symbol collisions when programs and libraries linked different Berkeley DB versions.

Turns out that GLIBC is actually designed keeping this in mind, and includes symbol versioning to solve the issue: a particular string is assigned t each symbol, so that you can have multiple libraries providing ABI-incompatible symbols with the same name – usually there is a need for the API to be at least partially compatible, but I don’t want to go in too many details now – without clashes and collisions. To provide versioning you have three main option: inline with the C sources, through the use of a version script, or, with GNU ld/bfd, through the --default-symver option, which sets the version string of each symbol to the soname of the library it is exported from. This was a godsend for Gentoo at the time because it allowed avoiding collisions without having to edit anything in the build system: you just had to add the flag to the linker’s flags in the ebuild and voilà.

If you’re now wondering whether GNU gold supports this option, you’re on the right track. The answer is “no, not right now”, right now it chokes on such an option, which results in Berkeley DB reporting the compiler to be unable to create executables. Whether it will support said option or not in the future is still to be seen. Last time I tried to implement a bfd/ld feature in gold – namely support for emitting explicitly unversioned symbols, which is needed to build FUSE – the results have been disappointing although I understand there is a problem with implementing a build feature that cannot work at runtime right now.

So unless gold gains the same option, we need to find another solution or ignore the existence of gold for a while longer. An alternative that I have been told about already would be to replace the current --default-symver option with a --version-script option pointing to an explicit version script to set the version. Unfortunately, this is not as easy done as it is said, at least for the versions we have in tree right now. A similar blanket-version approach would make no issue if it was introduced with a new slot of the package, as the version would have to be different either way, but it wouldn’t work to keep binary compatibility with the older versions.

The problem is that BerkDB isn’t installing a single library, but a number of them instead; and since --default-symver uses the library’s soname when creating the versions for its symbols, it means that for each library, you’d need a different version. Implementing this same method through use of standard versioning scripts would be a world of pain, and probably not worth the prize. For now, I decided to simply mask BerkDB on the container that is testing gold, forcing as many packages as possible to use gdbm instead, which does not have the same problem.

I’m glad we decided not to go the same route with expat, even though the immediate fallout at the time was out of scale (at the time it was a dream even to think about using --as-needed.la files are a joke in comparison!), it saved us the headache of reaching the point where we decide whether to forgo modern tools, or break binary compatibility again.

At any rate this is just the tip of the iceberg, about gold and real-world software. I’ll write more about this in the next days as I find time. For now, I wouldn’t mind if you noted your interest on testing gold… comments, flattrs (on the blog, post or, even better, tinderbox since that’s what is doing the work!) and other tokens are definitely appreciated. At least it would tell me I’m not wrong in insisting spending time reporting and solving the gold bugs.

Surviving without libtool archives

The sheer amount of articles regarding libtool archives in my blog’s archive is probably enough to tell that I have a pretty bad relationship with them. But even though I have been quiet, lately, I haven’t stopped working on them.

In particular, my current “daily job”, or at least one of them, has among its task the one of getting more reliable cross-compilation support with Gentoo’s ebuilds, and dropping the cursed .la files is part of the plan. With them around, often enough, -L/usr/lib64 is added to the linking line, which causes all sorts of troubles once it’s resolved to a fully qualified path such as /usr/lib64/libfoo.so which cannot obviously be linked against.

After a bit of doubts on how to proceed (waiting for Gentoo proper to remove all of them, given the kind of obstruction I find coming from developers, was not really an option), it was decided to try building the whole chroot used for cross-compilation, and the cross-compiled roots by removing all the .la files with the exclusion of the one that is known to be required (libltdl.la).

For those wondering, it was achieved with this “simple” setting in make.conf:

INSTALL_MASK="
/usr/lib/lib[0-9]*.la
/usr/lib/lib[a-k]*.la
/usr/lib/lib[m-z]*.la
/usr/lib/libl[0-9]*.la
/usr/lib/libl[a-s]*.la
/usr/lib/libl[u-z]*.la
/usr/lib/liblt[0-9]*.la
/usr/lib/liblt[a-c]*.la
/usr/lib/liblt[e-z]*.la
/usr/lib/libltd[0-9]*.la
/usr/lib/libltd[a-k]*.la
/usr/lib/libltd[m-z]*.la
/usr/lib/libltdl[0-9]*.la
/usr/lib/libltdl[a-z]*.la
"

At this point, my opinion has always been that there would be no problem with building further packages even though the libtool archives were gone… turns out I was mostly, but not entirely, right. Indeed if the system were built using the standard GNU ld linker, there would have been no package failure; on the other hand, all of this is built using gold ­– which is much stricter about underlinking – and that makes a huge difference.

Not only the use of libtool by itself would make it mostly pointless to use --as-needed, but it also makes --no-add-needed (the feature that makes the linker strict in terms of underlinking) much less effective: if you only link to libfoo.la, which in turn lists -lbar, and you use symbols from both, the libtool archive would provide both to the linker, hiding the fact that you didn’t express your dependencies properly.

But out of a whole operating system built without .la files, how many packages did require fixes? The answer is two: libsoup (which was actually already fixed in the second-to-last release, so the fix was simply updating the version used) and tpm-tools (that, similarly to trousers and opencryptoki has a quite bogus build system).

I’m not saying that they would be the only packages suffering from these issues, and in particular, with the fact that this system is not building anything statically, it is likely to encounter much fewer complications, but it is more than likely that with minimal effort we’d suffer fewer problems with linking, rebuilds, and dependencies if we were to drop those files entirely, and switch to fixing the few packages failing.

Gold rush — The troubles of using gold for testing

In my previous post I have listed a few good things that come with the stricter behaviour implemented by the gold linker, and why I’d like to use this for testing. You also know that the current released versions are definitely unreliable and unusable for testing (or any other) purposes.

Well, version 2.20 (which is the latest “stable” release of binutils) is quite ancient by now, and indeed, 2.21 is coming out soon, so I actually went on to test the status of gold in that version: it works, sort-of. Indeed, with the current working version of gold, that, among other things, can – and I’d venture to say should – be built separately, you can build a good deal of packages. But by no mean all.

It’s not just a matter of what fails because of the underlinking prevention of which, I hope, I have conveyed the positive effects and the reasons why we should enforce it more strongly, but rather of the fact that we know for a fact of at least one bug where gold fails to execute properly when correctly using the documented syntax.

The above bug is a split-up of the issues that fuse itself has when built with gold: from one side there was a bug in fuse (the same symbol listed in two version names within the linker script), and that was fixed upstream; on the other hand, the following syntax, described by the binutils manual itself:

__asm__(".symver original_foo,foo@");

This is supposed to declare the symbol foo in the base version; it is not replaceable with a linker script either since there is no way to define the base version in there if any other version is defined. On the other hand, gold has gone up to now without implementing the feature. Willingly. And it’s actually not an entirely bad reason why it has been doing so: the reference implementation for this stuff (GLIBC) Is broken in this regard.

Indeed, it seems like, instead of actually using the base version first when a symbol without version is provided, GLIBC prefers another symbol, not even the default symbol — most likely, it depends on the order of the symbols in the ELF file. The result is, at any rate, pretty bad, because the binary compatibility between version 2.0 and all the following ones, for what concerns FUSE, is not really present: binary compatibility is ensured only since they started providing versioned symbols, so from version 2.2 onward.

Now, how to move forward from here depends vastly on whether version 2.21 will have the bug fixed or not; if not, then I think the only way out to solve this would be to install a separate gold package, installing a binutils-config configuration file to select it as default linker — this one is a nice trick, whose idea I lifted directly from the chromiumos-overlay binutils ebuild.

Indeed, using a separate ebuild would also help if we were to find some other problem with gold, failing to build or producing bad results for a package; I’m pretty sure that if my tinderbox would run with gold, it would be the big deployment it would need to be called “usable”. Yes I know there are other projects using it, but given there are still relatively big issues like the one I pictured above, I expect that it wasn’t really polished around enough yet. This would probably take care of it.

It’s not all gold that shines — Why underlinking is a bad thing

I have written a few days ago a rant about the gold linker, that really messed up the 64-bit tinderbox to the point I have to recreate it from scratch entirely. Some people asked me why I was interested in using gold, given I’m the first one who was sceptic about it — but not the only one as it happens.

Well, there are two reasons for me to be interested in this; the most obvious one is that I’m being paid to care (but I won’t/can’t go in further details, let’s just say that I have a contract that requires me to work at close contact with it), the other is that gold implements what I’ll be calling “underlinking protection” similar to what is done by the Apple linker, and similar to what would be achieved with --no-copy-dt-needed-entries on the good old ld (but in the case of ld, the softer --as-needed is actually going to make it quite moot.

Let me explain what the issue is here with a simple, real-world example to be found in git (this is one of the patches I worked on to get gold running). Since I’m really getting quite used to draw UML diagrams lately, I’m going to start with a diagram of the current situation. Please do note that I’m actually simplifying a lot since there are a number of components in GIT, of course, and also both libcrypto and libssl implement a number of interfaces. Instead, I have reduced the whole situation at a total of three interfaces, two libraries and one component. But it should do.

Example package diagram for GIT and OpenSSL

Yes I like pastel colours on diagrams.

So in this diagram we got two real packages (git and OpenSSL), that for what we’re concerned install respectively one application (imap-send) and two libraries (libssl and libcrypto). Of the two libraries, the former exports only SSL functions, while the latter exports HMAC methods and the EVP interface (which is a higher-level interface for multiple crypto and hash algorithms) — before somebody comments on this, yes I know they both export a lot more than that, I sincerely don’t care right now, I’m simplifying enormously to make this bearable.

As a specific note, I’m going to define the <<access>> and <<import>> terms in this way:

  • <<access>> means the first package is requiring symbols from the second, so it uses the second; this dependency is finalised during compile phase;
  • <<import>> means the first package includes a NEEDED entry for the second, so it links to the second, and thus it is finalised during link phase.

Now that we have an idea of the situation, let’s analyse the dependencies between the parties:

  • first of all we can rule out problems between the two libraries: the SSL functions access the EVP interface, but then again, libssl imports libcrypto, solving the dependency correctly;
  • then there is the dependency between imap-send and libssl: since it access the SSL interface, the component imports libssl, and that’s also correct for the dependencies;
  • on the other hand, while imap-send also accesses HMAC and EVP interfaces, it doesn’t import the libcrypto library; here is our problem.

In the case of the Apple linker and gold, this is an error condition: you cannot make use of the transitive import to produce a stable binary, so the link is rejected; you have to balance the two, by importing (linking in) both libssl and libcrypto. To be honest, in a situation like this, though, there is little to gain beside properness by restoring the balance — after all, the two libraries always come together and are linked one with the other.

There are, though, a number of situations where this does really matter a lot. Dependencies finalised during the compile phase cannot be swapped out at runtime if the binary interface changes, which is why we have the link phase and why we have sonames to express the binary interface compatibility. After seeing a real-world example, let’s go back to an abstract example.

Package diagram of the first abstract example

In this situation we have some given component (can be either an executable, another library, a plugin, it doesn’t really matter to what I’m going to talk about), that accesses interfaces from two different libraries: libfoo, and a dependency of that, libbar, but only links to the former — this is the minimal example of the GIT/OpenSSL interaction above. Just like in the previous real-world case, this situation, as is, is stable: since libfoo.so.1 links to libbar.so.1, the interfaces that the component access are to be found.

Now let’s assume that libbar gets updated to a new version, which is no longer ABI compatible (libbar.so.2); for the sake of argument, libfoo.so.1 still works, either because it was designed to support both source interfaces, or because only ABI changed, but API didn’t (for instance, when 32-bit parameters are replaced with 64-bit ones, you break ABI, but not API), or because there is some source-level compatibility interface. Or, to put it very very bluntly, you only need to rebuild the packages for libbar.so.2 to be usedreminds you of something?

Package diagram of the second abstract example

If you were to run revdep-rebuild (or any other similar approach), it would only rebuild libfoo.so.1 to link to the new dependency; at that point, you’d have libfoo.so.1 updated to both access the new interface and import the new library, but the original component would still be accessing the interfaces from libbar.so.1!

At this point, it depends on how libbar is designed what would happen; if the library was designed with symbol-versioning in mind from the start, or if it avoid re-using the same names between functions with the same ABI, then the original component would fail to load (or execute) because of missing symbols – of course with due caveats – but if they didn’t ensure this, and actually two interfaces with different ABIs shared the same name, you’d get the same effect of a symbol collision: crash, corruption, or generally unpredictable results.

Keeping all I explained in this post in mind, I’m pretty sure you can see the advantage in gold not allowing underlinking to begin with, and why I’m so scared of the softer --as-needed approach. Even if we’re not going to allow gold to be used as default linker anytime soon, we definitely have a number of good reasons why we should at least start using it to run the tinderbox — after all, even if it’s just triggered by gold, fixing underlinking will make it nicer for good old ld users.

Tomorrow, I’ll probably post something about the technical issues regarding testing gold, since there are at least a few.

Risky linkers

I don’t usually advertise much of what I do as a job for a living, and I’d like to keep it this way. It’s usually easy to guess when you look at what my commits are though; so for instance, when you see me committing new versions of net-proxy/squidclamav you can guess I’m working for proxy for the usual customer of mine. When you see me committing a version bump for sys-block/dd_rescue you guess I have a broken harddisk to recover data from, and so on.

Interpreting the stuff I’m committing around the tree for my latest job is probably quite tricky and will likely lead nowhere beside the need to get things cross-compiling correctly but that’s not something new, I have done it before. It might lead you a bit more when you think that I’m currently looking at working on getting packages to play nice with the gold linker. Yes the linker I criticised in many places.

Beside the fact that using the gold linker is required for what I’m working on, I have another interest in trying it out: it has neither the soft --as-needed behaviour, nor it seems to shut up if a symbol can be resolved in an indirect dependency (which is a behaviour shared with the Apple linker). This makes it particularly interesting for testing proper linking.

Now, I know a few people wondered why the gold USE flag was dropped from the tree ebuilds altogether; well it turns out that it’s quite disruptive still; this is what happens on the main tinderbox after rebuilding gold with gold itself:

tinderbox ~ # ld -v
Segmentation fault

Unfortunately I didn’t pay enough attention to this when I first tried it, so the result is that the 64-bit tinderbox is currently out of order and needs to be re-created from scratch. This is twice unfortunate because I didn’t back it up after setting it up, so it’s going to take quite a while before I can resume the 64-bit builds (for the 32-bit builds it’s going to take just a moment as I can simply download a safe binutils binary package).

Now, I’m sure there are working versions of gold – as I’m actually working with one on a separate container – but if I may, I still am a little concerned that, over two years after gold became the news, we still lack a version of it that is sufficiently stable to provide us with a daily-basis usable linker. Not that it says anything; I’m pretty sure that with the exception of developers, integrators, and Gentoo users, there is little interest in having a fast linker, so the fact that it doesn’t get much attention to be implemented in a timely fashion really doesn’t concern me as much. On the other hand, it shows that sometimes, even though rewriting everything from scratch in a different language can gain you a lot in better design, you still lose a lot in either flexibility or knowledge base that is still very important, and would regress if missed.

Anyway the bottom line for this is that I won’t be able to use the stricter gold for testing linking on the tinderbox, just like I cannot use --no-undefined or sneak -Werror-implicit-function-declaration into the compiler’s flags, even though all of those are pretty good devices to identify issues. Sigh!

Linkers and names

I received a question by mail the other day from Randy, about sonames and in particular their versioning which I already discussed about a couple of times. On the other hand, his concerns are good enough to warrant for another blog post on the matter, hoping not to bore my readers too much with it.

First of all, let me repeat that there are a number of different names that a shared library “owns”, on Unix and Unix-like systems based on ELF:

  • the actual name of the filename;
  • the name the link editor (or build-time linker if you prefer, ld) uses to link to it;
  • the name the loader (or dynamic linker if you prefer, ld.so) uses to find it.

The obvious way to correlate all of them together is to use symbolic links; and that’s why your /usr/lib directory is full of those. But how that happen?

When linking, the link editor (ld) can take a library as parameter in two different ways: either with full path to the library (such as /usr/lib/libcrypto.so.0.9.8) or via a link-library parameter (-lcrypto); in the first case, the linker know exactly which files it has to link against; in the latter, though, it has to go look for it.

The paths where the linker search for the library is worth discussing separately, and all in all the library search path list is not something that is important to the aim of this article. What we do care about is the basename the linker is going to look out for; well, the answer is easy, it applies the same substitution you’d have with sed and s/^-l(.+)$/lib1.so/ — it prepends lib and adds .so to the end of the library name. This is why all the libraries you can link to have a .so suffix-only variant, beside the full name and the version-suffix symlink.

Obviously, when the link-editor adds the dependencies on the libraries to the output, it has to use some kind of name; the full path is not a viable option (you might be linking to a non-installed library that will be installed later), so you can either use the library base name (PE – the format used by Windows – did this, but Microsoft seems to have changed idea when .NET came around), or find some other value to link (no pun intended) it to, which is what we’re concerned with here, since that’s what Unix (and Linux) do.

Each library declares its canonical name (the soname, standing for Shared Object Name); the linker will copy that value from the tag called DT_SONAME to a DT_NEEDED entry tag of the output file. Now, these two tags are really what’s important in this discussion; as you saw, the two are respectively consumed and produced by the link editor. The loader, instead, only cares about DT_NEEDED: it uses that tag to know the filename of the libraries to look for in its library search path (that does not necessary have to be the same as the one for the link editor).

Actually, it’s not all true; there are a number of libraries that don’t declare a soname at all, in some cases that’s correct, for instance plugins don’t need a soname at all since they are never linked against, but a good many times, it’s actually a bug in the build system itself. But that’s a topic for another day.

The loader, for good or bad, does not check that the value in DT_SONAME of the loaded library corresponds to the DT_NEEDED tag it encountered — this is why symlinking libraries with incompatible ABIs seem to work at first: the loader does not fail to load them. This means that it really doesn’t matter what a DT_NEEDED tag points to, as long as, within the library path, there is a file (or symlink) with that name. This lets you do some cool but braindead thinks, which I’d rather not write of myself.

To ensure that the loader is able to find the library, then, there is need to have the symlinks in place that correspond to the name it’s going to look for. Now this can be done by two pieces of software, at two different times. The first is, well, libtool, which takes care of the task both at build-time and install-time, and makes both the link used by the loader, and that used by the link editor to find the library. A number of other build systems imitate that behaviour as well.

The second component that does so is ldconfig, but only on GNU systems; by default it recreates all the symlinks to sonames in the library search path (of the loader); with various options, it processes particular libraries or paths. The result is pretty nifty, but not portable. I discovered this myself a long time ago when working on Gentoo/FreeBSD, as the preplib command from Portage failed on that architecture.

With hindsight, I wish I didn’t ask for preplib to die, but rather rewrite it to use scanelf to do the job rather than ldconfig.

What probably confused Randy, is that many documents out there assumes the libtool/Linux semantic of soname and versioning. I call it libtool/Linux because libtool changes its soname semantic depending on the operating system it’s working on, as again I found out while working on Gentoo/FreeBSD. In Linux, it produces a file with a three-components name, and sets DT_SONAME to just the first component; on FreeBSD (by default) it creates a file with a two-components name, and sets DT_SONAME to the full name.

In Gentoo/FreeBSD, we hack libtool to use the same naming scheme as Linux, since that is what library developers tend to think about when setting the version components. Without this hack, each minor version of glib would have a new soname, requiring all the linked software to be re-built, even if the ABI didn’t change, or changed in a compatible way.

I admit I haven’t even started looking up the reasoning for either the version scheme; they are probably mostly historical; in the case of FreeBSD, having the soname and the full name to match is probably needed to avoid the need for symlinks (as ldconfig does not create them as I said above). In the case of Linux, I’m not really sure why so much complexity. Without digressing too much, the components passed to -version-info don’t match directly the actual components used in either the Linux or FreeBSD semantics; they are both related by way of not-too-simple arithmetic functions.

At any rate, neither the link editor nor the loader have a real concept of “ABI version”, which is why they don’t enforce it at all. So you can have libraries that don’t even update their ABI version at all, or others that use the full version as ABI, and bump it even if there aren’t incompatible changes.

So to answer Randy’s final question, this happens because each project is free to decide what its soname is, and thus its soversion. In the case of OpenSSL, they use a three-components version that is the same of the file name, and of the version of the released package, as can be seen by looking at the following scanelf -S output from the tinderbox :

tinderbox ~ # scanelf -S /usr/lib/libcrypto.so.*
 TYPE   SONAME FILE 
ET_DYN libcrypto.so.0.9.8 /usr/lib/libcrypto.so.0.9.8 
ET_DYN libcrypto.so.1.0.0 /usr/lib/libcrypto.so.1.0.0 

Any document that implies explicitly that the soname has to consist of the library name and its ABI version, is simply wrong. That is a practice that definitely should be followed, but there are no technical restrictions in there, they are only practical restrictions.

Yes, it is a sorry state of the affairs.