Redundant symbols

So I’ve decided to dust off my link collision script and see what the situation is nowadays. I’ve made sure that all the suppression file use non-capturing groups on the regular expressions – as that should improve the performance of the regexp matching – make it more resilient to issues within the files (metasploit ELF files are barely valid), and run it through.

Well, it turns out that the situation is bleaker than ever. Beside the obvious amount of symbols with a too-common name, there are still a lot of libraries and programs exporting default bison/flex symbols the same way I found them in 2008:

Symbol yylineno@ (64-bit UNIX - System V AMD x86-64) present 59 times
Symbol yyparse@ (64-bit UNIX - System V AMD x86-64) present 53 times
Symbol yylex@ (64-bit UNIX - System V AMD x86-64) present 49 times
Symbol yy_flush_buffer@ (64-bit UNIX - System V AMD x86-64) present 48 times
Symbol yy_scan_buffer@ (64-bit UNIX - System V AMD x86-64) present 48 times
Symbol yy_scan_bytes@ (64-bit UNIX - System V AMD x86-64) present 48 times
Symbol yy_scan_string@ (64-bit UNIX - System V AMD x86-64) present 48 times
Symbol yy_create_buffer@ (64-bit UNIX - System V AMD x86-64) present 47 times
Symbol yy_delete_buffer@ (64-bit UNIX - System V AMD x86-64) present 47 times
[...]

Note that at least one library got to export them to be listed in this output; indeed these symbols are present in quite a long list of libraries. I’m not going to track down each and every of them though, but I guess I’ll keep an eye on that list so that if problems arise that can easily be tracked down to this kind of collisions.

Action Item: I guess my next post is going to be a quick way to handle building flex/bison sources without exposing these symbols, for both programs and libraries.

But this is not the only issue — I’ve already mentioned a long time ago that a single installed system already brings in a huge number of redundant hashing functions; on the tinderbox as it was when I scanned it, there were 57 md5_init functions (and this without counting different function names!). Some of this I’m sure boils down to gnulib making it available, and the fact that, unlike the BSD libraries, GLIBC does not have public hashing functions — using libcrypto is not an option for many people.

Action item: I’m not very big of benchmarks myself, never understood the proper way to go around getting the real data rather than being fooled by the scheduler. Somebody who’s more apt at that might want to gather a bunch of libraries providing MD5/SHA1/SHA256 hashing interfaces, and produce some graphs that can let us know whether it’s time to switch to libgcrypt, or nettle, or whatever else that provides us with good performance as well as with a widely-compatible license.

The presence of duplicates of memory-management symbols such as malloc and company is not that big of a deal, at first sight. After all, we have a bunch of wrappers that use interposing to account for memory usage, plus another bunch to provide alternative allocation strategies that should be faster depending on the way you use your memory. The whole thing is not bad by itself, but when you get one of graphviz’s libraries (libgvpr) to expose malloc something sounds wrong. Indeed, if even after updating my suppression filter to ignore the duplicates coming from gperftools and TBB, I get 40 copies of realloc() something sounds extremely wrong:

Symbol realloc@ (64-bit UNIX - System V AMD x86-64) present 40 times
  libgvpr
  /mnt/tbamd64/bin/ksh
  /mnt/tbamd64/bin/tcsh
  /mnt/tbamd64/usr/bin/gtk-gnutella
  /mnt/tbamd64/usr/bin/makefb
  /mnt/tbamd64/usr/bin/matbuild
  /mnt/tbamd64/usr/bin/matprune
  /mnt/tbamd64/usr/bin/matsolve
  /mnt/tbamd64/usr/bin/polyselect
  /mnt/tbamd64/usr/bin/procrels
  /mnt/tbamd64/usr/bin/sieve
  /mnt/tbamd64/usr/bin/sqrt
  /mnt/tbamd64/usr/lib64/chromium-browser/chrome
  /mnt/tbamd64/usr/lib64/chromium-browser/chromedriver
  /mnt/tbamd64/usr/lib64/chromium-browser/libppGoogleNaClPluginChrome.so
  /mnt/tbamd64/usr/lib64/chromium-browser/nacl_helper
  /mnt/tbamd64/usr/lib64/firefox/firefox
  /mnt/tbamd64/usr/lib64/firefox/firefox-bin
  /mnt/tbamd64/usr/lib64/firefox/mozilla-xremote-client
  /mnt/tbamd64/usr/lib64/firefox/plugin-container
  /mnt/tbamd64/usr/lib64/firefox/webapprt-stub
  /mnt/tbamd64/usr/lib64/mercury/lib/asm_fast.gc/libcurs.so
  /mnt/tbamd64/usr/lib64/mercury/lib/asm_fast.gc/libmcurses.so
  /mnt/tbamd64/usr/lib64/mercury/lib/asm_fast.gc.memprof/libcurs.so
  /mnt/tbamd64/usr/lib64/mercury/lib/asm_fast.gc.memprof/libmcurses.so
  /mnt/tbamd64/usr/lib64/mercury/lib/asm_fast.gc.prof/libcurs.so
  /mnt/tbamd64/usr/lib64/mercury/lib/asm_fast.gc.prof/libmcurses.so
  /mnt/tbamd64/usr/lib64/mercury/lib/asm_fast.gc.trseg.debug/libcurs.so
  /mnt/tbamd64/usr/lib64/mercury/lib/asm_fast.gc.trseg.debug/libmcurses.so
  /mnt/tbamd64/usr/lib64/mercury/lib/asm_fast.gc.trseg/libcurs.so
  /mnt/tbamd64/usr/lib64/mercury/lib/asm_fast.gc.trseg/libmcurses.so
  /mnt/tbamd64/usr/lib64/mercury/lib/hlc.gc/libcurs.so
  /mnt/tbamd64/usr/lib64/mercury/lib/hlc.gc/libmcurses.so
  /mnt/tbamd64/usr/lib64/mercury/lib/hlc.gc.trseg/libcurs.so
  /mnt/tbamd64/usr/lib64/mercury/lib/hlc.gc.trseg/libmcurses.so
  /mnt/tbamd64/usr/lib64/OpenFOAM/OpenFOAM-1.6/lib/libhoard.so
  /mnt/tbamd64/usr/lib64/thunderbird/mozilla-xremote-client
  /mnt/tbamd64/usr/lib64/thunderbird/plugin-container
  /mnt/tbamd64/usr/lib64/thunderbird/thunderbird
  /mnt/tbamd64/usr/lib64/thunderbird/thunderbird-bin

Now it is true that it’s possible, depending on the usage patterns, to achieve a much better allocation strategy than the default coming from GLIBC — on the other hand, I’m also pretty sure that GLIBC’s own allocation improved a lot in the past few years so I’d rather use the standard allocation than a custom one that is five or more years old. Again this could use some working around.

In the list above, Thunderbird and Firefox for sure use (and for whatever reason re-expose) jemalloc; I have no idea if libhoard in OpenFOAM is another memory management library (and whether OpenFOAM is bundling it or not), and Mercury is so messed up that I don’t want to ask myself what it’s doing there. There are though a bunch of standalone programs listed as well.

Action item: go through the standalone programs exposing the memory interfaces — some of them are likely to bundle one of the already-present memory libraries, so just make them use the system copy of it (so that improvements in the library trickle down to the program), for those that use custom strategies, consider making them optional, as I’d expect most not to be very useful to begin with.

There is another set of functions that are similar to the memory management functions, which is usually brought in by gnulib; these are convenience wrappers that do error checking over the standard functions — they are xmalloc and friends. A quick check, shows that these are exposed a bit too often:

Symbol xmemdup@ (64-bit UNIX - System V AMD x86-64) present 37 times
  liblftp-tasks
  libparted
  libpromises
  librec
  /mnt/tbamd64/usr/bin/csv2rec
  /mnt/tbamd64/usr/bin/dgawk
  /mnt/tbamd64/usr/bin/ekg2
  /mnt/tbamd64/usr/bin/gawk
  /mnt/tbamd64/usr/bin/gccxml_cc1plus
  /mnt/tbamd64/usr/bin/gdb
  /mnt/tbamd64/usr/bin/pgawk
  /mnt/tbamd64/usr/bin/rec2csv
  /mnt/tbamd64/usr/bin/recdel
  /mnt/tbamd64/usr/bin/recfix
  /mnt/tbamd64/usr/bin/recfmt
  /mnt/tbamd64/usr/bin/recinf
  /mnt/tbamd64/usr/bin/recins
  /mnt/tbamd64/usr/bin/recsel
  /mnt/tbamd64/usr/bin/recset
  /mnt/tbamd64/usr/lib64/lftp/4.4.2/liblftp-network.so
  /mnt/tbamd64/usr/lib64/libgettextlib-0.18.2.so
  /mnt/tbamd64/usr/lib64/man-db/libman-2.6.3.so
  /mnt/tbamd64/usr/libexec/gcc/x86_64-pc-linux-gnu/4.6.3/cc1
  /mnt/tbamd64/usr/libexec/gcc/x86_64-pc-linux-gnu/4.6.3/cc1obj
  /mnt/tbamd64/usr/libexec/gcc/x86_64-pc-linux-gnu/4.6.3/cc1plus
  /mnt/tbamd64/usr/libexec/gcc/x86_64-pc-linux-gnu/4.6.3/f951
  /mnt/tbamd64/usr/libexec/gcc/x86_64-pc-linux-gnu/4.6.3/jc1
  /mnt/tbamd64/usr/libexec/gcc/x86_64-pc-linux-gnu/4.6.3/lto1
  /mnt/tbamd64/usr/libexec/gcc/x86_64-pc-linux-gnu/4.7.2/cc1
  /mnt/tbamd64/usr/libexec/gcc/x86_64-pc-linux-gnu/4.7.2/cc1obj
  /mnt/tbamd64/usr/libexec/gcc/x86_64-pc-linux-gnu/4.7.2/cc1plus
  /mnt/tbamd64/usr/libexec/gcc/x86_64-pc-linux-gnu/4.7.2/f951
  /mnt/tbamd64/usr/libexec/gcc/x86_64-pc-linux-gnu/4.7.2/jc1
  /mnt/tbamd64/usr/libexec/gcc/x86_64-pc-linux-gnu/4.7.2/lto1
  /mnt/tbamd64/usr/libexec/gnat-gcc/x86_64-pc-linux-gnu/4.5/cc1
  /mnt/tbamd64/usr/libexec/gnat-gcc/x86_64-pc-linux-gnu/4.5/gnat1
  /mnt/tbamd64/usr/libexec/gnat-gcc/x86_64-pc-linux-gnu/4.5/lto1

In this case they are exposed even by the GCC tools themselves! While this brings me again to complain that gnulib show now actually be libgnucompat and be dynamically linked, there is little we can do about these in programs — but the symbols should not creep in system libraries (mandb has the symbols in its private library which is marginally better).

Action item: check the libraries exposing the gnulib symbols, and make them expose only their proper interface, rather than every single symbol they come up with.

I suppose that this is already quite a bit of data for a single blog post — if you want a copy of the symbols’ index to start working on some of the action items I listed, just contact me and I’ll send it to you, it’s a big too big to just have it published as is.

Dealing with insecure runpaths

Somehow, lately I’ve had a few more inquiries about runpaths than usual — all for different packages, which makes it quite a bit more interesting. Even though not all the questions regarded Gentoo’s “insecure runpaths” handling, I think it might be well worth it to write a bit more about it. Before going into details about it, though, I’ll point you to my previous post for a description of what runpaths (and rpath) are, instead of repeating it all here.

So, what is an insecure runpath? Basically any runpath set to a directory where an attacker may have control is insecure, because it allows an attacker to load an arbitrary shared object which can then run arbitrary code. The definition of attacker, as I already discussed is flexible depending on the situation.

Since runpaths are, as the name suggest, paths on the filesystem, there are two main starting points that would cause a path to become insecure: runpaths derived from the current working directory, and runpaths derived from any world-writable directory. The ability for an attacker to place the correct object in the correct path varies considerably, but it’s a good rule of thumb to consider the both of them just as bad. What happens for packages built in Gentoo often enough, is for the runpath to include the build directory in its runpath, which could be either /var/tmp or /tmp (if you, for instance, build in tmpfs — please tell me you’re not using /dev/shm for that!).

Depending on how your system is set up, this might not be as insecure as it might sound at first, because for instance in my laptop, /var/tmp/portage is always present, and always owned by the portage user, which makes it less vulnerable to a random user wanting to move there a particularly nasty shared object… on the other hand, assuming that it’s secure enough is the wrong move, full stop.

Before I start panicking people — Portage is smarter than you think and it has been, by default, stripping insecure runpaths for a while. Which means that you don’t have to fret about fixing the issues when you see the warning — but you really should look into fixing them for good. I would also argue that since Portage already strips the runpaths at install time, you shouldn’t just use chrpath to remove the insecure paths after the build, but you should either fix it properly or leave it be, in my opinion. This opinion might be a bit too personal, as I don’t know if pkgcore or paludis support the same kind of fixing.

So let’s go back to the two sources of insecurity in the runpaths. In the case of a current directory being the base for the insecure path, which means having an rpath of either . or ../something, this is usually bad logic in the way the package tries to set its rpath, as instead of the current work directory, the developers most likely wanted to refer to where the executable itself is installed, which is, instead, $ORIGIN literally. Another common situation is for that literal to be treated as a variable name, so that $ORIGIN/../mylibs becomes something like /../mylibs which is also wrong.

But a much more common situation arises when our build directory is injected into the runpath. Something along the lines of /var/tmp/portage/pkg-category/package-0.0.0/image/usr/lib64/mypkgslibs — more rarely it would point to the build directory rather than the image directory. In many cases this happens because the upstream build system does not know about, or mishandles, the DESTDIR variable.

The DESTDIR variable is commonly used to install a software package at a given offset — binary distributions will then generate a binary package out of it, source distributions like Gentoo will then merge the installed copy to the live filesystem after recording which files have been installed. In either case, the understanding behind this variable is that the final location of the executables will not include it. Unfortunately not all build systems do support it, so in some cases we end up doing something a bit more hackish by replacing /usr with ${D}/usr in the definition of the install prefix. The prefix is, though, commonly used to identify where the executables will be at the end, so it would be possible for a build system to have, in the parameters, -rpath ${prefix}/lib/mylibs which would then inject ${D} on the runpath.

As you can see, for most common situations it’s a matter of getting upstream to fix their build system. In other cases, the problem is that the ebuild is installing files without going through the build system’s install phase, which, at least with libtool, would often re-link the object files to make sure the rpath is handled correctly.

Beside this, there isn’t much more I can add, I’m afraid.

USEless flag: static-libs

You might remember that in my graphical post I tried making it clear that in some cases static libraries are not only unnecessary, but also unusable. It so happens that not everybody has that clear yet, so let me try to explain a bit more where this USE flag is unwarranted entirely.

Language bindings. While it is technically possible to have static extensions built in in Ruby and Python, said situations are very convoluted and aren’t really solid enough to be used for Gentoo packaging; those who want to go with that route are generally using custom build systems, because they require to build in the whole interpreter and binding libraries together.

This means that, to give an idea, these archives shouldn’t have been there to begin with:

/usr/lib/python2.7/site-packages/beagle/beagle.a
/usr/lib/python2.7/site-packages/pygetdata.a
/usr/lib/python2.7/site-packages/numpy/core/lib/libnpymath.a
/usr/lib/python2.7/site-packages/_mathgl.a
/usr/lib/python3.2/site-packages/numpy/core/lib/libnpymath.a

Note: there are no Ruby files in that list, but just because the tinderbox is currently unable to build any Ruby version due to GLIBC 2.14 issues that are yet unresolved upstream.

So if you’re building language bindings, adding a static-libs USE flag is just pointless: remove it and disable static libraries altogether, so that you don’t have to build source files twice (PIC and non-PIC).

Runtime-loaded plugins. It is unfortunate how little developers know about the limitations of runtime-loaded plugins (i.e., all plugins; I’m just making it clear, because buildtime-included plugins are rather built-ins). One of these is the fact that you only need the .so file to load them. So these are certainly smelling funny:

/usr/lib/gstreamer-0.10/libgstmimic.a
/usr/lib/gstreamer-0.10/libgstvp8.a
/usr/lib/libvisual-0.4/morph/morph_alphablend.a
/usr/lib/libvisual-0.4/morph/morph_flash.a
/usr/lib/libvisual-0.4/morph/morph_tentacle.a
/usr/lib/libvisual-0.4/morph/morph_slide.a
/usr/lib/libvisual-0.4/actor/actor_lv_scope.a
/usr/lib/libvisual-0.4/actor/actor_jakdaw.a
/usr/lib/libvisual-0.4/actor/actor_oinksie.a
/usr/lib/libvisual-0.4/actor/actor_bumpscope.a
/usr/lib/libvisual-0.4/actor/actor_gforce.a
/usr/lib/libvisual-0.4/actor/actor_infinite.a
/usr/lib/libvisual-0.4/actor/actor_lv_analyzer.a
/usr/lib/libvisual-0.4/actor/actor_corona.a
/usr/lib/libvisual-0.4/actor/actor_JESS.a

It is true that libtool’s libltdl theoretically allows you to decide at build time whether to build the code in or use it as a plugin, but even then, only one version of the two is ever needed. So if your package only installs one of these, simply remove the USE flag and disable static libs altogether; pretty please.

Also note that while there are packages that allows you to use plugins or builtins, for instance Apache allows you to choose whether to build the modules in or build plugins to load at runtime. But this only works for first-party modules, as building third party modules within Apache requires to have their sources available when building Apache itself.

Libraries relying heavily on plugins. Similarly, if you’re packaging a library that includes a plugins’ system, it is almost certain that the plugins link back to the library. This is actually required to avoid underlinking and to ensure that no undefined symbols are present in the final link of the plugin (well, there are exceptions, but let’s not dig into those right now).

Why in this case you don’t want static libraries? Well, if you were to build a frontend against the static library, it would then load the plugin and the dynamic library as well, not only doubling the loaded code, but causing symbol collisions one with the other, which is fine only until the two versions get out of sync (i.e. you update the library package, without rebuilding the frontend).

This only works if you can convert all the plugins into builtins on the fly, but I know of no project that allows you to do that.

Almost everything else. Don’t get me wrong, I knwo that sometimes we do need static libraries. But it’s also true that in most cases we don’t. I don’t know why people insist on having static-libs an USE flag by default. My personal preference would be to have static-libs as an USE flag when it is actually being used.

This means that unless there are reverse dependencies building against the static libraries, the package should not have the static-libs USE flag at all. Unfortunately this requires the user to enable it forcefully when they want to build something statically (and I admit it happened to me before a time or two).. but let’s be real: most people will not do that.

Indeed, if it wasn’t for PostgreSQL and lvm2/udev (the latter of which I would have avoided with a USE=-static on lvm2), I don’t think I would have noticed the DWARF-4 binutils bug that I had lingering on my system for over a month!

Gold readiness obstacle #6: more versioning trouble

And let’s keep talking about gold and the issues I’m encountering trying to build Gentoo with it. Waiting to see if glibc will ever get to implement the base versioning or if Ian would like to implement default versions I’ve found a third problem with the versioning support.

The problem, which I’ve reported as Sourceware bug #12893 as of today, is displayed by the libdebian-installer package: somehow the link editor is reporting two duplicated symbols… in the same source file. At first I thought it was some nasty bug with the link editor’s internals; what I found instead was a curious setting fort the source code itself.

What happens is this: one source file defines a function (with an additional alias, let’s ignore that for now); then it also uses the already described .symver directive to provide an alias which has a version:

int di_system_prebaseconfig_append (const char *udeb, const char *format, ...)
{ /* doesn't care about its body */ }

__asm__ (".symver di_system_prebaseconfig_append,di_system_prebaseconfig_append@LIBDI_4.0");

Do note that the original symbol is not marked static, which means it is also exposed as a base version. This by itself would be just fine, if it wasn’t that said source file is compiled into a translation unit, which is (indirectly, but I’ll simplify) used to produce a shared library. Said shared library uses a link editor version script to set the symbol/version pair of various symbols, like it is often done by properly-designed shared objects.

What becomes a problem is that the symbol’s name is also listed in the version script; which means the link editor will take the unversioned (base) symbol, and label it with the designed version (LIBDI_4.0)… causing two symbols with same name and version to be created. It should appear obvious that something’s wrong with the logic of this whole situation.

Why does this not cause a problem with the old bfd link editor? I’ve got no idea, although I can possibly speculate on the fact that the two symbols not only have the same name and version, but also the same address, which is likely to give the link editor enough clue to simply disregard the duplicate. On the other hand, the solution of this should probably be applied to the libdebian-installer package, the design of which I know nothing about; it might have been intended to support both shared and static linking, but it would look quite strange even in such a case.

At any rate, I’ll have to wait for Ian to express his opinion, and in the mean time I’ll be catching up with a few more buggy packages. I guess I don’t have any hurry, given that libtirpc is not fixed after all as it still reports missing symbols, but it does so only on glibc 2.14, which means that the main tinderbox won’t be able to be of much help for a little while.

Gold readiness obstacle #5: libtool (part 2, OpenMP)

After some digression on the issues with the latest glibc version, it’s time for the fifth instalment of gold readiness for the Gentoo tree, which is completing the libtool issues that I noted a couple of days ago in part 1.

As I said, this relates to a known issue with libtool and OpenMP, to the point that there is a Gentoo bug open and upstream libtool package is already fixed to deal with this, it’s just not trickled down to a release and from there into Gentoo. Although I guess it might be a good idea to just apply this with elibtoolize as I’ve done for the configure fix for gold.

What is the problem? Well, the first problem is with the design of OpenMP language extensions, and with some other flags that implicitly enable those extensions. While these are flags that relate to the GCC frontend (gcc command), they not only change the semantics of the compiled code, they also change the semantics of the linking step, by adding a link to the OpenMP implementation library. This means that the frontend needs to know about this both at compile time as well as link time (where the frontend converts it to the proper linking flags for the link editor to pick up OpenMP).

Unfortunately, when using libtool, its script is parsing and mangling the options first (it’s for this reason that libtool had to be patched to get --as-needed working as intended). Up to now, it didn’t know that -fopenmp should have been passed to the linking frontend of gcc just the same.

Okay, in truth this is not much of an issue for gold only; it is just the same issue when using ld/bfd. But the switch to a link editor that has stricter underlinking rejection makes the issue much more apparent, in particular because, while libtool is usually involved in building the libraries, there is no reason (beside the slight chance of using static linking through libtool archives) to force its usage for building final executables, which means that a single -fopenmp at the final linking point would be quite enough.

Linkers and names

I received a question by mail the other day from Randy, about sonames and in particular their versioning which I already discussed about a couple of times. On the other hand, his concerns are good enough to warrant for another blog post on the matter, hoping not to bore my readers too much with it.

First of all, let me repeat that there are a number of different names that a shared library “owns”, on Unix and Unix-like systems based on ELF:

  • the actual name of the filename;
  • the name the link editor (or build-time linker if you prefer, ld) uses to link to it;
  • the name the loader (or dynamic linker if you prefer, ld.so) uses to find it.

The obvious way to correlate all of them together is to use symbolic links; and that’s why your /usr/lib directory is full of those. But how that happen?

When linking, the link editor (ld) can take a library as parameter in two different ways: either with full path to the library (such as /usr/lib/libcrypto.so.0.9.8) or via a link-library parameter (-lcrypto); in the first case, the linker know exactly which files it has to link against; in the latter, though, it has to go look for it.

The paths where the linker search for the library is worth discussing separately, and all in all the library search path list is not something that is important to the aim of this article. What we do care about is the basename the linker is going to look out for; well, the answer is easy, it applies the same substitution you’d have with sed and s/^-l(.+)$/lib1.so/ — it prepends lib and adds .so to the end of the library name. This is why all the libraries you can link to have a .so suffix-only variant, beside the full name and the version-suffix symlink.

Obviously, when the link-editor adds the dependencies on the libraries to the output, it has to use some kind of name; the full path is not a viable option (you might be linking to a non-installed library that will be installed later), so you can either use the library base name (PE – the format used by Windows – did this, but Microsoft seems to have changed idea when .NET came around), or find some other value to link (no pun intended) it to, which is what we’re concerned with here, since that’s what Unix (and Linux) do.

Each library declares its canonical name (the soname, standing for Shared Object Name); the linker will copy that value from the tag called DT_SONAME to a DT_NEEDED entry tag of the output file. Now, these two tags are really what’s important in this discussion; as you saw, the two are respectively consumed and produced by the link editor. The loader, instead, only cares about DT_NEEDED: it uses that tag to know the filename of the libraries to look for in its library search path (that does not necessary have to be the same as the one for the link editor).

Actually, it’s not all true; there are a number of libraries that don’t declare a soname at all, in some cases that’s correct, for instance plugins don’t need a soname at all since they are never linked against, but a good many times, it’s actually a bug in the build system itself. But that’s a topic for another day.

The loader, for good or bad, does not check that the value in DT_SONAME of the loaded library corresponds to the DT_NEEDED tag it encountered — this is why symlinking libraries with incompatible ABIs seem to work at first: the loader does not fail to load them. This means that it really doesn’t matter what a DT_NEEDED tag points to, as long as, within the library path, there is a file (or symlink) with that name. This lets you do some cool but braindead thinks, which I’d rather not write of myself.

To ensure that the loader is able to find the library, then, there is need to have the symlinks in place that correspond to the name it’s going to look for. Now this can be done by two pieces of software, at two different times. The first is, well, libtool, which takes care of the task both at build-time and install-time, and makes both the link used by the loader, and that used by the link editor to find the library. A number of other build systems imitate that behaviour as well.

The second component that does so is ldconfig, but only on GNU systems; by default it recreates all the symlinks to sonames in the library search path (of the loader); with various options, it processes particular libraries or paths. The result is pretty nifty, but not portable. I discovered this myself a long time ago when working on Gentoo/FreeBSD, as the preplib command from Portage failed on that architecture.

With hindsight, I wish I didn’t ask for preplib to die, but rather rewrite it to use scanelf to do the job rather than ldconfig.

What probably confused Randy, is that many documents out there assumes the libtool/Linux semantic of soname and versioning. I call it libtool/Linux because libtool changes its soname semantic depending on the operating system it’s working on, as again I found out while working on Gentoo/FreeBSD. In Linux, it produces a file with a three-components name, and sets DT_SONAME to just the first component; on FreeBSD (by default) it creates a file with a two-components name, and sets DT_SONAME to the full name.

In Gentoo/FreeBSD, we hack libtool to use the same naming scheme as Linux, since that is what library developers tend to think about when setting the version components. Without this hack, each minor version of glib would have a new soname, requiring all the linked software to be re-built, even if the ABI didn’t change, or changed in a compatible way.

I admit I haven’t even started looking up the reasoning for either the version scheme; they are probably mostly historical; in the case of FreeBSD, having the soname and the full name to match is probably needed to avoid the need for symlinks (as ldconfig does not create them as I said above). In the case of Linux, I’m not really sure why so much complexity. Without digressing too much, the components passed to -version-info don’t match directly the actual components used in either the Linux or FreeBSD semantics; they are both related by way of not-too-simple arithmetic functions.

At any rate, neither the link editor nor the loader have a real concept of “ABI version”, which is why they don’t enforce it at all. So you can have libraries that don’t even update their ABI version at all, or others that use the full version as ABI, and bump it even if there aren’t incompatible changes.

So to answer Randy’s final question, this happens because each project is free to decide what its soname is, and thus its soversion. In the case of OpenSSL, they use a three-components version that is the same of the file name, and of the version of the released package, as can be seen by looking at the following scanelf -S output from the tinderbox :

tinderbox ~ # scanelf -S /usr/lib/libcrypto.so.*
 TYPE   SONAME FILE 
ET_DYN libcrypto.so.0.9.8 /usr/lib/libcrypto.so.0.9.8 
ET_DYN libcrypto.so.1.0.0 /usr/lib/libcrypto.so.1.0.0 

Any document that implies explicitly that the soname has to consist of the library name and its ABI version, is simply wrong. That is a practice that definitely should be followed, but there are no technical restrictions in there, they are only practical restrictions.

Yes, it is a sorry state of the affairs.

Are we done with LDFLAGS?

Quite some weeks ago, Markos stated asking for a more thorough testing of ebuilds to check whether they support LDFLAGS; he then described why that is important which can be summarised in “if LDFLAGS are ignored, setting --as-needed by default is pointless”. He’s right on this of course, but there are a few more tricks to consider.

You might remember that I described an alternative approach to --as-needed that I call “forced --as-needed”, by editing the GCC specifications. The idea is that by forcing the flag in through this method, packages ignoring LDFLAGS and packages using unpatched libtool won’t simply ignore it. This was one of the things I had in mind when I suggested that approach, but alas, as it happens, I wasn’t listened to.

Right now there are 768 bugs reported blocking the tracker of which 404 are open and, in the total, 635 were reported by me (mostly through the tinderbox but not limited to). And I’m still opening bugs on a daily basis for those.

But let’s say that these are all the problems there are, and that in two weeks time no package is reporting ignored LDFLAGS at all. Will that mean that all packages will properly build with --as-needed at that point? Not really.

Unfortunately, LDFLAGS-ignored warnings have both false positives (prebuild binary packages, packages linking with non-GNU linkers) and false partially-negatives. To understand that you have to understand how the current warning works. A few binutils versions ago, a new option was introduced, called --hash-style; this option is used to change the way the symbol table is hashed for faster loading by the runtime linker/loader (ld.so); the classic hashing algorithm (SysV) is not excessively good for very long, similar symbols that are way too common when using C++, so there has been some work to replace that with a better GNU-specific algorithm. I’ve followed most of the related development closely at the time, since Michael Meeks (of OpenOffice.org fame) actually came asking Gentoo for some help testing things out; it was that work that actually got me interested in linkers and the ELF format.

At any rate, while the original hash table was created in the .hash section, the new hash table, being incompatible, is in .gnu.hash. The original patch only replaced one with the other, but what actually landed in binutils was slightly different (it allows to choose between SysV, GNU, or both hash tables), and the default has been to enable both hash tables, so that older systems can load .hash and the new ones can load .gnu.hash; win win. On the other hand, on Gentoo (and many other distributions) where excessively old GLIBC versions are not supported at all, there is enough of a reason to use --hash-style=gnu to disable the generation of the SysV table entirely.

Now, the Portage warning is derived by this situation: if you add -Wl,--hash-style=gnu to your LDFLAGS, it will be checking the generated ELF files and warn if it finds the SysV .hash section. Unfortunately this does not work for non-Linux profiles (as far as I know, FreeBSD does not support the GNU hashing style yet; uClibc does), and will obviously report all the prebuilt binaries coming from proprietary packages. In those cases, you don’t want to strip .hash because it might be the only hash table preventing ld.so from doing a linear search.

So, what is the problem with this test? Well, let’s note one big difference between --as-needed and --hash-style flags: the former is positional, the second is not. That means that --as-needed, to work as intended, needs to be added before the actual libraries are listed, while --hash-style can come at any point in the command line. Unfortunately this means that if any package has a linking line such as

$(CC) $(LIBS) $(OBJECTS) $(LDFLAGS)

It won’t be hitting the LDFLAGS warning from Portage, but (basic) --as-needed would fail — OTOH, my forced --as-needed method would work just fine. So there is definitely enough work to do here for the next… years?

Depend on what you use

New Note 20

To this day, we still get --as-needed failures for packages in Gentoo; both for new packages and bumps. To this day, checking the list of reverse dependencies of libpng is not enough to ensure that all the packages build fine with libpng-1.4 (as Samuli found out the hard way). One common problem in both is represented by missing dependencies, which in a big part are caused by transitional transitive dependencies.

Transitional Transitive dependencies are those caused by indirect linking; since I don’t want to bore you all repeating myself you can read about it in this post and this one and finally another one — yes I wrote a lot about the matter.

How do transitional transitive, indirect dependencies, cause trouble with both --as-needed and with upgrade verification? Well it depends on a number of factors actually:

  • the build might work on the developers’ systems because the libraries linked against indirectly bring in the actually needed libraries, either by DT_NEEDED or by libtool archives, but the former libraries aren’t used directly, thus --as-needed breaks the link — misguided link
  • the build might work properly because some of the used (and linked to) libraries optionally use (and link to) the otherwise missing libraries; this work just as long as they are not built without that support; for instance you might use OpenSSL and Curl, separately, then link to Curl only, expecting it to bring in OpenSSL… but Curl might be built against GnuTLS or NSS, or neither;
  • the build might work depending on the versions of libraries used, because one of the linked libraries might replace one library for another, dropping a needed library from the final link.

The same rules generally apply to the DEPEND and RDEPEND variables; relying on another package to bring in your own dependencies is a bad idea; even if you use GTK+ it doesn’t mean that you can avoid listing libpng as a package used, if you use it directly. On the other hand, listing libpng because it’s present in the final link (especially when not using --as-needed) is a bad idea which you definitely should avoid.

By ignoring transitional transitive dependencies, you invalidate the dependency-tree, which means we cannot rely on it when we’re trying to avoid huge fuckups if an important package changed API and ABI. This is why I have (wrongly) snapped back at Samuli for closing the libpng-1.4 tracker before I had the chance of running it through the tinderbox.

Bottomline, please always depend on what you use directly, both in linking and in ebuilds. Please!

Thanks to Odi for letting me know that I used (consistently) the wrong word in the article. This goes to show that I either should stop writing drafts at 3am or I should proofread them before posting.

Plugins and static libraries don’t mix well

There is one interesting thing that, I’m afraid, most devleopers happen to ignore, either spontaneously, or because they really don’t know about them. One of this is the fact that static libraries and plugins usually don’t mix well. Although I have to warn you, that’s not an absolute and they can easily designed to work fine.

The main problem though is to ponder whether it is useful to use static libraries and plugins together, and then it’s to find out if it’s safe to or not. Let’s start from the basis. What are static libraries used for? Mostly for two reasons: performances, and not having to depend on the dynamic version of the same library in the system. If performance of the library is a problem, it’s much more likely that the culprit is the plugins system rather than the dynamic nature of the library; I have said something about it in the past, although I didn’t go much in details and I haven’t had time to continue the discussion yet.

For what concerns dependencies, the plugins usually need a way to access the functions provided by the main library; this means there is an ABI dependency between the two; now the plugins might not link against the library directly, to support static libraries usage, but it also means that if the internal ABI changes in any way between versions, you’re screwed.

What does this mean? That in most cases when you have plugins, you don’t want to have static libraries around; this means that you also don’t need the .la files and so you have quite a bit of cleanup.

More to the point, if you’re building a plugin, you don’t want to build the static version at all, since the plugin will be opened with the dlopen() interface, from the dynamic version of the library (the module). Since not always upstream remember to disable the static archive building in their original build system, ebuild authors should take care of disabling them, either with --disable-static or by patching the build system (if you don’t want to stop all static lib building). And this is not my idea but a proper development procedure (and no, it does not mean any discussion: if it’s a plugin — and it’s not possible to make it a builtin — you shouldn’t install the archive file! Full stop!).

Now, you can see where this brings us again: more .la files that are often currently installed and are not useful at all. Like .la files for PAM modules (libpam does not load them through the .la so they are not useful — and this is definitely the word of the PAM maintainer! And for PAM-related packages, that word is The Word). Let’s try to continue this way, shall we? From the leaves.

The end of the mono-debugger saga

So after starting inspecting and finding the problem last night I finally had a tenative patch that makes mdb work fine.

Indeed, I simply implemented some extended debuglink file support into the Bfd managed wrapper, which finds sections and symbols in the debuglinked file whenever they are not found in the original file. This solves my problem, although it might not be complete yet, since I have written it in 20 minutes. I’ve attached the version for trunk on my bug report and I’ll add my backport to 2.4.2 to my overlay today. After a bit of testing, I hope to get it in main tree too.

Speaking of testing, the mono-debugger ebuild had a test restriction, with no bug referenced; I’m quite sure that the tests that do fail are the ones that should have told us that mono-debugger wouldn’t have worked on the default Gentoo install at all. I’ll probably have to add some logic to warn the user about split-debug setups (please not that our default of stripping files of debug information does not strip the symbol table of libpthread.so, otherwise also gdb wouldn’t work at all; and lets mdb work fine, so it’s only a problem with split-debug).

After the debugger finally started to work, I also found another problem: mono itself does not seem to load libraries requested by DllImport through the standard dlopen() interface, but it looks for them in particular directories; which don’t include all the possible directories at all. This became a problem because the current default version of libedit in Gentoo does not have a soname, and it caused mono to find a libedit.so that was not a library at all (but rather an ldscript). But that’s a problem for another day, and my solution is just to use a newer libedit version that works fine.

Now I’ll go back to my tinderbox, and in the next few days you’ll probably see a few more posts about different topics than Mono… even though I have a few patches to post there as well.