Autotools Mythbuster: who’s afraid of libtool?

This is a follow-up on my last post for autotools introduction. I’m trying to keep these posts bite sized both because it seems to work nicely, and because this way I can avoid leaving the posts rotting in the drafts set.

So after creating a simple autotools build system in the previous now you might want to know how to build a library — this is where the first part of complexity kicks in. The complexity is not, though, into using libtool, but into making a proper library. So the question is “do you really want to use libtool?”

Let’s start from a fundamental rule: if you’re not going to install a library, you don’t want to use libtool. Some projects that only ever deal with programs still use libtool because that way they can rely on .la files for static linking. My suggestion is (very simply) not to rely on them as much as you can. Doing it this way means that you no longer have to care about using libtool for non-library-providing projects.

But in the case you are building said library, using libtool is important. Even if the library is internal only, trying to build it without libtool is just going to be a big headache for the packager that looks into your project (trust me I’ve seen said projects). Before entering the details on how you use libtool, though, let’s look into something else: what you need to make sure you think about, in your library.

First of all, make sure to have an unique prefix to your public symbols, be them constants, variables or functions. You might also want to have one for symbols that you use within your library on different translation units — my suggestion in this example is going to be that symbols starting with foo_ are public, while symbols starting with foo__ are private to the library. You’ll soon see why this is important.

Reducing the amount of symbols that you expose is not only a good performance consideration, but it also means that you avoid the off-chance to have symbol collisions which is a big problem to debug. So do pay attention.

There is another thing that you should consider when building a shared library and that’s the way the library’s ABI is versioned but it’s a topic that, in and by itself, takes more time to discuss than I want to spend in this post. I’ll leave that up to my full guide.

Once you got these details sorted out, you should start by slightly change the configure.ac file from the previous post so that it initializes libtool as well:

AC_INIT([myproject], [123], [flameeyes@flameeyes.eu], [https://flameeyes.blog/tag/autotools-mythbuster/])
AM_INIT_AUTOMAKE([foreign no-dist-gz dist-xz])
LT_INIT

AC_PROG_CC

AC_OUTPUT([Makefile])

Now it is possible to provide a few options to LT_INIT for instance to disable by default the generation of static archives. My personal recommendation is not to touch those options in most cases. Packagers will disable static linking when it makes sense, and if the user does not know much about static and dynamic linking, they are better off getting everything by default on a manual install.

On the Makefile.am side, the changes are very simple. Libraries built with libtool have a different class than programs and static archives, so you declare them as lib_LTLIBRARIES with a .la extension (at build time this is unavoidable). The only real difference between _LTLIBRARIES and _PROGRAMS is that the former gets its additional links from _LIBADD rather than _LDADD like the latter.

bin_PROGRAMS = fooutil1 fooutil2 fooutil3
lib_LTLIBRARIES = libfoo.la

libfoo_la_SOURCES = lib/foo1.c lib/foo2.c lib/foo3.c
libfoo_la_LIBADD = -lz
libfoo_la_LDFLAGS = -export-symbols-regex '^foo_[^_]'

fooutil1_LDADD = libfoo.la
fooutil2_LDADD = libfoo.la
fooutil3_LDADD = libfoo.la -ldl

pkginclude_HEADERS = lib/foo1.h lib/foo2.h lib/foo3.h

The _HEADERS variable is used to define which header files to install and where. In this case, it goes into ${prefix}/include/${PACKAGE}, as I declared it a pkginclude install.

The use of -export-symbols-regex ­– further documented in the guide – ensures that only the symbols that we want to have publicly available are exported and does so in an easy way.

This is about it for now — one thing that I haven’t added in the previous post, but which I’ll expand in the next iteration or the one after, is that the only command you need to regenerate autotools is autoreconf -fis and that still applies after introducing libtool support.

Redundant symbols

So I’ve decided to dust off my link collision script and see what the situation is nowadays. I’ve made sure that all the suppression file use non-capturing groups on the regular expressions – as that should improve the performance of the regexp matching – make it more resilient to issues within the files (metasploit ELF files are barely valid), and run it through.

Well, it turns out that the situation is bleaker than ever. Beside the obvious amount of symbols with a too-common name, there are still a lot of libraries and programs exporting default bison/flex symbols the same way I found them in 2008:

Symbol yylineno@ (64-bit UNIX - System V AMD x86-64) present 59 times
Symbol yyparse@ (64-bit UNIX - System V AMD x86-64) present 53 times
Symbol yylex@ (64-bit UNIX - System V AMD x86-64) present 49 times
Symbol yy_flush_buffer@ (64-bit UNIX - System V AMD x86-64) present 48 times
Symbol yy_scan_buffer@ (64-bit UNIX - System V AMD x86-64) present 48 times
Symbol yy_scan_bytes@ (64-bit UNIX - System V AMD x86-64) present 48 times
Symbol yy_scan_string@ (64-bit UNIX - System V AMD x86-64) present 48 times
Symbol yy_create_buffer@ (64-bit UNIX - System V AMD x86-64) present 47 times
Symbol yy_delete_buffer@ (64-bit UNIX - System V AMD x86-64) present 47 times
[...]

Note that at least one library got to export them to be listed in this output; indeed these symbols are present in quite a long list of libraries. I’m not going to track down each and every of them though, but I guess I’ll keep an eye on that list so that if problems arise that can easily be tracked down to this kind of collisions.

Action Item: I guess my next post is going to be a quick way to handle building flex/bison sources without exposing these symbols, for both programs and libraries.

But this is not the only issue — I’ve already mentioned a long time ago that a single installed system already brings in a huge number of redundant hashing functions; on the tinderbox as it was when I scanned it, there were 57 md5_init functions (and this without counting different function names!). Some of this I’m sure boils down to gnulib making it available, and the fact that, unlike the BSD libraries, GLIBC does not have public hashing functions — using libcrypto is not an option for many people.

Action item: I’m not very big of benchmarks myself, never understood the proper way to go around getting the real data rather than being fooled by the scheduler. Somebody who’s more apt at that might want to gather a bunch of libraries providing MD5/SHA1/SHA256 hashing interfaces, and produce some graphs that can let us know whether it’s time to switch to libgcrypt, or nettle, or whatever else that provides us with good performance as well as with a widely-compatible license.

The presence of duplicates of memory-management symbols such as malloc and company is not that big of a deal, at first sight. After all, we have a bunch of wrappers that use interposing to account for memory usage, plus another bunch to provide alternative allocation strategies that should be faster depending on the way you use your memory. The whole thing is not bad by itself, but when you get one of graphviz’s libraries (libgvpr) to expose malloc something sounds wrong. Indeed, if even after updating my suppression filter to ignore the duplicates coming from gperftools and TBB, I get 40 copies of realloc() something sounds extremely wrong:

Symbol realloc@ (64-bit UNIX - System V AMD x86-64) present 40 times
  libgvpr
  /mnt/tbamd64/bin/ksh
  /mnt/tbamd64/bin/tcsh
  /mnt/tbamd64/usr/bin/gtk-gnutella
  /mnt/tbamd64/usr/bin/makefb
  /mnt/tbamd64/usr/bin/matbuild
  /mnt/tbamd64/usr/bin/matprune
  /mnt/tbamd64/usr/bin/matsolve
  /mnt/tbamd64/usr/bin/polyselect
  /mnt/tbamd64/usr/bin/procrels
  /mnt/tbamd64/usr/bin/sieve
  /mnt/tbamd64/usr/bin/sqrt
  /mnt/tbamd64/usr/lib64/chromium-browser/chrome
  /mnt/tbamd64/usr/lib64/chromium-browser/chromedriver
  /mnt/tbamd64/usr/lib64/chromium-browser/libppGoogleNaClPluginChrome.so
  /mnt/tbamd64/usr/lib64/chromium-browser/nacl_helper
  /mnt/tbamd64/usr/lib64/firefox/firefox
  /mnt/tbamd64/usr/lib64/firefox/firefox-bin
  /mnt/tbamd64/usr/lib64/firefox/mozilla-xremote-client
  /mnt/tbamd64/usr/lib64/firefox/plugin-container
  /mnt/tbamd64/usr/lib64/firefox/webapprt-stub
  /mnt/tbamd64/usr/lib64/mercury/lib/asm_fast.gc/libcurs.so
  /mnt/tbamd64/usr/lib64/mercury/lib/asm_fast.gc/libmcurses.so
  /mnt/tbamd64/usr/lib64/mercury/lib/asm_fast.gc.memprof/libcurs.so
  /mnt/tbamd64/usr/lib64/mercury/lib/asm_fast.gc.memprof/libmcurses.so
  /mnt/tbamd64/usr/lib64/mercury/lib/asm_fast.gc.prof/libcurs.so
  /mnt/tbamd64/usr/lib64/mercury/lib/asm_fast.gc.prof/libmcurses.so
  /mnt/tbamd64/usr/lib64/mercury/lib/asm_fast.gc.trseg.debug/libcurs.so
  /mnt/tbamd64/usr/lib64/mercury/lib/asm_fast.gc.trseg.debug/libmcurses.so
  /mnt/tbamd64/usr/lib64/mercury/lib/asm_fast.gc.trseg/libcurs.so
  /mnt/tbamd64/usr/lib64/mercury/lib/asm_fast.gc.trseg/libmcurses.so
  /mnt/tbamd64/usr/lib64/mercury/lib/hlc.gc/libcurs.so
  /mnt/tbamd64/usr/lib64/mercury/lib/hlc.gc/libmcurses.so
  /mnt/tbamd64/usr/lib64/mercury/lib/hlc.gc.trseg/libcurs.so
  /mnt/tbamd64/usr/lib64/mercury/lib/hlc.gc.trseg/libmcurses.so
  /mnt/tbamd64/usr/lib64/OpenFOAM/OpenFOAM-1.6/lib/libhoard.so
  /mnt/tbamd64/usr/lib64/thunderbird/mozilla-xremote-client
  /mnt/tbamd64/usr/lib64/thunderbird/plugin-container
  /mnt/tbamd64/usr/lib64/thunderbird/thunderbird
  /mnt/tbamd64/usr/lib64/thunderbird/thunderbird-bin

Now it is true that it’s possible, depending on the usage patterns, to achieve a much better allocation strategy than the default coming from GLIBC — on the other hand, I’m also pretty sure that GLIBC’s own allocation improved a lot in the past few years so I’d rather use the standard allocation than a custom one that is five or more years old. Again this could use some working around.

In the list above, Thunderbird and Firefox for sure use (and for whatever reason re-expose) jemalloc; I have no idea if libhoard in OpenFOAM is another memory management library (and whether OpenFOAM is bundling it or not), and Mercury is so messed up that I don’t want to ask myself what it’s doing there. There are though a bunch of standalone programs listed as well.

Action item: go through the standalone programs exposing the memory interfaces — some of them are likely to bundle one of the already-present memory libraries, so just make them use the system copy of it (so that improvements in the library trickle down to the program), for those that use custom strategies, consider making them optional, as I’d expect most not to be very useful to begin with.

There is another set of functions that are similar to the memory management functions, which is usually brought in by gnulib; these are convenience wrappers that do error checking over the standard functions — they are xmalloc and friends. A quick check, shows that these are exposed a bit too often:

Symbol xmemdup@ (64-bit UNIX - System V AMD x86-64) present 37 times
  liblftp-tasks
  libparted
  libpromises
  librec
  /mnt/tbamd64/usr/bin/csv2rec
  /mnt/tbamd64/usr/bin/dgawk
  /mnt/tbamd64/usr/bin/ekg2
  /mnt/tbamd64/usr/bin/gawk
  /mnt/tbamd64/usr/bin/gccxml_cc1plus
  /mnt/tbamd64/usr/bin/gdb
  /mnt/tbamd64/usr/bin/pgawk
  /mnt/tbamd64/usr/bin/rec2csv
  /mnt/tbamd64/usr/bin/recdel
  /mnt/tbamd64/usr/bin/recfix
  /mnt/tbamd64/usr/bin/recfmt
  /mnt/tbamd64/usr/bin/recinf
  /mnt/tbamd64/usr/bin/recins
  /mnt/tbamd64/usr/bin/recsel
  /mnt/tbamd64/usr/bin/recset
  /mnt/tbamd64/usr/lib64/lftp/4.4.2/liblftp-network.so
  /mnt/tbamd64/usr/lib64/libgettextlib-0.18.2.so
  /mnt/tbamd64/usr/lib64/man-db/libman-2.6.3.so
  /mnt/tbamd64/usr/libexec/gcc/x86_64-pc-linux-gnu/4.6.3/cc1
  /mnt/tbamd64/usr/libexec/gcc/x86_64-pc-linux-gnu/4.6.3/cc1obj
  /mnt/tbamd64/usr/libexec/gcc/x86_64-pc-linux-gnu/4.6.3/cc1plus
  /mnt/tbamd64/usr/libexec/gcc/x86_64-pc-linux-gnu/4.6.3/f951
  /mnt/tbamd64/usr/libexec/gcc/x86_64-pc-linux-gnu/4.6.3/jc1
  /mnt/tbamd64/usr/libexec/gcc/x86_64-pc-linux-gnu/4.6.3/lto1
  /mnt/tbamd64/usr/libexec/gcc/x86_64-pc-linux-gnu/4.7.2/cc1
  /mnt/tbamd64/usr/libexec/gcc/x86_64-pc-linux-gnu/4.7.2/cc1obj
  /mnt/tbamd64/usr/libexec/gcc/x86_64-pc-linux-gnu/4.7.2/cc1plus
  /mnt/tbamd64/usr/libexec/gcc/x86_64-pc-linux-gnu/4.7.2/f951
  /mnt/tbamd64/usr/libexec/gcc/x86_64-pc-linux-gnu/4.7.2/jc1
  /mnt/tbamd64/usr/libexec/gcc/x86_64-pc-linux-gnu/4.7.2/lto1
  /mnt/tbamd64/usr/libexec/gnat-gcc/x86_64-pc-linux-gnu/4.5/cc1
  /mnt/tbamd64/usr/libexec/gnat-gcc/x86_64-pc-linux-gnu/4.5/gnat1
  /mnt/tbamd64/usr/libexec/gnat-gcc/x86_64-pc-linux-gnu/4.5/lto1

In this case they are exposed even by the GCC tools themselves! While this brings me again to complain that gnulib show now actually be libgnucompat and be dynamically linked, there is little we can do about these in programs — but the symbols should not creep in system libraries (mandb has the symbols in its private library which is marginally better).

Action item: check the libraries exposing the gnulib symbols, and make them expose only their proper interface, rather than every single symbol they come up with.

I suppose that this is already quite a bit of data for a single blog post — if you want a copy of the symbols’ index to start working on some of the action items I listed, just contact me and I’ll send it to you, it’s a big too big to just have it published as is.

Multiple SSL implementations

If you run ~arch you probably noticed the a few days ago that all Telepathy-based applications failed to connect to Google Talk. The reason for this was a change in GnuTLS 3.1.7 that made it stricter while checking security parameters’ compliance, and in particular required 816-bit primes on default connections, whereas the GTalk servers provided only 768. The change has since been reverted, and version 3.1.8 connects to GTalk by default once again.

Since I hit this the first time I tried to set up KTP on KDE 4.10, I was quite appalled by the incompatibility, and was tempted to stay with Pidgin… but at the end, I found a way around this. The telepathy-gabble package was not doing anything to choose the TLS/SSL backend but deep inside it (both telepathy-gabble and telepathy-salut bundle the wocky XMPP library — I haven’t checked whether it’s the same code, and thus if it has to be unbundled), it’s possible to select between GnuTLS (the default) or OpenSSL. I’ve changed it to OpenSSL and everything went fine. Now this is exposed as an USE flag.

But this made me wonder: does it matter on runtime which backend one’s using? To be obviously honest, one of the reasons why people use GnuTLS over OpenSSL is the licensing concerns, as OpenSSL’s license is, by itself, incompatible with GPL. But the moment when you can ignore the licensing issues, does it make any difference to choose one or the other? It’s a hard question to answer, especially the moment you consider that we’re talking about crypto code, which tends to be written in such a way to be optimized for execution as well as memory. Without going into details of which one is faster in execution, I would assume that OpenSSL’s code is faster simply due to its age and spread (GnuTLS is not too bad, I have some more reserves for libgcrypt but nevermind that now, since it’s no longer used). Furthermore, Nikos himself noted that sometimes better algorithm has been discarded, before, because of the FSF copyright assignment shenanigan which I commented on a couple of months ago.

But more importantly than this, my current line of thought is wondering whether it’s better to have everything GnuTLS or everything OpenSSL — and if it has any difference to have mixed libraries. The size of the libraries themselves is not too different:

        exec         data       rodata        relro          bss     overhead    allocated   filename
      964847        47608       554299       105360        15256       208805      1896175   /usr/lib/libcrypto.so
      241354        25304        97696        12456          240        48248       425298   /usr/lib/libssl.so
       93066         1368        57934         2680           24        14640       169712   /usr/lib/libnettle.so
      753626         8396       232037        30880         2088        64893      1091920   /usr/lib/libgnutls.so

OpenSSL’s two libraries are around 2.21MB of allocated memory, whereas GnuTLS is 1.21 (more like 1.63 when adding GMP, which OpenSSL can optionally use). So in general, GnuTLS uses less memory, and it also has much higher shared-to-private ratios than OpenSSL, which is a good thing, as it means it creates smaller private memory areas. But what about loading both of them together?

On my laptop, after changing the two telepathy backends to use OpenSSL, there is no running process using GnuTLS at all. There are still libraries and programs making use of GnuTLS (and most of them without an OpenSSL backend, at least as far as the ebuild go) — even a dependency of the two backends (libsoup), which still does not load GnuTLS at all here.

While this does mean that I can get rid of GnuTLS on my system (NetworkManager, GStreamer, VLC, and even the gabble backend use it), it means that I don’t have to keep loaded into memory that library at all time. While 1.2MB is not that big a save, it’s still a drop in the ocean of the memory usage.

So, while sometimes what I call “software biodiversity” is a good thing, other times it only means we end up with a bunch of libraries all doing the same thing, all loaded at the same time for different software… oh well, it’s not like it’s the first time this happens…

Multi-level bundling, with a twist

I spent half my Saturday afternoon working on Blender, to get the new version (2.64a) in Portage. This is never an easy task but in this case it was a bit more tedious because thanks to the new release of libav (version 9) I had to make a few more modifications … making sure it would still work with the old libav (0.8).

FFmpeg support is not guaranteed — If you care you can submit a patch. I’m one of the libav developers thus that’s what I work, and test, with.

Funnily enough, while I was doing that work, a new bug for blender was reported in Gentoo, so I looked into it and found out that it was actually caused by one of the bundled dependencies — luckily, one that was already available as its own ebuild, so I just decided to get rid of it. The interesting part was that it wasn’t listed in the “still bundled libraries” list that the ebuild’s own diagnostic prints… since it was actually a bundled library of the bundled libmv!

So you reach the point where you get one package (Blender) bundling a library (libmv) bundling a bunch of libraries, multi-level.

Looking into it I found out that not only the dependency that was causing the bug was bundled (ldl) but there were at least two more that, I knew for sure, were available in Gentoo (glog and gflags). Which meant I could shave some more code out of the package, by adding a few more dependencies… which is always a good thing in my book (and I know that my book is not the same as many others’).

While looking for other libraries to unbundle, I found another one, mostly because its name (eltopo) was funny — it has a website and from there you can find the sources — neither are linked in the Blender package. When I looked at the sources, I was dismayed to see that there was no real build system but just an half-broken Makefile building two completely different PIC-enabled static archives, for debug and release. Not really something that distributions could get much interest in packaging.

So I set up at build my usual autotools-based build system (which no matter what people say it’s extremely fast, if you know how to do it), fix the package to build with gcc 4.7 correctly (how did it work for Blender? I assume they patched it somehow but they don’t write down what they do!), and .. uh where’s the license file?

Turns out that while the homepage says that the library is “public domain”, there is no license statement anywhere in the source code, making it in all effects the exact opposite: a proprietary software. I’ve opened an issue for it and hopefully upstream will fix that one up so I can send him my fixes and package it in Gentoo.

Interestingly enough, the libmv software that Blender packages, is much better in its way of bundling libraries. While they don’t seem to give you an easy way to disable the bundled copies (which might or might not be Blender’s build system fault), they make it clear where each library come from, and they have scripts to “re-bundle” said libraries. When they make changes, they also keep a log of them so that you can identify what changed and either ignore, patch or send it upstream. If all projects bundling stuff did it that way, it would be a much easier job to unbundle…

In the mean time, if you have some free time and feel like doing something to improve the bundled libraries situation in Gentoo Linux, or you care about Blender and you’d like to have a better Gentoo experience with it, we could use some ebuilds for ceres-solver and SSBA as well as fast-C (this last one has no buildsystem at all, unfortunately) all used by libmv, or maybe carve libredcode (for which I don’t even have an URL at hand), recastnavigation (which has no releases) which are instead used directly by Blender.

P.S.: don’t expect to see me around this Sunday, I’m actually going to see the Shuttle, and so I won’t be back till late, most likely, or at least I hope so. You’ll probably see a photo set on Monday on my Flickr page if you want to have a treat.

USEless flag: static-libs

You might remember that in my graphical post I tried making it clear that in some cases static libraries are not only unnecessary, but also unusable. It so happens that not everybody has that clear yet, so let me try to explain a bit more where this USE flag is unwarranted entirely.

Language bindings. While it is technically possible to have static extensions built in in Ruby and Python, said situations are very convoluted and aren’t really solid enough to be used for Gentoo packaging; those who want to go with that route are generally using custom build systems, because they require to build in the whole interpreter and binding libraries together.

This means that, to give an idea, these archives shouldn’t have been there to begin with:

/usr/lib/python2.7/site-packages/beagle/beagle.a
/usr/lib/python2.7/site-packages/pygetdata.a
/usr/lib/python2.7/site-packages/numpy/core/lib/libnpymath.a
/usr/lib/python2.7/site-packages/_mathgl.a
/usr/lib/python3.2/site-packages/numpy/core/lib/libnpymath.a

Note: there are no Ruby files in that list, but just because the tinderbox is currently unable to build any Ruby version due to GLIBC 2.14 issues that are yet unresolved upstream.

So if you’re building language bindings, adding a static-libs USE flag is just pointless: remove it and disable static libraries altogether, so that you don’t have to build source files twice (PIC and non-PIC).

Runtime-loaded plugins. It is unfortunate how little developers know about the limitations of runtime-loaded plugins (i.e., all plugins; I’m just making it clear, because buildtime-included plugins are rather built-ins). One of these is the fact that you only need the .so file to load them. So these are certainly smelling funny:

/usr/lib/gstreamer-0.10/libgstmimic.a
/usr/lib/gstreamer-0.10/libgstvp8.a
/usr/lib/libvisual-0.4/morph/morph_alphablend.a
/usr/lib/libvisual-0.4/morph/morph_flash.a
/usr/lib/libvisual-0.4/morph/morph_tentacle.a
/usr/lib/libvisual-0.4/morph/morph_slide.a
/usr/lib/libvisual-0.4/actor/actor_lv_scope.a
/usr/lib/libvisual-0.4/actor/actor_jakdaw.a
/usr/lib/libvisual-0.4/actor/actor_oinksie.a
/usr/lib/libvisual-0.4/actor/actor_bumpscope.a
/usr/lib/libvisual-0.4/actor/actor_gforce.a
/usr/lib/libvisual-0.4/actor/actor_infinite.a
/usr/lib/libvisual-0.4/actor/actor_lv_analyzer.a
/usr/lib/libvisual-0.4/actor/actor_corona.a
/usr/lib/libvisual-0.4/actor/actor_JESS.a

It is true that libtool’s libltdl theoretically allows you to decide at build time whether to build the code in or use it as a plugin, but even then, only one version of the two is ever needed. So if your package only installs one of these, simply remove the USE flag and disable static libs altogether; pretty please.

Also note that while there are packages that allows you to use plugins or builtins, for instance Apache allows you to choose whether to build the modules in or build plugins to load at runtime. But this only works for first-party modules, as building third party modules within Apache requires to have their sources available when building Apache itself.

Libraries relying heavily on plugins. Similarly, if you’re packaging a library that includes a plugins’ system, it is almost certain that the plugins link back to the library. This is actually required to avoid underlinking and to ensure that no undefined symbols are present in the final link of the plugin (well, there are exceptions, but let’s not dig into those right now).

Why in this case you don’t want static libraries? Well, if you were to build a frontend against the static library, it would then load the plugin and the dynamic library as well, not only doubling the loaded code, but causing symbol collisions one with the other, which is fine only until the two versions get out of sync (i.e. you update the library package, without rebuilding the frontend).

This only works if you can convert all the plugins into builtins on the fly, but I know of no project that allows you to do that.

Almost everything else. Don’t get me wrong, I knwo that sometimes we do need static libraries. But it’s also true that in most cases we don’t. I don’t know why people insist on having static-libs an USE flag by default. My personal preference would be to have static-libs as an USE flag when it is actually being used.

This means that unless there are reverse dependencies building against the static libraries, the package should not have the static-libs USE flag at all. Unfortunately this requires the user to enable it forcefully when they want to build something statically (and I admit it happened to me before a time or two).. but let’s be real: most people will not do that.

Indeed, if it wasn’t for PostgreSQL and lvm2/udev (the latter of which I would have avoided with a USE=-static on lvm2), I don’t think I would have noticed the DWARF-4 binutils bug that I had lingering on my system for over a month!

Hide those symbols!

Last week I have written in passing about my old linking-collision script. Since then I restarted working on it and I have a few comments to give again.

First of all, you might have seen the 2008 GhostScript bug — this is a funny story; back in 2008 when I started working on finding and killing symbol collisions between libraries and programs, I filed a bug with GhostScript (the AFPL version), since it exported a symbol that was present, with the same name, in libXfont and libt1. I found that particularly critical since they aren’t libraries used in totally different applications, as they are all related to rendering data.

At the time, the upstream GS developer (who happens to be one of the Xiph developers, don’t I just love those guys?) asked me to provide him with a real-world crash. Since any synthetic testcase I could come up with would look contrived, I didn’t really want to spend time trying to come up with one. Instead I argued the semantic of the problem, explaining why, albeit theoretical at that point, the problem should have been solved. No avail, the bug was closed at the time with a threat that anyone reopening it would have its account removed.

Turns out in 2011 that there is a program that does link together both libgs and libt1: Evince. And it crashes when you try to render a DVI document (through libt1), containing an Encapsuled PostScript (EPS) image (rendered through GhostScript library). What a surprise! Even though the problem was known and one upstream developer (Henry Stiles) knows that the proper fix is using unique names for internal functions and data entries, the “solution” was limited to the one colliding symbol, leaving all the others to be found in the future to have problems. Oh well.

Interestingly, most packages don’t seem to care about their internal symbols, be them libraries or final binaries. On final binaries this is usually not much of a problem, as two binaries cannot collide with one another, but it doesn’t mean that the symbol couldn’t collide with another library — for this reason, the script now ignores symbols that collide only between executables, but keeps listing those colliding with at least one library.

Before moving on to how to hide those symbols, I’d like to point out that the Ruby-Elf project page has a Flattr button, while the sources are on Gitorious GitHub for those who are curious.

Update (2017-04-22): as you may know, Gitorious was acquired by GitLab in 2015 and turned down the service. So the project is now on GitHub. I also stopped using Flattr a long time ago.

You can now wonder how to hide the symbols; one way that I often suggest is to use the GCC-provided -fvisibility=hidden support — this is obviously not always an option as you might want to support older versions, or simply don’t want to start adding visibility markers to your library. Thankfully there are two other options you can make use of; one is to directly use the version script support from GNU ld (compatible with Sun’s, Apple’s and gold for what it’s worth); basically you can then declare something like:

{
  global:
    func1;
    func2;
    func3;
  local: *;
}

This way only the three named functions would be exported, and everything else will be hidden. While this option works quite nicely, it often sounds too cumbersome, mostly because version scripts are designed to allow setting multiple versions to the symbols as well. But that’s not the only option, at least if you’re using libtool.

In that case there are, once again, two separate options: one is to provide it with a list of exported symbols, similar to the one above, but with one-symbol-per-line (-export-symbols SYMBOL-FILE), the other is to provide a regular expression of symbols to export (-export-symbols-regex REGEX), so that you just have to name the symbols correctly to have them exported or not. This loses the advantage of multiple versions for symbols – but even that is a bit hairy so I won’t get there – but gains the advantage of working with generating Windows libraries as well, where you have to list the symbols to export.

I’d have to add here that hiding symbols for executables should also reduce their startup time, as the runtime loader (ld.so) doesn’t need to look up a long list of symbols when preparing the binary to be executed; the same goes for libraries. So in a utopia world where each library and program only exports its tiny, required list of symbols, the system should also be snappier. Think about it.

Reliable and unreliable detection of bundled libraries

Tim asked me for some help to identify bundled libraries, so I come back to a topic that I haven’t written about in quite a while, mostly because it seemed like a lost fight with too many upstream projects.

First of all let’s focus on what I’m talking about. With the term “bundled libraries” I’m referring to libraries whose sources are copied verbatim, or only slightly modified, within another project. This is usually done by projects with either a number of dependencies, or uncommon dependencies, under the impression it makes it easier for the users to build the project. I maintain that such notion is mostly obsolete as it’s not the place of most random users to actually build their own projects, leaving that job to distributions, which have the inverse problem with bundled libraries: they make packaging harder, and software much more vulnerable to security issues.

So how do you track down bundled libraries issues? Well one option is to check out each packages’ sources, but that’s not really applicable that easily, especially since the sources might be masked — I found more than one project taking a library originally in a number of source files, concatenating all of them together, and then building it in a single object file. The other options relate to analyse the result of the build, executables and libraries. This is what I’ve done with my scripts when I started looking at a different, but slightly related, problem (symbol collisions).

You can analyse the resulting binaries in many ways; the one I’ve been doing with the scripts I’m referring to above is a very shallow detection: it wasn’t designed for the task of looking for bundled libraries, it was designed to identify libraries and executables exporting the same symbols, that would cause trouble if they were ever loaded in the same address space. For those curious, this is because the dynamic loader needs to choose only one symbol for any name, so either the libraries or the executable would end up calling the wrong function or accessing the wrong data. While this can sound too theoretical, one such bug I reported back in 2008 was the root cause of a runtime crash of Evince in 2011.

At any rate this method is not optimal for the task of finding bundled libraries for two reasons: the first is that it only checks for identical symbol names, so it doesn’t take into account libraries that are bundled and have their method renamed – and yes I have seen that done – the second is that it only works with exported symbols, that are those that the dynamic loader can collide on. What is not included here are the so-called local symbols: static symbols, symbols declared with private ELF visibility, and those that are hidden by linker scripts at link time.

To inspect the binaries deeper, you need non-stripped copies; it doesn’t really require you to have them built with DWARF data in them (the debug data that is added when building with, for instance, -ggdb); it can work even just the complete symbol table kept in the .symtab section of final binaries (and usually stripped away). To get the list of all symbols present within a binary, be them data (constants and variables) or code (functions), you can simply use the nm --defined-only command (if you add --dynamic you end up with the same data I’m analysing with my script above, as it changes the table that it uses to look up symbols).

Unfortunately this still requires to find a way to match the symbols even with prefixed versions, and with little false positives; this is why I haven’t worked on a script to deal with this kind of data yet. Although while writing this I can think of a way to make it at least scan for a particular library against a list of executables, that is not really one of the best-performing options available. I guess the proper answer here would be to find a way to generate a regular expression for each library based on the list of symbols they include, and then grep for that over the symbols exported by the other binaries.

Note here: you can’t expect that all the symbols of a bundled library are present in the final binary; when linking against object files or static archives the linker applies logic akin to the --as-needed one and will not bring in object files if no symbol from that file is referenced by the code; if you use other techniques then you can even skip single functions and data symbols that are unused by your code. Bottom line is that even if the full sources of another library are bundled and built, the final executable might not have all of them on it.

If you don’t have access to a non-stripped binary, well then your only option is to run strings over the package and try to find matching strings for the library you’re looking for; if you’re lucky, the library provides version information you can still track down in raw strings. This unfortunately is the roughest option you have available and I’d not suggest doing it for anything you have sources for; it’s a last-resort for proprietary, pre-built software.

Finally, let me say a couple of words regarding identify the version of an exported version of a library, which I have found to be done way too many times. If this happens on an ET_DYN file, such as a shared object, you can easily dlopen() it and then call functions like zlibVersion() to get the actual version of the library embedded. This ends up being pretty important when trying to tell whether a package is bundling a vulnerable version of zlib or libpng, even though it is also not extremely reliable as it is.

Your symbol table is not your ABI

You’d say that for how much I have written on the topics related to shared libraries, ABIs, visibility and so on, I would be able to write a whole book about the subject. Unfortunately, between me not being a native speaker – thus requiring more editing – and the topic not being about any shiny web technology, there seem to be no publisher interested, again. At least this time it’s not that somebody else wrote it already. Heh.

When a wise man points at the moon, the fool looks at the finger.

Yesterday (well, today, but I’m going to post this in the morning or afternoon) I hit an interesting bug in the tinderbox: a game failing to build apparently because of a missing link to libesd (the client library of the now-obsolete ESounD. This seemed trivial to me, as it appeared that the new libgnome didn’t link to libesd any longer. I had to look a bit more to see how bad the situation was. Indeed, libgnome used to bring in libesd via pkg-config; then it started using Requires.private, so it was only brought in indirectly (transitively) through the NEEDED entries. Finally, with this release, it is not even in the NEEDED entries, as long as you’re using --as-needed.

What happened? Well, with the latest release, the old, long-time deprecated esd interfaces for sound support are finally gone, and all the gnome-sound API is based off Lennart’s libcanberra; this is the good part of the story. The other part of the story is that two functions that were tightly tied with the old ESounD interface (to load a sample into the ESounD session, and get a handler to the currently running esd session, respectively) are no longer functioning. Since GNOME is not supposed to break ABI (Application Binary Interface) between releases within version 2, the developers decided to keep the symbols around, and, quoting them, to keep linking to libesd to maintain binary compatibility.

Well, first off, the “binary compatibility” that they wish to keep by linking to libesd is rather compatibility with underlinked software that, in its own self, is a bad thing we shouldn’t be condoning at all. Software that uses esd should link against it, not rely on the fact that libgnome would bring it in.

On the other hand, they seem to assume that the ABI is little more than what the symbol tables provides. Of course there are the symbols exported that are part of the ABI, and their types; for the functions, also the type and order of their parameters. You also have to add the structures, the types of their content, their size and the order. All of this is part of the ABI, yet the only thing that the linker will enforce is the symbol table, which is why in my title I wanted to make it clear that the symbol table is not the only part of the ABI; or the fact that the ABI is not only what the linker can enforce — or the API what the compiler can enforce!

What is that I’m referring to? Well as I said there are two functions now that make no sense at all; they weren’t, though, removed, as that would have broken the part of the ABI visible by the linker. Instead, the functions now return the value -1 (i.e. there has been an error) without doing anything (almost; actually the sample loading is done through libcanberra, but it really doesn’t matter given that you’re trying to get the sample handle, that is not returned anyway). Even though this won’t cause a startup error, or a build-time error, it’s still breaking the ABI: software that was built relying on the two functions working will not work starting this version.

You’re not maintaining binary compatibility, you’re hiding your own ABI breakage under the rug!

This is not dissimilar to the infamous undefined symbol problem as it simply shuffles to runtime a mistake that could have been easily found at build-time, by making the build fail when the symbol was used, to run-time, by forcing an error condition where the code was unlikely to expect one.

In this particular case, only two packages directly express the need for esd to be enabled in libgnome2; they should be fixed or simply dropped out of Portage, as ESounD has been deprecated for many years now and thus make no sense to find a way to fix these packages, they are more than likely dead if they didn’t write a fallback already they have probably stopped developing it.

Linkers and names

I received a question by mail the other day from Randy, about sonames and in particular their versioning which I already discussed about a couple of times. On the other hand, his concerns are good enough to warrant for another blog post on the matter, hoping not to bore my readers too much with it.

First of all, let me repeat that there are a number of different names that a shared library “owns”, on Unix and Unix-like systems based on ELF:

  • the actual name of the filename;
  • the name the link editor (or build-time linker if you prefer, ld) uses to link to it;
  • the name the loader (or dynamic linker if you prefer, ld.so) uses to find it.

The obvious way to correlate all of them together is to use symbolic links; and that’s why your /usr/lib directory is full of those. But how that happen?

When linking, the link editor (ld) can take a library as parameter in two different ways: either with full path to the library (such as /usr/lib/libcrypto.so.0.9.8) or via a link-library parameter (-lcrypto); in the first case, the linker know exactly which files it has to link against; in the latter, though, it has to go look for it.

The paths where the linker search for the library is worth discussing separately, and all in all the library search path list is not something that is important to the aim of this article. What we do care about is the basename the linker is going to look out for; well, the answer is easy, it applies the same substitution you’d have with sed and s/^-l(.+)$/lib1.so/ — it prepends lib and adds .so to the end of the library name. This is why all the libraries you can link to have a .so suffix-only variant, beside the full name and the version-suffix symlink.

Obviously, when the link-editor adds the dependencies on the libraries to the output, it has to use some kind of name; the full path is not a viable option (you might be linking to a non-installed library that will be installed later), so you can either use the library base name (PE – the format used by Windows – did this, but Microsoft seems to have changed idea when .NET came around), or find some other value to link (no pun intended) it to, which is what we’re concerned with here, since that’s what Unix (and Linux) do.

Each library declares its canonical name (the soname, standing for Shared Object Name); the linker will copy that value from the tag called DT_SONAME to a DT_NEEDED entry tag of the output file. Now, these two tags are really what’s important in this discussion; as you saw, the two are respectively consumed and produced by the link editor. The loader, instead, only cares about DT_NEEDED: it uses that tag to know the filename of the libraries to look for in its library search path (that does not necessary have to be the same as the one for the link editor).

Actually, it’s not all true; there are a number of libraries that don’t declare a soname at all, in some cases that’s correct, for instance plugins don’t need a soname at all since they are never linked against, but a good many times, it’s actually a bug in the build system itself. But that’s a topic for another day.

The loader, for good or bad, does not check that the value in DT_SONAME of the loaded library corresponds to the DT_NEEDED tag it encountered — this is why symlinking libraries with incompatible ABIs seem to work at first: the loader does not fail to load them. This means that it really doesn’t matter what a DT_NEEDED tag points to, as long as, within the library path, there is a file (or symlink) with that name. This lets you do some cool but braindead thinks, which I’d rather not write of myself.

To ensure that the loader is able to find the library, then, there is need to have the symlinks in place that correspond to the name it’s going to look for. Now this can be done by two pieces of software, at two different times. The first is, well, libtool, which takes care of the task both at build-time and install-time, and makes both the link used by the loader, and that used by the link editor to find the library. A number of other build systems imitate that behaviour as well.

The second component that does so is ldconfig, but only on GNU systems; by default it recreates all the symlinks to sonames in the library search path (of the loader); with various options, it processes particular libraries or paths. The result is pretty nifty, but not portable. I discovered this myself a long time ago when working on Gentoo/FreeBSD, as the preplib command from Portage failed on that architecture.

With hindsight, I wish I didn’t ask for preplib to die, but rather rewrite it to use scanelf to do the job rather than ldconfig.

What probably confused Randy, is that many documents out there assumes the libtool/Linux semantic of soname and versioning. I call it libtool/Linux because libtool changes its soname semantic depending on the operating system it’s working on, as again I found out while working on Gentoo/FreeBSD. In Linux, it produces a file with a three-components name, and sets DT_SONAME to just the first component; on FreeBSD (by default) it creates a file with a two-components name, and sets DT_SONAME to the full name.

In Gentoo/FreeBSD, we hack libtool to use the same naming scheme as Linux, since that is what library developers tend to think about when setting the version components. Without this hack, each minor version of glib would have a new soname, requiring all the linked software to be re-built, even if the ABI didn’t change, or changed in a compatible way.

I admit I haven’t even started looking up the reasoning for either the version scheme; they are probably mostly historical; in the case of FreeBSD, having the soname and the full name to match is probably needed to avoid the need for symlinks (as ldconfig does not create them as I said above). In the case of Linux, I’m not really sure why so much complexity. Without digressing too much, the components passed to -version-info don’t match directly the actual components used in either the Linux or FreeBSD semantics; they are both related by way of not-too-simple arithmetic functions.

At any rate, neither the link editor nor the loader have a real concept of “ABI version”, which is why they don’t enforce it at all. So you can have libraries that don’t even update their ABI version at all, or others that use the full version as ABI, and bump it even if there aren’t incompatible changes.

So to answer Randy’s final question, this happens because each project is free to decide what its soname is, and thus its soversion. In the case of OpenSSL, they use a three-components version that is the same of the file name, and of the version of the released package, as can be seen by looking at the following scanelf -S output from the tinderbox :

tinderbox ~ # scanelf -S /usr/lib/libcrypto.so.*
 TYPE   SONAME FILE 
ET_DYN libcrypto.so.0.9.8 /usr/lib/libcrypto.so.0.9.8 
ET_DYN libcrypto.so.1.0.0 /usr/lib/libcrypto.so.1.0.0 

Any document that implies explicitly that the soname has to consist of the library name and its ABI version, is simply wrong. That is a practice that definitely should be followed, but there are no technical restrictions in there, they are only practical restrictions.

Yes, it is a sorry state of the affairs.

A shared object is (not) enough

In my immediately previous post I have thrown in a couple of nods to two particularly nasty issues related to shared object plugins; I have written extensively, or even excessively, about the issue so I’m not going to write more about the presentation of shared objects (or dynamic libraries if you prefer the Microsoft term for the same concept), and I’d rather go on with the two current problems at hand, which I’ll try to cover in a proper manner.

Shared objects as plugins

When building shared objects for plugin usage, like the case for NSS I noted, PAM plugins or extensions for languages like Ruby, Python, Perl, Java… you don’t need static libraries at all, so a shared object is enough!

While some of those system do support statically linking engine and plugins in an application, this rarely works out as intended; for instance FreeBSD (used to?) support statically linked PAM, but that worked only for the default modules, and if you configured your service authentication chain to use non-default modules you have a non-working setup. So the net result is definitely against having to support statically-linked PAM, or any other statically linked system.

Since you cannot link this stuff statically in, you can easily see that there is no need to install (nor build!) the static archive version of those things; this is usually done properly by both custom-tailored build system (as upstream likely tries to minimise the effort) and by the language-specific buildsystems (like the various incarnation of mkmf and rake-compiler in Ruby, distutils in Python, ant for Java, and so on so forth). On the other hand, especially with autotools-based build system, most people seem to forget that there is a nasty overhead in building both version, beside the waste of installing the extra file.

Indeed, since libtool will prefer building PIC objects for the shared objects (as it’s required for AMD64 and most of non-x86 architectures), and non-PIC objects for static archives (to reduce overhead); since you cannot build once for both (nor you can pre-process once, as you can have __PIC__ conditional code!) you end up having to call the compiler twice for each source file. To reduce this overhead you can usually default to disable static libraries (or disable it through ./configure invocation) or you can disable it altogether as instructed so that it only ever builds the shared object version, and not the static one. Unfortunately this does not stop libtool from installing a pointless .la file but that’s a different story.

While there is a safety check in Portage proper to check for .la and .a files in /lib, there is no such check for Ruby, Python, Perl, Java extensions. My tinderbox has an extra check for that and is usually able to find them; I also have a bug report template that tries explaining the maintainers involved why I report to them that a .la file is pointless and that they might want to fix the eventual static archives at the same time as well. Unfortunately, sometimes people decide it’s too much of a hassle to prepare the patch to send upstream and apply that in Gentoo, so you end up with ebuilds that avoid using make install to avoid installing the already built archives, or just delete them after install (works okay for .la files, since they are usually small and it’s definitely not trivial to avoid installing them), causing the double-build to still be performed.

Boot-critical programs and shared objects

Different page, correlated issue happens with boot-critical programs: things that need to be started before you mount all your local filesystems (maybe because they are needed to mount such filesystems, like lvm) need to have all their shared objects being available at that time. This becomes a problem when you end up needing libraries installed in /usr/lib and you split out /usr (similarly to what I said about /boot I don’t think the general case should be for splitting it out; sure there are cases where it’s needed for various reasons, but it shouldn’t be by default!), as they wouldn’t be able to run.

To solve this problem you either move the libraries (and all their dependencies) to /lib, or you have to statically link applications. The former creates a chain reaction that makes the whole point of splitting /usr mostly moot; the latter problem actually moves the problem down to the user: since Gentoo policy is to never force static linking on the user, as shared object linking has many many advantages, this is usually made conditional to the static USE flag; such a flag will build the software with static linking, and will thus require the dependent libraries to be available in static archive form (which is why rather than Portage features or whatever else, static libraries are usually made optional via an USE flag: it can be depended upon).

Mix the two! Shared object plugins for boot-critical programs!

And here is the reason why I merged the two problems, as they might seem just barely related: there is a case where you actually need a static archive to build a shared object plugin; that’s the case for PAM plugins that need libraries, such as pam_userdb uses Berkeley DB library for storage.

It’s not an easy case to solve, because of course you’d be looking to have the library available as static archive, but at the same time it has to be PIC to work properly… up to the 0.99 version, the solution was to build an internal copy of Berkeley DB within the PAM ebuild; without counting the additional problems with security, we ended up with a very complex ebuild, a lot more complex than it would be needed for PAM alone. I discarded that solution when I took over PAM, and split the Berkeley DB support in its own ebuild… doing the same thing as before. That ebuild has been, up to now, pretty much untouched, and the result is that we have a stale ebuild in tree using a stale version of Berkeley DB. I don’t like that situation at all.

Sincerely, after thinking about it I think the best solution at this point is simply to get rid of the stale ebuild, and decide that even though PAM is installed in /lib, it does not warrant total coverage: we won’t be moving, or building statically, things like PostgreSQL, LDAP, MySQL and so on so forth, and yes, those are all possible providers for the PAM modules. I guess we should just add one very simple statement: don’t use external-dependent modules for authenticating users and services that are boot-critical.

If I’ll still be a Gentoo developer next month, after I free myself up from my current work tasks, I’ll be merging back sys-auth/pam_userdb into sys-libs/pam, and then take care of getting the new PAM stable, and the old ebuild removed. This should solve quite a few of our problems and set a better precedent than what we have right now.