Me and a RaspberryPi: Cross-linking

You can probably guess that my target is not building packages for an embedded device on the device itself. I usually have a good luck with cross-compilation, but I usually also use very nasty hacks to get it to work. This time around I’m trying to use as few hacks as humanly possible. And linking is becoming trouble.

So first of all you need to get your cross compiler, but that’s easy:

# crossdev armv6j-hardfloat-linux-gnueabi

And everything will be taken care of for you. The whole toolchain, prefixed with armv6j-hardfloat-linux-gnueabi- will then be available, including a suitable armv6j-hardfloat-linux-gnueabi-pkg-config so that you don’t have to deal with it manually.

Now you got two options on how you want to proceed: you may only need a single configuration for all your cross-compiled targets, or you may want to customize the configuration. The former case is the easiest: a root will be set up by crossdev in /usr/armv6j-hardfloat-linux-gnueabi so you can just configure it like a chroot and just armv6j-hardfloat-linux-gnueabi-emerge --root=$path to build the packages.

But I considered that a bad hack so I wanted to do something different: I wanted to have a self-contained root and configuration. The newest GCC theoretically allows this in a very simple fashion: you just need to have already the basic components (glibc, for once) in the root, and then you can use the --with-sysroot flag to switch out of the crossdev-installed headers and libraries. Unfortunately, while the compiler behaves perfectly with this switch, the same can’t be said of the link editor.

Indeed, while even ld supports the --with-sysroot parameter, it will ignore it, making it impossible to find the libraries that are not installed in /usr/armv6j-hardfloat-linux-gnueabi. The classical solution to this is to use -L$ROOT/usr/lib -L$ROOT/lib so that the link editor is forced to search those paths as well — unfortunately this can cause problems due to the presence of .la files, and even more so due to the presence of ldscripts in /usr/lib.

You might remember a quite older post of mine that discussed the names of shared libraries. The unversioned libfoo.so name link is used by the link editor to find which library to link to when you ask for -lfoo. For most libraries this alias is provided by a symlink, but for a number of reasons (which honestly are not that clear to me) for libraries that are installed in /lib, an ldscript is created. This ldscript will provide a non-root-relative path to the shared object, so for instance $ROOT/usr/lib/libz.so will point to /lib/libz.so.1 and that’s not going to fly very well unless sysroot gets respected, but luckily for us, this does actually work.

What it seems like, is that at least the BFD link editor coming from binutils 2.23 has trouble with the implementation of --sysroot for search paths only (it works fine when expanding the paths for the ldscripts) — what about the gold link editor that should be the shiniest out there? Well, it looks like --sysroot there, while technically supported, is implemented in an even worse way: it does not use it when expanding the paths in the ldscripts, which is a 2009 fix from Gentoo on the BFD link editor side.

At the end, my current solution involves setting this in the make.conf in the root:

CFLAGS="-O2 -pipe -march=armv6j -mfpu=vfp -mfloat-abi=hard --sysroot=/home/flame/rasppy/root"
CXXFLAGS="${CFLAGS}"
LDFLAGS="-Wl,-O1,--as-needed --sysroot=/home/flame/rasppy/root -Wl,--sysroot=/home/flame/rasppy/root -L=/usr/lib"

PKG_CONFIG_SYSROOT_DIR=/home/flame/rasppy/root
PKG_CONFIG_ALLOW_SYSTEM_LIBS=1
PKG_CONFIG_LIBDIR=/home/flame/rasppy/root/usr/lib/pkgconfig
PKG_CONFIG_PATH=/home/flame/rasppy/root/usr/lib/pkgconfig:/home/flame/rasppy/root/usr/share/pkgconfig

And then use this command as emerge:

% sudo armv6j-hardfloat-linux-gnueabi-emerge --root=/home/flame/rasppy/root --config-root=/home/flame/rasppy/root

And it seems to work decently enough. Some packages, of course, fail to cross-compile at all, but that’s a story for a different time.

Postmortem of a patch, or how do you find what changed?

Two days ago, Luca asked me to help him figure out what’s going on with a patch for libav which he knew to be the right thing, but was acting up in a fashion he didn’t understand: on his computer, it increased the size of the final shared object by 80KiB — while this number is certainly not outlandish for a library such as libavcodec, it does seem odd at a first glance that a patch removing source code is increasing the final size of the executable code.

My first wild guess which (spoiler alert) turned out to be right, was that removing branches out of the functions let GCC optimize the function further and decide to inline it. But how to actually be sure? It’s time to get the right tools for the job: dev-ruby/ruby-elf, dev-util/dwarves and sys-devel/binutils enter the battlefield.

We’ve built libav with and without the patch on my server, and then rbelf-size told us more or less the same story:

% rbelf-size --diff libav-{pre,post}/avconv
        exec         data       rodata        relro          bss     overhead    allocated   filename
     6286266       170112      2093445       138872      5741920       105740     14536355   libav-pre/avconv
      +19456           +0         -592           +0           +0           +0       +18864 

Yes there’s a bug in the command, I noticed. So there is a total increase of around 20KiB, where is it split? Given this is a build that includes debug info, it’s easy to find it through codiff:

% codiff -f libav-{pre,post}/avconv
[snip]

libavcodec/dsputil.c:
  avg_no_rnd_pixels8_9_c    | -163
  avg_no_rnd_pixels8_10_c   | -163
  avg_no_rnd_pixels8_8_c    | -158
  avg_h264_qpel16_mc03_10_c | +4338
  avg_h264_qpel16_mc01_10_c | +4336
  avg_h264_qpel16_mc11_10_c | +4330
  avg_h264_qpel16_mc31_10_c | +4330
  ff_dsputil_init           | +4390
 8 functions changed, 21724 bytes added, 484 bytes removed, diff: +21240

[snip]

If you wonder why it’s adding more code than we expected, it’s because there are other places where functions have been deleted by the patch, causing some reductions in other places. Now we know that the three functions that the patch deleted did remove some code, but five other functions added 4KiB each. It’s time to find out why.

A common way to do this is to generate the assembly file (which GCC usually does not represent explicitly) to compare the two — due to the size of the dsputil translation unit, this turned out to be completely pointless — just the changes in the jump labels cause the whole file to be rewritten. So we rely instead on objdump, which allows us to get a full disassembly of the executable section of the object file:

% objdump -d libav-pre/libavcodec/dsputil.o > dsputil-pre.s
% objdump -d libav-post/libavcodec/dsputil.o > dsputil-post.s
% diff -u dsputil-{pre,post}.s | diffstat
 unknown |245013 ++++++++++++++++++++++++++++++++--------------------------------
 1 file changed, 125163 insertions(+), 119850 deletions(-)

As you can see, trying a diff between these two files is going to be pointless, first of all because of the size of the disassembled files, and secondarily because each instruction has its address-offset prefixed, which means that every single line will be different. So what to do? Well, first of all it’s useful to just isolate one of the functions so that we reduce the scope of the changes to check — I found out that there is a nice way to do so, and it involves relying on the way the function is declared in the file:

% fgrep -A3 avg_h264_qpel16_mc03_10_c dsputil-pre.s
00000000000430f0 :
   430f0:       41 54                   push   %r12
   430f2:       49 89 fc                mov    %rdi,%r12
   430f5:       55                      push   %rbp
--
[snip]

While it takes a while to come up with the correct syntax, it’s a simple sed command that can get you the data you need:

% sed -n -e '/ dsputil-func-pre.s
% sed -n -e '/ dsputil-func-post.s
% diff -u dsputil-func-{pre,post}.s | diffstat
 dsputil-func-post.s | 1430 ++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 1376 insertions(+), 54 deletions(-)

Okay that’s much better — but it’s still a lot of code to sift through, can’t we reduce it further? Well, actually… yes. My original guess was that some function was inlined; so let’s check for that. If a function is not inlined, it has to be called, the instruction for which, in this context, is callq. So let’s check if there are changes in the calls that happen:

% diff -u =(fgrep callq dsputil-func-pre.s) =(fgrep callq dsputil-func-post.s)
--- /tmp/zsh-flamehIkyD2        2013-01-24 05:53:33.880785706 -0800
+++ /tmp/zsh-flamebZp6ts        2013-01-24 05:53:33.883785509 -0800
@@ -1,7 +1,6 @@
-       e8 fc 71 fc ff          callq  a390 
-       e8 e5 71 fc ff          callq  a390 
-       e8 c6 71 fc ff          callq  a390 
-       e8 a7 71 fc ff          callq  a390 
-       e8 cd 40 fc ff          callq  72e0 
-       e8 a3 40 fc ff          callq  72e0 
-       e8 00 00 00 00          callq  43261 
+       e8 00 00 00 00          callq  8e670 
+       e8 71 bc f7 ff          callq  a390 
+       e8 52 bc f7 ff          callq  a390 
+       e8 33 bc f7 ff          callq  a390 
+       e8 14 bc f7 ff          callq  a390 
+       e8 00 00 00 00          callq  8f8d3 

Yes, I do use zsh — on the other hand, now that I look at the code above I note that there’s a bug: it does not respect $TMPDIR as it should have used /tmp/.private/flame as base path, dang!

So the quick check shows that avg_pixels8_l2_10 is no longer called — but does that account for the whole size? Let’s see if it changed:

% nm -S libav-{pre,post}/libavcodec/dsputil.o | fgrep avg_pixels8_l2_10
00000000000072e0 0000000000000112 t avg_pixels8_l2_10
00000000000072e0 0000000000000112 t avg_pixels8_l2_10

The size is the same and it’s 274 bytes. The increase is 4330 bytes, which is around 15 times more than the size of the single function — what does that mean then? Well, a quick look around shows this piece of code:

        41 b9 20 00 00 00       mov    $0x20,%r9d
        41 b8 20 00 00 00       mov    $0x20,%r8d
        89 d9                   mov    %ebx,%ecx
        4c 89 e7                mov    %r12,%rdi
        c7 04 24 10 00 00 00    movl   $0x10,(%rsp)
        e8 cd 40 fc ff          callq  72e0 
        48 8d b4 24 80 00 00    lea    0x80(%rsp),%rsi
        00 
        49 8d 7c 24 10          lea    0x10(%r12),%rdi
        41 b9 20 00 00 00       mov    $0x20,%r9d
        41 b8 20 00 00 00       mov    $0x20,%r8d
        89 d9                   mov    %ebx,%ecx
        48 89 ea                mov    %rbp,%rdx
        c7 04 24 10 00 00 00    movl   $0x10,(%rsp)
        e8 a3 40 fc ff          callq  72e0 
        48 8b 84 24 b8 04 00    mov    0x4b8(%rsp),%rax
        00 
        64 48 33 04 25 28 00    xor    %fs:0x28,%rax
        00 00 
        75 0c                   jne    4325c 

This is just a fragment but you can see that there are two calls to the function, followed by a pair of xor and jne instructions — which is the basic harness of a loop. Which means the function gets called multiple times. Knowing that this function is involved in 10-bit processing, it becomes likely that the function gets called twice per bit, or something along those lines — remove the call overhead (as the function is inlined) and you can see how twenty copies of that small function per caller account for the 4KiB.

So my guess was right, but incomplete: GCC not only inlined the function, but it also unrolled the loop, probably doing constant propagation in the process.

Is this it? Almost — the next step was to get some benchmark data when using the code, which was mostly Luca’s work (and I have next to no info on how he did that, to be entirely honest); the results on my server has been inconclusive, as the 2% loss that he originally registered was gone in further testing and would, anyway, be vastly within margin of error of a non-dedicated system — no we weren’t using full-blown profiling tools for that.

While we don’t have any sound numbers about it, what we’re worried about is for cache-starved architectures, such as Intel Atom, where the unrolling and inlining can easily cause performance loss, rather than gain — which is why all us developers facepalm in front of people using -funroll-all-loops and similar. I guess we’ll have to find an Atom system to do this kind of runs on…

Are -g options really safe?

Tonight feels like a night after a very long day. But it was just half a day spent on trying to find the end of a bug saga that started about a month ago for me.

It starts like this: postgresql-server started failing; the link editor – BFD-based ld – reported that one of the static archives installed by postgresql-base didn’t have a proper index, which should have been generated by ranlib. But simply executing ranlib on said file didn’t solve the problem.

I originally blamed the build system of PostgreSQL, but when yesterday I launched an emerge -e world to rebuild everything with GCC 4.6, another package failed in the same way: lvm2, linking to /usr/lib64/libudev.a — since I know the udev build system very well, almost like I wrote it myself, I trusted that the archive was built correctly, so it was time to look at what the real problem was.

After poking around a bit, I found that binutils’s nm, objdump and at that point even ld refused to display information for some relocated objects (ET_REL files). This would have made it very difficult to debug the issue if not for two things: first, eu-nm could see the file just fine, and second, my own home-cooked nm.rb tool that I wrote to test Ruby-Elf reported issues with the file — but without exploding.

flame@yamato mytmpfs % nm dlopen.o
nm: dlopen.o: Bad value
flame@yamato mytmpfs % eu-nm -B dlopen.o 
0000000000000000 n .LC0
                 U _GLOBAL_OFFSET_TABLE_
                 U __stack_chk_fail
                 U dlclose
                 U dlerror
                 U dlopen
00000000000001a0 T dlopen_LTX_get_vtable
                 U dlsym
                 U lt__error_string
                 U lt__set_last_error
                 U lt__zalloc
0000000000000000 t vl_exit
00000000000000a0 t vm_close
0000000000000100 t vm_open
0000000000000040 t vm_sym
0000000000000000 d vtable
0000000000000000 n wt.1af52e75450527ed
0000000000000000 n wt.2e36542536402b38
0000000000000000 n wt.32ec40f73319dfa8
0000000000000000 n wt.442ae951f162d46e
0000000000000000 n wt.90e079bbb773abcb
0000000000000000 n wt.ac43b6ac10ce5688

I don’t have the original output from my tool since I have since fixed it, but the issues were related, as you can guess from that output, to the various wt. symbols at the end of the list. Where do they come from? What does the ‘n’ symbol they are coded with mean? And why is BFD failing to deal with them? I set out to find those answers with, well, more than a hunch of what the problem would turn out to be.

So what are those symbols? Google doesn’t help at all here since searching for “wt”, even enclosed in double quotes, turns up only results for “weight”. Yes, I know it is a way to shorten that word, but what the heck, I’m looking for a certain string! The answer, actually is simple: they are additional debug symbols that are added by -gdwarf-4, which is used to include the latest DWARF format revision. This was implemented in GCC 4.6 and is supposed to reduce the size of the debug information, which is generally positive, and include more debug information.

Turns out that libbfd (the library that implements all the low level access for nm, ld and the other utilities) doesn’t like those symbols, not sure if it’s the sections they are defined on, their type (which is set to STT_NONE), or what else, but it doesn’t like them at all. Interestingly enough, this does not happen with final executables and dynamic libraries, which makes it at least bearable: only less then 40 packages had to be rebuilt on my system because they had broken static objects; unfortunately one of those was LibreOffice, d’oh!

Now, let’s look back at the nm issue though: when I started writing Ruby-Elf, I decided not to reimplement the whole suite of ELF tools, since there are already quite a few implementations of those out there. But I did write a nm tool to debug my own issues — it also worked quite nicely, because implementing access to the codes used by nm allowed me to use the same output in my elfgrep tool to show the results. This implementation, that was actually never ported to the tools framework I wrote for Ruby-Elf, didn’t get installed, and it was just part of the repository for my own good.

But after noticing that I’m more resilient than binutils’s version, and it produced more interesting output than elfutils’s, I decided to rework it and make it available as rbelf-nm, writing a man page, and listing the codes for the various symbol kinds. But before all this, I also rewrote the function of code choice. Before, it relied on binding types and then on section names to produce the right code; now it relies on the symbol type, binding types, and sections’ flags and type, making it as resilient as elfutils’s, and as informative as binutils’s, up to what I encountered right now.

I also released a new version (1.0.6.1, don’t ask!) that includes the new tool, and it is already on RubyGems and Portage if you wish to use it. Please remember that the project has a Flattr page so if you like the project, your support is definitely welcome.

Gold readiness obstacle #3: side-by-side selection

One question I have been asked by developers and power users alike, is “how do I safely test gold?”, and slightly down the FAQ list “is there an ebuild for it?”. The answers to these questions are quite difficult to give, and it’s virtually impossible to just give a single answer to them. But I’ll try.

The second question is the one that has to be answered first. As a link editor, gold is not built standalone; it is still built as part of GNU binutils, just like the bfd-based link editor, while it is mostly standalone from the rest of the utilities. For a (little) while, Gentoo used to have an USE flag that could turn on building/installing gold, but then it was dropped, mostly because it was clearly too soon to have it even available. Right now, there is only one version to build a binutils package that uses gold as default link editor: the EXTRA_ECONF variable has to be set to --enable-gold=default (without the =default part it would build and install gold, but it wouldn’t be the preferred link editor — ld).

I could give more hints about how to do this, but I’m not going to do so; if you do not know how to set the variable, you really shouldn’t be playing with gold right now, it is not safe at this level just yet.

With the configure switch I give above, you both build gold and set it as the preferred link editor, not the only one though, as the old bfd editor is also built and installed. So what’s the problem? Doesn’t that mean that it is possible to have both and switch then at runtime? Unfortunately, not so much.

Unfortunately GCC does not use a variable to choose which linker to use, it only looks for three executable names (real-ld, collect-ld and ld) in a series of path, starting with the ones listed in COMPILER_PATH environment variable. On an unmodified Gentoo system, what the compiler will execute is something like this (taken from my laptop): /usr/lib/gcc/x86_64-pc-linux-gnu/4.5.2/../../../../x86_64-pc-linux-gnu/bin/ld, which is a symlink created directly by the binutils package. Of course you can use custom COMPILER_PATH values to play with gold, but that’s definitely not something you wish to keep doing for a very long time.

The perfect solution would be to have a tool to select different linker versions, which is what binutils-config would be supposed to do. Unfortunately it never worked as it was properly intended, and the couple of people who tried implementing eselect binutils before has shown that it’s not a trivial task to implement multiple binutils versions slotting properly. the same goes for eselect compiler. I think I have already proposed implementing new, complex eselect tools as part of GSoC, but it seems like nobody picked it up. I’m afraid that this could still be an option next year.

Gold readiness obstacle #1: Berkeley DB

I have already said that I’m working on getting gold tested a couple of years after its introduction. The situation with this linker is a bit difficult to assess; Google is set on making heavy use of it, and is supposedly faster to link Chrome (even though it uses an inconsiderate amount of RAM to achieve that — it’s the usual approach of Google software I guess: you can always throw more RAM, you can’t always throw more time!), among others. On the other hand I can tell for sure that no distribution tried to build their whole package set with it yet, simply by looking at the kind of packages that fail to build for one reason or another.

I’ll leave the failures that are important to other, non-Gentoo-based distributions for the next few posts; today, the target is a failure that limits itself to Gentoo systems, because it involves a workaround we implemented a long time ago, which is now going to bite our ass until we either solve it at once, or find an alternative workaround. But let’s start with the original problem.

The Berkeley DB library (berkdb) – which is now maintained by Oracle, for the record – is a very common library used for storing data in plain files. There are a number of different “generations” of API, one of which is provided by the FreeBSD C library as well (db1); and the very generic API structure (dbm) is also implemented by the GNU-project gdbm library. The use of BerkDB was much more prominent in day-to-day life a couple of years ago for any Linux user; nowadays, the storage format and library of preference is SQLite (to the point that even BerkDB itself provides an SQLite-based interface to its own storage format. But even so, it is very difficult to do without BerkDB: LibreOffice, Postfix, Evolution, Squid, Perl, … they all require BerkDB for this or that feature.

Unfortunately the most recent generation of APIs for Berkeley DB is still varying widely, and the format is not always compatible between minor version changes (so from 4.4 to 4.5, and so on). For these reasons, Gentoo has been allowing side-by-side installation of multiple Berkeley DB versions at the same time, so-called slotting. By allowing non-rebuilt software to still use the old version (and the old files), as well as allowing access to the utilities of the previous format, you make sysadmins’ work easier, usually. Unfortunately, since the functions present on more than one minor version have the same exact name, Gentoo users and developers ended up hitting ELF symbol collisions when programs and libraries linked different Berkeley DB versions.

Turns out that GLIBC is actually designed keeping this in mind, and includes symbol versioning to solve the issue: a particular string is assigned t each symbol, so that you can have multiple libraries providing ABI-incompatible symbols with the same name – usually there is a need for the API to be at least partially compatible, but I don’t want to go in too many details now – without clashes and collisions. To provide versioning you have three main option: inline with the C sources, through the use of a version script, or, with GNU ld/bfd, through the --default-symver option, which sets the version string of each symbol to the soname of the library it is exported from. This was a godsend for Gentoo at the time because it allowed avoiding collisions without having to edit anything in the build system: you just had to add the flag to the linker’s flags in the ebuild and voilà.

If you’re now wondering whether GNU gold supports this option, you’re on the right track. The answer is “no, not right now”, right now it chokes on such an option, which results in Berkeley DB reporting the compiler to be unable to create executables. Whether it will support said option or not in the future is still to be seen. Last time I tried to implement a bfd/ld feature in gold – namely support for emitting explicitly unversioned symbols, which is needed to build FUSE – the results have been disappointing although I understand there is a problem with implementing a build feature that cannot work at runtime right now.

So unless gold gains the same option, we need to find another solution or ignore the existence of gold for a while longer. An alternative that I have been told about already would be to replace the current --default-symver option with a --version-script option pointing to an explicit version script to set the version. Unfortunately, this is not as easy done as it is said, at least for the versions we have in tree right now. A similar blanket-version approach would make no issue if it was introduced with a new slot of the package, as the version would have to be different either way, but it wouldn’t work to keep binary compatibility with the older versions.

The problem is that BerkDB isn’t installing a single library, but a number of them instead; and since --default-symver uses the library’s soname when creating the versions for its symbols, it means that for each library, you’d need a different version. Implementing this same method through use of standard versioning scripts would be a world of pain, and probably not worth the prize. For now, I decided to simply mask BerkDB on the container that is testing gold, forcing as many packages as possible to use gdbm instead, which does not have the same problem.

I’m glad we decided not to go the same route with expat, even though the immediate fallout at the time was out of scale (at the time it was a dream even to think about using --as-needed.la files are a joke in comparison!), it saved us the headache of reaching the point where we decide whether to forgo modern tools, or break binary compatibility again.

At any rate this is just the tip of the iceberg, about gold and real-world software. I’ll write more about this in the next days as I find time. For now, I wouldn’t mind if you noted your interest on testing gold… comments, flattrs (on the blog, post or, even better, tinderbox since that’s what is doing the work!) and other tokens are definitely appreciated. At least it would tell me I’m not wrong in insisting spending time reporting and solving the gold bugs.

Are we done with LDFLAGS?

Quite some weeks ago, Markos stated asking for a more thorough testing of ebuilds to check whether they support LDFLAGS; he then described why that is important which can be summarised in “if LDFLAGS are ignored, setting --as-needed by default is pointless”. He’s right on this of course, but there are a few more tricks to consider.

You might remember that I described an alternative approach to --as-needed that I call “forced --as-needed”, by editing the GCC specifications. The idea is that by forcing the flag in through this method, packages ignoring LDFLAGS and packages using unpatched libtool won’t simply ignore it. This was one of the things I had in mind when I suggested that approach, but alas, as it happens, I wasn’t listened to.

Right now there are 768 bugs reported blocking the tracker of which 404 are open and, in the total, 635 were reported by me (mostly through the tinderbox but not limited to). And I’m still opening bugs on a daily basis for those.

But let’s say that these are all the problems there are, and that in two weeks time no package is reporting ignored LDFLAGS at all. Will that mean that all packages will properly build with --as-needed at that point? Not really.

Unfortunately, LDFLAGS-ignored warnings have both false positives (prebuild binary packages, packages linking with non-GNU linkers) and false partially-negatives. To understand that you have to understand how the current warning works. A few binutils versions ago, a new option was introduced, called --hash-style; this option is used to change the way the symbol table is hashed for faster loading by the runtime linker/loader (ld.so); the classic hashing algorithm (SysV) is not excessively good for very long, similar symbols that are way too common when using C++, so there has been some work to replace that with a better GNU-specific algorithm. I’ve followed most of the related development closely at the time, since Michael Meeks (of OpenOffice.org fame) actually came asking Gentoo for some help testing things out; it was that work that actually got me interested in linkers and the ELF format.

At any rate, while the original hash table was created in the .hash section, the new hash table, being incompatible, is in .gnu.hash. The original patch only replaced one with the other, but what actually landed in binutils was slightly different (it allows to choose between SysV, GNU, or both hash tables), and the default has been to enable both hash tables, so that older systems can load .hash and the new ones can load .gnu.hash; win win. On the other hand, on Gentoo (and many other distributions) where excessively old GLIBC versions are not supported at all, there is enough of a reason to use --hash-style=gnu to disable the generation of the SysV table entirely.

Now, the Portage warning is derived by this situation: if you add -Wl,--hash-style=gnu to your LDFLAGS, it will be checking the generated ELF files and warn if it finds the SysV .hash section. Unfortunately this does not work for non-Linux profiles (as far as I know, FreeBSD does not support the GNU hashing style yet; uClibc does), and will obviously report all the prebuilt binaries coming from proprietary packages. In those cases, you don’t want to strip .hash because it might be the only hash table preventing ld.so from doing a linear search.

So, what is the problem with this test? Well, let’s note one big difference between --as-needed and --hash-style flags: the former is positional, the second is not. That means that --as-needed, to work as intended, needs to be added before the actual libraries are listed, while --hash-style can come at any point in the command line. Unfortunately this means that if any package has a linking line such as

$(CC) $(LIBS) $(OBJECTS) $(LDFLAGS)

It won’t be hitting the LDFLAGS warning from Portage, but (basic) --as-needed would fail — OTOH, my forced --as-needed method would work just fine. So there is definitely enough work to do here for the next… years?

Enabling –as-needed, whose task is it?

A “fellow” Gentoo developer today told me that we shouldn’t try to get --as-needed working because it’s a task for upstream (he actually used the word “distributors” to intend that neither Gentoo nor any other vendor should do that, silly him)… this developer will go unnamed because I’ll also complain right away that he suggested I give up on Gentoo when I gave one particular reason (that I’ll repeat in a moment) for us to do that instead. Just so you know, if it was up to me, that particular developer right now would have his access revoked. Luckily for him, I have no such powers.

Anyway, let me try to put in proper writing why it should fall to Gentoo to enable that by default to protect our users.

The reason I gave above (it was a twitter exchange so I couldn’t articulate it completely), is that “Fedora does it already”. To be clear, both Fedora and Suse/Novell do it already. I’m quite sure Debian doesn’t do it, and I guess Ubuntu doesn’t do that either to keep in line with Debian. And the original hints we took to start with --as-needed came from AltLinux. This alone means that there is quite a bit of consensus out there that it should be a task for distributors to look at. And it should say a lot that the problems solved by --as-needed are marginal for binary distributions like the ones I named here; they all do that simply to reduce the time needed for their binary packages to rebuild, rather than to avoid breaking users’ systems!

But I’m the first person to say that the phrase “$OtherDistribution does it, why don’t you?” is bogus and actually can cause more headaches than it solves. Although most of the time, this is meant when $Otherdistribution is Ubuntu or a relative of theirs. I seriously think that we should take a few more hints from Fedora; not clone them, but they have very strong system-level developers working on their distributions. But that’s a different point altogether by now.

So what other reasons are there for us to provide --as-needed rather than upstream? Well, actually this is the wrong question; the one you should formulate is “Why is it not upstream’s task to use --as-needed?”. While I have been pushing --as-needed support in a few of my upstream packages before, I think by now that it’s not the correct solution. It all boils down to who knows better whether it’s safe to enable --as-needed or not. There are a few things you should assess before enabling --as-needed:

  • does the linker support --as-needed? it’s easier said than done; the linker might understand the flag, but supporting is it another story; there are older versions of ld still in active use that will crash when using it; other with a too greedy --as-needed that will drop libraries that are needed, and only recently the softer implementation was added; while upstream could check for a particualr ld version, what about backports?
  • do the libraries you link to, link to all their needed dependencies? one of the original problems with --as-needed when introduced to the tree was that you’d have to rebuild one of the dependencies because it relied on transitive linking, which --as-needed disallowed (especially in its original, un-softened form); how can a given package make sure that its dependencies are all fine before enabling --as-needed?
  • do the operating system at all support --as-needed? while Gentoo/FreeBSD uses modern binutils, and (most of) the libraries are built so that all the dependencies are linked in for --as-needed support, using it is simply not possible (or wasn’t possible at least…), because gcc will not allow for linking the threading libraries in for compatibility with pre-libmap.conf FreeBSD versions; this has changed recently for Gentoo since we only support more recent versions of the loader that don’t have that limitation; even more so, how can upstream know whether the compiler will have the fix already or not?

Of course you can “solve” most of the doubts by running runtime tests; but is that what upstreams should do? Running multiple tests from multiple packages require sharing the knowledge and risks for the tests to get out-of-sync one with the other; you have redundancy of work.. when instead the distributor can simply decide on whether using --as-needed is part of their priorities or not. It definitely is for Fedora, Suse, AltLinux… it should be for Gentoo as well, especially as a source-based distribution!

Of course, you can find some case-by-case where --as-needed will not work properly; PulseAudio is unfortunately one of those, and I haven’t had the time to debug binutils to see why the softer rules don’t work well in that case. But after years working on this idea, I’m very sure that it’s a very low percentage of stuff that fails to work properly with this, and we should not be taken hostage by a handful of packages out of over ten thousands!

But, when you don’t care about the users’ experience, when you’re just lazy or you think your packages are “special” and deserve to break any possible rule, you can afford yourself to ignore --as-needed. Is that the kind of developers Gentoo should have, though? I don’t think so!

The why and how of RPATH

This post is brought to you by a conversation with Fabio, which actually reminded me of an older conversation I had with someone else (exactly whom, right now, escapes me) about the ability to inject RPATH values into already-built binaries. I’m sorry to have forgotten who asked me about that, I hope he or she won’t take it bad.

But before I go to the core of the problem, let me try to give a brief introduction of what we’re talking about here, because jumping straight to talk about injection is going to be too much for most of my readers, I’m sure. Even though, this whole topic reconnects with multiple other smaller topics I discussed about in the past on my blog, so you’ll see a few links here and there for that.

First of all, what the heck is RPATH? When using dynamic linking (shared libraries), the operating system need to know where to look for the libraries an executable uses; to do so, each operating system has one or more PATH variables, plus eventual configuration files, that are used to look up the libraries; on Windows, for instance, the same PATH variable that is used to find the commands is used to load the libraries; and the libraries are looked for in the same directory where the executable is first of all. On Unix, the commands and the libraries use distinct paths, by design, and the executable’s directory is not searched for; this is also because the two directories are quite distinct (/usr/bin vs /usr/lib as an example). The GNU/Linux loader (and here the name is proper, as the loader comes out of GLIBC — almost identical behaviour is expected by uClibc, FreeBSD and other OSes but I know that much for sure) differs extensively from the Sun loader; I say this because I’ll introduce the Sun system later.

In your average GNU/Linux system, including Gentoo, the paths where to look up the libraries in are defined in the /etc/ld.so.conf file; prepended to that list, the LD_LIBRARY_PATH variable is used. (There is a second variable, LDPATH LIBRARY_PATH, that tells the gcc frotnend to the linker where to look for libraries to link to at build time, rather than to load — update: thanks to Fabio who pointed me I got the wrong variable; LDPATH is used by the env-update script to set the proper search paths in the ld.so.conf file ). All the executables will look in all these paths for both the libraries they link to directly and for non-absolute dlopen() calls. But what happens with private libraries — libraries that are shared only among a small number of executables coming from the same package?

The obvious choice is to install them normally in the same path as the general libraries; this does not require playing with the search paths at all, but it causes two problems: the build-time linker will still find them during link time, and it might not be what you want and it increases the number of files present in the single directory (which means accessing its content slows down, little by little). The common alternative approach is installing it in a sub-directory that is specific to the package (automake already provides a pkglib installation class for this type of libraries). I already discussed and advocated this solution so that internal libraries are not “shown” to the rest of the software on the system.

Of course, adding the internal library directories to the global search path also means slowing down the libraries’ load, as more directories are being searched when looking for the libraries. To solve this issue, runpath first and rpath now is used. The DT_RPATH is a .dynamic attribute that provides further search paths for the object it is listed in (it can be used both for executables and for shared objects); the list is inserted in-between the LD_LIBRARY_PATH environment variable and the search libraries from the /etc/ld.so.conf file. In that paths, there are two special cases: $ORIGIN is expanded to be the path of the directory where the object is found, while both empty and . values in the list are meant to refer to the current work directory (so-called insecure rpath).

Now, while rpath is not a panacea and also slightly slow down the load of the executable, it should have decent effects, especially by not requiring further symlinks to switch among almost-equivalent libraries that don’t have ABIs that stable. They also get very useful when you want to build a software you’re developing so that it loads your special libraries rather than the system one, without relying on wrapper scripts like libtool does.

To create an RPATH entry, you simply tell it to the linker (not the compiler!), so for instance you could add to your ./configure call the string LDFLAGS=-Wl,-rpath,/my/software/base/my/lib/.libs to build and run against a specific version of your library. But what about an already-linked binary? The idea of using RPATHs make it also for a nicer handling of binary packages and their dependencies, so there is an obvious advantage in having the RPATH editable after the final link took place… unfortunately this isn’t as easy. While there is a tool called chrpath that allows you to change an already-present RPATH, and especially to delete one (it comes handy to resolve insecure rpath problems), it has two limitations: the new RPATH cannot be longer than the previous one, and you cannot add a new one from scratch.

The reason is that the .dynamic entries are fixed-sized; you can remove an RPATH by setting its type to NULL, so that the dynamic loader can skip over it; you can edit an already-present RPATH by changing the string it points to in the string table, but you cannot extend neither .dynamic nor the string table itself. This reduces the usefulness of RPATH for binary package to almost nothing. Is there anything we can do to improve on this? Well, yes.

Ali Bahrami at Sun already solved this problem in June 2007, which means just over three years ago. They implemented it with a very simple trick that could be implemented totally in the build-time linker, without having to change even just a bit of the dynamic loader: they added padding!

The only new definition they had to add was DT_SUNW_STRPAD, a new entry in the .dynamic table that gives you the size of the padding space at the end of the string table; together with that, they added a number of extra DT_NULL entries in the same .dynamic table. Since all the entries in .dynamic are fixed sized, a DT_NULL can become a DT_RPATH without a problem. Even if some broken software might expect the DT_NULL at the end of the table (which would be wrong anyway), you just need to keep them at the end. All the ELF software should ignore the .dynamic entries they don’t understand, as long as they are in the reserved specific ranges, at least.

Unfortunately, as far as I know, there is no implementation of this idea in the GNU toolchain (GNU binutils is where ld is). It shouldn’t be hard to implement, as I said; it’s just a matter of emitting a defined amount of zeros at the end of a string table, and add a new .dynamic tag with its size… the same rpath command from OpenSolaris would probably be usable on Linux after that. I considered porting this myself, but I have no direct need for it; if you develop proprietary software for Linux, or need this feature for deployment, though, you can contact me for a quote on implementing it. Or you can do it yourself or find someone else.

The neverending fun of debugging a debugger

In the previous post of mine I’ve noted that I found some issues with the Mono-implemented software monosim. Luckily upstream understood the problem and he’s working on it. In the mean time I’ve had my share of fun because mono-debugger (mdb) does not seem to work properly for me. Since I also need Mono for a job task I’m working on, I’ve decided to work on fixing the issue.

So considering my knowledge of Mono is above the average user, but still not that high, I decided to ask on #mono (on gimpnet). With all due respect, the developers could really try to be friendly, especially with a fellow Free Software enthusiast that is just looking for help to fix the issue himself:

 thread_db is a libc feature I think to do debugging
 Chances are, you are no an "interesting" Linux distro
 One of those with "Roll your own optimization flags" that tend to break libc
 miguel_ miguel
 miguel, yes using gentoo but libc and debugging with gdb are fine...
 I knew it ;-)
 Yup, most stuff will appear to work
 But it breaks things in subtle ways
 and I can debug the problem libc side if needed, I just need to understand what's happening mono-side
 You need to complain to the GDB maintainers on your distro
 All the source code is available, grep for the error message
 Perhaps libthread_db is not availabel on your system
 it is available, already ruled the simple part out :)
 and yes, I have been looking at the code, but I'm not really that an expert on the mono side so I'm having an hard time to follow exactly what is trying to do

As you can see, even if Miguel started already with the snarky comments, I tried keeping it pretty lightweight; after all, Lennart does have his cheap shots at Gentoo, but I find him a pretty decent guy after all…

Somebody else, instead, was able to piss me off in a single phrase:

 i thought the point with gentoo was that if you watch make output scrolling, you can call yourself a dev ;)

Now, maybe if Mr Shields were to actually not piss other developers off without reason, he wouldn’t be badmouthed so much for his blogs. And I’m not one of those badmouthing him, the Mono project or anything else related to that up to now. I actually already stated that I like the language, and find the idea pretty useful, if with a few technical limitations.

Now, let’s get back to what the problem is: the not-very-descriptive error message that I get from the mono debugger (that thread_db, the debug library provided by glibc, couldn’t be initialised) is due to the fact that glibc tries to check if the NPTL thread library is loaded first, and to do that it tries to reach the (static!) variable nptl_version. Since it’s a static variable, nm(1) won’t be able to see it, although I can’t seem to find it with pfunct either; to be precise, it’ll be checking that the version corresponds too, but the problem is that it’s not found in the first place.

Debugging this is pretty difficult: the mono-debugger code does not throw an exception for the particular reason that thread_db couldn’t be initialised, but simply states the obvious. From there, you have to backtrace manually in the code (manually at first because mono-debugger ignored all the user-provided CFLAGS, included my -ggdb to get debug information!), and the sequence call is C# → C (mono-develop) → C (thread_db) → C (mono-develop) → C# → C (internal libbfd). Indeed it jumps around with similarly-called functions and other fun stuff that really drove me crazy at first.

Right now I cut the chase at knowing that libbfd was unable to find the libpthread.so library. The reason for that is still unknown to me, but to reduce the amount of code that is actually being used, I’ve decided to remove the internal libbfd version in favour of the system one; while the ABI is not stable (and thus you would end up rebuilding anything using libbfd at any binutils bump), the API doesn’t usually change tremendously, and there usually is enough time to fix it up if needed; indeed from the internal copy to the system copy, the only API breakage is one struct’s member name, which I fixed with a bit of autotools mojo. The patches are not yet available but I’ll be submitting them soon; the difference with an without the included libbfd is quite nice:

flame@yamato mono-debugger-2.4.2 % qsize mono-debugger
dev-util/mono-debugger-2.4.2: 25 files, 21 non-files, 4944.144 KB
flame@yamato mono-debugger-2.4.2 % qsize mono-debugger
dev-util/mono-debugger-2.4.2: 25 files, 21 non-files, 2020.972 KB

In the package there is also an internal copy of libedit; I guess because it’s not often found in distributions, but we have it, and on Gentoo/FreeBSD it’s also part of the system, so…

Now, no doubt that this hasn’t brought me yet to find what the problem is, and it’s quite likely that the problem is Gentoo specific since it seems to be working fine both on my Fedora and other systems. But is the right move for the Mono team to diss off a (major, I’ll have to say) developer of a distribution that isn’t considering removing Mono from their repository?

Fool-proof…

Not in the (good) sense of making something that a fool won’t be able to break, but in the negative sense of something being a proof that you are a fool.

When I check bugs that seem far-fetched or silly or I cannot reproduce after a few tries, I have one very easy way to close the bug as invalid with a very low false positive ratio usually: I check the linker flags (LDFLAGS). I don’t rule out bugs depending on the compiler flags usually, it takes a lot of bogus stuff for me to ignore your bug for that; I’m the first one to use excessive flags to test stuff and I rarely find them to be extremely bad behaving. But there is one thing I can see in LDFLAGS that does taint the bugs as coming from an user with not enough clue.

I don’t want to insult anybody, but I do think that if you do that mistake, you seriously should think again about what you are doing.

The tainting flag is --enable-new-dtags; yeah the same --enable-new-dtags that I’ve blogged about almost three years ago in relation to the stupid hacks for “kdenewldflags USE flag”.

This is an interesting linker flag that does… nothing useful. Well, not exactly okay. It does usually enable some new tags for the .dynamic section, in particular the DT_RUNPATH that replaces DT_RPATH for security reasons. But it doesn’t make any sense when using the Gentoo version of binutils: we force the new dtags on! Now if you were to have vanilla binutils installed I could understand your use of the flag, but given you shouldn’t be bothering us with bugs if you were using that (the vanilla USE flag is useful when reporting bugs upstream), it is a nice way to cut you out anyway.

Why do I use that has a “PEBKAC flag”, given it doesn’t do anything bad in particular? Indeed I have been told that the ld man page still said (and for most users still say) that new dtags are disabled by default, and I actually went around to add a patch in the latest masked binutils to change the man and info pages reflecting this. My reasoning is that most people would have this flag in because it was spread around by people reading (without understanding) the above-noted KDE macro.

So almost all people who has --enable-new-dtags in their LDFLAGS variable are people who didn’t understand a bit what LDFLAGS really is and just copied the list of entries out of a forum post, an uninformed blog, or maybe the old Gentoo Wiki. I don’t really want to care, if you cannot get that the flag is pointless, you don’t get your non-obvious bugs serviced by me.

A similar thing is up for those who have -Wl,-O2 .. considering that the linker has no optimisation level beside the one.

Is this stated clearly enough?