TEXTRELs (Text Relocations) and their impact on hardening techniques

You might have seen the word TEXTREL thrown around security or hardening circles, or used in Gentoo Linux installation warnings, but one thing that is clear out there is that the documentation around this term is not very useful to understand why they are a problem. so I’ve been asked to write something about it.

Let’s start with taking apart the terminology. TEXTREL is jargon for “text relocation”, which is once again more jargon, as “text” in this case means “code portion of an executable file.” Indeed, in ELF files, the .text section is the one that contains all the actual machine code.

As for “relocation”, the term is related to dynamic loaders. It is the process of modifying the data loaded from the loaded file to suit its placement within memory. This might also require some explanation.

When you build code into executables, any named reference is translated into an address instead. This includes, among others, variables, functions, constants and labels — and also some unnamed references such as branch destinations on statements such as if and for.

These references fall into two main typologies: relative and absolute references. This is the easiest part to explain: a relative reference takes some address as “base” and then adds or subtracts from it. Indeed, many architectures have a “base register” which is used for relative references. In case of executable code, particularly with the reference to labels and branch destinations, relative references translate into relative jumps, which are relative to the current instruction pointer. An absolute reference is instead a fully qualified pointer to memory, well at least to the address space of the running process.

While absolute addresses are kinda obvious as a concept, they are not very practical for a compiler to emit in many cases. For instance, when building shared objects, there is no way for the compiler to know which addresses to use, particularly because a single process can load multiple objects, and they need to all be loaded at different addresses. So instead of writing to the file the actual final (unknown) address, what gets written by the compiler first – and by the link editor afterwards – is a placeholder. It might sound ironic, but an absolute reference is then emitted as a relative reference based upon the loading address of the object itself.

When the loader takes an object and loads it to memory, it’ll be mapped at a given “start” address. After that, the absolute references are inspected, and the relative placeholder resolved to the final absolute address. This is the process of relocation. Different types of relocation (or displacements) exists, but they are not the topic of this post.

Relocations as described up until now can apply to both data and code, but we single out code relocations as TEXTRELs. The reason for this is to be found in mitigation (or hardening) techniques. In particular, what is called W^X, NX or PaX. The basic idea of this technique is to disallow modification to executable areas of memory, by forcing the mapped pages to either be writable or executable, but not both (W^X reads “writable xor executable”.) This has a number of drawbacks, which are most clearly visible with JIT (Just-in-Time) compilation processes, including most JavaScript engines.

But beside JIT problem, there is the problem with relocations happening in code section of an executable, too. Since the relocations need to be written to, it is not feasible (or at least not easy) to provide an exclusive writeable or executable access to those. Well, there are theoretical ways to produce that result, but it complicates memory management significantly, so the short version is that generally speaking, TEXTRELs and W^X techniques don’t go well together.

This is further complicated by another mitigation strategy: ASLR, Address Space Layout Randomization. In particular, ASLR fully defeats prelinking as a strategy for dealing with TEXTRELs — theoretically on a system that allows TEXTREL but has the address space to map every single shared object at a fixed address, it would not be necessary to relocate at runtime. For stronger ASLR you also want to make sure that the executables themselves are mapped at different addresses, so you use PIE, Position Independent Executable, to make sure they don’t depend on a single stable loading address.

Usage of PIE was for a long while limited to a few select hardened distributions, such as Gentoo Hardened, but it’s getting more common, as ASLR is a fairly effective mitigation strategy even for binary distributions where otherwise function offsets would be known to an attacker.

At the same time, SELinux also implements protection against text relocation, so you no longer need to have a patched hardened kernel to provide this protection.

Similarly, Android 6 is now disallowing the generation of shared objects with text relocations, although I have no idea if binaries built to target this new SDK version gain any more protection at runtime, since it’s not really my area of expertise.

Future planning for Ruby-Elf

My work on Ruby-Elf tends to happen in “sprees” whenever I actually need something from it that wasn’t supported before — I guess this is true for many projects out there, but it seems to happen pretty regularly for me with my projects. The other day I prepared a new release after fixing the bug I found while doing the postmortem of a libav patch — and then I proceeded giving another run to my usual collisions check after noting that I could improve the performance of the regular expressions …

But where is it directed, as it is? Well, I hope I’ll be able to have version 2.0 out before end of 2013 — in this version, I want to make sure I get full support for archives, so that I can actually analyze static archives without having to extract them beforehand. I’ve got a branch with the code to get access to the archives themselves, but it can only extract the file before actually being able to read it. The key in supporting archives would probably be supporting in-memory IO objects, as well as offset-in-file objects.

I’ve also found an interesting gem called bindata which seems to provide a decent way to decode binary data in Ruby without having to actually fully pre-decode it. This would probably be a killer for Ruby-Elf, as a lot of the time I’m forcibly decoding everything because it was extremely difficult to access it on the spot — so the first big change for Ruby-Elf 2 is going to be to drop down the task of decoding to bindata (or, otherwise, another similar gem).

Another change that I plan is to drop the current version of the man pages. While DocBook is a decent way to deal with man pages, and standard enough to be around in most distributions, it’s one “strange” dependency for a Ruby package — and honestly the XML is a bit too verbose sometimes. For the most horsey beefy man pages, the generated roff page is half as big as the source, which is the other way around from what anybody would expect them.

So I’m quite decided that the next version of Ruby-Elf will use Markdown for the man pages — while it does not have the same amount of semantic tagging, and thus I might have to handle some styling in the synopsis manually, using something like md2man should be easy (I’m not going to use ronn because of the old issue with JRuby and rdiscount) and at the same time, it gives me a public HTML version for free, thanks to GitHub conversion.

Finally, I really hope that by Ruby-Elf 2 I’ll be able to get least the symbol demangler for the Itanium C++ ABI — that is the one used by modern GCC, yes, it was originally specified for the Itanic. Working toward supporting the full DWARF specification is something that is on the back of my mind but I’m not very convinced right now, because it’s huge. Also, if I were to implement it I would then have to rename the library to Dungeon.

Redundant symbols

So I’ve decided to dust off my link collision script and see what the situation is nowadays. I’ve made sure that all the suppression file use non-capturing groups on the regular expressions – as that should improve the performance of the regexp matching – make it more resilient to issues within the files (metasploit ELF files are barely valid), and run it through.

Well, it turns out that the situation is bleaker than ever. Beside the obvious amount of symbols with a too-common name, there are still a lot of libraries and programs exporting default bison/flex symbols the same way I found them in 2008:

Symbol yylineno@ (64-bit UNIX - System V AMD x86-64) present 59 times
Symbol yyparse@ (64-bit UNIX - System V AMD x86-64) present 53 times
Symbol yylex@ (64-bit UNIX - System V AMD x86-64) present 49 times
Symbol yy_flush_buffer@ (64-bit UNIX - System V AMD x86-64) present 48 times
Symbol yy_scan_buffer@ (64-bit UNIX - System V AMD x86-64) present 48 times
Symbol yy_scan_bytes@ (64-bit UNIX - System V AMD x86-64) present 48 times
Symbol yy_scan_string@ (64-bit UNIX - System V AMD x86-64) present 48 times
Symbol yy_create_buffer@ (64-bit UNIX - System V AMD x86-64) present 47 times
Symbol yy_delete_buffer@ (64-bit UNIX - System V AMD x86-64) present 47 times
[...]

Note that at least one library got to export them to be listed in this output; indeed these symbols are present in quite a long list of libraries. I’m not going to track down each and every of them though, but I guess I’ll keep an eye on that list so that if problems arise that can easily be tracked down to this kind of collisions.

Action Item: I guess my next post is going to be a quick way to handle building flex/bison sources without exposing these symbols, for both programs and libraries.

But this is not the only issue — I’ve already mentioned a long time ago that a single installed system already brings in a huge number of redundant hashing functions; on the tinderbox as it was when I scanned it, there were 57 md5_init functions (and this without counting different function names!). Some of this I’m sure boils down to gnulib making it available, and the fact that, unlike the BSD libraries, GLIBC does not have public hashing functions — using libcrypto is not an option for many people.

Action item: I’m not very big of benchmarks myself, never understood the proper way to go around getting the real data rather than being fooled by the scheduler. Somebody who’s more apt at that might want to gather a bunch of libraries providing MD5/SHA1/SHA256 hashing interfaces, and produce some graphs that can let us know whether it’s time to switch to libgcrypt, or nettle, or whatever else that provides us with good performance as well as with a widely-compatible license.

The presence of duplicates of memory-management symbols such as malloc and company is not that big of a deal, at first sight. After all, we have a bunch of wrappers that use interposing to account for memory usage, plus another bunch to provide alternative allocation strategies that should be faster depending on the way you use your memory. The whole thing is not bad by itself, but when you get one of graphviz’s libraries (libgvpr) to expose malloc something sounds wrong. Indeed, if even after updating my suppression filter to ignore the duplicates coming from gperftools and TBB, I get 40 copies of realloc() something sounds extremely wrong:

Symbol realloc@ (64-bit UNIX - System V AMD x86-64) present 40 times
  libgvpr
  /mnt/tbamd64/bin/ksh
  /mnt/tbamd64/bin/tcsh
  /mnt/tbamd64/usr/bin/gtk-gnutella
  /mnt/tbamd64/usr/bin/makefb
  /mnt/tbamd64/usr/bin/matbuild
  /mnt/tbamd64/usr/bin/matprune
  /mnt/tbamd64/usr/bin/matsolve
  /mnt/tbamd64/usr/bin/polyselect
  /mnt/tbamd64/usr/bin/procrels
  /mnt/tbamd64/usr/bin/sieve
  /mnt/tbamd64/usr/bin/sqrt
  /mnt/tbamd64/usr/lib64/chromium-browser/chrome
  /mnt/tbamd64/usr/lib64/chromium-browser/chromedriver
  /mnt/tbamd64/usr/lib64/chromium-browser/libppGoogleNaClPluginChrome.so
  /mnt/tbamd64/usr/lib64/chromium-browser/nacl_helper
  /mnt/tbamd64/usr/lib64/firefox/firefox
  /mnt/tbamd64/usr/lib64/firefox/firefox-bin
  /mnt/tbamd64/usr/lib64/firefox/mozilla-xremote-client
  /mnt/tbamd64/usr/lib64/firefox/plugin-container
  /mnt/tbamd64/usr/lib64/firefox/webapprt-stub
  /mnt/tbamd64/usr/lib64/mercury/lib/asm_fast.gc/libcurs.so
  /mnt/tbamd64/usr/lib64/mercury/lib/asm_fast.gc/libmcurses.so
  /mnt/tbamd64/usr/lib64/mercury/lib/asm_fast.gc.memprof/libcurs.so
  /mnt/tbamd64/usr/lib64/mercury/lib/asm_fast.gc.memprof/libmcurses.so
  /mnt/tbamd64/usr/lib64/mercury/lib/asm_fast.gc.prof/libcurs.so
  /mnt/tbamd64/usr/lib64/mercury/lib/asm_fast.gc.prof/libmcurses.so
  /mnt/tbamd64/usr/lib64/mercury/lib/asm_fast.gc.trseg.debug/libcurs.so
  /mnt/tbamd64/usr/lib64/mercury/lib/asm_fast.gc.trseg.debug/libmcurses.so
  /mnt/tbamd64/usr/lib64/mercury/lib/asm_fast.gc.trseg/libcurs.so
  /mnt/tbamd64/usr/lib64/mercury/lib/asm_fast.gc.trseg/libmcurses.so
  /mnt/tbamd64/usr/lib64/mercury/lib/hlc.gc/libcurs.so
  /mnt/tbamd64/usr/lib64/mercury/lib/hlc.gc/libmcurses.so
  /mnt/tbamd64/usr/lib64/mercury/lib/hlc.gc.trseg/libcurs.so
  /mnt/tbamd64/usr/lib64/mercury/lib/hlc.gc.trseg/libmcurses.so
  /mnt/tbamd64/usr/lib64/OpenFOAM/OpenFOAM-1.6/lib/libhoard.so
  /mnt/tbamd64/usr/lib64/thunderbird/mozilla-xremote-client
  /mnt/tbamd64/usr/lib64/thunderbird/plugin-container
  /mnt/tbamd64/usr/lib64/thunderbird/thunderbird
  /mnt/tbamd64/usr/lib64/thunderbird/thunderbird-bin

Now it is true that it’s possible, depending on the usage patterns, to achieve a much better allocation strategy than the default coming from GLIBC — on the other hand, I’m also pretty sure that GLIBC’s own allocation improved a lot in the past few years so I’d rather use the standard allocation than a custom one that is five or more years old. Again this could use some working around.

In the list above, Thunderbird and Firefox for sure use (and for whatever reason re-expose) jemalloc; I have no idea if libhoard in OpenFOAM is another memory management library (and whether OpenFOAM is bundling it or not), and Mercury is so messed up that I don’t want to ask myself what it’s doing there. There are though a bunch of standalone programs listed as well.

Action item: go through the standalone programs exposing the memory interfaces — some of them are likely to bundle one of the already-present memory libraries, so just make them use the system copy of it (so that improvements in the library trickle down to the program), for those that use custom strategies, consider making them optional, as I’d expect most not to be very useful to begin with.

There is another set of functions that are similar to the memory management functions, which is usually brought in by gnulib; these are convenience wrappers that do error checking over the standard functions — they are xmalloc and friends. A quick check, shows that these are exposed a bit too often:

Symbol xmemdup@ (64-bit UNIX - System V AMD x86-64) present 37 times
  liblftp-tasks
  libparted
  libpromises
  librec
  /mnt/tbamd64/usr/bin/csv2rec
  /mnt/tbamd64/usr/bin/dgawk
  /mnt/tbamd64/usr/bin/ekg2
  /mnt/tbamd64/usr/bin/gawk
  /mnt/tbamd64/usr/bin/gccxml_cc1plus
  /mnt/tbamd64/usr/bin/gdb
  /mnt/tbamd64/usr/bin/pgawk
  /mnt/tbamd64/usr/bin/rec2csv
  /mnt/tbamd64/usr/bin/recdel
  /mnt/tbamd64/usr/bin/recfix
  /mnt/tbamd64/usr/bin/recfmt
  /mnt/tbamd64/usr/bin/recinf
  /mnt/tbamd64/usr/bin/recins
  /mnt/tbamd64/usr/bin/recsel
  /mnt/tbamd64/usr/bin/recset
  /mnt/tbamd64/usr/lib64/lftp/4.4.2/liblftp-network.so
  /mnt/tbamd64/usr/lib64/libgettextlib-0.18.2.so
  /mnt/tbamd64/usr/lib64/man-db/libman-2.6.3.so
  /mnt/tbamd64/usr/libexec/gcc/x86_64-pc-linux-gnu/4.6.3/cc1
  /mnt/tbamd64/usr/libexec/gcc/x86_64-pc-linux-gnu/4.6.3/cc1obj
  /mnt/tbamd64/usr/libexec/gcc/x86_64-pc-linux-gnu/4.6.3/cc1plus
  /mnt/tbamd64/usr/libexec/gcc/x86_64-pc-linux-gnu/4.6.3/f951
  /mnt/tbamd64/usr/libexec/gcc/x86_64-pc-linux-gnu/4.6.3/jc1
  /mnt/tbamd64/usr/libexec/gcc/x86_64-pc-linux-gnu/4.6.3/lto1
  /mnt/tbamd64/usr/libexec/gcc/x86_64-pc-linux-gnu/4.7.2/cc1
  /mnt/tbamd64/usr/libexec/gcc/x86_64-pc-linux-gnu/4.7.2/cc1obj
  /mnt/tbamd64/usr/libexec/gcc/x86_64-pc-linux-gnu/4.7.2/cc1plus
  /mnt/tbamd64/usr/libexec/gcc/x86_64-pc-linux-gnu/4.7.2/f951
  /mnt/tbamd64/usr/libexec/gcc/x86_64-pc-linux-gnu/4.7.2/jc1
  /mnt/tbamd64/usr/libexec/gcc/x86_64-pc-linux-gnu/4.7.2/lto1
  /mnt/tbamd64/usr/libexec/gnat-gcc/x86_64-pc-linux-gnu/4.5/cc1
  /mnt/tbamd64/usr/libexec/gnat-gcc/x86_64-pc-linux-gnu/4.5/gnat1
  /mnt/tbamd64/usr/libexec/gnat-gcc/x86_64-pc-linux-gnu/4.5/lto1

In this case they are exposed even by the GCC tools themselves! While this brings me again to complain that gnulib show now actually be libgnucompat and be dynamically linked, there is little we can do about these in programs — but the symbols should not creep in system libraries (mandb has the symbols in its private library which is marginally better).

Action item: check the libraries exposing the gnulib symbols, and make them expose only their proper interface, rather than every single symbol they come up with.

I suppose that this is already quite a bit of data for a single blog post — if you want a copy of the symbols’ index to start working on some of the action items I listed, just contact me and I’ll send it to you, it’s a big too big to just have it published as is.

Postmortem of a patch, or how do you find what changed?

Two days ago, Luca asked me to help him figure out what’s going on with a patch for libav which he knew to be the right thing, but was acting up in a fashion he didn’t understand: on his computer, it increased the size of the final shared object by 80KiB — while this number is certainly not outlandish for a library such as libavcodec, it does seem odd at a first glance that a patch removing source code is increasing the final size of the executable code.

My first wild guess which (spoiler alert) turned out to be right, was that removing branches out of the functions let GCC optimize the function further and decide to inline it. But how to actually be sure? It’s time to get the right tools for the job: dev-ruby/ruby-elf, dev-util/dwarves and sys-devel/binutils enter the battlefield.

We’ve built libav with and without the patch on my server, and then rbelf-size told us more or less the same story:

% rbelf-size --diff libav-{pre,post}/avconv
        exec         data       rodata        relro          bss     overhead    allocated   filename
     6286266       170112      2093445       138872      5741920       105740     14536355   libav-pre/avconv
      +19456           +0         -592           +0           +0           +0       +18864 

Yes there’s a bug in the command, I noticed. So there is a total increase of around 20KiB, where is it split? Given this is a build that includes debug info, it’s easy to find it through codiff:

% codiff -f libav-{pre,post}/avconv
[snip]

libavcodec/dsputil.c:
  avg_no_rnd_pixels8_9_c    | -163
  avg_no_rnd_pixels8_10_c   | -163
  avg_no_rnd_pixels8_8_c    | -158
  avg_h264_qpel16_mc03_10_c | +4338
  avg_h264_qpel16_mc01_10_c | +4336
  avg_h264_qpel16_mc11_10_c | +4330
  avg_h264_qpel16_mc31_10_c | +4330
  ff_dsputil_init           | +4390
 8 functions changed, 21724 bytes added, 484 bytes removed, diff: +21240

[snip]

If you wonder why it’s adding more code than we expected, it’s because there are other places where functions have been deleted by the patch, causing some reductions in other places. Now we know that the three functions that the patch deleted did remove some code, but five other functions added 4KiB each. It’s time to find out why.

A common way to do this is to generate the assembly file (which GCC usually does not represent explicitly) to compare the two — due to the size of the dsputil translation unit, this turned out to be completely pointless — just the changes in the jump labels cause the whole file to be rewritten. So we rely instead on objdump, which allows us to get a full disassembly of the executable section of the object file:

% objdump -d libav-pre/libavcodec/dsputil.o > dsputil-pre.s
% objdump -d libav-post/libavcodec/dsputil.o > dsputil-post.s
% diff -u dsputil-{pre,post}.s | diffstat
 unknown |245013 ++++++++++++++++++++++++++++++++--------------------------------
 1 file changed, 125163 insertions(+), 119850 deletions(-)

As you can see, trying a diff between these two files is going to be pointless, first of all because of the size of the disassembled files, and secondarily because each instruction has its address-offset prefixed, which means that every single line will be different. So what to do? Well, first of all it’s useful to just isolate one of the functions so that we reduce the scope of the changes to check — I found out that there is a nice way to do so, and it involves relying on the way the function is declared in the file:

% fgrep -A3 avg_h264_qpel16_mc03_10_c dsputil-pre.s
00000000000430f0 :
   430f0:       41 54                   push   %r12
   430f2:       49 89 fc                mov    %rdi,%r12
   430f5:       55                      push   %rbp
--
[snip]

While it takes a while to come up with the correct syntax, it’s a simple sed command that can get you the data you need:

% sed -n -e '/ dsputil-func-pre.s
% sed -n -e '/ dsputil-func-post.s
% diff -u dsputil-func-{pre,post}.s | diffstat
 dsputil-func-post.s | 1430 ++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 1376 insertions(+), 54 deletions(-)

Okay that’s much better — but it’s still a lot of code to sift through, can’t we reduce it further? Well, actually… yes. My original guess was that some function was inlined; so let’s check for that. If a function is not inlined, it has to be called, the instruction for which, in this context, is callq. So let’s check if there are changes in the calls that happen:

% diff -u =(fgrep callq dsputil-func-pre.s) =(fgrep callq dsputil-func-post.s)
--- /tmp/zsh-flamehIkyD2        2013-01-24 05:53:33.880785706 -0800
+++ /tmp/zsh-flamebZp6ts        2013-01-24 05:53:33.883785509 -0800
@@ -1,7 +1,6 @@
-       e8 fc 71 fc ff          callq  a390 
-       e8 e5 71 fc ff          callq  a390 
-       e8 c6 71 fc ff          callq  a390 
-       e8 a7 71 fc ff          callq  a390 
-       e8 cd 40 fc ff          callq  72e0 
-       e8 a3 40 fc ff          callq  72e0 
-       e8 00 00 00 00          callq  43261 
+       e8 00 00 00 00          callq  8e670 
+       e8 71 bc f7 ff          callq  a390 
+       e8 52 bc f7 ff          callq  a390 
+       e8 33 bc f7 ff          callq  a390 
+       e8 14 bc f7 ff          callq  a390 
+       e8 00 00 00 00          callq  8f8d3 

Yes, I do use zsh — on the other hand, now that I look at the code above I note that there’s a bug: it does not respect $TMPDIR as it should have used /tmp/.private/flame as base path, dang!

So the quick check shows that avg_pixels8_l2_10 is no longer called — but does that account for the whole size? Let’s see if it changed:

% nm -S libav-{pre,post}/libavcodec/dsputil.o | fgrep avg_pixels8_l2_10
00000000000072e0 0000000000000112 t avg_pixels8_l2_10
00000000000072e0 0000000000000112 t avg_pixels8_l2_10

The size is the same and it’s 274 bytes. The increase is 4330 bytes, which is around 15 times more than the size of the single function — what does that mean then? Well, a quick look around shows this piece of code:

        41 b9 20 00 00 00       mov    $0x20,%r9d
        41 b8 20 00 00 00       mov    $0x20,%r8d
        89 d9                   mov    %ebx,%ecx
        4c 89 e7                mov    %r12,%rdi
        c7 04 24 10 00 00 00    movl   $0x10,(%rsp)
        e8 cd 40 fc ff          callq  72e0 
        48 8d b4 24 80 00 00    lea    0x80(%rsp),%rsi
        00 
        49 8d 7c 24 10          lea    0x10(%r12),%rdi
        41 b9 20 00 00 00       mov    $0x20,%r9d
        41 b8 20 00 00 00       mov    $0x20,%r8d
        89 d9                   mov    %ebx,%ecx
        48 89 ea                mov    %rbp,%rdx
        c7 04 24 10 00 00 00    movl   $0x10,(%rsp)
        e8 a3 40 fc ff          callq  72e0 
        48 8b 84 24 b8 04 00    mov    0x4b8(%rsp),%rax
        00 
        64 48 33 04 25 28 00    xor    %fs:0x28,%rax
        00 00 
        75 0c                   jne    4325c 

This is just a fragment but you can see that there are two calls to the function, followed by a pair of xor and jne instructions — which is the basic harness of a loop. Which means the function gets called multiple times. Knowing that this function is involved in 10-bit processing, it becomes likely that the function gets called twice per bit, or something along those lines — remove the call overhead (as the function is inlined) and you can see how twenty copies of that small function per caller account for the 4KiB.

So my guess was right, but incomplete: GCC not only inlined the function, but it also unrolled the loop, probably doing constant propagation in the process.

Is this it? Almost — the next step was to get some benchmark data when using the code, which was mostly Luca’s work (and I have next to no info on how he did that, to be entirely honest); the results on my server has been inconclusive, as the 2% loss that he originally registered was gone in further testing and would, anyway, be vastly within margin of error of a non-dedicated system — no we weren’t using full-blown profiling tools for that.

While we don’t have any sound numbers about it, what we’re worried about is for cache-starved architectures, such as Intel Atom, where the unrolling and inlining can easily cause performance loss, rather than gain — which is why all us developers facepalm in front of people using -funroll-all-loops and similar. I guess we’ll have to find an Atom system to do this kind of runs on…

Symbolism and ELF files (or, what does -Bsymbolic do?)

Julian asked by mail – uh.. last month! okay I have a long backlog – what the -Bsymbolic and -Bsymbolic-functions options to the GNU linker do. The answer is not extremely complicated but it calls for some explanation on how function calls in Unix work. I say in Unix, because there are a few differences in how OS X and Windows behave if I recall correctly, and I’m definitely not an expert on those. I wish I was, then I would be able to work on a book to replace Levine’s Linkers and Loaders.

PLT and GOT tables diagram

Please don’t complain about the very bad drawing above, it’s just going to illustrate what’s going on, I did it on my iPad with a capacitive stylus. I’ll probably try to do a few more of these, since I don’t have my usual Intuos tablet, and I won’t have it until I’ll find my place in London.

You see, the whole issue of linking in Unix is implemented with a long list of tables: the symbol table, the procedure linking table (PLT) and the global offset table (GOT). All objects involved in a dynamic linking chain (executables and shared objects, ET_EXEC and ET_DYN) posses a symbol table, which mixes defined (exported) and undefined (requested) symbols. Objects that are exporting symbols (ET_DYN and ET_EXEC with plugins, callbacks or simply badly designed) posses a PLT, and PIC objects (most ET_DYN, with the exception of some x86 prebuilt objects, and PIE ET_EXEC) posses GOTs.

The GOT and the text section

Let’s start from the bottom, that is the GOT, or actually before the GOT itself to the executable code itself. For what ELF is concerned, by default (there are a number of options that change this but I don’t want to go there for now), data and functions sections are completely opaque. Access to functions and data has to be done through start addresses; for non-PIC objects, these are absolute addresses, as the objects are assumed to be loaded always at the same position; when using position-independent code (PIC), like the name hints, this position has to be ignored, so the data or function position has to be derived by using offsets from the load address for the object — when using non-static load addresses, and non-PIC objects, you actually have to patch the code to use the new full address, which is what causes a text relocation (TEXTREL), which requires the ability to write to the executable segments, which is an obvious security issue.

So here’s where the global offset table enters the scene. Whenever you’re referring to a particular data or function, you have an offset from the start of the containing section. This makes it possible for that section to be loaded at different addresses, and yet keep the code untouched. (Do note I’m actually simplifying a lot the whole thing, but I don’t want to go too much into details because half the people reading wouldn’t understand what I’m saying otherwise.)

But the GOT is only used when the data or function is guaranteed to be in the same object. If you’re not using any special option to either compiler or linker, this means only static symbols are addressed directly in the GOT. Everything else is accessed through the object’s PLT, in which all the functions that the object calls are added. This PLT has then code to ask the dynamic loader what address the given symbol is defined at.

Global and local symbol tables

To answer that question, the dynamic loader had to have a global table which resolve symbol names to addresses. This is basically a global PLT from a point of view. Depending on some settings in the objects, in the environment or in the loader itself, this table can be populated right when the application is being executed, or only when the symbols are requested. For simplicity, I’ll assume that what’s happening is the former, as otherwise we’ll end up in details that have nothing to do with what we were discussing to begin with. Furthermore there is a different complication added by the modern GNU loader, which introduced the indirect binding.. it’s a huge headache.

While the same symbol name might have multiple entries in the various objects’ symbol tables, because more than one object is exporting the same symbol, in the resolved table each symbol name has exactly one address, which is found by reading the objects’ GOTs. This means that the loader has to solve in some way the collisions that happen when multiple objects export the same symbol. And also, it means that there is no guarantee by default that an object that both exports and calls a given symbol is going to call its own copy.

Let me try to underline that point: symbols that are exported are added to the symbol table of an object; symbols that are called are added to the symbol table as undefined (if they are not there already) and they are added to the procedure linking table (which then finds the position via its own offset table). By default, with no special options, as I said, only static functions are called directly from the object’s global offset table, everything else is called through the PLT, and thus through the linker’s table of resolved symbols. This is actually what drives symbol interposing (which is used by LD_PRELOAD libraries), and what caused ancient xine’s problems which steered me to look into ELF itself.

Okay, I’m almost at -Bsymbolic!

As my post about xine shows, there are situations where going through the PLT is not the desired behaviour, as you want to ensure that an object calls its own copy of any given symbol that is defined within itself. You can do that in many ways; the simplest possible of options, is not to expose those symbols at all. As I said with default options, only static functions are called straight through the GOT, but this can be easily extended to functions that are not exposed, which can be done either by marking the symbols as hidden (happens at compile time), or by using a linker script to only expose a limited set of symbols (happens at link time).

This is logical: the moment when the symbols are no longer exported by the object, the dynamic loader has no way to answer for the PLT, which means the only option you have is to use the GOT directly.

But sometimes you have to expose the symbols, and at the same time you want to make sure that you call your own copy and not any other interposed copy of those symbols. How do you do that? That’s where -Bsymbolic and -Bsymbolic-functions options come into play. What they do is duplicate the GOT entries for the symbols that are both called and defined in a shared object: the loader points to one, but the object itself points to the other. This way, it’ll always call its own copy. An almost identical solution is applied, just at compile-time rather than link-time, when you use protected visibility (instead of default or hidden).

Unfortunately this make a small change in the semantics we’re used to: since the way the symbols are calculated varies depending on whether you’re referring to the symbol from within or outside the object, pointers taken without and outside will no longer match. While for most libraries this is not a problem, there are some cases where it really is. For instance in xine we hit a problem with the special memcpy() function implementation: it was a weak symbol, so simply a pointer to the actual function, which was being set within the libxine object… but would have been left unset for the external objects, including the plugins, for which it would still have been a NULL.

Comparing function symbols is a rare corner case, but comparing objects’ addresses is common enough, if you’re trying to see if a default, global object is being passed to your function instead of a custom one… in that case, having the addresses no longer matching is a big pain. Which is basically why you have -Bsymbolic-functions — it’s exactly like -Bsymbolic but limits itself to the functions, whereas the objects are still handled like if no option was passed. It’s a compromise to make it easier to not go through the PLT for everything, while not breaking so much code (it would then only break on corner cases like xine’s memcpy()).

By the way, if it’s not obvious, the use of symbolic resolution is not only about making sure that the objects know which function they are calling, it’s also a performance improvement, as it avoids a virtual round-trip to the dynamic loader, and a lookup of where the symbol is actually defined. This is minor for most functions, but it can be visible if there are many many functions that are being called. Of course it shouldn’t amke much of a difference if the loader is good enough, but that’s also a completely different story. As is the option for the compiler to emit two copies of a given function, to avoid doing the full preamble when called within the object. And again for what concerns link-time optimization, which is connected to, but indirectly, with what I’ve discussed above.

Oh and if it wasn’t clear from what I wrote you should not ever use the -Bsymbolic flag in your LDFLAGS variable in Gentoo. It’s not a flag you should mock with, only upstream developers should care about it.

Is x32 for me? Short answer: no.

For the long answer keep reading.

So thanks to Anthony and the PaX team yesterday I was able to set up the first x32 testing system — of which I lamented last week I was unable to get a hold of. The reason why it wasn’t working with LXC was actually quite simple: the RANDMMAP feature of the hardened kernel, responsible for the stronger ASLR was changing the load address of x32 binaries to be outside of the 32-bit range of the x32 pointers. Version 3.4.3 solves the issue finally and so I could set it up properly.

The first was the hard step: trying out libav. This is definitely an interesting and valuable test, simply because libav has so many hand-crafted assembly routines that it really shouldn’t suffer much from the otherwise slower amd64 ABI, and at the same time, Måns was sure that it wouldn’t have worked out of the box — which is indeed the case. From one side YASM still doesn’t support the new ABI which means that everything relying on it both in libav, x264 and other projects won’t work; from the other side the inline asm (which is handled by GCC) is now a completely different “third way” from the usual 32-bit x86 and the 64-bit amd64.

While I was considering setting up a tbx32 instance to see what would actually work, the answer right now is “way too little”; heck even Ruby 1.9 doesn’t work right now because it’s using some inline asm that is no longer working.

More interesting is that the usual way to discern which architecture one’s in are going to fail, badly:

  • sizeof(long) and sizeof(void*) are both 4 (which means that both types are 32-bit), like on x86;
  • __x86_64__ is defined just like on amd64 and there is no define specific for x32; the best you can do is to check for __x86_64__ and __SIZEOF_LONG__ == 4 __ILP32__ at the same time — edit: yes I knew that there had to be one define more specific than that, I just didn’t bother to look it up before; the point of needing two checks still stands, mmkay?

What does this mean? It simply means that considering it took us years to have a working amd64 system, which was in many ways a “pure” new architecture, which could be easily discerned from the older x86, we’re going to spend some more years trying to get x32 working … and all for what? To have a smaller address range and thus smaller pointers, to save on memory usage and memory bandwidth … by the time x32 will be ready, I’d be ready to bet that neither concerns will be that important — heck I don’t have a single computer with less than 8GB of RAM, right now!

It might be more interesting to know why is x32 so important for enough people to work on it; to me, it seems like the main reason is that it saves a lot of memory on C++ programs, simply because every class and every object has so many pointers (functions, virtual functions and so on so forth), that the change from 32 to 64 bit has been a hit big enough. Given that there is so much software still written in C++ (I’m unconvinced as to why to be honest), it’s likely that there is enough interest in working on this to improve the performance.

But at the end of the day, I’m really concerned that this might not be worth the effort: we’re calling unto us another big problem with this kind of multilib (considering we never really solved multilib for the “classic” amd64, and that Debian is still working hard to get their multiarch method out of the door), plus a number of software porting problems that will keep a lot of people busy for the next years. The efforts would probably be better directed at improving the current software and moving everything to pure 64-bit.

On a final note, of course libav works if you disable all the hand-written assembly code, as everything is also available in pure C. But many routines can be as much as 10 or more times slower in pure C than using AVX, for instance, which means that even if you have a noticeable improvement going from amd64 to x32, you’re going to lose more by losing the assembly.

My opinion? It’s not worth it.

Choosing an MD5 implementation

_Yes this post might give you a sense of deja vu read but I’m trying to be more informative than ranty, if I can, over two years and a half after the original post…

Yesterday I ranted about gnulib and in particular about the hundred and some copies of the MD5 algorithm that it brings with it. Admittedly, the numbers seem more sensational than they would if you were to count the packages involved, as most of the copies come from GNU octave and another bunch come from GCC and so on so forth.

What definitely surprises me, though, is the way that people rely on said MD5 implementation, like there was no other available. I mean, I can understand GCC not wanting to add any more dependencies that could make it ABI-dependent – just look at the recent, and infamous, gmp bump – but did GNU forget that it has its own hash library ?

There are already enough MD5 implementations out there to fill a long list, so how do you choose which one to use, rather than add a new one onto your set? Well, that’s a tricky question. As far as I can tell, the most prominent factors in choosing an implementation for hash algorithms are non-technical:

Licensing is likely to be the most common issue: relying on OpenSSL’s libcrypto means that you rely on a software the license of which has been deemed enough incompatible with GPL-2 that there is a documented SSL exception that is used by projects using the GNU General Public License together with those libraries. This is the reason why libgcrypt exists, for the most part, but this continues GNU’s series of “let’s reinvent the wheel, and then reinvent it, and again”, GnuTLS (which is supposed to be a replacement to OpenSSL itself) also provides its own implementation. Great.

Availability can be a factor as well: software designed for BSD systems – such as libarchive – will make use of the libcrypto-style interface just fine; the reason is that at least FreeBSD (and I think NetBSD as well) provides those functions in its standard libraries set, making it the obvious available implementation (I wish we had something like that). Adding dependencies on a software is usually a problem, and that’s why gnulib’s used often times (being imported in the project’s sources, it adds no further dependency). So if your average system configuration already contains an acceptable implementation, then that’s what you should go for.

Given that all the interfaces are almost identical one to the other with the exception of the names and structure, and that their actual implementation has to follow the standard to make sense, the lack of many technical reasons in the prominent factors for choosing one library over another is generally understandable. So how should one proceed to choose which one to use? Well, I have some general rule of thumbs that could be useful.

The first is to use something that you already have available or you’re using already in your project: OpenSSL’s libcrypto, GLIB and libav/ffmpeg all provide an implementation of most hash functions. If you already rely on them for some of your needs, simply use their interfaces rather than looking for new ones. If there are bugs in those, or the interface is not good enough, or the performances are not as expected, try to get those fixed instead of cooking your own implementation.

If you are not using any other implementation already, then try to look at libgcrypt first; the reason why I suggest this is because it’s a common library (it is required among others by GnuPG), implementing all the main hashing algorithms, it’s LGPL so you have very few license concerns, and it doesn’t implement any other common interface, so you don’t risk symbol collision issues, as you would if you were to use a libcrypto-compatible library, and at the same time bring in anything else that used OpenSSL — I have seen that happening with Drizzle a longish time ago.

If you’re a contributor to one of the projects that use gnulib’s md5 code… please consider porting it to libgcrypt, all distributors will likely thank you for that!

Are -g options really safe?

Tonight feels like a night after a very long day. But it was just half a day spent on trying to find the end of a bug saga that started about a month ago for me.

It starts like this: postgresql-server started failing; the link editor – BFD-based ld – reported that one of the static archives installed by postgresql-base didn’t have a proper index, which should have been generated by ranlib. But simply executing ranlib on said file didn’t solve the problem.

I originally blamed the build system of PostgreSQL, but when yesterday I launched an emerge -e world to rebuild everything with GCC 4.6, another package failed in the same way: lvm2, linking to /usr/lib64/libudev.a — since I know the udev build system very well, almost like I wrote it myself, I trusted that the archive was built correctly, so it was time to look at what the real problem was.

After poking around a bit, I found that binutils’s nm, objdump and at that point even ld refused to display information for some relocated objects (ET_REL files). This would have made it very difficult to debug the issue if not for two things: first, eu-nm could see the file just fine, and second, my own home-cooked nm.rb tool that I wrote to test Ruby-Elf reported issues with the file — but without exploding.

flame@yamato mytmpfs % nm dlopen.o
nm: dlopen.o: Bad value
flame@yamato mytmpfs % eu-nm -B dlopen.o 
0000000000000000 n .LC0
                 U _GLOBAL_OFFSET_TABLE_
                 U __stack_chk_fail
                 U dlclose
                 U dlerror
                 U dlopen
00000000000001a0 T dlopen_LTX_get_vtable
                 U dlsym
                 U lt__error_string
                 U lt__set_last_error
                 U lt__zalloc
0000000000000000 t vl_exit
00000000000000a0 t vm_close
0000000000000100 t vm_open
0000000000000040 t vm_sym
0000000000000000 d vtable
0000000000000000 n wt.1af52e75450527ed
0000000000000000 n wt.2e36542536402b38
0000000000000000 n wt.32ec40f73319dfa8
0000000000000000 n wt.442ae951f162d46e
0000000000000000 n wt.90e079bbb773abcb
0000000000000000 n wt.ac43b6ac10ce5688

I don’t have the original output from my tool since I have since fixed it, but the issues were related, as you can guess from that output, to the various wt. symbols at the end of the list. Where do they come from? What does the ‘n’ symbol they are coded with mean? And why is BFD failing to deal with them? I set out to find those answers with, well, more than a hunch of what the problem would turn out to be.

So what are those symbols? Google doesn’t help at all here since searching for “wt”, even enclosed in double quotes, turns up only results for “weight”. Yes, I know it is a way to shorten that word, but what the heck, I’m looking for a certain string! The answer, actually is simple: they are additional debug symbols that are added by -gdwarf-4, which is used to include the latest DWARF format revision. This was implemented in GCC 4.6 and is supposed to reduce the size of the debug information, which is generally positive, and include more debug information.

Turns out that libbfd (the library that implements all the low level access for nm, ld and the other utilities) doesn’t like those symbols, not sure if it’s the sections they are defined on, their type (which is set to STT_NONE), or what else, but it doesn’t like them at all. Interestingly enough, this does not happen with final executables and dynamic libraries, which makes it at least bearable: only less then 40 packages had to be rebuilt on my system because they had broken static objects; unfortunately one of those was LibreOffice, d’oh!

Now, let’s look back at the nm issue though: when I started writing Ruby-Elf, I decided not to reimplement the whole suite of ELF tools, since there are already quite a few implementations of those out there. But I did write a nm tool to debug my own issues — it also worked quite nicely, because implementing access to the codes used by nm allowed me to use the same output in my elfgrep tool to show the results. This implementation, that was actually never ported to the tools framework I wrote for Ruby-Elf, didn’t get installed, and it was just part of the repository for my own good.

But after noticing that I’m more resilient than binutils’s version, and it produced more interesting output than elfutils’s, I decided to rework it and make it available as rbelf-nm, writing a man page, and listing the codes for the various symbol kinds. But before all this, I also rewrote the function of code choice. Before, it relied on binding types and then on section names to produce the right code; now it relies on the symbol type, binding types, and sections’ flags and type, making it as resilient as elfutils’s, and as informative as binutils’s, up to what I encountered right now.

I also released a new version (1.0.6.1, don’t ask!) that includes the new tool, and it is already on RubyGems and Portage if you wish to use it. Please remember that the project has a Flattr page so if you like the project, your support is definitely welcome.

Looking for symbols? elfgrep to the rescue!

About three years after starting my work on Ruby-Elf I finally implemented one of the scripts I wanted to write for the longest tile: elfgrep. At the name implies it’s a tool with a grep-like interface to look up symbols defined and used in ELF files.

I have avoided writing it for a long time because scanelf (part of pax-utils) implements already a similar, but definitely not identical, feature through the -gs options. The main feature missing in scanelf is the ability to look for multiple symbols at once: it does allow you to specify multiple symbols, but then again it only prints the first one found, rather than all of them.

The other night, mru from FFmpeg suggested me another limitation of scanelf: it cannot be used to look for symbols depending on their version information (for GNU systems). So I finally decided to start writing my own. Thankfully, Ruby-Elf was designed to be easy to extend, if anything, so the original implementation to do the job it was aimed for only required 83 lines of Ruby code, including my license header.

Right now the implementation is a bit more complex, and so it has more lines of code, but it implements a number of switches analogue to those in grep itself, that makes it a very flexible tool to find both definitions and uses of symbols: you can either look for the library defining a given symbol or the objects making use of those; you can get the type of symbols (it has an output similar to nm(1)), or you can simply list the files that matched or that didn’t match. You can also count symbols, without having to go through wc -l thanks to the -c option, and the list output is suitable to use with xargs -0 as well.

Most of the time, when analysing the output of a library, I end up having to do something like nm | grep; this unfortunately doesn’t work that well when you have multiple files, as you lose sight of the file that actually hits; elfgrep solves this just fine as it prefixes the file’s path to the nm-like output, which makes it terrific to identify which object file exports a given symbol, for instance.

All in all, I’m very very happy at how elfgrep turned out to be, so I’ll likely try to make a release of ruby-elf soonish; but to do so I have to make it a Ruby Gem, just for the sake of ease of distribution; I’ll look at it in the next week or so. In the mean time you can find the sources on the project’s page and on my overlay you find an ebuild that installs it from Git until I make a release (I’ll package it in main tree as soon as it is!).

If you have any particular comment, patch, request, or anything like that, feel free to send me an email, you find the references above.

Your worst enemy: undefined symbols

What ties in reckless glibc unmasking GTK+ 2.20 issues Ruby 1.9 porting and --as-needed failures all together? Okay the title is a dead giveaway for the answer: undefined symbols.

Before deepening within the topic I first have to tell you about symbols I guess; and to do so, and to continue further, I’ll be using C as the base language for everyone of my notes. When considering C, then, a symbol is any function or data (constant or variable) that is declared extern; that is anything that is neither static or defined in the same translation unit (that is, source file, most of the time).

Now, what nm shows as undefined (U code) is not really what we’re concerned about; for object files (.o, just intermediate) will report undefined symbols for any function or data element used that is not in the same translation unit; most of those get resolved at the time all the object files get linked in to form a final shared object or executable — actually, it’s a lot more complex than this, but since I don’t care about describing here symbolic resolution, please accept it like it was true.

The remaining symbols will be keeping the U code in the shared object or executable, but most of them won’t concern us: they will be loaded from the linked libraries, when the dynamic loader actually resolve them. So for instance, the executable built from the following source code, will have the printf symbol “undefined” (for nm), but it’ll be resolved by the dynamic linker just fine:

int main() {
  printf("Hello, world!");
}

I have explicitly avoided using the fprintf function, mostly because that would require a further undefined symbol, so…

Why do I say that undefined symbols are our worst enemy? Well, the problem is actually with undefined, unresolved symbols after the loader had its way. These are either symbols for functions and data that is not really defined, or is defined in libraries that are not linked in. The former case is what you get with most of the new-version compatibility problems (glibc, gtk, ruby); the latter is what you get with --as-needed.

Now, if you have a bit of practice with development and writing simple commands, you’d be now wondering why is this a kind of problem; if you were to mistype the function above into priltf – a symbol that does not exist, at least in the basic C library – the compiler will refuse to create an executable at all, even if the implicit declaration was only treated as a warning, because the symbol is, well, not defined. But this rule only applies, by default, to final executables, not to shared objects (shared libraries, dynamic libraries, .so, .dll or .dylib files).

For shared objects, you have to explicitly ask to refuse linking them with undefined reference, otherwise they are linked just fine, with no warning, no error, no bothering at all. The way you can tell the linker to refuse that kind of linkage is passing the -Wl,--no-undefined flag; this way if there is even a single symbol that is not defined in the current library or any of its dependencies the linker will refuse to complete the link. Unfortunately, using this by default is not going to work that well.

There are indeed some more or less good reasons to allow shared objects to have undefined symbols, and here come a few:

Multiple ABI-compatible libraries: okay this is a very far-fetched one, simply for the difficulty to have ABI-compatible libraries (it’s difficult enough to have them API-compatible!), but it happens; for instance on FreeBSD you – at least used to – have a few different implementations of the threading libraries, and have more or less the same situation for multiple OpenGL and mathematical libraries; the idea behind this is actually quite simply; if you have libA1 and libA2 providing the symbols, then libB linking to libA1, and libC linking to libA2, an executable foo linking to libB and libC would get both libraries linked together, and creating nasty symbol collisions.

Nowadays, FreeBSD handles this through a libmap.conf file that allows to link always the same library, but then switch at load-time with a different one; a similar approach is taken by things like libgssglue that allows to switch the GSSAPI implementation (which might be either of Kerberos or SPKM) with a configuration file. On Linux, beside this custom implementation, or hacks such as that used by Gentoo (eselect opengl) to handle the switch between different OpenGL implementations, there seem to be no interest in tackling the problem at the root. Indeed, I complained about that when --as-needed was softened to allow this situation although I guess it at least removed one common complain about adopting the option by default.

Plug-ins hosted by a standard executable: plug-ins are, generally speaking, shared objects; and with the exception of the most trivial plugins, whose output is only defined in terms of their input, they use functions that are provided by the software they plug. When they are hosted (loaded and used from) by a library, such as libxine, they are linked back to the library itself, and that makes sure that the symbols are known at the time of creating the plugin object. On the other hand, when the plug-ins are hosted by some software that is not a shared object (which is the case of, say, zsh), then you have no way to link them back, and the linker has no way to discern between undefined symbols that will be lifted from the host program, and those that are bad, and simply undefined.

Plug-ins providing symbols for other plug-ins : here you have a perfect example in the Ruby-GTK2 bindings; when I first introduced --no-undefined in the Gentoo packaging of Ruby (1.9 initially, nowadays all the three C-based implementations have the same flag passed on), we got reports of non-Portage users of Ruby-GTK2 having build failures. The reason? Since all the GObject-derived interfaces had to share the same tables and lists, the solution they chose was to export an interface, unrelated to the Ruby-extension interface (which is actually composed of a single function, bless them!), that the other extensions use; since you cannot reliably link modules one with the other, they don’t link to them and you get the usual problem of not distinguish between expected and unexpected undefined symbols.

Note: this particular case is not tremendously common; when loading plug-ins with dlopen() the default is to use the RTLD_LOCAL option, which means that the symbols are only available to the branch of libraries loaded together with that library or with explicit calls to dlsym(); this is a good thing because it reduces the chances of symbol collisions, and unexpected linking consequences. On the other hand, Ruby itself seems to go all the way against the common idea of safety: they require RTLD_GLOBAL (register all symbols in the global procedure linking table, so that they are available to be loaded at any point in the whole tree), and also require RTLD_LAZY, which makes it more troublesome if there are missing symbols — I’ll get later to what lazy bindings are.

Finally, the last case I can think of where there is at least some sense into all of this trouble, is reciprocating libraries, such as those in PulseAudio. In this situation, you have two libraries, each using symbols from one another. Since you need the other to fully link the one, but you need the one to link the other, you cannot exit the deadlock with --no-undefined turned on. This, and the executable-plugins-host, are the only two reasons that I find valid for not using --no-undefined by default — but unfortunately are not the only two used.

So, what about that lazy stuff? Well, the dynamic loader has to perform a “binding” of the undefined symbols to their definition; binding can happen in two modes, mainly: immediate (“now”) or lazy, the latter being the default. With lazy bindings, the loader will not try to find the definition to bind to the symbol until it’s actually needed (so until the function is called, or the data is fetched or written to); with immediate bindings, the loader will iterate over all the undefined symbols of an object when it is loaded (eventually loading up the dependencies). As you might guess, if there are undefined, unresolved symbol, the two binding types have very different behaviours. An immediately-loaded executable will fail to start, and a loaded library would fail dlopen(); a lazily-loaded executable will start up fine, and abort as soon as a symbol is hit that cannot be resolved; and a library would simply make its host program abort at the same way. Guess what’s safer?

With all these catches and issues, you can see why undefined symbols are a particularly nasty situation to deal with. To the best of my knowledge, there isn’t a real way to post-mortem an object to make sure that all its symbols are defined. I started writing support for that in Ruby-Elf but the results weren’t really… good. Lacking that, I’m not sure how we can proceed.

It would be possible to simply change the default to be --no-undefined, and work around with --undefined the few that require the undefined symbols to be there (we decided to proceed that way with Ruby); but given the kind of support I’ve received before in my drastic decisions, I don’t expect enough people to help me tackle that anytime soon — and I don’t have the material time to work on that, as you might guess.