Symbolism and ELF files (or, what does -Bsymbolic do?)

Julian asked by mail – uh.. last month! okay I have a long backlog – what the -Bsymbolic and -Bsymbolic-functions options to the GNU linker do. The answer is not extremely complicated but it calls for some explanation on how function calls in Unix work. I say in Unix, because there are a few differences in how OS X and Windows behave if I recall correctly, and I’m definitely not an expert on those. I wish I was, then I would be able to work on a book to replace Levine’s Linkers and Loaders.

PLT and GOT tables diagram

Please don’t complain about the very bad drawing above, it’s just going to illustrate what’s going on, I did it on my iPad with a capacitive stylus. I’ll probably try to do a few more of these, since I don’t have my usual Intuos tablet, and I won’t have it until I’ll find my place in London.

You see, the whole issue of linking in Unix is implemented with a long list of tables: the symbol table, the procedure linking table (PLT) and the global offset table (GOT). All objects involved in a dynamic linking chain (executables and shared objects, ET_EXEC and ET_DYN) posses a symbol table, which mixes defined (exported) and undefined (requested) symbols. Objects that are exporting symbols (ET_DYN and ET_EXEC with plugins, callbacks or simply badly designed) posses a PLT, and PIC objects (most ET_DYN, with the exception of some x86 prebuilt objects, and PIE ET_EXEC) posses GOTs.

The GOT and the text section

Let’s start from the bottom, that is the GOT, or actually before the GOT itself to the executable code itself. For what ELF is concerned, by default (there are a number of options that change this but I don’t want to go there for now), data and functions sections are completely opaque. Access to functions and data has to be done through start addresses; for non-PIC objects, these are absolute addresses, as the objects are assumed to be loaded always at the same position; when using position-independent code (PIC), like the name hints, this position has to be ignored, so the data or function position has to be derived by using offsets from the load address for the object — when using non-static load addresses, and non-PIC objects, you actually have to patch the code to use the new full address, which is what causes a text relocation (TEXTREL), which requires the ability to write to the executable segments, which is an obvious security issue.

So here’s where the global offset table enters the scene. Whenever you’re referring to a particular data or function, you have an offset from the start of the containing section. This makes it possible for that section to be loaded at different addresses, and yet keep the code untouched. (Do note I’m actually simplifying a lot the whole thing, but I don’t want to go too much into details because half the people reading wouldn’t understand what I’m saying otherwise.)

But the GOT is only used when the data or function is guaranteed to be in the same object. If you’re not using any special option to either compiler or linker, this means only static symbols are addressed directly in the GOT. Everything else is accessed through the object’s PLT, in which all the functions that the object calls are added. This PLT has then code to ask the dynamic loader what address the given symbol is defined at.

Global and local symbol tables

To answer that question, the dynamic loader had to have a global table which resolve symbol names to addresses. This is basically a global PLT from a point of view. Depending on some settings in the objects, in the environment or in the loader itself, this table can be populated right when the application is being executed, or only when the symbols are requested. For simplicity, I’ll assume that what’s happening is the former, as otherwise we’ll end up in details that have nothing to do with what we were discussing to begin with. Furthermore there is a different complication added by the modern GNU loader, which introduced the indirect binding.. it’s a huge headache.

While the same symbol name might have multiple entries in the various objects’ symbol tables, because more than one object is exporting the same symbol, in the resolved table each symbol name has exactly one address, which is found by reading the objects’ GOTs. This means that the loader has to solve in some way the collisions that happen when multiple objects export the same symbol. And also, it means that there is no guarantee by default that an object that both exports and calls a given symbol is going to call its own copy.

Let me try to underline that point: symbols that are exported are added to the symbol table of an object; symbols that are called are added to the symbol table as undefined (if they are not there already) and they are added to the procedure linking table (which then finds the position via its own offset table). By default, with no special options, as I said, only static functions are called directly from the object’s global offset table, everything else is called through the PLT, and thus through the linker’s table of resolved symbols. This is actually what drives symbol interposing (which is used by LD_PRELOAD libraries), and what caused ancient xine’s problems which steered me to look into ELF itself.

Okay, I’m almost at -Bsymbolic!

As my post about xine shows, there are situations where going through the PLT is not the desired behaviour, as you want to ensure that an object calls its own copy of any given symbol that is defined within itself. You can do that in many ways; the simplest possible of options, is not to expose those symbols at all. As I said with default options, only static functions are called straight through the GOT, but this can be easily extended to functions that are not exposed, which can be done either by marking the symbols as hidden (happens at compile time), or by using a linker script to only expose a limited set of symbols (happens at link time).

This is logical: the moment when the symbols are no longer exported by the object, the dynamic loader has no way to answer for the PLT, which means the only option you have is to use the GOT directly.

But sometimes you have to expose the symbols, and at the same time you want to make sure that you call your own copy and not any other interposed copy of those symbols. How do you do that? That’s where -Bsymbolic and -Bsymbolic-functions options come into play. What they do is duplicate the GOT entries for the symbols that are both called and defined in a shared object: the loader points to one, but the object itself points to the other. This way, it’ll always call its own copy. An almost identical solution is applied, just at compile-time rather than link-time, when you use protected visibility (instead of default or hidden).

Unfortunately this make a small change in the semantics we’re used to: since the way the symbols are calculated varies depending on whether you’re referring to the symbol from within or outside the object, pointers taken without and outside will no longer match. While for most libraries this is not a problem, there are some cases where it really is. For instance in xine we hit a problem with the special memcpy() function implementation: it was a weak symbol, so simply a pointer to the actual function, which was being set within the libxine object… but would have been left unset for the external objects, including the plugins, for which it would still have been a NULL.

Comparing function symbols is a rare corner case, but comparing objects’ addresses is common enough, if you’re trying to see if a default, global object is being passed to your function instead of a custom one… in that case, having the addresses no longer matching is a big pain. Which is basically why you have -Bsymbolic-functions — it’s exactly like -Bsymbolic but limits itself to the functions, whereas the objects are still handled like if no option was passed. It’s a compromise to make it easier to not go through the PLT for everything, while not breaking so much code (it would then only break on corner cases like xine’s memcpy()).

By the way, if it’s not obvious, the use of symbolic resolution is not only about making sure that the objects know which function they are calling, it’s also a performance improvement, as it avoids a virtual round-trip to the dynamic loader, and a lookup of where the symbol is actually defined. This is minor for most functions, but it can be visible if there are many many functions that are being called. Of course it shouldn’t amke much of a difference if the loader is good enough, but that’s also a completely different story. As is the option for the compiler to emit two copies of a given function, to avoid doing the full preamble when called within the object. And again for what concerns link-time optimization, which is connected to, but indirectly, with what I’ve discussed above.

Oh and if it wasn’t clear from what I wrote you should not ever use the -Bsymbolic flag in your LDFLAGS variable in Gentoo. It’s not a flag you should mock with, only upstream developers should care about it.

Hide those symbols!

Last week I have written in passing about my old linking-collision script. Since then I restarted working on it and I have a few comments to give again.

First of all, you might have seen the 2008 GhostScript bug — this is a funny story; back in 2008 when I started working on finding and killing symbol collisions between libraries and programs, I filed a bug with GhostScript (the AFPL version), since it exported a symbol that was present, with the same name, in libXfont and libt1. I found that particularly critical since they aren’t libraries used in totally different applications, as they are all related to rendering data.

At the time, the upstream GS developer (who happens to be one of the Xiph developers, don’t I just love those guys?) asked me to provide him with a real-world crash. Since any synthetic testcase I could come up with would look contrived, I didn’t really want to spend time trying to come up with one. Instead I argued the semantic of the problem, explaining why, albeit theoretical at that point, the problem should have been solved. No avail, the bug was closed at the time with a threat that anyone reopening it would have its account removed.

Turns out in 2011 that there is a program that does link together both libgs and libt1: Evince. And it crashes when you try to render a DVI document (through libt1), containing an Encapsuled PostScript (EPS) image (rendered through GhostScript library). What a surprise! Even though the problem was known and one upstream developer (Henry Stiles) knows that the proper fix is using unique names for internal functions and data entries, the “solution” was limited to the one colliding symbol, leaving all the others to be found in the future to have problems. Oh well.

Interestingly, most packages don’t seem to care about their internal symbols, be them libraries or final binaries. On final binaries this is usually not much of a problem, as two binaries cannot collide with one another, but it doesn’t mean that the symbol couldn’t collide with another library — for this reason, the script now ignores symbols that collide only between executables, but keeps listing those colliding with at least one library.

Before moving on to how to hide those symbols, I’d like to point out that the Ruby-Elf project page has a Flattr button, while the sources are on Gitorious GitHub for those who are curious.

Update (2017-04-22): as you may know, Gitorious was acquired by GitLab in 2015 and turned down the service. So the project is now on GitHub. I also stopped using Flattr a long time ago.

You can now wonder how to hide the symbols; one way that I often suggest is to use the GCC-provided -fvisibility=hidden support — this is obviously not always an option as you might want to support older versions, or simply don’t want to start adding visibility markers to your library. Thankfully there are two other options you can make use of; one is to directly use the version script support from GNU ld (compatible with Sun’s, Apple’s and gold for what it’s worth); basically you can then declare something like:

{
  global:
    func1;
    func2;
    func3;
  local: *;
}

This way only the three named functions would be exported, and everything else will be hidden. While this option works quite nicely, it often sounds too cumbersome, mostly because version scripts are designed to allow setting multiple versions to the symbols as well. But that’s not the only option, at least if you’re using libtool.

In that case there are, once again, two separate options: one is to provide it with a list of exported symbols, similar to the one above, but with one-symbol-per-line (-export-symbols SYMBOL-FILE), the other is to provide a regular expression of symbols to export (-export-symbols-regex REGEX), so that you just have to name the symbols correctly to have them exported or not. This loses the advantage of multiple versions for symbols – but even that is a bit hairy so I won’t get there – but gains the advantage of working with generating Windows libraries as well, where you have to list the symbols to export.

I’d have to add here that hiding symbols for executables should also reduce their startup time, as the runtime loader (ld.so) doesn’t need to look up a long list of symbols when preparing the binary to be executed; the same goes for libraries. So in a utopia world where each library and program only exports its tiny, required list of symbols, the system should also be snappier. Think about it.

Your worst enemy: undefined symbols

What ties in reckless glibc unmasking GTK+ 2.20 issues Ruby 1.9 porting and --as-needed failures all together? Okay the title is a dead giveaway for the answer: undefined symbols.

Before deepening within the topic I first have to tell you about symbols I guess; and to do so, and to continue further, I’ll be using C as the base language for everyone of my notes. When considering C, then, a symbol is any function or data (constant or variable) that is declared extern; that is anything that is neither static or defined in the same translation unit (that is, source file, most of the time).

Now, what nm shows as undefined (U code) is not really what we’re concerned about; for object files (.o, just intermediate) will report undefined symbols for any function or data element used that is not in the same translation unit; most of those get resolved at the time all the object files get linked in to form a final shared object or executable — actually, it’s a lot more complex than this, but since I don’t care about describing here symbolic resolution, please accept it like it was true.

The remaining symbols will be keeping the U code in the shared object or executable, but most of them won’t concern us: they will be loaded from the linked libraries, when the dynamic loader actually resolve them. So for instance, the executable built from the following source code, will have the printf symbol “undefined” (for nm), but it’ll be resolved by the dynamic linker just fine:

int main() {
  printf("Hello, world!");
}

I have explicitly avoided using the fprintf function, mostly because that would require a further undefined symbol, so…

Why do I say that undefined symbols are our worst enemy? Well, the problem is actually with undefined, unresolved symbols after the loader had its way. These are either symbols for functions and data that is not really defined, or is defined in libraries that are not linked in. The former case is what you get with most of the new-version compatibility problems (glibc, gtk, ruby); the latter is what you get with --as-needed.

Now, if you have a bit of practice with development and writing simple commands, you’d be now wondering why is this a kind of problem; if you were to mistype the function above into priltf – a symbol that does not exist, at least in the basic C library – the compiler will refuse to create an executable at all, even if the implicit declaration was only treated as a warning, because the symbol is, well, not defined. But this rule only applies, by default, to final executables, not to shared objects (shared libraries, dynamic libraries, .so, .dll or .dylib files).

For shared objects, you have to explicitly ask to refuse linking them with undefined reference, otherwise they are linked just fine, with no warning, no error, no bothering at all. The way you can tell the linker to refuse that kind of linkage is passing the -Wl,--no-undefined flag; this way if there is even a single symbol that is not defined in the current library or any of its dependencies the linker will refuse to complete the link. Unfortunately, using this by default is not going to work that well.

There are indeed some more or less good reasons to allow shared objects to have undefined symbols, and here come a few:

Multiple ABI-compatible libraries: okay this is a very far-fetched one, simply for the difficulty to have ABI-compatible libraries (it’s difficult enough to have them API-compatible!), but it happens; for instance on FreeBSD you – at least used to – have a few different implementations of the threading libraries, and have more or less the same situation for multiple OpenGL and mathematical libraries; the idea behind this is actually quite simply; if you have libA1 and libA2 providing the symbols, then libB linking to libA1, and libC linking to libA2, an executable foo linking to libB and libC would get both libraries linked together, and creating nasty symbol collisions.

Nowadays, FreeBSD handles this through a libmap.conf file that allows to link always the same library, but then switch at load-time with a different one; a similar approach is taken by things like libgssglue that allows to switch the GSSAPI implementation (which might be either of Kerberos or SPKM) with a configuration file. On Linux, beside this custom implementation, or hacks such as that used by Gentoo (eselect opengl) to handle the switch between different OpenGL implementations, there seem to be no interest in tackling the problem at the root. Indeed, I complained about that when --as-needed was softened to allow this situation although I guess it at least removed one common complain about adopting the option by default.

Plug-ins hosted by a standard executable: plug-ins are, generally speaking, shared objects; and with the exception of the most trivial plugins, whose output is only defined in terms of their input, they use functions that are provided by the software they plug. When they are hosted (loaded and used from) by a library, such as libxine, they are linked back to the library itself, and that makes sure that the symbols are known at the time of creating the plugin object. On the other hand, when the plug-ins are hosted by some software that is not a shared object (which is the case of, say, zsh), then you have no way to link them back, and the linker has no way to discern between undefined symbols that will be lifted from the host program, and those that are bad, and simply undefined.

Plug-ins providing symbols for other plug-ins : here you have a perfect example in the Ruby-GTK2 bindings; when I first introduced --no-undefined in the Gentoo packaging of Ruby (1.9 initially, nowadays all the three C-based implementations have the same flag passed on), we got reports of non-Portage users of Ruby-GTK2 having build failures. The reason? Since all the GObject-derived interfaces had to share the same tables and lists, the solution they chose was to export an interface, unrelated to the Ruby-extension interface (which is actually composed of a single function, bless them!), that the other extensions use; since you cannot reliably link modules one with the other, they don’t link to them and you get the usual problem of not distinguish between expected and unexpected undefined symbols.

Note: this particular case is not tremendously common; when loading plug-ins with dlopen() the default is to use the RTLD_LOCAL option, which means that the symbols are only available to the branch of libraries loaded together with that library or with explicit calls to dlsym(); this is a good thing because it reduces the chances of symbol collisions, and unexpected linking consequences. On the other hand, Ruby itself seems to go all the way against the common idea of safety: they require RTLD_GLOBAL (register all symbols in the global procedure linking table, so that they are available to be loaded at any point in the whole tree), and also require RTLD_LAZY, which makes it more troublesome if there are missing symbols — I’ll get later to what lazy bindings are.

Finally, the last case I can think of where there is at least some sense into all of this trouble, is reciprocating libraries, such as those in PulseAudio. In this situation, you have two libraries, each using symbols from one another. Since you need the other to fully link the one, but you need the one to link the other, you cannot exit the deadlock with --no-undefined turned on. This, and the executable-plugins-host, are the only two reasons that I find valid for not using --no-undefined by default — but unfortunately are not the only two used.

So, what about that lazy stuff? Well, the dynamic loader has to perform a “binding” of the undefined symbols to their definition; binding can happen in two modes, mainly: immediate (“now”) or lazy, the latter being the default. With lazy bindings, the loader will not try to find the definition to bind to the symbol until it’s actually needed (so until the function is called, or the data is fetched or written to); with immediate bindings, the loader will iterate over all the undefined symbols of an object when it is loaded (eventually loading up the dependencies). As you might guess, if there are undefined, unresolved symbol, the two binding types have very different behaviours. An immediately-loaded executable will fail to start, and a loaded library would fail dlopen(); a lazily-loaded executable will start up fine, and abort as soon as a symbol is hit that cannot be resolved; and a library would simply make its host program abort at the same way. Guess what’s safer?

With all these catches and issues, you can see why undefined symbols are a particularly nasty situation to deal with. To the best of my knowledge, there isn’t a real way to post-mortem an object to make sure that all its symbols are defined. I started writing support for that in Ruby-Elf but the results weren’t really… good. Lacking that, I’m not sure how we can proceed.

It would be possible to simply change the default to be --no-undefined, and work around with --undefined the few that require the undefined symbols to be there (we decided to proceed that way with Ruby); but given the kind of support I’ve received before in my drastic decisions, I don’t expect enough people to help me tackle that anytime soon — and I don’t have the material time to work on that, as you might guess.

Application Binary Interface

Following my earlier post about libtool archives, Jorge asked me how they relate to the numbers that come at the end of shared objects; mostly, they don’t, with the exception that the libtool archive keeps a list of all the symlinks with different suffix components for a given library.

I was, actually, already planning on writing something about shared object versioning since I did note about the problems with C++ libraries related to the “Ardour Problem”. Unfortunately it requires first some knowledge of what composes an ABI, and that requires me to write something before going deep in shared object versioning. And this hits on my main missing necessity right now: time. Since I have now more or less two jobs at hand, the time I can spare is dedicated more to Gentoo or my personal problems than writing in-depth articles for the blog. You can, anyway, always bribe me to write about a particular topic.

So let’s start with the basic of what the Application Binary Interface (ABI) is, in the smaller context that I’m going to care about, and how it relates to the shared object versioning topic I wanted to discuss. For simplicity, I’m not going to discuss issues like the various architecture-dependent calling conventions, and, for now, I’m also focusing on software written in C rather than C++; the ABI problems with the latter are an order of magnitude worse than the ones in C.

Basically, the ABI of a library is the interface between that and the software using it; it relates to the API of the interface, but you can maintain API-compatibility and break ABI-compatibility, since in the ABI, you have to count in many different factors:

  • the callable functions, with their names; adding a function is a backward-compatible ABI change (although it also means that you cannot execute something built against the newer library on a system with an older one), removing or renaming a function breaks ABI;
  • the parameters of the function, with their types; while restricting the parameters of a function (for instance taking a constant string rather than a non-constant string) is ABI-compatible, removing those restrictions is an ABI breakage; changing compound or primitive type of a parameter is also an ABI change, since you change their meaning; this is also why using parameters with types like off_t is bad (it depends on the feature-selection macros used);
  • the returned value of functions; this does not only mean the type of it, but also the actual meaning of the value;
  • the size and content of the transparent structures (i.e.: the structures that are defined, and not just declared, in the public header files);
  • any major API change also produces an ABI change (unless symbol versioning is used to keep backward-compatibility); it’s particularly important to note that changing how a dynamically-allocated returned value is allocated does change API and ABI if there is not a symmetrical function to free it; this is why, even for very simple data types, you should have a symmetrical alloc/free interface.

Now there are a few important notes about what I just wrote, and to explain them I want to use FFmpeg as an example; it is often said that FFmpeg has no warranties of ABI compatibility with the same shared object version (I’ll return to that at another time); this is false because FFmpeg developers do pay attention to keep the public ABI compatible between releases as long as the released shared object has the same version. What they don’t guarantee is the ABI-compatibility for internal symbols, and software like VLC, xine and GStreamer used to use the internal symbols without thinking about it twice.

This is why it’s important to use symbol visibility to hide the internal symbols: once you have hidden them you can do whatever you want with them, since no software can rely on them, and have subtle crashes or misbehaviour because of a change in them. But that’s a topic for another day.