Julian asked by mail – uh.. last month! okay I have a long backlog – what the -Bsymbolic
and -Bsymbolic-functions
options to the GNU linker do. The answer is not extremely complicated but it calls for some explanation on how function calls in Unix work. I say in Unix, because there are a few differences in how OS X and Windows behave if I recall correctly, and I’m definitely not an expert on those. I wish I was, then I would be able to work on a book to replace Levine’s Linkers and Loaders.
Please don’t complain about the very bad drawing above, it’s just going to illustrate what’s going on, I did it on my iPad with a capacitive stylus. I’ll probably try to do a few more of these, since I don’t have my usual Intuos tablet, and I won’t have it until I’ll find my place in London.
You see, the whole issue of linking in Unix is implemented with a long list of tables: the symbol table, the procedure linking table (PLT) and the global offset table (GOT). All objects involved in a dynamic linking chain (executables and shared objects, ET_EXEC and ET_DYN) posses a symbol table, which mixes defined (exported) and undefined (requested) symbols. Objects that are exporting symbols (ET_DYN and ET_EXEC with plugins, callbacks or simply badly designed) posses a PLT, and PIC objects (most ET_DYN, with the exception of some x86 prebuilt objects, and PIE ET_EXEC) posses GOTs.
Let’s start from the bottom, that is the GOT, or actually before the GOT itself to the executable code itself. For what ELF is concerned, by default (there are a number of options that change this but I don’t want to go there for now), data and functions sections are completely opaque. Access to functions and data has to be done through start addresses; for non-PIC objects, these are absolute addresses, as the objects are assumed to be loaded always at the same position; when using position-independent code (PIC), like the name hints, this position has to be ignored, so the data or function position has to be derived by using offsets from the load address for the object — when using non-static load addresses, and non-PIC objects, you actually have to patch the code to use the new full address, which is what causes a text relocation (TEXTREL), which requires the ability to write to the executable segments, which is an obvious security issue.
So here’s where the global offset table enters the scene. Whenever you’re referring to a particular data or function, you have an offset from the start of the containing section. This makes it possible for that section to be loaded at different addresses, and yet keep the code untouched. (Do note I’m actually simplifying a lot the whole thing, but I don’t want to go too much into details because half the people reading wouldn’t understand what I’m saying otherwise.)
But the GOT is only used when the data or function is guaranteed to be in the same object. If you’re not using any special option to either compiler or linker, this means only static symbols are addressed directly in the GOT. Everything else is accessed through the object’s PLT, in which all the functions that the object calls are added. This PLT has then code to ask the dynamic loader what address the given symbol is defined at.
To answer that question, the dynamic loader had to have a global table which resolve symbol names to addresses. This is basically a global PLT from a point of view. Depending on some settings in the objects, in the environment or in the loader itself, this table can be populated right when the application is being executed, or only when the symbols are requested. For simplicity, I’ll assume that what’s happening is the former, as otherwise we’ll end up in details that have nothing to do with what we were discussing to begin with. Furthermore there is a different complication added by the modern GNU loader, which introduced the indirect binding.. it’s a huge headache.
While the same symbol name might have multiple entries in the various objects’ symbol tables, because more than one object is exporting the same symbol, in the resolved table each symbol name has exactly one address, which is found by reading the objects’ GOTs. This means that the loader has to solve in some way the collisions that happen when multiple objects export the same symbol. And also, it means that there is no guarantee by default that an object that both exports and calls a given symbol is going to call its own copy.
Let me try to underline that point: symbols that are exported are added to the symbol table of an object; symbols that are called are added to the symbol table as undefined (if they are not there already) and they are added to the procedure linking table (which then finds the position via its own offset table). By default, with no special options, as I said, only static functions are called directly from the object’s global offset table, everything else is called through the PLT, and thus through the linker’s table of resolved symbols. This is actually what drives symbol interposing (which is used by LD_PRELOAD
libraries), and what caused ancient xine’s problems which steered me to look into ELF itself.
Okay, I’m almost at -Bsymbolic
!
As my post about xine shows, there are situations where going through the PLT is not the desired behaviour, as you want to ensure that an object calls its own copy of any given symbol that is defined within itself. You can do that in many ways; the simplest possible of options, is not to expose those symbols at all. As I said with default options, only static functions are called straight through the GOT, but this can be easily extended to functions that are not exposed, which can be done either by marking the symbols as hidden (happens at compile time), or by using a linker script to only expose a limited set of symbols (happens at link time).
This is logical: the moment when the symbols are no longer exported by the object, the dynamic loader has no way to answer for the PLT, which means the only option you have is to use the GOT directly.
But sometimes you have to expose the symbols, and at the same time you want to make sure that you call your own copy and not any other interposed copy of those symbols. How do you do that? That’s where -Bsymbolic
and -Bsymbolic-functions
options come into play. What they do is duplicate the GOT entries for the symbols that are both called and defined in a shared object: the loader points to one, but the object itself points to the other. This way, it’ll always call its own copy. An almost identical solution is applied, just at compile-time rather than link-time, when you use protected visibility (instead of default or hidden).
Unfortunately this make a small change in the semantics we’re used to: since the way the symbols are calculated varies depending on whether you’re referring to the symbol from within or outside the object, pointers taken without and outside will no longer match. While for most libraries this is not a problem, there are some cases where it really is. For instance in xine we hit a problem with the special memcpy()
function implementation: it was a weak symbol, so simply a pointer to the actual function, which was being set within the libxine object… but would have been left unset for the external objects, including the plugins, for which it would still have been a NULL.
Comparing function symbols is a rare corner case, but comparing objects’ addresses is common enough, if you’re trying to see if a default, global object is being passed to your function instead of a custom one… in that case, having the addresses no longer matching is a big pain. Which is basically why you have -Bsymbolic-functions
— it’s exactly like -Bsymbolic
but limits itself to the functions, whereas the objects are still handled like if no option was passed. It’s a compromise to make it easier to not go through the PLT for everything, while not breaking so much code (it would then only break on corner cases like xine’s memcpy()
).
By the way, if it’s not obvious, the use of symbolic resolution is not only about making sure that the objects know which function they are calling, it’s also a performance improvement, as it avoids a virtual round-trip to the dynamic loader, and a lookup of where the symbol is actually defined. This is minor for most functions, but it can be visible if there are many many functions that are being called. Of course it shouldn’t amke much of a difference if the loader is good enough, but that’s also a completely different story. As is the option for the compiler to emit two copies of a given function, to avoid doing the full preamble when called within the object. And again for what concerns link-time optimization, which is connected to, but indirectly, with what I’ve discussed above.
Oh and if it wasn’t clear from what I wrote you should not ever use the -Bsymbolic
flag in your LDFLAGS
variable in Gentoo. It’s not a flag you should mock with, only upstream developers should care about it.