LibreSSL: drop-in and ABI leakage

There has been some confusion on my previous post with Bob Beck of LibreSSL on whether I would advocate for using a LibreSSL shared object as a drop-in replacement for an OpenSSL shared object. Let me state this here, boldly: you should never, ever, for no reason, use shared objects from different major/minor OpenSSL versions or implementations (such as LibreSSL) as a drop-in replacement for one another.

The reason is, obviously, that the ABI of these libraries differs, sometimes subtly enought that they may actually load and run, but then perform abysmally insecure operations, as its data structures will have changed, and now instead of reading your random-generated key, you may be reading the master private key. nd in general, for other libraries you may even be calling the wrong set of functions, especially for those written in C++, where the vtable content may be rearranged across versions.

What I was discussing in the previous post was the fact that lots of proprietary software packages, by bundling a version of Curl that depends on the RAND_egd() function, will require either unbundling it, or keeping along a copy of OpenSSL to use for runtime linking. And I think that is a problem that people need to consider now rather than later for a very simple reason.

Even if LibreSSL (or any other reimplementation, for what matters) takes foot as the default implementation for all Linux (and not-Linux) distributions, you’ll never be able to fully forget of OpenSSL: not only if you have proprietary software that you maintain, but also because a huge amount of software (and especially hardware) out there will not be able to update easily. And the fact that LibreSSL is throwing away so much of the OpenSSL clutter also means that it’ll be more difficult to backport fixes — while at the same time I think that a good chunk of the black hattery will focus on OpenSSL, especially if it feels “abandoned”, while most of the users will still be using it somehow.

But putting aside the problem of the direct drop-in incompatibilities, there is one more problem that people need to understand, especially Gentoo users, and most other systems that do not completely rebuild their package set when replacing a library like this. The problem is what I would call “ABI leakage”.

Let’s say you have a general libfoo that uses libssl; it uses a subset of the API that works with both OpenSSL. Now you have a bar program that uses libfoo. If the library is written properly, then it’ll treat all the data structures coming from libssl as opaque, providing no way for bar to call into libssl without depending on the SSL API du jour (and thus putting a direct dependency on libssl for the executable). But it’s very well possible that libfoo is not well-written and actually treats the libssl API as transparent. For instance a common mistake is to use one of the SSL data structures inline (rather than as a pointer) in one of its own public structures.

This situation would be barely fine, as long as the data types for libfoo are also completely opaque, as then it’s only the code for libfoo that relies on the structures, and since you’re rebuilding it anyway (as libssl is not ABI-compatible), you solve your problem. But if we keep assuming a worst-case scenario, then you have bar actually dealing with the data structures, for instance by allocating a sized buffer itself, rather than calling into a proper allocation function from libfoo. And there you have a problem.

Because now the ABI of libfoo is not directly defined by its own code, but also by whichever ABI libssl has! It’s a similar problem as the symbol table used as an ABI proxy: while your software will load and run (for a while), you’re really using a different ABI, as libfoo almost certainly does not change its soname when it’s rebuilt against a newer version of libssl. And that can easily cause crashes and worse (see the note above about dropping in LibreSSL as a replacement for OpenSSL).

Now honestly none of this is specific to LibreSSL. The same is true if you were to try using OpenSSL 1.0 shared objects for software built against OpenSSL 0.9 — which is why I cringed any time I heard people suggesting to use symlink at the time, and it seems like people are giving the same suicidal suggestion now with OpenSSL, according to Bob.

So once again, don’t expect binary-compatibility across different versions of OpenSSL, LibreSSL, or any other implementation of the same API, unless they explicitly aim for that (and LibreSSL definitely doesn’t!)

Why Foreign Function Interfaces are not an easy answer

With the term FFI you usually refer to techniques related to GCC’s libffi and their various bindings, such as Python’s ctypes. The name should instead encompass a number of different approaches that work in very different ways.

What I’m going to talk about is a subset of FFI techniques that work the way FFI works, which means they also cover .NET’s P/Invoke — which I briefly talked about in an old post.

The idea is that the code for the language you’re writing in, declares the arguments that the foreign language interfaces are expecting. While this works in theory, it has quite a few practical problems, which are not really easy to see, especially for developers whose main field of expertise is interpreted languages such as Ruby, or intermediate ones like C#. This, just because the problems are related to the ABI: Application Binary Interface.

While the ABI for C and C++ is quite different, I’ll start with the worst case scenario, and that is using FFI techniques for C interfaces. A C interface (a function) is exposed only through its name, and no other data; the name does not encode either the number of the type of parameters, which means that you can’t reflectively load the ABI based off the symbols in a shared object.

What you end up doing, in these cases, is declare in the Ruby code (or whatever else, I’ll stick with Ruby because that’s where I usually have experience) the list of parameters to be used with that function. And here it gets tricky: which types are you going to use for parameters? Unless you’re sticking with C99’s standard integers, and mostly pure functions, you’re going to have trouble, sooner or later.

  • the int, long and short types do not have fixed sizes, and depending on the architecture and the operating system they are going to be of different size; Win64 and eglibc’s x32 are making that even more interesting;
  • the size of pointers (void*) depends once again on the operating system and architecture;
  • some types such as off_t and size_t depends not just on the architecture and operating systems but also on the configuration of said system: on glibc/x86, by default they are 32-bit, but if you do enable the so-called largefile support they are 64-bit (the same goes with st_ino as that post suggest);
  • on some architectures, the char type is unsigned, on others it is signed, which is one of the things that made PPC porting quite interesting, if you weren’t using C99’s types;
  • if structures are involved, especially with bitfields, you’re just going to suffer, since the layout of the structure, if not packed, depends on both the size of the fields and the endianness of the architecture — plus you have to factor in the usual chance for difference due to architecture and operating system.

Up until now, the situation doesn’t seem to be unsolvable; indeed it should be quite easy, if not for the structures, if you create type mappings for each and every standard type that could change, and make sure developers use them… of course things don’t stop there.

Rule number one of the libraries’ consumer: ABI changes.

If you’re a Gentoo user you’re very likely to be on not-too-friendly terms with revdep-rebuild or the new preserved libraries feature. And you probably have heard or read that the issue with requiring to rebuild other packages is that one of the dependencies changed ABI. To be precise, what changes in those situation is the soname which is declaring the library changed ABI, which is nice of them.

But most of the changes in ABI are not declared, either by mistake or for proper reasons. In the former case, what you have is a project that didn’t care enough about its own consumers and didn’t make sure that its ABI is compatible one release with the other, and that didn’t follow soname bumping rules, which is actually all too common. In the latter scenario, instead, you have a quite more interesting situation, especially for what FFI is concerned.

There are some cases where you can change ABI, and yet keep binary compatibility. This is usually achieved by two types of ABI changes: new interfaces and versioned interfaces.

The first one is self-explanatory: if you add a new exported function to a library, it’s not going to conflict with the other exposed interfaces (remember I’m talking about C here; this is not strictly true for C++ methods!). Yet that means thast the new versions of the library have functions that are not present in the older ones — this, by the way, is the reason why downgrading libraries is never well-supported, especially in Gentoo (if you rebuilt the library’s consumers, it is possible that they used the newly-available functions — they wouldn’t be there after the downgrade, and yet the soname didn’t change, so revdep-rebuild wouldn’t flag them as bad).

The second option is trickier; I have written something about versioning before, but I never went out of my way to describe the whole handling of it. Suffice to say that by using symbol versioning, you can get an ABI-compatible change, for an API-compatible change that would otherwise break the ABI.

A classical example is moving from uint32_t to uint64_t for the parameters of a function: changing the function declaration is not going to break API because you’re increasing the integer size (and I explicitly referred to unsigned integers so you don’t have to worry about sign extension), so a simple rebuild of the consumer would be enough for the change to be integrated. At the same time, such a change in the C ABI would make the change incompatible, as the size of the parameters on the stack doubled, so calls to the previous API would crash on the new one.

This can be solved – if you used versioning to begin with (due to the bug in glibc I discussed in the article linked earlier) – by keeping a wrapper around the new API which still uses the old one, and making each of them use a new version for the symbol. At that point, the programs built against the old API will keep using the symbol with the original version (the wrapper), while the new ones will build straight to the new API. There you are: compatible API change leads to compatible ABI change.

Yes I know what you’re thinking: you can just add a suffix to the function and use macros to switch consumers to the new interface, without using versioning at all; that’s absolutely true, but I’m not trying to discuss the merits of symbol versioning here, just explaining how it connects to FFI trouble.

Okay, why is it all this relevant then? Well, what the FFI techniques use to load the libraries they wrap around is the dlopen() and dlsym() interfaces; the latter in particular is going to follow the step of the link editor, when a symbol with multiple versions is encountered: it will use the one that is declared to be the “default symbol”, that is, the latest added (usually).

Now return to the example above: you have wrapped through FFI the function to require two parameters as uint32_t, but now dlsym() is loading in its place a function that expects two uint64_t parameters.. there you are, your code has just crashed.

Of course it is possible to override this throught he use of dlvsym() but that’s not optimal because, as far as I can tell, it’s a GNU extension, and most libraries wouldn’t be caring about that at all. At the same time, symbol versioning, or at least this complex and messed up version of it, is mostly exclusive to GNU operating systems, and its use is discouraged for libraries that are supposed to be portable… the blade there is two-sided.

Since these methods are usually just employed by Linux-specific libraries, there aren’t so many that are susceptible to this kind of crash; on the other hand, since most non-Linux systems don’t offer this choice, most Ruby developers (who seem to use OS X or Windows, seeing how often we encountered case-sensitivity issues compared to any other class of projects) would be unaware of its very existence…

Oh and by the way, if your FFI-based extension is loading libfoo.so without any soversion, you’re not really understanding shared objects, and you should learn a bit more about them before wrapping them.

What’s the morale? Be sure about what you want to do: wrapping C-based libraries often is a good choice to avoid reimplement everything, but consider if it might not be a better idea to write the whole thing in Ruby, it might not be so time-critical as you think it is.

Writing C-based extension move the compatibility issues at build-time, which is a bit safer: even if you write tests for each and every function you wrap (which you should be doing), the ABI can change dynamically when you update packages, making install-time tests not much reliable for this kind of usage.

Your worst enemy: undefined symbols

What ties in reckless glibc unmasking GTK+ 2.20 issues Ruby 1.9 porting and --as-needed failures all together? Okay the title is a dead giveaway for the answer: undefined symbols.

Before deepening within the topic I first have to tell you about symbols I guess; and to do so, and to continue further, I’ll be using C as the base language for everyone of my notes. When considering C, then, a symbol is any function or data (constant or variable) that is declared extern; that is anything that is neither static or defined in the same translation unit (that is, source file, most of the time).

Now, what nm shows as undefined (U code) is not really what we’re concerned about; for object files (.o, just intermediate) will report undefined symbols for any function or data element used that is not in the same translation unit; most of those get resolved at the time all the object files get linked in to form a final shared object or executable — actually, it’s a lot more complex than this, but since I don’t care about describing here symbolic resolution, please accept it like it was true.

The remaining symbols will be keeping the U code in the shared object or executable, but most of them won’t concern us: they will be loaded from the linked libraries, when the dynamic loader actually resolve them. So for instance, the executable built from the following source code, will have the printf symbol “undefined” (for nm), but it’ll be resolved by the dynamic linker just fine:

int main() {
  printf("Hello, world!");
}

I have explicitly avoided using the fprintf function, mostly because that would require a further undefined symbol, so…

Why do I say that undefined symbols are our worst enemy? Well, the problem is actually with undefined, unresolved symbols after the loader had its way. These are either symbols for functions and data that is not really defined, or is defined in libraries that are not linked in. The former case is what you get with most of the new-version compatibility problems (glibc, gtk, ruby); the latter is what you get with --as-needed.

Now, if you have a bit of practice with development and writing simple commands, you’d be now wondering why is this a kind of problem; if you were to mistype the function above into priltf – a symbol that does not exist, at least in the basic C library – the compiler will refuse to create an executable at all, even if the implicit declaration was only treated as a warning, because the symbol is, well, not defined. But this rule only applies, by default, to final executables, not to shared objects (shared libraries, dynamic libraries, .so, .dll or .dylib files).

For shared objects, you have to explicitly ask to refuse linking them with undefined reference, otherwise they are linked just fine, with no warning, no error, no bothering at all. The way you can tell the linker to refuse that kind of linkage is passing the -Wl,--no-undefined flag; this way if there is even a single symbol that is not defined in the current library or any of its dependencies the linker will refuse to complete the link. Unfortunately, using this by default is not going to work that well.

There are indeed some more or less good reasons to allow shared objects to have undefined symbols, and here come a few:

Multiple ABI-compatible libraries: okay this is a very far-fetched one, simply for the difficulty to have ABI-compatible libraries (it’s difficult enough to have them API-compatible!), but it happens; for instance on FreeBSD you – at least used to – have a few different implementations of the threading libraries, and have more or less the same situation for multiple OpenGL and mathematical libraries; the idea behind this is actually quite simply; if you have libA1 and libA2 providing the symbols, then libB linking to libA1, and libC linking to libA2, an executable foo linking to libB and libC would get both libraries linked together, and creating nasty symbol collisions.

Nowadays, FreeBSD handles this through a libmap.conf file that allows to link always the same library, but then switch at load-time with a different one; a similar approach is taken by things like libgssglue that allows to switch the GSSAPI implementation (which might be either of Kerberos or SPKM) with a configuration file. On Linux, beside this custom implementation, or hacks such as that used by Gentoo (eselect opengl) to handle the switch between different OpenGL implementations, there seem to be no interest in tackling the problem at the root. Indeed, I complained about that when --as-needed was softened to allow this situation although I guess it at least removed one common complain about adopting the option by default.

Plug-ins hosted by a standard executable: plug-ins are, generally speaking, shared objects; and with the exception of the most trivial plugins, whose output is only defined in terms of their input, they use functions that are provided by the software they plug. When they are hosted (loaded and used from) by a library, such as libxine, they are linked back to the library itself, and that makes sure that the symbols are known at the time of creating the plugin object. On the other hand, when the plug-ins are hosted by some software that is not a shared object (which is the case of, say, zsh), then you have no way to link them back, and the linker has no way to discern between undefined symbols that will be lifted from the host program, and those that are bad, and simply undefined.

Plug-ins providing symbols for other plug-ins : here you have a perfect example in the Ruby-GTK2 bindings; when I first introduced --no-undefined in the Gentoo packaging of Ruby (1.9 initially, nowadays all the three C-based implementations have the same flag passed on), we got reports of non-Portage users of Ruby-GTK2 having build failures. The reason? Since all the GObject-derived interfaces had to share the same tables and lists, the solution they chose was to export an interface, unrelated to the Ruby-extension interface (which is actually composed of a single function, bless them!), that the other extensions use; since you cannot reliably link modules one with the other, they don’t link to them and you get the usual problem of not distinguish between expected and unexpected undefined symbols.

Note: this particular case is not tremendously common; when loading plug-ins with dlopen() the default is to use the RTLD_LOCAL option, which means that the symbols are only available to the branch of libraries loaded together with that library or with explicit calls to dlsym(); this is a good thing because it reduces the chances of symbol collisions, and unexpected linking consequences. On the other hand, Ruby itself seems to go all the way against the common idea of safety: they require RTLD_GLOBAL (register all symbols in the global procedure linking table, so that they are available to be loaded at any point in the whole tree), and also require RTLD_LAZY, which makes it more troublesome if there are missing symbols — I’ll get later to what lazy bindings are.

Finally, the last case I can think of where there is at least some sense into all of this trouble, is reciprocating libraries, such as those in PulseAudio. In this situation, you have two libraries, each using symbols from one another. Since you need the other to fully link the one, but you need the one to link the other, you cannot exit the deadlock with --no-undefined turned on. This, and the executable-plugins-host, are the only two reasons that I find valid for not using --no-undefined by default — but unfortunately are not the only two used.

So, what about that lazy stuff? Well, the dynamic loader has to perform a “binding” of the undefined symbols to their definition; binding can happen in two modes, mainly: immediate (“now”) or lazy, the latter being the default. With lazy bindings, the loader will not try to find the definition to bind to the symbol until it’s actually needed (so until the function is called, or the data is fetched or written to); with immediate bindings, the loader will iterate over all the undefined symbols of an object when it is loaded (eventually loading up the dependencies). As you might guess, if there are undefined, unresolved symbol, the two binding types have very different behaviours. An immediately-loaded executable will fail to start, and a loaded library would fail dlopen(); a lazily-loaded executable will start up fine, and abort as soon as a symbol is hit that cannot be resolved; and a library would simply make its host program abort at the same way. Guess what’s safer?

With all these catches and issues, you can see why undefined symbols are a particularly nasty situation to deal with. To the best of my knowledge, there isn’t a real way to post-mortem an object to make sure that all its symbols are defined. I started writing support for that in Ruby-Elf but the results weren’t really… good. Lacking that, I’m not sure how we can proceed.

It would be possible to simply change the default to be --no-undefined, and work around with --undefined the few that require the undefined symbols to be there (we decided to proceed that way with Ruby); but given the kind of support I’ve received before in my drastic decisions, I don’t expect enough people to help me tackle that anytime soon — and I don’t have the material time to work on that, as you might guess.

Important names, pointless names

So I recently said that i wanted to tackle the problem of QA warnings about missing sonames (Shared Object Names) and to do so I needed to give you some details about shared objects … I’ll add that you probably want to read my explanation about sonames that I wrote last October, since it covers some basis I’d rather avoid repeating here.

The sonames are, as I said before, the names that the dynamic linker will search for when loading a program, or another library, linked to the former. This is because the value of the DT_SONAME tag will be copied into the DT_NEEDED entry for the linked programs; this is why the soname is particularly important. Especially if it’s properly used with ABI-versioning so that libfoo.so.0 and libfoo.so.1 are used for two different, incompatible ABIs that cannot be used interchangeably.

At this point is clear that the soname main importance is to generate proper NEEDED entries; so what happens when it’s missing? In that case, the basename of the library file (which usually is just libfoo.so) is used as NEEDED; since -lfoo will translate to searching libfoo.so without further extension, even when the library is providing (manually) the version numbers, they won’t be encoded into the NEEDED entry at all, and that is, as you might guess, bad.

But is it, always?

While for a general-purpose library (I don’t mean just a library like glib, but rather a library in the broader sense, counting in also specific libraries like libpng or libcurl) the lack of a soname is always a bad thing, there are a few situations where it’s not too important, and can easily be ignored, to avoid spending time where it’s not useful: whenever ABI versioning is not important. This assertion gets then split into further cases, as I’ll try to explain now.

The first case is when you won’t be listing the library as NEEDED ever; this is the case for almost all cases of plugins I know of. They get loaded through dlopen() rather than by the implicit handling of the dynamic linker. This makes sonames and needed entries totally pointless. In these cases, it shouldn’t be a problem even if the library lacks a soname.

The second case relates to internal shared libraries, that might or might not be worth it, as I wrote before. Internal libraries are shipped with the binaries that links to them. They are rebuilt whenever the package is rebuilt. They are only there to reduce the code duplication between binaries. In these cases, sonames, and shared object versioning, are of no use. While it won’t cause trouble to have them, they won’t be necessary.

So, how perfect is the Gentoo QA test? Quite a bit. It won’t cause you to waste time with most of the plugins and most of the internal libraries; on the other hand, there will be false negatives: currently, it only takes the files corresponding to the glob {,usr/}lib*/lib*.so* … this is correct, most of the time, but not always. If the package will add configuration scripts to link to libraries in other paths if the package will install new paths into ld.so.conf, so adding more paths to the default search, where NEEDED entries would be required. These are unfortunately cases it’s not trivial to look out for.

So what about the private libraries that are installed into the /usr/lib path (or similar) for which no soname is required but the warning is printed out? Well you have different ways to proceed: you might just add the soname, which might be more work than it’s worth; you might move the library in a sub-directory and then use rpath to add it to the search path just of the package’s binaries; or you might just use the QA_SONAME_${ARCH} variable to ignore the warning (although I would argue that this should be then reported as a workaround warning by repoman).

My favourite solution here would be to actually use rpath to move the library out of the generic linker search path. While this adds one further search for all the libraries those binaries link to, it reduces the size of the main search path’s content… and you should know that the amount of files in a directory relates to the slowness of accessing the directory itself. While I don’t have any benchmark at hand, I find that much, much easier to deal with (and it’s actually one of the reasons why I’d like for the .la files to disappear: they artificially increase the size of the library paths without good reason, in my opinion).

But if you want to bring upstream the issue, it might be a good time to ask whether those libraries are worth it or not, following the indications I wrote in the article linked at the top of this post.

Shared libraries worth their while

This is, strictly speaking, a non Gentoo-related post; on the other hand, I’m going to introduce here a few concepts that I’ll use in a future post to explain one Gentoo-specific warning, so I’ll consider this a prerequisite. Sorry if you feel like Planet Gentoo should never go over the technical non-Gentoo work, but then again, you’re free to ignore me.

I have, in the past, written about the need to handle shared code in packages that install multiple binaries (real binaries, not scripts!) to perform various tasks, which end up sharing most of their code. Doing the naïve thing, using the source code in all of them, or the slightly less naïve thing, building a static library and linking it to all the binaries, tend to increase the size of the commands on disk, and the memory required to fully load them in memory. In my previous post I noted a particularly nasty problem with the smbpasswd binary, that was almost twice the size because of unused code injected by the static convenience library (and probably even more, given that I never went down for to hide the symbols and clean them up).

In another post, I also proposed the use of multicall binaries to handle these situations; the idea behind multicall binaries is that you end up with a single program, with multiple “applets”; all the code is merged into a single ELF binary object, and at runtime the correct path is taken to call the right applet, depending on the name used to call up the binary. It’s not extremely easy but not even impossible to get right, so I still suggest that as main alternative to handle shared code, when the shared code is bigger in size than the single applet’s code.

This does not solve the Samba situation though: the code of the single utilities is still big enough than having a single super-package will not make it very manageable, and a different solution has to be devised. In this case you end up having to choose between the static linking (naïve approach) or using a private, shared object. An easy way out here is trying to be sophisticated, and always go with the shared object approach; it definitely might not be the best option.

Let me be clear here: shared objects are not a panacea to the shared code problems. As you might have heard already, using shared objects is generally a compromise: you relax problems related to bugs and security vulnerability, by using a shared object, so that you don’t have to rebuild all the software using that code — and most of the times you also share read-only memory to reduce the memory consumption of a system — at the expense of load time (the loader has to do much more work), sometime execution speed (PIC takes its toll), and sometimes memory usage, as counter-intuitive as that might sound, given that I just said that they reduce memory consumption.

While the load time and execution speed tolls are pretty much immediate to understand, and you can find a lot of documentation about them on the net, it’s less obvious to understand the share-memory, waste-memory situation. I wrote extensively about the Copy-on-Write problem so if you follow my blog regularly you might have guessed the problem already at this point, but it does not fill in all the gaps yet, so let me try to explain how this compromise works.

When we use ELF objects, part of the binary file itself are shared in memory across different processes (homogeneous or heterogeneous). This means that only those parts that would not be modified from the ELF files can be shared. This usually includes the executable code – text – for standard executables (and most code compiled with PIC support for shared objects, which is what we’re going to assume), and part (most) of the read-only data. In all cases, what breaks the share for us is Copy-on-Write, as that will create private copies of the pages to the single process, which is why writeable data is nothing we care about when choosing the code-sharing strategy (it’ll mostly be the same whether you link it statically or via shared objects — there are corner cases, but I won’t dig into them right now).

What is that talking about homogeneous or heterogeneous processes above? Well, it’s a mistake to think that the only memory that is shared in the system is due to shared objects: read-only text and data for an ELF executable file are shared among processes spawned from the same file (what I called and will call homogeneous processes). What shared object accomplish with memory is sharing between processes spawned by different executables, but loading the same shared objects (heterogeneous processes). The KSM implementation (no it’s not KMS!) in the current versions of the Linux kernel allows for something similar, but it’s a story so long that I won’t really bother count it in.

Again, the first approach to shared objects might make you think that moving whatever amount of memory from being shared between homogeneous processes to be shared between heterogeneous processes is a win-win situation. Unfortunately you have to cope with data relocations (which is a topic I wrote about extensively): a constant pointer is read-only when the code is always loaded at a given address (as it happens with most standard ELF executables), but it’s not when the code can be loaded at an arbitrary address (as it happens with shared objects): in the latter case it’ll end up in the relocated data section, which follows the same destiny as the writeable data section: it’s always private to the single process!

*Note about relocated data: in truth you could ensure that the data relocation is the same among different processes, by using either prelinking (which is not perfect especially with modern software, which is more and more plugin-based), or methods like KDE’s kdeinit preloading. In reality, this is not really something you could, or should, rely upon because it also goes against the strengthening of security applied by Address Space Layout Randomization.*

So when you move shared code from static linking to shared objects, you have to weight in the two factors: how much code will be left untouched by the process, and how much will be relocated? The size utility from either elfutils or binutils will not help you here, as it does not tell you how big the relocated data section is. My ruby-elf instead has an rbelf-size script that gives you the size of .data.rel.ro (another point here: you only care about the increment in size of .data.rel.ro as that’s the one that is added as private: .data.rel would be part of the writeable data anyway). You can see it in action here:

flame@yamato ruby-elf % ruby -I lib tools/rbelf-size.rb /lib/libc.so.6
     exec      data    rodata     relro       bss     total filename
   960241      4507    359020     12992     19104   1355864 /lib/libc.so.6

As you can see from this output, the C library has some 950K of executable code, 350K of read-only data (both will be shared among heterogeneous processes) and just 13K (top) of additional relocated memory, compared to static linking. _Note: the rodata value does not only include .rodata but all the read-only non-executable sections; the value of exec and rodata roughly corresponds of what size calls text).

So how is knowing how much relocated data useful in assessing how to deal with shared code? Well, if you build your shared code as shared object and analyse it with this method (hint: I just implemented rbelf-size -r to reduce the columns to the three types of memory we have in front of us), you’ll have a rough idea of how much gain and how much waste you’ll have for what concern memory: the higher the shared-to-relocated ratio, the better results you’ll have. Having an infinite ratio (when there is no relocated data) it’s the perfection.

Of course the next question is what do you do if you have a low ratio? Well there isn’t really a correct answer here: you might decide to bite the bullet and go in the code to improve the ratio; cowstats from the Ruby-Elf suite helps you to do just that; it can actually help you reducing your private sections as well, as many times you have mistake in there, due to missing const declarations. If you have already done your best to reduce the relocations, then your only chance left is to avoid using a library altogether; if you’re not going to improve your memory usage by using a library, and it’s something internal only, then you really should look into using either static linking or, even better, multicall binaries.

Impootant Notes of Caution

While I’m trying to go further on the topic of shared objects than most documentation I have read myself on the argument, I have to point out that I’m still generalising a lot! While the general concept are as I put them down here, there are some specific situations that change the table making it much more complex: text relocations, position independent executables, PIC overhead, are just some of the problems that might arise while trying to apply these general ideas over specific situations.

Still trying not to dig too deep on the topic right away, I’d like to spend a few words about the PIE problem, which I have already described and noted in the blog: when you use Position Independent Executables (which is usually done to make good use of the ASLR technique), you can discard the whole check of relocated code: almost always you’ll have good results if you use shared objects (minus complications added by the overlinking, of course). You still would have the best results by using multicall binaries if the commands have very little code.

Also, please remember that using shared objects slows down the loading process which means that if you have a number of fire-and-forget commands, which is something not too unusual in the UNIX-like environments, you will probably have best results with multicall, or static linking, than with shared objects. The shared memory is also something that you’ll probably get to ignore in that case, as it’s only worth its while if you normally keep the processes running for a relatively long time (and thus loaded into memory).

Finally, all I said refers to internal libraries used for sharing code among commands of the same package: while most of the same notes about performance-wise up- and down-sides holds true for all kind of shared objects, you have to factor in the security and stability problems when you deal with third-party (or third-party-available) libraries — even if you develop them yourself and ship them with your package: they’ll still be used by many other projects so you’ll have to handle them with much more care, and they should really be shared.

A shared object is (not) enough

In my immediately previous post I have thrown in a couple of nods to two particularly nasty issues related to shared object plugins; I have written extensively, or even excessively, about the issue so I’m not going to write more about the presentation of shared objects (or dynamic libraries if you prefer the Microsoft term for the same concept), and I’d rather go on with the two current problems at hand, which I’ll try to cover in a proper manner.

Shared objects as plugins

When building shared objects for plugin usage, like the case for NSS I noted, PAM plugins or extensions for languages like Ruby, Python, Perl, Java… you don’t need static libraries at all, so a shared object is enough!

While some of those system do support statically linking engine and plugins in an application, this rarely works out as intended; for instance FreeBSD (used to?) support statically linked PAM, but that worked only for the default modules, and if you configured your service authentication chain to use non-default modules you have a non-working setup. So the net result is definitely against having to support statically-linked PAM, or any other statically linked system.

Since you cannot link this stuff statically in, you can easily see that there is no need to install (nor build!) the static archive version of those things; this is usually done properly by both custom-tailored build system (as upstream likely tries to minimise the effort) and by the language-specific buildsystems (like the various incarnation of mkmf and rake-compiler in Ruby, distutils in Python, ant for Java, and so on so forth). On the other hand, especially with autotools-based build system, most people seem to forget that there is a nasty overhead in building both version, beside the waste of installing the extra file.

Indeed, since libtool will prefer building PIC objects for the shared objects (as it’s required for AMD64 and most of non-x86 architectures), and non-PIC objects for static archives (to reduce overhead); since you cannot build once for both (nor you can pre-process once, as you can have __PIC__ conditional code!) you end up having to call the compiler twice for each source file. To reduce this overhead you can usually default to disable static libraries (or disable it through ./configure invocation) or you can disable it altogether as instructed so that it only ever builds the shared object version, and not the static one. Unfortunately this does not stop libtool from installing a pointless .la file but that’s a different story.

While there is a safety check in Portage proper to check for .la and .a files in /lib, there is no such check for Ruby, Python, Perl, Java extensions. My tinderbox has an extra check for that and is usually able to find them; I also have a bug report template that tries explaining the maintainers involved why I report to them that a .la file is pointless and that they might want to fix the eventual static archives at the same time as well. Unfortunately, sometimes people decide it’s too much of a hassle to prepare the patch to send upstream and apply that in Gentoo, so you end up with ebuilds that avoid using make install to avoid installing the already built archives, or just delete them after install (works okay for .la files, since they are usually small and it’s definitely not trivial to avoid installing them), causing the double-build to still be performed.

Boot-critical programs and shared objects

Different page, correlated issue happens with boot-critical programs: things that need to be started before you mount all your local filesystems (maybe because they are needed to mount such filesystems, like lvm) need to have all their shared objects being available at that time. This becomes a problem when you end up needing libraries installed in /usr/lib and you split out /usr (similarly to what I said about /boot I don’t think the general case should be for splitting it out; sure there are cases where it’s needed for various reasons, but it shouldn’t be by default!), as they wouldn’t be able to run.

To solve this problem you either move the libraries (and all their dependencies) to /lib, or you have to statically link applications. The former creates a chain reaction that makes the whole point of splitting /usr mostly moot; the latter problem actually moves the problem down to the user: since Gentoo policy is to never force static linking on the user, as shared object linking has many many advantages, this is usually made conditional to the static USE flag; such a flag will build the software with static linking, and will thus require the dependent libraries to be available in static archive form (which is why rather than Portage features or whatever else, static libraries are usually made optional via an USE flag: it can be depended upon).

Mix the two! Shared object plugins for boot-critical programs!

And here is the reason why I merged the two problems, as they might seem just barely related: there is a case where you actually need a static archive to build a shared object plugin; that’s the case for PAM plugins that need libraries, such as pam_userdb uses Berkeley DB library for storage.

It’s not an easy case to solve, because of course you’d be looking to have the library available as static archive, but at the same time it has to be PIC to work properly… up to the 0.99 version, the solution was to build an internal copy of Berkeley DB within the PAM ebuild; without counting the additional problems with security, we ended up with a very complex ebuild, a lot more complex than it would be needed for PAM alone. I discarded that solution when I took over PAM, and split the Berkeley DB support in its own ebuild… doing the same thing as before. That ebuild has been, up to now, pretty much untouched, and the result is that we have a stale ebuild in tree using a stale version of Berkeley DB. I don’t like that situation at all.

Sincerely, after thinking about it I think the best solution at this point is simply to get rid of the stale ebuild, and decide that even though PAM is installed in /lib, it does not warrant total coverage: we won’t be moving, or building statically, things like PostgreSQL, LDAP, MySQL and so on so forth, and yes, those are all possible providers for the PAM modules. I guess we should just add one very simple statement: don’t use external-dependent modules for authenticating users and services that are boot-critical.

If I’ll still be a Gentoo developer next month, after I free myself up from my current work tasks, I’ll be merging back sys-auth/pam_userdb into sys-libs/pam, and then take care of getting the new PAM stable, and the old ebuild removed. This should solve quite a few of our problems and set a better precedent than what we have right now.

A shared library by any other name

One interesting but little known fact among users and developers alike is the reason why shared libraries are installed on systems with multiple file names. This ties a bit into the problem of .la files, but that’s not my topic here.

Shared libraries, especially when built and installed through libtool, usually get installed with one file, and two symlinks: libfoo.so, libfoo.so.X and libfoo.so.X.Y.Z. The reasoning for this is not always clear, so I’ll try to shed some light on the matter, for future reference, maybe it can help solving trouble for others in the future:

  • first of all, the real file is the one with the full version: libfoo.so.X.Y.Z: this because libtool uses some crazy-minded versioning scheme that should make it consistent to add or remove interfaces… in truth it usually just drives developers crazy when they start wondering which value they have to increase (hint: no, the three values you set into libtool flags are not the same three you get in the filename);
  • the presence of the other two names are due to the presence of two linker programs: the build-time linker (or link editor) and the runtime (or dynamic) linker: ld and ld.so; each one uses a different name for the library;
  • the link editor (ld) when linking a library by short name (-lfoo) isn’t in the known about which version you’re speaking of, so it tries its best to find the library transforming it to libfoo.so, without any version specification; so that’s why the link with the unversioned name is there;
  • the dynamic linker, when looking up the libraries to load, uses the NEEDED entries in the .dynamic section of the ELF file; those entries are created based on the SONAME entry (in the same section) of the linked library; since the link editor found the library as libfoo.so it wouldn’t be able to use the filename properly; the SONAME also serves to indicate the ABI compatibility, so it is usually versioned (with one or more version components depending on the operating system’s convention — in Gentoo systems, both Linux and FreeBSD, the convention is one component, but exceptions exist); in this case, it’d be libfoo.so.X; so this is what the dynamic linker looks up, it’s also not in the known about the full version specification.

Now there are a few things to say about this whole situation with file names: while libtool takes care of creating the symlinks by itself, not all build systems do so; one very common reason for that is that they have no experience of non-GNU systems (here the idea of “GNU system” is that of any system that uses the GNU C library). The thing is, ldconfig on GNU systems does not limit itself at regenerating the ld.so cache, but it also ensures that all the libraries are well symlinked (I sincerely forgot whether it takes care of .so symlinks as well or just SONAME-based symlinks, but I’d expect only the latter). A few packages have been found before explicitly relied on ldconfig to do that using a GNU-specific parameter (a GNU lock-in — might sound strange but there are those as well) that takes care of fixing the links without changing the ld.so cache.

And there our beloved .la files come back in the scene. One of the things that .la files do is provide an alternative to the -lfoolibfoo.so translation for the linkers that don’t do that by themselves (mostly very old linkers, or non-ELF based linkers). And once again this is not useful to us, at least in main tree, since all our supported OSes (Linux, FreeBSD, with all three the supported C libraries) are new enough to properly take care of that by themselves.

Plugins and static libraries don’t mix well

There is one interesting thing that, I’m afraid, most devleopers happen to ignore, either spontaneously, or because they really don’t know about them. One of this is the fact that static libraries and plugins usually don’t mix well. Although I have to warn you, that’s not an absolute and they can easily designed to work fine.

The main problem though is to ponder whether it is useful to use static libraries and plugins together, and then it’s to find out if it’s safe to or not. Let’s start from the basis. What are static libraries used for? Mostly for two reasons: performances, and not having to depend on the dynamic version of the same library in the system. If performance of the library is a problem, it’s much more likely that the culprit is the plugins system rather than the dynamic nature of the library; I have said something about it in the past, although I didn’t go much in details and I haven’t had time to continue the discussion yet.

For what concerns dependencies, the plugins usually need a way to access the functions provided by the main library; this means there is an ABI dependency between the two; now the plugins might not link against the library directly, to support static libraries usage, but it also means that if the internal ABI changes in any way between versions, you’re screwed.

What does this mean? That in most cases when you have plugins, you don’t want to have static libraries around; this means that you also don’t need the .la files and so you have quite a bit of cleanup.

More to the point, if you’re building a plugin, you don’t want to build the static version at all, since the plugin will be opened with the dlopen() interface, from the dynamic version of the library (the module). Since not always upstream remember to disable the static archive building in their original build system, ebuild authors should take care of disabling them, either with --disable-static or by patching the build system (if you don’t want to stop all static lib building). And this is not my idea but a proper development procedure (and no, it does not mean any discussion: if it’s a plugin — and it’s not possible to make it a builtin — you shouldn’t install the archive file! Full stop!).

Now, you can see where this brings us again: more .la files that are often currently installed and are not useful at all. Like .la files for PAM modules (libpam does not load them through the .la so they are not useful — and this is definitely the word of the PAM maintainer! And for PAM-related packages, that word is The Word). Let’s try to continue this way, shall we? From the leaves.

ABI changes keeping backward compatibility

So I’ve written about what ABI is and what breaks it and how to properly version libraries and I left saying that I would be writing about some further details on how to improve compatibility between library versions. The first trick is using symbol versioning to maintain backward compatibility.

The original Unix symbol namespace used for symbols (which is the default one) was flat; that means that a symbol (function, variable or constant) is only ever identified by its name. To make things more versatile, Sun Solaris and GNU C library both implemented “symbol versioning”: a method to assign to the same function one extra “namespace” identifier; the use of this feature is split between avoiding symbol collisions and maintaining ABI compatibility.

In this post I’m focusing on using the feature to maintain ABI compatibility, by changing the ABI of a single function, simulating two versions of the same library, and yet allowing software built against the old version to work just fine with the new one. While it’s possible to achieve the same thing in many different systems, I’m going to focus for ease of explanation to glibc-based Linux systems, in particular on my usual Yamato Gentoo box.

While it does not sound like a real life situation, I’m going to use a very clumsy API-compatible function which ABI can be broken, and which simply reports the ABI version used:

/* lib1.c */
#include 

void my_symbol(const char *str) {
  printf("Using ABI v.1n");
}

The ABI break will come in the form of the const specifier being dropped; since this un-restricts what the function can do with the parameter, it is a “silent” ABI break (in the sense that it wouldn’t warn the user if we weren’t taking precaution). Indeed if we were having the new version written this way:

/* lib2.c */
#include 

void my_symbol(char *str) {
  printf("Using ABI v.2n");
}

the binaries would still execute “cleanly” reporting the new ABI (of course if we were changing the string, and the passed string was a static constant string, that would be a problem; but since I’m using x86-64 I cannot show that since it still does not seem to produce a bus error or anything). Let’s assume that instead of reporting the new ABI version the software would crash because something happened that shouldn’t.

The testing code we’re going to use is going to be pretty simple:

/* test.c */
#if defined(LIB1)
void my_symbol(const char *str);
#elif defined(LIB2)
void my_symbol(char *str);
#endif

int main() {
  my_symbol("ciao");
}

Now we have to make the thing resistant to ABI breakage, which involves using inline ASM and GNU binutils extensions in our case (because I’m trying to simplify for now!). Instead of a single my_symbol function, we’re going to define two public my_symbol_* functions, with the two ABI version variants, and then alias them:

/* lib2.c */
#include 

void my_symbol_v1(const char *str) {
  printf("Using ABI v.1n");
}

void my_symbol_v2(char *str) {
  printf("Using ABI v.2n");
  str[0] = '';
}

__asm__(".symver my_symbol_v1,my_symbol@");
__asm__(".symver my_symbol_v2,my_symbol@@LIB2");

The two aliases are the ones doing the magic, the first defines the v1 variant as having the default versioning (nothing after the “at” character), while the v2 variant, which is the default used for linking if no version is defined (two “at” characters), has the LIB2 version. This does not work right away though: first of all the LIB2 version is defined nowhere, second the two v1 and v2 symbols are also exported allowing other software direct access to the two variants. Since you cannot use the symbol visibility definitions here (you’d be hiding the aliases too) you have to provide the linker with a linker script:

/* lib2.ldver */
LIB2 {
     local:
    my_symbol_*;
};

With this script you’re telling the linker that all the my_symbol_ variants are to be considered local to the object (hidden) and you are, as well, creating a LIB2 version to assign to the v2 variant.

But how well does it work? Let’s build two binaries against the two libraries, then execute them with the two:

flame@yamato versioning % gcc -fPIC -shared lib1.c -Wl,-soname,libfoo.so.0 -o abi1/libfoo.so.0.1
flame@yamato versioning % gcc -fPIC -shared lib2.c -Wl,-soname,libfoo.so.0 -Wl,-version-script=lib2.ldver -o abi2/libfoo.so.0.2
flame@yamato versioning % gcc test.c -o test-lib1 abi1/libfoo.so.0.1                              
flame@yamato versioning % gcc test.c -o test-lib2 abi2/libfoo.so.0.2                              
flame@yamato versioning % LD_LIBRARY_PATH=abi1 ./test-lib1       
Using ABI v.1
flame@yamato versioning % LD_LIBRARY_PATH=abi1 ./test-lib2
./test-lib2: abi1/libfoo.so.0: no version information available (required by ./test-lib2)
Using ABI v.1
flame@yamato versioning % LD_LIBRARY_PATH=abi2 ./test-lib1
Using ABI v.1
flame@yamato versioning % LD_LIBRARY_PATH=abi2 ./test-lib2
Using ABI v.2

As you see, the behaviour of the program linked against the original version is not changed when moving to the new library, while the opposite is true for the program linked against the new and executed against the old one (that would be forward compatibility of the library, which is rarely guaranteed).

This actually makes it possible to break ABI (but not API) and then be able to fix bugs, it shouldn’t be abused since it does not always work that well and it does not work on all the systems out there. But it helps to deal with legacy software that needs to be kept backward-compatible and yet requires fixes.

For instance, if you didn’t follow the advice of always keeping alloc/free interfaces symmetrical, and you wanted to replace a structure allocation from standard malloc() to g_malloc() to GSlice, you can do that by using something like this:

/* in the header */

my_struct *my_struct_alloc();
#define my_struct_free(x) g_slice_free(my_struct, x)

/* in the library sources */

my_struct *my_struct_alloc_v0() {
  return malloc(sizeof(my_struct));
}

my_struct *my_struct_alloc_v1() {
  return g_malloc(sizeof(my_struct));
}

my_struct *my_struct_alloc_v2() {
  return g_slice_new(my_struct);
}

__asm__(".symver my_struct_alloc_v0,my_struct_alloc@");
__asm__(".symver my_struct_alloc_v1,my_struct_alloc@VER1");
__asm__(".symver my_struct_alloc_v2,my_struct_alloc@@VER2");

Of course this is just a proof of concept, if you only allocated the structure, two macros would be fine; expect something to be done to the newly allocated function then. If versioning wasn’t used, software built against the first version, using malloc(), would crash when freeing the memory area with the GSlice deallocator; on the other hand with this method, each software built against a particular allocator will get its own deallocator properly, even in future releases.

Shared Object Version

Picking up where I left with my post about ABI I’d like to provide some insights about the “soversion”, or shared object version, that is part of the “soname” (the canonical name of a shared object/library), and its relationship with the -version-info option in libtool.

First of all, the “soname” is the string listed in the DT_SONAME tag of the .dynamic section of an ELF shared object. It represents the canonical name the library should be called with, and it’s used to create the DT_NEEDED entries for the shared objects and dynamic executables depending on it, as well as the canonical name used when opening the library through dlopen() (without the full path).

Usually, the soname is composed of the library’s basename (libfoo.so) followed by a reduced shared object version, but the extent to which is reduced (or not) depends on the standard rules for the operating system and a few other notes. What I’m going to talk about today is that last part, the shared object version, which is probably the most important part of the soname.

First of all, the “soversion” does not correspond to either the package version nor the -version-info parameter (although it is calculated starting from that one); using either directly would be a big mistake, unless you expect to be able to keep a perfect ABI based on your package versioning, in which case you might want to try using the package version, but that’s quite a difficult thing to do.

The part of this version that is embedded in the soname is the version of the ABI, and has to change when the ABI is changed following the rules I shown previously. If this was kept the same between versions and the ABI was broken, software would be going to subtly crash because of the changes in the ABI. By changing the ABI version, and thus the soname, you make the loader refuse to start the program with a different library than it was developed for; of course it does not make the software magically work, but it will at least stop it from crashing further on along the road.

By default on Linux and Solaris, there is a single component used for the soname, as ABI version, at least with libtool, projects following, manually, this rule, and setting their soversion the same as their package version would be providing a single ABI for each major version of their software; I rarely have seen anything like that working out good. Ruby uses a mix of this, by defining two components as the soversion, so that eventually you could have libruby.so.1.8 and libruby.so.1.9 (on the other hand, we rename them to libruby18 and libruby19 so that they don’t collide for other reasons, but that’s beside the point). This works as long as they don’t have to change, for any reason, the ABI of a minor release of Ruby; when that happens, something will certainly break.

The -version-info of libtool is explicitly distinct from the package version, as well as the actual soversion, and is used to provide a consistent library versioning among releases, by providing three components: current, age and revision; they represent the information in form of API/ABI supported and dropped; understanding the separation is quite a time waste but it can be summarised in three simple steps:

  • if you don’t change the interface at all just increase the “interface revision” value;
  • if you make backward-compatible changes (like adding interfaces), increase the “current interface” value and the “older interface age” value, reset “interface revision” to zero;
  • if you make backward-incompatible changes, breaking ABI (removing interfaces for instance), increase the “current interface” value and reset both “older interface age” and ”interface revision” to zero.

Depending on the operating system, this will create a soname change either on backward-incompatible changes (Linux, Solaris and Gentoo/FreeBSD), or with any type of change to the interface (vanilla FreeBSD).

Again, the idea is that each time you might have a backward-incompatible change you get a different soname so that the loader can’t mix and match different interfaces. When you don’t guarantee any ABI stability between versions, usually for internal libraries, like GNU binutils do for libbfd, you just put the package name in the library’s basename rather than soversion, and set the soversion to all zeros, so you get stuff like libbfd-2.20.so.0.0.0. This way you’re sure that, whether you change interfaces or not, an upgrade of your package won’t break others’ software. Of course it should also be enough for people to understand that it should not be used at all since it’s not guaranteed to be stable.

Next step is going to describe the symbol versioning technique to reduce the amount of backward-incompatible changes, to keep the same ABI available until it really has to go.