The why and how of RPATH

This post is brought to you by a conversation with Fabio, which actually reminded me of an older conversation I had with someone else (exactly whom, right now, escapes me) about the ability to inject RPATH values into already-built binaries. I’m sorry to have forgotten who asked me about that, I hope he or she won’t take it bad.

But before I go to the core of the problem, let me try to give a brief introduction of what we’re talking about here, because jumping straight to talk about injection is going to be too much for most of my readers, I’m sure. Even though, this whole topic reconnects with multiple other smaller topics I discussed about in the past on my blog, so you’ll see a few links here and there for that.

First of all, what the heck is RPATH? When using dynamic linking (shared libraries), the operating system need to know where to look for the libraries an executable uses; to do so, each operating system has one or more PATH variables, plus eventual configuration files, that are used to look up the libraries; on Windows, for instance, the same PATH variable that is used to find the commands is used to load the libraries; and the libraries are looked for in the same directory where the executable is first of all. On Unix, the commands and the libraries use distinct paths, by design, and the executable’s directory is not searched for; this is also because the two directories are quite distinct (/usr/bin vs /usr/lib as an example). The GNU/Linux loader (and here the name is proper, as the loader comes out of GLIBC — almost identical behaviour is expected by uClibc, FreeBSD and other OSes but I know that much for sure) differs extensively from the Sun loader; I say this because I’ll introduce the Sun system later.

In your average GNU/Linux system, including Gentoo, the paths where to look up the libraries in are defined in the /etc/ file; prepended to that list, the LD_LIBRARY_PATH variable is used. (There is a second variable, LDPATH LIBRARY_PATH, that tells the gcc frotnend to the linker where to look for libraries to link to at build time, rather than to load — update: thanks to Fabio who pointed me I got the wrong variable; LDPATH is used by the env-update script to set the proper search paths in the file ). All the executables will look in all these paths for both the libraries they link to directly and for non-absolute dlopen() calls. But what happens with private libraries — libraries that are shared only among a small number of executables coming from the same package?

The obvious choice is to install them normally in the same path as the general libraries; this does not require playing with the search paths at all, but it causes two problems: the build-time linker will still find them during link time, and it might not be what you want and it increases the number of files present in the single directory (which means accessing its content slows down, little by little). The common alternative approach is installing it in a sub-directory that is specific to the package (automake already provides a pkglib installation class for this type of libraries). I already discussed and advocated this solution so that internal libraries are not “shown” to the rest of the software on the system.

Of course, adding the internal library directories to the global search path also means slowing down the libraries’ load, as more directories are being searched when looking for the libraries. To solve this issue, runpath first and rpath now is used. The DT_RPATH is a .dynamic attribute that provides further search paths for the object it is listed in (it can be used both for executables and for shared objects); the list is inserted in-between the LD_LIBRARY_PATH environment variable and the search libraries from the /etc/ file. In that paths, there are two special cases: $ORIGIN is expanded to be the path of the directory where the object is found, while both empty and . values in the list are meant to refer to the current work directory (so-called insecure rpath).

Now, while rpath is not a panacea and also slightly slow down the load of the executable, it should have decent effects, especially by not requiring further symlinks to switch among almost-equivalent libraries that don’t have ABIs that stable. They also get very useful when you want to build a software you’re developing so that it loads your special libraries rather than the system one, without relying on wrapper scripts like libtool does.

To create an RPATH entry, you simply tell it to the linker (not the compiler!), so for instance you could add to your ./configure call the string LDFLAGS=-Wl,-rpath,/my/software/base/my/lib/.libs to build and run against a specific version of your library. But what about an already-linked binary? The idea of using RPATHs make it also for a nicer handling of binary packages and their dependencies, so there is an obvious advantage in having the RPATH editable after the final link took place… unfortunately this isn’t as easy. While there is a tool called chrpath that allows you to change an already-present RPATH, and especially to delete one (it comes handy to resolve insecure rpath problems), it has two limitations: the new RPATH cannot be longer than the previous one, and you cannot add a new one from scratch.

The reason is that the .dynamic entries are fixed-sized; you can remove an RPATH by setting its type to NULL, so that the dynamic loader can skip over it; you can edit an already-present RPATH by changing the string it points to in the string table, but you cannot extend neither .dynamic nor the string table itself. This reduces the usefulness of RPATH for binary package to almost nothing. Is there anything we can do to improve on this? Well, yes.

Ali Bahrami at Sun already solved this problem in June 2007, which means just over three years ago. They implemented it with a very simple trick that could be implemented totally in the build-time linker, without having to change even just a bit of the dynamic loader: they added padding!

The only new definition they had to add was DT_SUNW_STRPAD, a new entry in the .dynamic table that gives you the size of the padding space at the end of the string table; together with that, they added a number of extra DT_NULL entries in the same .dynamic table. Since all the entries in .dynamic are fixed sized, a DT_NULL can become a DT_RPATH without a problem. Even if some broken software might expect the DT_NULL at the end of the table (which would be wrong anyway), you just need to keep them at the end. All the ELF software should ignore the .dynamic entries they don’t understand, as long as they are in the reserved specific ranges, at least.

Unfortunately, as far as I know, there is no implementation of this idea in the GNU toolchain (GNU binutils is where ld is). It shouldn’t be hard to implement, as I said; it’s just a matter of emitting a defined amount of zeros at the end of a string table, and add a new .dynamic tag with its size… the same rpath command from OpenSolaris would probably be usable on Linux after that. I considered porting this myself, but I have no direct need for it; if you develop proprietary software for Linux, or need this feature for deployment, though, you can contact me for a quote on implementing it. Or you can do it yourself or find someone else.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s