Shared libraries worth their while

This is, strictly speaking, a non Gentoo-related post; on the other hand, I’m going to introduce here a few concepts that I’ll use in a future post to explain one Gentoo-specific warning, so I’ll consider this a prerequisite. Sorry if you feel like Planet Gentoo should never go over the technical non-Gentoo work, but then again, you’re free to ignore me.

I have, in the past, written about the need to handle shared code in packages that install multiple binaries (real binaries, not scripts!) to perform various tasks, which end up sharing most of their code. Doing the naïve thing, using the source code in all of them, or the slightly less naïve thing, building a static library and linking it to all the binaries, tend to increase the size of the commands on disk, and the memory required to fully load them in memory. In my previous post I noted a particularly nasty problem with the smbpasswd binary, that was almost twice the size because of unused code injected by the static convenience library (and probably even more, given that I never went down for to hide the symbols and clean them up).

In another post, I also proposed the use of multicall binaries to handle these situations; the idea behind multicall binaries is that you end up with a single program, with multiple “applets”; all the code is merged into a single ELF binary object, and at runtime the correct path is taken to call the right applet, depending on the name used to call up the binary. It’s not extremely easy but not even impossible to get right, so I still suggest that as main alternative to handle shared code, when the shared code is bigger in size than the single applet’s code.

This does not solve the Samba situation though: the code of the single utilities is still big enough than having a single super-package will not make it very manageable, and a different solution has to be devised. In this case you end up having to choose between the static linking (naïve approach) or using a private, shared object. An easy way out here is trying to be sophisticated, and always go with the shared object approach; it definitely might not be the best option.

Let me be clear here: shared objects are not a panacea to the shared code problems. As you might have heard already, using shared objects is generally a compromise: you relax problems related to bugs and security vulnerability, by using a shared object, so that you don’t have to rebuild all the software using that code — and most of the times you also share read-only memory to reduce the memory consumption of a system — at the expense of load time (the loader has to do much more work), sometime execution speed (PIC takes its toll), and sometimes memory usage, as counter-intuitive as that might sound, given that I just said that they reduce memory consumption.

While the load time and execution speed tolls are pretty much immediate to understand, and you can find a lot of documentation about them on the net, it’s less obvious to understand the share-memory, waste-memory situation. I wrote extensively about the Copy-on-Write problem so if you follow my blog regularly you might have guessed the problem already at this point, but it does not fill in all the gaps yet, so let me try to explain how this compromise works.

When we use ELF objects, part of the binary file itself are shared in memory across different processes (homogeneous or heterogeneous). This means that only those parts that would not be modified from the ELF files can be shared. This usually includes the executable code – text – for standard executables (and most code compiled with PIC support for shared objects, which is what we’re going to assume), and part (most) of the read-only data. In all cases, what breaks the share for us is Copy-on-Write, as that will create private copies of the pages to the single process, which is why writeable data is nothing we care about when choosing the code-sharing strategy (it’ll mostly be the same whether you link it statically or via shared objects — there are corner cases, but I won’t dig into them right now).

What is that talking about homogeneous or heterogeneous processes above? Well, it’s a mistake to think that the only memory that is shared in the system is due to shared objects: read-only text and data for an ELF executable file are shared among processes spawned from the same file (what I called and will call homogeneous processes). What shared object accomplish with memory is sharing between processes spawned by different executables, but loading the same shared objects (heterogeneous processes). The KSM implementation (no it’s not KMS!) in the current versions of the Linux kernel allows for something similar, but it’s a story so long that I won’t really bother count it in.

Again, the first approach to shared objects might make you think that moving whatever amount of memory from being shared between homogeneous processes to be shared between heterogeneous processes is a win-win situation. Unfortunately you have to cope with data relocations (which is a topic I wrote about extensively): a constant pointer is read-only when the code is always loaded at a given address (as it happens with most standard ELF executables), but it’s not when the code can be loaded at an arbitrary address (as it happens with shared objects): in the latter case it’ll end up in the relocated data section, which follows the same destiny as the writeable data section: it’s always private to the single process!

*Note about relocated data: in truth you could ensure that the data relocation is the same among different processes, by using either prelinking (which is not perfect especially with modern software, which is more and more plugin-based), or methods like KDE’s kdeinit preloading. In reality, this is not really something you could, or should, rely upon because it also goes against the strengthening of security applied by Address Space Layout Randomization.*

So when you move shared code from static linking to shared objects, you have to weight in the two factors: how much code will be left untouched by the process, and how much will be relocated? The size utility from either elfutils or binutils will not help you here, as it does not tell you how big the relocated data section is. My ruby-elf instead has an rbelf-size script that gives you the size of .data.rel.ro (another point here: you only care about the increment in size of .data.rel.ro as that’s the one that is added as private: .data.rel would be part of the writeable data anyway). You can see it in action here:

flame@yamato ruby-elf % ruby -I lib tools/rbelf-size.rb /lib/libc.so.6
     exec      data    rodata     relro       bss     total filename
   960241      4507    359020     12992     19104   1355864 /lib/libc.so.6

As you can see from this output, the C library has some 950K of executable code, 350K of read-only data (both will be shared among heterogeneous processes) and just 13K (top) of additional relocated memory, compared to static linking. _Note: the rodata value does not only include .rodata but all the read-only non-executable sections; the value of exec and rodata roughly corresponds of what size calls text).

So how is knowing how much relocated data useful in assessing how to deal with shared code? Well, if you build your shared code as shared object and analyse it with this method (hint: I just implemented rbelf-size -r to reduce the columns to the three types of memory we have in front of us), you’ll have a rough idea of how much gain and how much waste you’ll have for what concern memory: the higher the shared-to-relocated ratio, the better results you’ll have. Having an infinite ratio (when there is no relocated data) it’s the perfection.

Of course the next question is what do you do if you have a low ratio? Well there isn’t really a correct answer here: you might decide to bite the bullet and go in the code to improve the ratio; cowstats from the Ruby-Elf suite helps you to do just that; it can actually help you reducing your private sections as well, as many times you have mistake in there, due to missing const declarations. If you have already done your best to reduce the relocations, then your only chance left is to avoid using a library altogether; if you’re not going to improve your memory usage by using a library, and it’s something internal only, then you really should look into using either static linking or, even better, multicall binaries.

Impootant Notes of Caution

While I’m trying to go further on the topic of shared objects than most documentation I have read myself on the argument, I have to point out that I’m still generalising a lot! While the general concept are as I put them down here, there are some specific situations that change the table making it much more complex: text relocations, position independent executables, PIC overhead, are just some of the problems that might arise while trying to apply these general ideas over specific situations.

Still trying not to dig too deep on the topic right away, I’d like to spend a few words about the PIE problem, which I have already described and noted in the blog: when you use Position Independent Executables (which is usually done to make good use of the ASLR technique), you can discard the whole check of relocated code: almost always you’ll have good results if you use shared objects (minus complications added by the overlinking, of course). You still would have the best results by using multicall binaries if the commands have very little code.

Also, please remember that using shared objects slows down the loading process which means that if you have a number of fire-and-forget commands, which is something not too unusual in the UNIX-like environments, you will probably have best results with multicall, or static linking, than with shared objects. The shared memory is also something that you’ll probably get to ignore in that case, as it’s only worth its while if you normally keep the processes running for a relatively long time (and thus loaded into memory).

Finally, all I said refers to internal libraries used for sharing code among commands of the same package: while most of the same notes about performance-wise up- and down-sides holds true for all kind of shared objects, you have to factor in the security and stability problems when you deal with third-party (or third-party-available) libraries — even if you develop them yourself and ship them with your package: they’ll still be used by many other projects so you’ll have to handle them with much more care, and they should really be shared.

One thought on “Shared libraries worth their while

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s