More details about symbol collisions

You might remember that I was quite upset by the amount of packages that block each other when they install the same files (or files with the same name). It might not be obvious – and actually it’s totally non-obvious – why I would insist on these to be solved whenever possible, and why I’m still wishing we had a proper way to switch around alternative implementations without using blockers, so I guess I’ll explain it here, while I also discuss again the problem of symbol collisions, which is also one very nebulous problem.

I have to say, first, that the same problem with blockers is present also with conflicting USE requirements; the bottom-line is the same: there are packages that you cannot have merged at the same time, and that’s a problem for me.

The problem can probably be solved just as well by changing the script I use for gathering ELF symbol information to check for collisions, but since that requires quite a bit for work just to work around this trouble, I’m fine with accepting a subset of packages, and ranting about the fact that we have no way (yet) to solve the problem of colliding packages, and incompatible USE requirements.

The script, basically, roams the filesystem to gather all the symbols that are exported by various ELF files, both executables and libraries, saves them into a PostgreSQL database, and then a separate script combines that data to generate a list of possibly colliding symbols. The whole point of this script is to find possible hard-to-diagnose bugs due to unwanted symbol interposing: an internal function or datum replaced with another from another ELF object (shared library or executable) that use the same name.

This kind of work is, as you can guess by now, hindered by the inability of keeping all the packages merged at once because then I cannot compare the symbols between them, and at the same time, it’s also hindered by those bad packages that install ELF files in uncommon and wrong paths (like /usr/share) — running the script over the whole filesystem would solve that problem, but the required amount of time to be wasted to run it is definitely going to be a problem.

On a different note, I have to point out that not all the cases where two objects export the same symbol are mistakes. Sometimes you’re willingly interposing symbols, like we do with the sandbox and google-perftools do with malloc and mmap; some other times, you’re implementing a specific interface, which might be that of a plugin but might also be required by a library, like the libwrap interface for allow/deny symbols. In some cases, plugins will not interfere one with the other because they get loaded with the RTLD_LOCAL option that tells the runtime loader that the symbols are not to be exported.

This makes the whole work pretty cumbersome: some packages will have to be ignored altogether, others will require specific rules, others will have to be fixed upstream, and some are actually issues with bundled libraries. The whole work is long, tedious and will not bring huge improvements all around, but it’s a nice polishing action that any upstream should really consider to show that they do care about the quality of their code.

But if I were to expect all of them to start actually making use of hidden symbols and the like, I’d probably be waiting forever. For now, I guess I’ll have to start with making use of these ideas on the projects I contribute to, like lscube. It might end up getting used more often.

Should only libraries be concerned about PIC?

Position Independent Code is a technique used to create executable code that, as the name implies, is independent from the starting address where it is loaded (the position). This means that the pointers to data and functions in the code, as well as in the default value of pointers cannot be assumed to be always the same as the ones set after the build process in the executable file (or the library).

What this means in practical terms is that, as you can’t be sure how many and which libraries a program might load at runtime, libraries are usually loaded at dynamically-assigned addresses, thus the code need not to statically use one value as base address. When a shared library is loaded with a static base address (thus not using PIC), it has to be relocated by the runtime loader, and that causes changes to the .text section, which breaks the assumption that sections should be either writable or executable, but not both at the same time.

When using PIC, instead, the access to symbols (data and functions) is maintained by a global offset table (GOT), so the code does not need to be relocated, only the GOT and the pointers stored in the data sections. As you can guess, this kind of indirect access takes more time than the direct access that non-PIC code uses, and this is why a lot of people hate the use of PIC in x86 systems (on the other hand, shared libraries not using PIC not only breaks the security assumption noted above, making it impossible to use mitigation technologies like NX – PaX in Linux, W^X in OpenBSD – it also increase the memory usage of software as all the .text sections containing code will need to be relocated and thus duplicated by the copy-on-write).

Using hidden visibility is possible to reduce the performance hit caused by GOT access, by using PC-relative addressing (relative to the position on the file), if the architecture supports them of course. It does not save much for what concerns pointers in the data sections, as they will still need relocations. This is what causes arrays of strings to be written in .data.rel.ro sections rather than .rodata sections: the former gets relocated, the latter doesn’t, so is always shared.

So this covers shared libraries, right? Copy-on-write on shared libraries are bad, shared-libraries use PIC, pointers in data section on PIC code cause copy-on-write. But does it stop with shared libraries?

One often useful security mitigation factor is random base address choice for executables: instead of loading the code always at the same address, it is randomised between different executions. This is useful because an attacker can’t just start guessing at which address the program will be loaded. But applying this technique to non-PIC code will cause relocations in the .text section, which in turn will break another security mitigation technique, so is not really a good idea.

Introducing PIE (Position Independent Executable).

PIE is not really anything new: it only means that even executables are built with PIC enabled. This means that while arrays of pointer to characters are often considered fine for executables, and are written to .rodata (if properly declared) for non-PIC code, the problem with them reappears when using PIE.

It’s not much of a concern usually because the people using PIE are usually concerned with security more than performance (after all it is slower than using non-PIC code), but I think it’s important for software correctness to actually start considering that an issue too.

Under this light, not only it is important to replace pointers to characters with characters array (and similar), but hiding the symbols for executables become even more important to reduce the hit caused by PIC.

I’m actually tempted to waste some performance in my next box and start using PIE all over just to find this kind of problems more easily… yeah I’m masochist.

Why would an executable export symbols?

I think this post might be interesting for those people interested in trying to get all the performance power out of a box, without breaking anything in the process.

I’ve blogged before about the problems related to exporting symbols from final executables, but I haven’t really dug deep enough to actually provide useful information to developers and users about what those exported symbols represent, for an executable.

First of all, let’s start to say that an executable under Linux, and on most modern Unixes, is just the same kind of file of a shared object (shared library, if you prefer, those which are called DLL under Windows, if you come from a Windows background). And exactly as shared libraries, they can export symbols.

Exported symbols are resolved by the dynamic – runtime – linker, through the process of dynamic bindings, and thy might collide. I’ll return on the way the runtime linker works at a different moment. For now let’s just say that the exported symbols require some extra step to be taken during the execution of a program, and that the process takes time.

Executables don’t usually need to export symbols, and they usually don’t export symbols at all. Although there are rare cases where executables are required to export symbols, for instance because they are used by some of the libraries they link to as a “callback” from the library to the program, or for C++ programs for RTTI to properly work, most of the times the symbols are exported just because of libtool.

By default when you link a program, it doesn’t get its symbols exported, they are hidden and thus resolved directly at buildtime, for those symbols present in the source files themselves. When you add code to a convenience library that is built with libtool, then something changes, and the symbols defined inside that library are exported even when linking it statically inside the final executable.

This causes quite a few drawbacks. and as I said, is not usually used for anything:

  • the symbols are resolved at runtime through dynamic binding, which takes time, even if usually very little for a normal system, repeated time wasted during dynamic binding might actually become a good deal of time;

  • the symbols might collide with a library loaded afterward, this is for instance why recode breaks PHP;

  • using --gc-sections won’t help much because exported symbols are seen as always used, and this might increase the amount of code added to the executable with no good reason;

  • prelink will likely set the wrong addresses for symbols that collide, which in turn will drop off the improvement of using prelink entirely, at least for some packages.

The easy solution for this is for software packages to actually check if the compiler supports hidden visibility, and if it does, hide all the symbols but for the ones in the public API of their libraries. In case of software like cmake that install no shared objects, hidden visibility could be forced by the ebuild, but to give back to the community, the best thing is to get as much software as possible to use hidden visibility, thus reducing the amount of symbols that gets exported on both binaries and shared libraries.

I hope these few notes might actually help Gentoo maintainers to understand why I’m stressing on this point. It would be nice if we all could improve the software we maintain, even one step at a time.

As for what concerns my linking collision scripts, the bad packages’ list got a few more entries today: KViewShell with djvulibre, Karbon with gdk-pixbuf, and gcj, with both boehm-gc and libltdl.

And now I can actually start seeing the true collisions, like gfree symbol, having two different definitions in libgunicode.so (fontforge) and poppler/libkpdfpart (xpdf code), or the scan_token symbol in ghostscript with a completely different definition in libXfont/libt1.

Talking about libXfont and libt1 (or t1lib). I wonder if there is hope in the future for one to use the other, rather than both use the same parser code for type1 fonts. I’ll have to check the FreeDesktop bugzilla tomorrow to see if it was ever discussed. At the moment they duplicate a lot of symbols one with the other.

I have to say, PostgreSQL is an important speed improvement, that will allow me to complete my task in shorter time. Now I’m waiting for Patrick to run my script over the whole set of packages in Gentoo, that might actually be something. If only there was an easy way to make building and testing code faster (for COW reduction) without changing hardware, that would be awesome. Unfortunately that I need to do locally :(

Hidden visibility, programs and ebuilds

One thing that might be interested to start doing in Gentoo to improve the users’ experience is to start playing with hidden visibility.

While I’ve already written before that hidden visibility is risky for most libraries unless upstream predisposed them to be used with hidden visibility (xine, most recent GNU libraries excluding glibc), it is usually a no-problem for final executables, as they aren’t supposed to export symbols at all, most of the times at least (there are some cases where binaries export symbols that they want to override from other libraries, like the C library or stuff like that).

Users should not play with -fvisibility=hidden flag as that might break badly; but we developers should know better. It woudl be nice to check if we’re using GCC 4.0 or later (you don’t want to use hidden visibility with older compilers), and then append the flag, for software we know is compatible with this. This software usually includes final executables (no shared objects) and static libraries (which usually should be linked without exporting the symbols, as otherwise you might end up breaking stuff badly, if two shared objects link two different versions of the same static library).

By using hidden visibility we should be able to reduce the amount of possible symbols’ collisions, we might be able to reduce the size of the applications, and finally we reduce startup, as the binary’s symbols are not resolved at runtime by the dynamic linker.

This combined with --gc-sections should also mitigate the problems caused by mishandled commodity libraries. The sad note is that using this my script won’t be able to identify the two problems anymore. I’ll have to upgrade it to use dwarf data one day or another.

On a few technical notes, I’ll have to rewrite the script not to sue sqlite but PostgreSQL instead; SQLite is way too slow and seems to waste too much space even handling my quite reduced set of 1006 packages, I don’t want to know what might happen to Patrick’s box if he runs it over the extracted binpkgs of his tinderbox. This will use up some of my free time during the week, but having it ran on a huge amount of packages might really help finding more cases like the ones I already documented and opened a bug for.

Considering that this is eating up the little free time I have at the moment (I can’t remember last time I had time to lay on the bed watching some good anime), I’d quite like at least some feedback from users, not only from developers :) – especially considered the amount of ranting we have witnessed in the last weeks about developers not talking to users enough… I always ask for feedback but rarely get any!

Also my Sony Reader should arrive soon, so I’m considering the idea of buying something from O’Reilly to start looking at things I don’t know yet about; if somebody read those before, I’d gladly take suggestions: Linux System Programming by Robert Love (well, not an area where I know nothing, but it might be interesting to see if Love has any suggestion there that we don’t widely apply yet), Postfix: The Definitive Guide (I know nothing of postfix, so this seems a decent start, and I want to know about it as I’m using it on this server for the xine bug tracker), and CJKV Information Processing (I don’t speak any of those languages, unfortunately, but I find those problems interesting, as they reflect on a smaller scale for Italian and most of european languages too).

Sometimes I think I’m wasting my time

Let’s start with a service communication: this morning I was able to order the V1075 phone on Vodafone’s e-Shop, it’s currently traveling to my home, yuppie :)

Now, on a more related note, this is half a rant, so I already say I’m sorry, but I feel this way tonight and I want to express it at my best.

As I told before, I was working for a while now on getting visibility support in FFmpeg, which allows to reduce the number of exported symbols, resulting in a faster loadtime for xine and similar.

Unfortunately, I hit a big obstacle in submitting these patches upstream, as many of the FFmpeg developers seems convinced that it’s just “GNU crap” that was invented to corrupt ELF format and so on. Well, a part the fact that ELF is designed to be extensible, so I think that everyone who tries to improve it is in the right, I don’t think visibility is crap (well, its implementation on GCC < 4.1 was crap, but not the theory itself).

Another “interesting” argument was that dynamic linking is slow by itself so it shouldn’t be improved and instead static linking should always be preferred. Let me discuss this on a security and simplicity standpoint: think of having to rebuild xine-lib, VLC, alsa-plugins, mt-daapd and any other program linking to FFmpeg every time that a security issue is discovered (that happens quite often lately) or you want to change an useflag of FFmpeg… now you can see where it’s useful to have dynamic linking. It’s slower, okay, but it’s more practical.

Anyway, the bottom line is that I’m asked to explain and defend my choice of working on those patches to see if they are worth merging or not, because they seems to be too invasive (noted, I said they were, but their maintainership is not invasive at all). As benchmarking on timing is quite pointless (xine startup is slow by its own, as I “already described”:, and bindings are done lazily which means that you need to count the time spent during playback, for instance), I’ve written a simple script that parses the output of LD_DEBUG=bindings and gives you statistics about the bindings themselves. It’s not yet complete, I’ll try to complete it soon and release it.

My mail was basically this:

So, as people seems to want some concrete results produced by the visibility
patches I’m working on, I drafted up a parser for the LD_DEBUG=bindings
output, and counted the results (I’ll make the script available in the next
days, give me time to make it something les specific than what I’ve hacked up
in 30 minutes of ruby hacking today).

The output comes out of a xine-ui playing an AVI with ISO MPEG-4 video stream,
and MP3 audio stream (both decoded with FFmpeg plugin). xine-lib CVS HEAD
with visibility enabled in that, too (and external FFmpeg from SVN of
course).

Without visibility patch:

libavcodec required 625 symbols (591 in itself)
libavutil required 10 symbols (3 in itself)
libpostproc required 9 symbols (none in itself)

libavcodec resolved 702 symbols (591 from itself)
libavutil resolved 13 symbols (3 from itself)
libpostproc resolved 3 symbols (none from itself)

-rwxr-xr-x 1 root root 2605648 22 set 19:32 /usr/lib/libavcodec.so
-rwxr-xr-x 1 root root  438528 22 set 19:32 /usr/lib/libavformat.so
-rwxr-xr-x 1 root root   17576 22 set 19:32 /usr/lib/libavutil.so
-rwxr-xr-x 1 root root   32368 22 set 19:32 /usr/lib/libpostproc.so

With visibility patch:

libavcodec required 105 symbols (71 in itself)
libavutil required 7 symbols (none in itself)
libpostproc required 10 symbols (none in itself)

libavcodec resolved 182 symbols (71 from itself)
libavutil resolved 10 symbols (none from itself)
libpostproc resolved 5 symbols (none from itself)

-rwxr-xr-x 1 root root 2558832 22 set 20:56 /usr/lib/libavcodec.so
-rwxr-xr-x 1 root root  426240 22 set 20:56 /usr/lib/libavformat.so
-rwxr-xr-x 1 root root   16984 22 set 20:56 /usr/lib/libavutil.so
-rwxr-xr-x 1 root root   32368 22 set 20:56 /usr/lib/libpostproc.so

I’m not sure why libpostproc required different amount of symbols, different
runs gives me values between 3 and 6, probably depends by which postproc xine
is trying to do depending on system load. The numbers for libavutil and
libavcodec are otherwise constants.

It is a good improvement in the amount of bindings required, and please
consider that xine does not use libavformat and I tried only with one video
per time; the advantage increase when you play more videos of different
formats.
The size difference is minimal, but consider that I built it with -Os on
itself, and that this isn’t C++ where the visibility improves drastically the
size of DSOs.

It seems pretty straightforward to me. The decreased amount of bindings is significant, and the result is appreciable on the time need to start the first track in Amarok. Unfortunately, it seems like FFmpeg upstream cares nothing about this. I think the line I should read from the mailing list is “If it does not involve ffmpeg(1) proper or mplayer, patches are not welcome”… so I’ll probably follow the suggestion of Derk-Jan Hartman (of Videolan) and simply send and forget, if they apply the patch good, if they don’t, I couldn’t care less.

So, trying to preparing the sums of this experience: I’ve started four days ago with a whole night working on FFmpeg’s code, patching and fixing; I’ve spent the following days working on improving the patch together with Luca trying to make it less invasive, and last two days I worked on proving my point, but I’m asked to prove it in a way that’s just unfeasible (it’s almost impossible to get benchmark results of the time spent on bindings, because it’s spread during the file reproduction).

This bluntly results in 4 days totally wasted, 4 days of useless unpaid work, that I could have probably spent better on doing something else. I don’t think I’ll try to submit a patch to FFmpeg ever again now.

Follow up on the FFmpeg insomniac work

So, as I said last night I was working once again on getting hidden visibility support in FFmpeg. I worked on this all the night and the morning till 9am, and gone to sleep at about 11 am.

The result of my night of hacking around is a huge patch (117 KiB), that exports as protected all the symbols of the public API, minus the ones it needs to take the address of (that needs to have default visibility), and hides all the rest. The result is quite good: the bindings for libav* libraries on a “false start” of picoxine (asking it to read a non-existant file) were reduced from 654 to 183 (they were 174 with a previous patch, but I was then asked to avoid duplicating some tables, so there are 9 more bindings, but they are protected bindings, which means they won’t take too much time (they’ll be resolved inside the library itself, not outside).

With this change, also the output of callgrind shows a sensible (even if very small) reduction in the time spent looking up symbols. It’s a step forward, even if too small to be satisfactory.

Unfortunately there are two main obstacles to being able to reduce drastically that time: the first is that is unlikely that my patches will be accepted by FFmpeg developers, as they are a bit invasive when you actually add the visibility, and there seems to be a few developers who disagree on -fvisibility= sanity (I can understand, on GCC < 4.1 it was badly broken, and even on 4.1 there are a few cases where you can add a text relocation by messing up with #pragma directives); the second is that I cannot start reworking the plugins loading in xine on the CVS HEAD (as 1.1.3 might not be that far away) and branching/merging with CVS is a pain (not saying that Subversion is all that well when merging; it might be simpler to branch, but merging is still a pain), so that will depend on the move to GIT I’m trying to push for, unfortunately seems we’re hard to reach a consensus on whether to move the repository and to what… the only choice I do not like at all is Bazaar-NG over launchpad, and using their bug tracking systems: it might be as advanced as they want, as featured as they want, I do not intend to rely on proprietary Canonical software!

Playing with visibility, once more

Okay, so I decided that now that GCC supports visibility almost decently (although there are a few ways to f* it up with #pragmas), it’s a good moment to start playing with it a bit more to have a bit of improvements here and there.

My first step was obvious, considering what I work on mostly: making ffmpeg respect it :)

The patch was non trivial, also because #pragmas are broken, but it’s done and I sent it to lu_zero so he can validate it. A part the performance improvement, it also helps developer to know what they are leaving available to other software. Luckily xine worked fine with it without problems, but VLC needed a patch because it was relying on two internal symbols. The patch now is in patchset 21 in portage.

Update (2017-04-22): there used to be a reference to my overlay for the patch here. The patch is gone and so has my overlay, so I removed the whole section.

I also decided to get a look at adding support to flac for the visibility, luckily this was really simple as upstream already handles visibility for Windows DLL, so I just needed to add a few logics and remove a test that is broken (relies on internal symbols), and now it works, yuhuu… there’s not a huge reduction of symbols, but it’s still something.

I’ll send it upstream now and add a new revision to portage later.

Unfortunately my hacking at this is limited to the free time I have while I think of what preparing for the course, until after the course (and even after if I find another job to work on), so don’t count on much improvement about this fin a short term.

My next target is libmodplug, just because it’s C++ and the symbols reduction might be drastic. being not a wrapper like libFLAC++ but a full featured C++ library, but it also means the work is harder.

Well, time to work, as the night was a bit… sad from a side :/ oh well.

Small and big updates

Okay time for some updates, some of them are small, some a bit bigger.

First of all, I decided the time came to give a try to KDE’s hidden visibility support again. I fought with it a lot in the past, and my readers since then should remember it :)

But many things changed since that, GCC 4.1 is now released, support is official and (mostly) stable for C++ as well as C, and KDE had enough time to fix the missing symbols during this year, so I decided to start working to allow choice to enable it for testing without messing around too much. Wasn’t that good the fact that I found another bug with GCC 4.1.1_pre with #pragma GCC visibility…

So here it is the first update: kde eclass now has supports for a kdehiddenvisibility useflag, that can be enabled with GCC 4.1 and will enable the hidden symbols in KDE. Note that before to enable this flag you need to have GCC 4.1 selected and you need to rebuild Qt 3.3.6-r1 with the visibility patch (always with GCC 4.1).

The reason for which this is an useflag is not easy to explain, but trust me, right now you don’t want to use other ways to reach this. First of all, you cannot pass -fvisibility=hidden to KDE without informing it that you want hidden symbols, or it won’t export the symbols at all and you’ll have useless libraries. Then you can’t enable fvisibility=hidden for every package, as you’ll end up with many broken libraries as they won’t export the symbols anymore (really just a few packages are ready for that if you want to improve the situation, remember to send the patches upstream, not to us directly). Also, I don’t trust that support enough to just enable it for everybody, not just yet at least.

This situation might change in the future, if hidden visibility is proving stable, so that it will always be enabled if GCC 4.1 is found, but this is not yet the time, so please don’t push on this just yet.

The second update is related to Gentoo/FreeBSD. Thanks to Steve’s KVM, I was able to get defiant up and running, and it’s now going to be my X11 and related testing box, with distcc over the other boxes to make compile process take less time. At this point, I found some time yesterday between pauses with work to test Quake3 as asked me by arachnist, and also xine-ui to confirm that it worked well, now you’ll see more games and multimedia packages marked ~x86-fbsd :)

I also added a few packages to the tree to complete the FreeBSD support, like eject-bsd, bsnmp, smbfs and getopt-long. Some of them are packages extrapolated from base sources, so that they can be used with other ports, too (bsnmp should work on Linux too if I’m able to complete the port), others are external but specific to FreeBSD or to BSD userland in general. In particular getopt-long was added by coincidence as it is needed by xmlto that is a new dependency of opensp.

Now, considering I started adding specific packages to the main tree, that I’m keywording games & multimedia… I can think that Gentoo/FreeBSD is proceeding well :)

Follow my next entry for some thoughts on Google’s Summer of Code ;)

Update: wrong revision, qt is just 3.3.6-r1 :)