Future planning for Ruby-Elf

My work on Ruby-Elf tends to happen in “sprees” whenever I actually need something from it that wasn’t supported before — I guess this is true for many projects out there, but it seems to happen pretty regularly for me with my projects. The other day I prepared a new release after fixing the bug I found while doing the postmortem of a libav patch — and then I proceeded giving another run to my usual collisions check after noting that I could improve the performance of the regular expressions …

But where is it directed, as it is? Well, I hope I’ll be able to have version 2.0 out before end of 2013 — in this version, I want to make sure I get full support for archives, so that I can actually analyze static archives without having to extract them beforehand. I’ve got a branch with the code to get access to the archives themselves, but it can only extract the file before actually being able to read it. The key in supporting archives would probably be supporting in-memory IO objects, as well as offset-in-file objects.

I’ve also found an interesting gem called bindata which seems to provide a decent way to decode binary data in Ruby without having to actually fully pre-decode it. This would probably be a killer for Ruby-Elf, as a lot of the time I’m forcibly decoding everything because it was extremely difficult to access it on the spot — so the first big change for Ruby-Elf 2 is going to be to drop down the task of decoding to bindata (or, otherwise, another similar gem).

Another change that I plan is to drop the current version of the man pages. While DocBook is a decent way to deal with man pages, and standard enough to be around in most distributions, it’s one “strange” dependency for a Ruby package — and honestly the XML is a bit too verbose sometimes. For the most horsey beefy man pages, the generated roff page is half as big as the source, which is the other way around from what anybody would expect them.

So I’m quite decided that the next version of Ruby-Elf will use Markdown for the man pages — while it does not have the same amount of semantic tagging, and thus I might have to handle some styling in the synopsis manually, using something like md2man should be easy (I’m not going to use ronn because of the old issue with JRuby and rdiscount) and at the same time, it gives me a public HTML version for free, thanks to GitHub conversion.

Finally, I really hope that by Ruby-Elf 2 I’ll be able to get least the symbol demangler for the Itanium C++ ABI — that is the one used by modern GCC, yes, it was originally specified for the Itanic. Working toward supporting the full DWARF specification is something that is on the back of my mind but I’m not very convinced right now, because it’s huge. Also, if I were to implement it I would then have to rename the library to Dungeon.

Are -g options really safe?

Tonight feels like a night after a very long day. But it was just half a day spent on trying to find the end of a bug saga that started about a month ago for me.

It starts like this: postgresql-server started failing; the link editor – BFD-based ld – reported that one of the static archives installed by postgresql-base didn’t have a proper index, which should have been generated by ranlib. But simply executing ranlib on said file didn’t solve the problem.

I originally blamed the build system of PostgreSQL, but when yesterday I launched an emerge -e world to rebuild everything with GCC 4.6, another package failed in the same way: lvm2, linking to /usr/lib64/libudev.a — since I know the udev build system very well, almost like I wrote it myself, I trusted that the archive was built correctly, so it was time to look at what the real problem was.

After poking around a bit, I found that binutils’s nm, objdump and at that point even ld refused to display information for some relocated objects (ET_REL files). This would have made it very difficult to debug the issue if not for two things: first, eu-nm could see the file just fine, and second, my own home-cooked nm.rb tool that I wrote to test Ruby-Elf reported issues with the file — but without exploding.

flame@yamato mytmpfs % nm dlopen.o
nm: dlopen.o: Bad value
flame@yamato mytmpfs % eu-nm -B dlopen.o 
0000000000000000 n .LC0
                 U _GLOBAL_OFFSET_TABLE_
                 U __stack_chk_fail
                 U dlclose
                 U dlerror
                 U dlopen
00000000000001a0 T dlopen_LTX_get_vtable
                 U dlsym
                 U lt__error_string
                 U lt__set_last_error
                 U lt__zalloc
0000000000000000 t vl_exit
00000000000000a0 t vm_close
0000000000000100 t vm_open
0000000000000040 t vm_sym
0000000000000000 d vtable
0000000000000000 n wt.1af52e75450527ed
0000000000000000 n wt.2e36542536402b38
0000000000000000 n wt.32ec40f73319dfa8
0000000000000000 n wt.442ae951f162d46e
0000000000000000 n wt.90e079bbb773abcb
0000000000000000 n wt.ac43b6ac10ce5688

I don’t have the original output from my tool since I have since fixed it, but the issues were related, as you can guess from that output, to the various wt. symbols at the end of the list. Where do they come from? What does the ‘n’ symbol they are coded with mean? And why is BFD failing to deal with them? I set out to find those answers with, well, more than a hunch of what the problem would turn out to be.

So what are those symbols? Google doesn’t help at all here since searching for “wt”, even enclosed in double quotes, turns up only results for “weight”. Yes, I know it is a way to shorten that word, but what the heck, I’m looking for a certain string! The answer, actually is simple: they are additional debug symbols that are added by -gdwarf-4, which is used to include the latest DWARF format revision. This was implemented in GCC 4.6 and is supposed to reduce the size of the debug information, which is generally positive, and include more debug information.

Turns out that libbfd (the library that implements all the low level access for nm, ld and the other utilities) doesn’t like those symbols, not sure if it’s the sections they are defined on, their type (which is set to STT_NONE), or what else, but it doesn’t like them at all. Interestingly enough, this does not happen with final executables and dynamic libraries, which makes it at least bearable: only less then 40 packages had to be rebuilt on my system because they had broken static objects; unfortunately one of those was LibreOffice, d’oh!

Now, let’s look back at the nm issue though: when I started writing Ruby-Elf, I decided not to reimplement the whole suite of ELF tools, since there are already quite a few implementations of those out there. But I did write a nm tool to debug my own issues — it also worked quite nicely, because implementing access to the codes used by nm allowed me to use the same output in my elfgrep tool to show the results. This implementation, that was actually never ported to the tools framework I wrote for Ruby-Elf, didn’t get installed, and it was just part of the repository for my own good.

But after noticing that I’m more resilient than binutils’s version, and it produced more interesting output than elfutils’s, I decided to rework it and make it available as rbelf-nm, writing a man page, and listing the codes for the various symbol kinds. But before all this, I also rewrote the function of code choice. Before, it relied on binding types and then on section names to produce the right code; now it relies on the symbol type, binding types, and sections’ flags and type, making it as resilient as elfutils’s, and as informative as binutils’s, up to what I encountered right now.

I also released a new version (1.0.6.1, don’t ask!) that includes the new tool, and it is already on RubyGems and Portage if you wish to use it. Please remember that the project has a Flattr page so if you like the project, your support is definitely welcome.

Security considerations: scanning for bundled libraries

My fight against bundled libraries might soon transcend the implementation limits of my ruby-elf script .

The script I’ve been using to find the bundled libraries up to now was not designed with that in mind originally; the idea was to identify colliding symbols between different object files, so to identify failure cases like xine’s aac decoder hopefully before they become a nuisance to users like PHP. Unfortunately the amount of data generated by the script due to bundled libraries makes it tremendously difficult to deal with in advance, so it can currently only be used as a post-mortem.

But as a security tool, I already stated it’s not enough because it only checks for symbols that are exported by shared objects (and often mistakenly by executables). To actually go deeper, one would have to look at one of two options: the .symtab entries in the ELF files (which are stripped out before installing), or the data that the compiler emits for each output file in form of DWARF sections with -g flags. The former can be done with the same tools I’ve been using up to now, the latter you list with pfunct from dev-util/dwarves. Trust me, though, that if the current database of optimistically suppressed symbols is difficult to deal with, doing a search using dwarf functions is likely to be unmanageable, at least to handle with the same algorithm that I’m using for the collision detection script.

Being able to identify bundled libraries in that kind of output is going to be tremendously tricky; if my collision detection script already finds collision between executables like the one from MySQL (well, before Jorge’s fixes at least) and Samba packages, because they don’t use internally shared libraries, running it against the internal symbols list is going to be even worse because it would then find equally-named internal functions (usage() anybody), static libraries links (including system support libraries) and so on.

So there are little hopes to tackle the issue this way; which makes the idea of finding beforehand all the bundled libraries in a system an inhuman task; on the other hand that doesn’t mean I have to give up on the idea. We can still make use of that data to do some kind of post-mortem, once again, with some tricks though. When it comes to vulnerabilities, you usually have a function, or a series of function, that are involved; depending on the centrality of the functions in a library, there will be more or less applications using that vulnerable codepath; while it’s not extremely difficult to track them down when the function is a direct API (just look for software having external references to that symbol), it’s quite another story with internal convenience functions, since they are called indirectly. For this reason while some advisories do report the problematic symbols, most of the time the thing is just ignored.

We can, though, use that particular piece of information to track down extra vulnerable software, that bundles the code. I’ve been doing that on request for Robert a couple of times with the data produced by the collision detection scripts, but unfortunately it doesn’t help because it also is only able to check externally-defined API, just like a search for use would. How to solve the problem? Well, I could just not strip the files and just read the data from .symtab to see whether the function is defined, and this might actually be what I’m going to do soonish; unfortunately this creates a couple of issues that needs to be taken care of.

The first is that the debug data is not exactly small, the second is that the chroots volume is under RAID1 so the space is a concern; it’s already 100GB big and with just 10% of it free, if I am not to strip data, it’s going to require even more space; I can probably just split out some of the data of the volume in a chroots-throwable volume that I don’t have to keep on RAID1. If I split the debug data with the splitdebug feature, it would make it quite easy to deal with.

Unfortunately this brings me to the second problem, or rather the second set of problems: ruby-elf does not currently support the debuglink facilities, but that’s easy to implemente, after all it’s just a note section with the name of the debug file, the second is nastier and relates to the fact that the debuglink section created by portage lists the basename of the file with the debug information, which is basically the same name as the original with a .debug suffix. The reason why this is not just left to be intended is that if you look up the debuglink for libfoo.so you’ll see the real name might be libfoo.so.2.4.6.debug; on the other hand it’s far from trivial since it leaves something to be intended: the path to find the file into. By default all tools will be looking at the same path as the executable file, and prepend /usr/lib/debug to that. All well as long as there are no symlinks in the path, but if there are (like on multilib AMD64 systems), it starts to be a problem: accessing a shared object via /usr/lib/libfoo.so will try a read of /usr/lib/debug/usr/lib/libfoo.so.2.4.6.debug which will not exist (it would be /usr/lib/debug/usr/lib64/libfoo.so.2.4.6.debug). I have to track down and check if it’s feasible to use full canonicalised path for the debuglink; on the other hand that will assume that the root for the file is the same as the root of the system, which might not be the case. The third option is to use a debugroot-relative path, so that debuglink would look like usr/lib64/libfoo.so.2.4.6.debug; unfortunately I have no clue how gdb and company would take a debuglink like that, and I have to check it).

Problem does not stop here though; since packages collide one with the other when they try to install files with the same name (even when they are not really alternatives), I cannot rely to have all the packages installed in the tinderbox, which is actually making it even worse to analyse the symbol collisions dataset. So I should at least scan the data before merge on livefs is done, and load it in a database, indexed on a per-package per-slot basis, and then select the search that data to identify the problems. Not an easy or a quick solution.

Nor a complete one to be honest: the .symtab method will not show the symbols that are not emitted, like inlined functions; while we do want the unused symbols to be cut out, we still need static inlined functions names, since if a vulnerability is found there, it has to be found. I should check whether DWARF data is emitted for that at least but I wouldn’t be surprised if it wasn’t either. And also does not cope at all with renamed symbols, or copied code… So, still a long way before we actually can reassure users that all security issues are tackled down when found (and this does not limit to Gentoo, remember; Gentoo is the base tool I use to tackle the task, but the same problems involve basically every distribution out there).

Some more Gentoo activity

Even though I’m spending most of my time working on paid jobs, I’ve returned active in Gentoo, although I’m mostly doing testbuilding for --as-needed lately. Yamato, with its 8-core horsepower, is still building tree, I left it before going to my bedroom to relax a bit that it was finishing the game-arcade catgory. I’ve been committing the most trivial stuff (missing deps, broken patches, and stuff like that), and opening bugs for the rest. The result is that the “My Bugs” search on Gentoo’s bugzilla reports over one thousand bugs.

Also, tonight I spent the night fixing the patches currently in tree so that absolute paths are replaced by relative ones, since epatch now fails if you try to apply a patch that has absolute paths (because when they don’t add up you’d be introducing subtle bugs that might apply to users but not to you). The result has been an almost tree-wide commit spree that sanitised the patches so that they won’t fail to apply for users. It was a long boring and manual job but it was completed, and of that I’m happy.

But it’s not all well. as Mike (vapier) pointed out, trying to just report all failures related to -Wl,--no-undefined is going to produce a huge amount of bugspam for false positives. In cases like zsh plugins, you really can’t do much more than passing -Wl,--undefined to disable the previous option. Which makes -Wl,--no-undefined too much of an hassle to be usable in the tree. On the other hand it’s stil an useful flag to ask upstream to adopt for their ow builds, so that there are no undefined symbols. I think I’ll doublecheck all the software I contribute to to add this flag (as a special exception, this flag needs to be used only on Linux, since on other systems it might well be a problem, for instance on BeOS dynamic patching is more than likely to cause problems, and on FreeBSD the pthread functions are not usually linked in libraries).

This, and the largefile problem I wrote about brings me to wonder what we can do to improve even further the symbiosis between Gentoo and the various upstream. I’m sure there are tons of patches in the tree that hasn’t been sent, and I’m afraid that --as-needed patching will cause even more to be introduced. I wonder if there could be volunteers that spend time checking package per package the patches so that they are sent upstream, checking the Unmaintained Free Software wiki so that if a package is not maintained by upstream anymore there are references to our patches if somebody wants to pick it up.

I could be doing this myself, but it takes time, and lately I haven’t had much; I could try to push myself further but I currently don’t see much of the point since I sincerely have had very little feedback from users lately, beside the small stable group of users who I esteem very much and who’s always around when I need help. Even just a kudo on ohloh would be nice to know my work is appreciated, you know. Anyway if you’re interested in helping with submitting stuff upstream, please try to contact me, so I can see to write down upstream references in the patches that we have in tree.

Also, since I started working “full time” on --as-needed issues, I had to leave behind some things like closing some sudo bugs and some PAM issues, like the one with OpenSSH and double lastlogin handling. I hope to resume those as soon as I free some of my time from paid jobs (hopefully having some money spare to pay for Yamato, which still ain’t completely paid for, and which by this pace is going to need new disks soon enough, considering the amount of space that the distfiles archive, as well as the built chroot, take. I actually hope that Iomega will send the replacement for my UltraMax soon since I wanted to move music and video on that external drive to free up a few hundreds gigabytes from the internal drives.

Once the build is completed, I’ll also have to spend time to optimise my ruby-elf tools to identify symbol collisions. I haven’t ran the tools in almost an year, and I’m sure that with all the stuff I mergd in that chroot, I’m going to have more interesting results than the one I had before. I already started looking for internal libraries, although just on account of exported symbols, which is, as you probably know, a very bland way to identify them. A much more useful way to identify those is by looking at the DWARF debug data in the files, with utilities like pfunct from the dwarves package. I haven’t built the chroot with debug information though, since otherwise it would have required much more on-disk space, and the system is already a few tens gigabytes, without counting portage, distfiles, packages, logs or build directories.

In the mean time even with the very bland search, I found already a few packags that do bundle zlib, one of which bundles an old version of zlib (1.2.1) which as far as I remember is vulnerable to some security issues. Once again having followed policy would have avoided the problem altogether, just like SDL bundling its own versions of Xorg libraries which can now make xorg-server crash when an SDL application is executed. Isn’t it pure fun?

At any rate, for tonight I’m off, I did quite a lot already, it’s 4.30 am and I’m still not sleepy, something is not feeling right.

Using dwarves for binaries’ data mining

I’ve wrote a lot about my linking collisions script that also shown the presence of internal copies of libraries in binaries. It might not be understood that this is just a side-effect, and that the primary scope of my script is not to find the included libraries, but rather to find possible collisions between two software with similar symbols and no link between them. This is what I found in Ghostscript bug #689698 and poppler bug #14451 . Those are really bad things to happen and that was my first reason for writing the script.

One reason why this script cannot be used with discovery of internal copies of libraries as main function is that it will not find internal copies of libraries if they have hidden visibility, which is a prerequisite for properly importing an internal copy of whatever library (skipping over the fact that is not a good idea to do such an import).

To find internal copies of libraries, the correct thing to do is to build all packages with almost full debug information (so -ggdb), and use the dwarf data in them to find the definition of functions. These definitions won’t disappear with hidden visibility so they can be relied upon.

Unfortunately parsing DWARF data is a very complex matter, and I doubt I’ll ever add DWARF parsing support to ruby-elf, not unless I can find someone else to work with me on it. But there is already a toolset that you can use for this: dwarves (dev-util/dwarves). I haven’t written an harvesting and analysis tool yet, and I’m just wasting a lot of CPU cycles to scan all the ELF files for single functions, at the moment, but I’ll soon write something for that.

The pfunct tool in dwarves allows you to find a particular function in a binary file, I ran pfunct over all the ELF files in my system, looking for two functions up to now: adler32 and png_free. The first is a common function from zlib, the latter is, obviously, a common function from libpng. Interestingly enough, I found two more packages that use an internal copy of zlib (one of which is included in an internal copy of libpng): rsync and doxygen.

It’s interesting to see how a base system package like rsync is suffering from this problem. It means that it’s not just uncommon libraries to be bundled by remotely used programs, but also widely known and accepted software to include omnipresent libraries like zlib.

I’m now looking for internal copies of popt, which I’ve also seen imported more than a couple of time by software (cough distcc cough), and is a dependency of system packages already. The problem is that dwarf parsing is slow and takes time for pfunct to scan all the system. That’s why I should use another harvest and an analyse script.

Oh well, more analysis for the future :) And eliasp, when I’ve got this script done, then I’ll likely accept your offer ;)