F-Spot problems

I like taking photos, although I have rarely some nice subjects outside of my garden (since I haven’t travelled a lot before, although I intend to fix this in the next few months, for instance going back to London and somewhere else too). Since most of the photos I take I want to be seen by friends and family, I resolved some time ago into publishing all of them on flickr with the pro-account, without upload limits and with full-size photos available as well.

I don’t want to go on with explaining why I prefer flickr to Picasa, let’s just leave it to the fact that I can interface to flickr with XSLT without having to implement the GData protocol, okay?

You might then expect that I was pretty upset when, after upgrading to F-Spot 0.6.1.1 I didn’t find the option to upload photos to flickr (in the mean time, even Apple’s iPhoto gained that feature!). Turns out that the problem wasn’t really F-spot’s but rather a problem with the patch, applied in Gentoo, to “fix” parallel build (in truth, it was just a hack to provide some serialisation to avoid breaking when using parallel make).

So indeed I was able to fix the Flickr exporter in 0.6.1.1-r1 and that was it: it used a system copy of the FlickrNet .NET library that was used by the exporter, instead of building and using the internal copy, so the keywords were dropped, but in the end it worked fine.

But the way I did the “fix” in Gentoo was as much as a hack as the original “parallel make fix”, so I wanted to rewrite it properly for upstream, by giving them something that merged directly, and could be really used. When I started doing so, I finally found what broke: the original “fix” stopped make from recursing into the bundled libraries’ directories on all targets, included clean and, more importantly for us, install. So the libraries were built but not installed, and that was our original problem.

The final problem, instead, is that F-Spot bundles libraries! FlickrNet is one of those, but there seem to be more, including Mono.Addins (which is already in the tree by the way), and gnome-keyring-sharp (also). And as usual, this is a violation of Gentoo policy; bug #284732 was filed, and I’ll try to work on it when I have some time; but before doing that I have to hope that upstream will accept the changes I made up to now, I would rather not do the work if upstream is going to reject it.

If you’re interested to see what I changed, or you’re an F-Spot developer that arrived on my blog, you find my changes in my git repository which is available on this server.

I told you so — The expat debacle

A few days before leaving for my vacation (more on that later), I have noticed an identi.ca post from tante that related to XML parsers’ vulnerabilities from CERT-FI. Since I was leaving for vacation I didn’t want to pick it up myself, but I nudged our security team in that respect. Unfortunately this was a preamble to a multi-level fuck up.

When I first saw the advisories, it didn’t even name expat, not even in passing, but it referred to the Python parser, and I remembered that Python used an internal copy of expat by default. So I was worried; the worry seems to have been correct: the bug is in the expat code, rather than in the glue to Python, so the bug is present in all software using expat; Robert was able to reproduce the issue with a software that only used libexpat, and not Python; CERT-FI at the time of writing does not list standalone expat in the list of vulnerable software, though, just listing “Python’s libexpat”.

Indeed, the fix is present in the latest revision of expat in Gentoo’s tree, but that fix was escalated and pushed without going properly through security up to today which would have meant the fix wouldn’t have been scheduled for security stable.

But there is an even greater fuck-up in all this, and those who probably follow me from some time ago are already expecting it: bundled copies of libexpat ! Indeed the thing is not only bundled in a bunch of closed-source software but also in a lot of free software packages. The bundled libs bug is a good index for those things.

The problem now is to make sure the list is updated, and also make sure that the proprietary software that is vulnerable will be handled properly, hopefully. Unfortunately expat is probably the second most commonly bundled library after zlib, and that already made me shiver more than a few times at the thought of a vulnerability in it. Well, time has come.

Now, can somebody really find the unbundling effort still pointless? Seriously?

Security considerations: scanning for bundled libraries

My fight against bundled libraries might soon transcend the implementation limits of my ruby-elf script .

The script I’ve been using to find the bundled libraries up to now was not designed with that in mind originally; the idea was to identify colliding symbols between different object files, so to identify failure cases like xine’s aac decoder hopefully before they become a nuisance to users like PHP. Unfortunately the amount of data generated by the script due to bundled libraries makes it tremendously difficult to deal with in advance, so it can currently only be used as a post-mortem.

But as a security tool, I already stated it’s not enough because it only checks for symbols that are exported by shared objects (and often mistakenly by executables). To actually go deeper, one would have to look at one of two options: the .symtab entries in the ELF files (which are stripped out before installing), or the data that the compiler emits for each output file in form of DWARF sections with -g flags. The former can be done with the same tools I’ve been using up to now, the latter you list with pfunct from dev-util/dwarves. Trust me, though, that if the current database of optimistically suppressed symbols is difficult to deal with, doing a search using dwarf functions is likely to be unmanageable, at least to handle with the same algorithm that I’m using for the collision detection script.

Being able to identify bundled libraries in that kind of output is going to be tremendously tricky; if my collision detection script already finds collision between executables like the one from MySQL (well, before Jorge’s fixes at least) and Samba packages, because they don’t use internally shared libraries, running it against the internal symbols list is going to be even worse because it would then find equally-named internal functions (usage() anybody), static libraries links (including system support libraries) and so on.

So there are little hopes to tackle the issue this way; which makes the idea of finding beforehand all the bundled libraries in a system an inhuman task; on the other hand that doesn’t mean I have to give up on the idea. We can still make use of that data to do some kind of post-mortem, once again, with some tricks though. When it comes to vulnerabilities, you usually have a function, or a series of function, that are involved; depending on the centrality of the functions in a library, there will be more or less applications using that vulnerable codepath; while it’s not extremely difficult to track them down when the function is a direct API (just look for software having external references to that symbol), it’s quite another story with internal convenience functions, since they are called indirectly. For this reason while some advisories do report the problematic symbols, most of the time the thing is just ignored.

We can, though, use that particular piece of information to track down extra vulnerable software, that bundles the code. I’ve been doing that on request for Robert a couple of times with the data produced by the collision detection scripts, but unfortunately it doesn’t help because it also is only able to check externally-defined API, just like a search for use would. How to solve the problem? Well, I could just not strip the files and just read the data from .symtab to see whether the function is defined, and this might actually be what I’m going to do soonish; unfortunately this creates a couple of issues that needs to be taken care of.

The first is that the debug data is not exactly small, the second is that the chroots volume is under RAID1 so the space is a concern; it’s already 100GB big and with just 10% of it free, if I am not to strip data, it’s going to require even more space; I can probably just split out some of the data of the volume in a chroots-throwable volume that I don’t have to keep on RAID1. If I split the debug data with the splitdebug feature, it would make it quite easy to deal with.

Unfortunately this brings me to the second problem, or rather the second set of problems: ruby-elf does not currently support the debuglink facilities, but that’s easy to implemente, after all it’s just a note section with the name of the debug file, the second is nastier and relates to the fact that the debuglink section created by portage lists the basename of the file with the debug information, which is basically the same name as the original with a .debug suffix. The reason why this is not just left to be intended is that if you look up the debuglink for libfoo.so you’ll see the real name might be libfoo.so.2.4.6.debug; on the other hand it’s far from trivial since it leaves something to be intended: the path to find the file into. By default all tools will be looking at the same path as the executable file, and prepend /usr/lib/debug to that. All well as long as there are no symlinks in the path, but if there are (like on multilib AMD64 systems), it starts to be a problem: accessing a shared object via /usr/lib/libfoo.so will try a read of /usr/lib/debug/usr/lib/libfoo.so.2.4.6.debug which will not exist (it would be /usr/lib/debug/usr/lib64/libfoo.so.2.4.6.debug). I have to track down and check if it’s feasible to use full canonicalised path for the debuglink; on the other hand that will assume that the root for the file is the same as the root of the system, which might not be the case. The third option is to use a debugroot-relative path, so that debuglink would look like usr/lib64/libfoo.so.2.4.6.debug; unfortunately I have no clue how gdb and company would take a debuglink like that, and I have to check it).

Problem does not stop here though; since packages collide one with the other when they try to install files with the same name (even when they are not really alternatives), I cannot rely to have all the packages installed in the tinderbox, which is actually making it even worse to analyse the symbol collisions dataset. So I should at least scan the data before merge on livefs is done, and load it in a database, indexed on a per-package per-slot basis, and then select the search that data to identify the problems. Not an easy or a quick solution.

Nor a complete one to be honest: the .symtab method will not show the symbols that are not emitted, like inlined functions; while we do want the unused symbols to be cut out, we still need static inlined functions names, since if a vulnerability is found there, it has to be found. I should check whether DWARF data is emitted for that at least but I wouldn’t be surprised if it wasn’t either. And also does not cope at all with renamed symbols, or copied code… So, still a long way before we actually can reassure users that all security issues are tackled down when found (and this does not limit to Gentoo, remember; Gentoo is the base tool I use to tackle the task, but the same problems involve basically every distribution out there).

Bundling libraries: the curse of the ancients

I was very upset by one comment from Ardour’s lead developer Paul Davis in a recently reported “bug” about the un-bundling of libraries from Ardour in Gentoo. I was, to be honest, angry after reading his comment, and I was tempted to answer badly for a while; but then I decided my health was more important and backed away, thought about it, then answered how I answered (which I hope is diplomatic enough). Then I thought it might be useful to address the problem in a less concise way and explain the details.

Ardour is bundling a series of libraries; like I wrote previously, there are problems related to this and we dealt with them by just unbundling the libraries, now Ardour is threatening to withdraw support from Gentoo as a whole if we don’t back away from that decision. I’ll try to address his comments in multiple parts, so that you can understand why it really upset me.

First problem: the oogie-boogie crashes<

It’s a quotation from Adam Savage from MythBusters, watch the show if you want to actually know the detail; I learnt about it from Irregular Webcomic years ago, but I have only seen it about six months ago, since in Italy it only passes on satellite pay TV, and the DVDs are not available (which is why they are in my wishlist).

Let’s see what exactly Paul said:

Many years ago (even before Gentoo existed, I think) we used to distribute Ardour without the various C++ libraries that are now included, and we wasted a ton of time tracking down wierd GUI behaviour, odd stack tracks and many other bizarre bugs that eventually were traced back to incompatibilities between the way the library/libraries had been compiled and the way Ardour was compiled.

I think that I now coined a term for my own dictionary, and will call this the syndrome of oogie-boogie bugs, for each time I hear (or I’m found muttering!) “we know of past bad behaviour”. Sorry but without documentation, these things are like unprovable myth, just like the one Adam commented upon (the “pyramid power”). I’m not saying that these things didn’t happen, far form that I’m sure they did, the problem is that they are not documented and thus are unprovable, and impossible to dissect and correct.

Also, I’m not blaming Paul or the Ardour team to be superficial, because, believe it or not, I suffer(ed, hopefully) from that syndrome myself: some time ago, I reported to Mart that I had maintainer mode-induced rebuilds on packages that patched both Makefile.am and Makefile.in, and that thus the method of patching both was not working; while I still maintain that it’s more consistent to always rebuild autotools (and I know I have to write on why is that), Mart pushed me into proving it, and together we were able to identify the problem: I was using XFS for my build directory, which has sub-second mtime precision, while he was using ext3 with mtime precise only to the second, so indeed I was experiencing difficulties he would never have been able to reproduce on his setup.

Just to show that this goes beyond this kind of problem, since I joined Gentoo, Luca told me to be wary about suggesting use of -O0 when debugging because it can cause stuff to miscompile. I never accepted his word for it because that’s just how I am, and he didn’t have any specifics to prove it. Turns out he wasn’t that wrong after all, since if you build FFmpeg with -O0 and Sun’s compiler, it cannot complete the link. The reason for this is that with older GCC, and Sun’s compiler, and others I’m sure, -O0 turns off the DCE (Dead Code Elimination) pass entirely, and cause branches like if (0) to be compiled anyway. FFmpeg relies on the DCE pass to always happen. (there is more to say about relying on the DCE pass but that’s another topic altogether).

So again, if you want to solve bugs of this kind, you have to just do like the actual Mythbusters: document, reproduce, dissect, fix (or document why you have to do something rather than just saying you have to do it). Not having the specifics of the problem, makes it an “oogie-boogie” bug and it’s impossible to deal with it.

Second problem: once upon a time

Let me repeat one particular piece of the previous quote from Paul Davis (emphasis mine): “Many years ago (even before Gentoo existed, I think)”. How many years ago is that? Well, since I don’t want to track down the data on our own site (I have to admit I found it appalling that we don’t have a “History” page), I’ll go around quoting Wikipedia. If we talk about Gentoo Linux with this very name, the 1.0 version has been released on 2002, March 31 (hey it’s almost seven years go by now). If we talk about Daniel’s project, Enoch Linux 0.75 was released in December 1999, which is more than nine years ago. I cannot seem to be able to confirm Paul’s memories since their Subversion repositories seems to have discarded the history information from when they were in CVS (it reports the first commit in 2005, which is certainly wrong if we consider that Wikipedia puts their “Initial Release” in 2004).

Is anything the same as it was at that time? Well, most likely there are still pieces of code that are older than that, but I don’t think any of those are in actual use nowadays. There have been, in particular, a lot of transitions since then. Are difficulties found at that time of any relevance nowadays? I sincerely don’t think so. Paul also don’t seem to have any documentation of newer happenings of this, and just says that they don’t want to spend more time on debugging these problems:

We simply cannot afford the time it takes to get into debugging problems with Gentoo users only to realize that its just another variation on the SYSLIBS=1 problem.

I’ll go around that statement in more details in the next problem, but for now let’s accept that there has been no documentation of new cases, and that all that it goes here is bad history. Let’s try to think about what that bad history was. We’re speaking about libraries, first of all, what does that bring us? If you’re an avid reader of my blog, you might remember what actually brought me to investigate bundled libraries in the first place: symbol collisions ! Indeed this is very likely, if you remember I did find one crash in xine due to the use of system FFmpeg, caused by symbol collisions. So it’s certainly not a far-fetched problem.

The Unix flat namespace to symbols is certainly one big issue that projects depending on many libraries have to deal with; and I admit there aren’t many tools that can deal with that. While my collision analysis work has focused up to now to identify the areas of problem, it only helps in the big scheme of things to find possible candidate to collision problems. This actually made me think that I should adapt my technique to identify problems in a much smaller scale, giving one executable in input and identifying duplicated symbols. I just added this to my TODO map.

Anyway, thinking about the amount of time passed since Gentoo’s creation (and thus what Paul think is when the problems started to happen), we can see that there is at least one big “event horizon” in GNU/Linux since then (and for once I use this term, because it’s proper to use it here): the libc5 to libc6 migration ; the HOWTO I’m linking, from Debian, was last edited in 1997, which puts it well in the timeframe that Paul described.

So it’s well possible that people at the time went to use libraries built for one C library with Ardour built with a different one, which would have created, almost certainly, subtle and difficult to identify (for a person not skilled with linkers at least) issues. And it’s certainly not the only possible cause of similar crashes, or even worse unexpected behaviour. If we look again at Paul’s comment, he speaks of “C++ libraries”; I know that Ardour is written in C++ and I think I remember some of the libraries being built being written in C++ too; I’m not sure if he’s right at calling all of them “C++ libraries” (C and C++ are two different languages, even if the foreign calling convention glue is embedded in the latter’s language), but given even a single one is as such, it can open a different Pandora’s vase.

See, if you look at GCC’s history, it wasn’t long before Enoch 0.75 release that a huge paradigm shift initiated for Free Software compilers. The GNU C Compiler, nowadays the GNU Compiler Collection, forked the Experimental/Enhanced GNU Compiler System (EGCS) in 1997, which was merged back into GCC with the historical release 2.95 in April 1999. EGCS contained a huge amount of changes, a lot related to C++. But even that wasn’t near perfection; for many, C++ support was mostly ready from prime time only after release 3 at least, so there were wild changes going on at that time. Libraries built with different versions of the compiler at the time might as well had wildly differently built symbols with the same name, and even worse, they would have been using different STL libraries. Add to the mix the infamous 2.96 release of GCC as shipped by RedHat, I think the worse faux-pas in the history of RedHat itself, with so many bugs due to backporting that a project I was working with at the time (NoX-Wizard) officially unsupported it, suggesting to use either 2.95 or 3.1. We even had an explicit #error out if the 2.96 release was used!

A smaller scale paradigm shift has happened with the release of GCC 3.4 and the change from libstdc++.so.5 to libstdc++.so.6 which is what we use nowadays. Mixing libraries using the two ABIs and the STL versions caused obvious and non-obvious crashes; we still have software using the older ABI, and that’s why we have libstdc++-v3 around; Mozilla, Sun and Blackdown hackers certainly remember that time because it was a huge mess for them. It’s one very common (and one of my favourite) arguments against the use of C++ for mission-critical system software.

Also, GCC’s backward compatibility is near non-existent: if you build something with GCC 4.3, without using static libraries, executing it on a system with GCC 4.2 will likely cause a huge amount of problems (forward compatibility is always ensured though). Which adds up to the picture I already painted. And do we want to talk about the visibility problem? (on a different note I should ask Steve for a dump of my old blog to merge here, it’s boring not remembering that one post was written on the old one).

I am thus not doubting at all of Paul’s memories regarding problems with system libraries and so on so forth. I also would stress another piece of his comment: “eventually were traced back to incompatibilities between the way the library/libraries had been compiled and the way Ardour was compiled”. I understand he might not actually just refer to the compiler (and compiler version) used in the build; so I wish to point out two particular GCC options: -f(no-)exceptions and -f(no-)rtti.

These two options enable or disable two C++ language features: exceptions handling and run-time type information. I can’t find any reference to that in the current man page, but I remember that it warned that mixing code built with and without it in the same software unit was bad. I wouldn’t expect it to be any different now sincerely. In general the problem is solved because each piece of software builds its own independent unit, in the form of executable or shared object, and the boundary between those is subject to the contract that we call ABI. Shared libraries built with and without those options are supposed to work fine together (I sincerely am not ready to bet though), but if the lower-level object files are mixed together, bad things may happen, and since we’re talking about computers, they will, in the moment you don’t want them to. It’s important to note here for all the developers not expert with linkers that static libraries (or more properly, static archives) are just a bunch of object files glued together, so linking something statically still means linking lower-level object files together.

So the relevance of Paul’s memories is, in my opinion, pretty low. Sure shit happened, and we can’t swear that it’ll never happen again (most likely it will), but we can deal with that, which brings me to the next problem:

Third problem: the knee-jerk reaction

Each time some bug happens that is difficult to pin down, it seems like any developer tries to shift the blame. Upstream. Downstream. Sidestream. Ad infinitum. As a spontaneous reflex.

This happens pretty often with distributions, especially with Gentoo that gives users “too much” freedom with their software, but most likely in general, and I think this is the most frequent reason for bundling libraries. By using system libraries developers lose what they think is “control” over their software, which in my opinion is often just sheer luck. Sometimes developers admit that their reasons are just desire to spend the less time possible working on issues, some other times they try to explicitly move the blame on the distributions or other projects, but at the end of the day the problem is just the same.

Free software is a moving target; you might developer software against a version of a library, not touch the code for a few months, it works great, and then a new version is released and your software stops working. And you blame the new release. You might be right (new bug introduced), or you might be wrong (you breached the “contract” called API, some change happened and something that was not guaranteed to work in any particular way changed the way it worked, and you relied on the old behaviour). In either case, the answer “I don’t give a damn, just use the old version” is a sign of something pretty wrong with your approach.

The Free Software spirit should be the spirit of collaboration. If a new release of a given dependency breaks your software, you should probably just contact the author and try to work out between the two project what the problem is; if it’s a bug introduced, make sure there is a testsuite, and that the testsuite includes a testcase for the particular issue you found. Writing testcases for bugs that happened in the past is exactly why testsuites are so useful. If the problem is that you relied on a behaviour that has changed, the author might know how not to rely on that and have code that work as expected, or might take steps to make sure nobody else tries that (either by improving documentation or changing the interface so that the behaviour is not exposed). Bundling the dependency citing multiple problems and giving no option is usually not the brightest step.

I’m all for giving working software to users by default, so I can understand bundling the library by default; I just think that it should either be documented why that’s the case or give a chance of not using it. Someone somewhere might actually be able to find what the problem is. Just give him a chance. In my previous encounter with Firefox’s SQLite, I received a mail from Benjamin Smedberg:

Mozilla requires a very specific version of sqlite that has specific compiled settings. We know that our builds don’t work with later or earlier versions, based on testing. This is why we don’t build against system libsqlite by design.

They know based on testings that they can’t work with anything else. What does that testing consists of, I still don’t know. Benjamin admitted he didn’t have the specifics, and relied me to Shawn Wilsher who supposedly had more details, but he never got back at me with those details. Which is quite sad since I was eager to find what the problem was because SQLite is one of the most frequent oogei-boogei sources. I even noted that the problem with SQLite seems to lie upstream, and I still maintain that in this case; while I said before that it’s a knee-jerk reaction, I also have witnessed to more than a few project having problems with SQLite, myself I had my share of headaches because of that. But this should really start make us think that maybe, just maybe, SQLite needs help.

But we’re not talking about SQLite here, and trust me that most upstreams will likely help you out to forwardport your code, fix issues and so on so forth. Even if you, for some reason I don’t want to talk about now, decided to change the upstream library after bundling, often times you can get it back to a vanilla state by pushing your changes upstream. I know it’s feasible even for the most difficult upstreams, because I have done just that with FFmpeg, with respect to xine’s copy.

But just so that we’re clear, it does not stop with libraries, the knee-jerk reaction happens with CFLAGS too; if you have many users reporting that using wild CFLAGS break your software, the most common reaction is to just disallow custom CFLAGS, while the reasoned approach would be to add a warning and then start to identify the culprit; it might be your code assuming something that is not always true, or it might be a compiler bug, in either case the solution is to fix the culprit instead of just disallowing anybody from making use of custom flags.

Solution: everybody’s share

So for now I dissected Paul’s comment into three main problems; I could probably write more about each of them, and I might if the points are not clear, but the post is already long enough (but I didn’t want to split it down because it would take too long to be available), and I wanted to reach a conclusion with a solution, which is what I already posted in my reply to the bug.

The solution to this problem is to give everybody something to do. Instead of “blacklisting Gentoo” like Paul proposed, they should just do the right thing and leave us to deal with the problems caused by our choices and our needs. I have already pointed out some of these in my three-parts article for LWN (part 1, part 2 and part 3). This means that if you get user reporting some weird behaviour, using the Gentoo ebuild, your answer should not be “Die!” but “You should report that to the Gentoo folks over at their bugzilla”. Yes I know it is a much longer phrase and that it requires much more typing, but it’s much more user friendly and actually provides us all with a way to improve the situation.

Or you could also do the humble thing and ask for help. I already said that before, but if you got problem with anything I have written about, and have a good documentation of what the problem is, you can write me; of course I don’t always have time to fix your issues, sometimes I don’t even have time to look at them in a timely fashion I’m afraid, but I never sent away someone because I didn’t like them. The problem is that most of the time I’m not asked at all.

Even if you might end up asking me some question that would be very silly if you knew the topic, I’m not offended by those; just like I’d rather not be asked to learn all about the theory behind psychoacoustic to find why libfaad is shrieking my music, I don’t pretend that Paul knows all the inside out of linking problems to find out why the system libraries cause problems. I (and others like me) have the expertise to identify relatively quickly a collision problem; I should also be able to provide tools to identify that more quickly. But if I don’t know of the problem, I cannot magically fix it; well, not always at least .

So Paul, this is an official offer; if you can give me the details of even a single crash or misbehaviour due to the use of system libraries, I’d be happy to look into it.

The battles and the losses

In the past years I picked up more than a couple of “battles” to improve Free Software quality all over. Some of these were controversial, like `–as-needed and some of them have been just lost causes (like trying to get rid of C++ strict requirements on server systems). All of those though, were fought with the hope of improving the situation all over, and sometimes the few accomplishments were quite a satisfaction by themselves.

I always thought that my battle for --as-needed support was going to be controversial because it does make a lot of software require fixes, but strangely enough, this has been reduced a lot. Most of the newly released software works out of the box with --as-needed, although there are some interesting exceptions, like GhostScript and libvirt. On the positive exceptions, there is for instance Luis R. Rodriguez, who made a new release of crda just to apply an --as-needed fix with a failure that was introduced in the previous release. It’s very refreshing to see that nowadays maintainers of core packages like these are concerned with these issues. I’m sure that when I’ve started working on --as-needed nobody would have made a new point release just to address such an issue.

This makes it much more likely for me to work on adding the warning to the new --as-needed and even more needed for me to find why ld fails to link PulseAudio libraries even though I’d have expected him to.

Another class of changes that I’ve been working on that have shown more interest around than I would have expected is my work on cowstats which, for the sake of self-interest, formed most of the changes in the ALSA 1.0.19 release for what concerns the userland part of the packages (see my previous post on the matter).

On this case, I wish first to thank _notadev_ for sending me Linkers and Loaders, that is going to help me improve Ruby-Elf more and more; thanks! And since I’m speaking of Ruby-Elf, I finally decided its fate: it’ll stay. My reasoning is that first of all I was finally able to get it to work with both Ruby 1.8 and 1.9 adding a single thin wrapper (that is going to be moved to Ruby Bombe once I actually finish that), and most importantly, the code is there, I don’t want to start from scratch, there is no point in that, and I think that both Ruby 1.9 and JRuby can improve from each other (the first losing the Global Interpreter Lock and the other one trying to speed up its starting time). And I could even decide to find time to write a C-based extension, as part of Ruby-Bombe, that takes care of byteswapping memory, maybe even using OpenMP.

Also, Ruby-Elf have been serving its time a lot with the collision detection script which is hard to move to something different since it really is a thin wrapper around PostgreSQL queries, and I don’t really like to deal with SQL in C. Speaking about the collision detection script, I stand by my conclusion that software sucks (but proprietary software stinks too).

Unfortunately while there are good signs to the issue of bundled libraries, like Lennart’s concerns with the internal copies of libltdl in both PulseAudio (now fixed) and libcanberra (also staged for removal) the whole issue is not solved yet, there are still packages in the tree with a huge amount of bundled libraries, like Avidemux and Ardour, and more scream to enter (and thankfully they don’t always do). -If you’d like to see the current list of collisions, I’ve uploaded the LZMA-compressed output of my script.- If you want you can clone Ruby-Elf and send me patches to extend the suppression files, to remove further noise from the file.

At any rate I’m going to continue my tinderboxing efforts, while waiting for the new disks, and work on my log analyser again. The problem with that is I really am slow at writing Python code, so I guess it would be much easier if I were to reimplement the few extra functions that I’m using out of Portage’s interface in Ruby and use those, or find a way to interface with Portage’s Python interface from Ruby. This is probably a good enough reason for me to stick with Ruby, sure Python can be faster, sure I can get better multithreading with C and Vala, but it takes me much less time to write these things with Ruby than it would take me in any of the other languages. I guess it’s a problem with the mindset.

And on the other hand, if I have problems with Ruby I should probably just find time to improve the implementation; JRuby is enough evidence to show that my beef against Ruby 1.9 runtime not supporting multithreading are an implementation issue and not a language issue.

Software sucks, or why I don’t trust proprietary closed source software

If you follow my blog since I started writing, you might remember my post about imported libraries from last January and the follow up related to OpenOffice; you might know I did start some major work toward identifying imported libraries using my collision detection script and that I postponed till I had enough horsepower to run the script again.

And this is another reason why I’m working on installing as many packages as possible on my testing chroot. Now, of course the primary reason was to test for --as-needed support, but I’ve also been busy checking build with glibc 2.8, and GCC 4.3, and recently glibc 2.9 . And in addition to this, the build is also providing me with some data about imported libraries.

With this simple piece of script, I’m doing a very rough cut analysis of the software that gets installed, to check for the most commonly imported libraries: zlib, expat, bz2lib, libpng, jpeg, and FFmpeg:

    rm -f "${T}"/flameeyes-scanelf-bundled.log
    for symbol in adler32 BZ2_decompress jpeg_mem_init XML_Parse avcodec_init png_get_libpng_ver; do
    scanelf -qRs +$symbol "${D}" >> "${T}"/flameeyes-scanelf-bundled.log
    done
    if [[ -s "${T}"/flameeyes-scanelf-bundled.log ]]; then
    ewarn "Flameeyes QA Warning! Possibly bundled libraries"
    cat "${T}"/flameeyes-scanelf-bundled.log
    fi

This checks for some symbols that are usually not present without the rest of the library, and although it gives a few false positives, it does produce interesting results. For instance while I knew FFmpeg is very often imported, and I expected zlib to be copied in every other software, it’s interesting to know that expat as much used as zlib, and every time it’s imported rather than used from the system. This goes for both Free and Open Source Software and for proprietary closed-source software. The difference is that while you can fix the F/OSS software, you cannot fix the proprietary software.

What is the problem with imported libraries? The basic one is that they waste space and memory since they duplicate code already present in the system, but there is also one other issue: they create situations where even old, known, and widely fixed issue remain around for months, even years after they were disclosed. What preserved proprietary software this well to this point is mostly related to the so-called “security through obscurity”https://en.wikipedia.org/wiki/Security_through_obscurity. You usually don’t know that the code is there and you don’t know in which codepath it’s used, which makes it much harder for novices to identify how to exploit those vulnerabilities. Unfortunately, this is far from being a true form of security.

Most people would now wonder, how can they mask the use of particular code? The first option is to build the library inside the software, which hides it to the eyes of the most naïve researchers; by not loading explicitly the library it’s not possible to identify its use through the loading of the library itself. But of course the references to those libraries remain in the code, and indeed most of the times you’ll find the libraries’ symbols as defined inside executables and libraries of proprietary software. Which is exactly what my rough script checks. I could use pfunct from the seven dwarves to get the data out of DWARF debugging information, but proprietary software is obviously built without debug information so it would just waste my time. If they used hidden visibility, finding out the bundled libraries would be much much harder.

Of course, finding which version of a library is bundled in an open source software package is trivial, since you just have to look for the headers to find the one defining the version — although expat often is strippedf out of the expat.h header that contains that information. On proprietary software is quite more difficult.

For this reason I produced a set of three utilities that, given a shared object, find out the version of the bundled library. As it is it quite obviously doesn’t work on final executables, but it’s a start at least. Running these tools on a series of proprietary software packages that bundled the libraries caused me some kind of hysteria: lots and lots of software still uses very old zlib versions, as well as libpng versions. The current status is worrisome .

Now, can somebody really trust proprietary software at this point? The only way I can trust Free Software is by making sure I can fix it, but there are so many forks and copies and bundles and morphings that evaluating the security of the software is difficult even there; on proprietary software, where you cannot be really sure at all about the origin of the software, the embedded libraries, and stuff like that, there’s no way I can trust that.

I think I’ll try my best to improve the situation of Free Software even when it comes to security; as the IE bug demonstrated, free software solutions like Firefox can be considered working secure alternatives even by media, we should try to play that card much more often.