Bundling libraries for despair and insecurity

I’ve been told quite a few times that my posts tend to be too long, and boring for most basic users, so for a time I’ll try to use the “extended content” support from Typo, and see how people react. What this means is that the blog post is just summarised on feeds, and on aggregators like Planet, while the complete text can be read by accessing the article directly on my blog.

When I started my work reporting bundled libraries almost an year ago, my idea had a lot to do with sharing code and just to the side to do with the security issues related to bundled libraries. I had of course first hand experience with the problem, since xine-lib has (and still in part had) bundled a lot of libraries. When I took over maintainership of it in Gentoo, it was largely breaching policy, and the number of issues I had with that was huge. With time, and coordination with upstream (to the point of me becoming upstream), the issues were addressed, and nowadays most of xine-lib bundled libraries are ignored in favour of the system copies (where possible; some were largely modified to the point of not being usable, but that’s still something we’re fighting with). Nowadays, the 1.2 branch of xine-lib already doesn’t have a FFmpeg copy at all, always using the system copy (or an eventual static copy built properly).

But nowadays I started to see that what is obvious to me about the problems with bundled copies of libraries is not obvious to all developers, and even less obvious to “power users” who proxy-maintain ebuilds and just want them to work for them, rather than complying with Gentoo policies and standards. Which is why I think that sunrise and other overlays should always be scrutinised carefully before being added to a system.

At any rate, for this reason I’m going to explain in this post why you should not use bundled internal copies of libraries for packages added to Gentoo, and why in particular these packages should not be deemed stable at all.

The first issue to discuss is why do upstreams bundle libraries, since knowing the reasoning behind that is often helpful to identify whether it makes sense at all to keep them or not. The first most obvious answer is: to reduce dependencies. For a long time this was the major reason behind xine-lib usage of internally bundled libraries. As it turns out, with time this reason became moot: distributions started packaging xine-lib directly reducing the number of users wishing to build it from sources; even those wanting to build xine-lib from sources would find all the needed libraries in the distributions, most of the times. When this is the sole reason for libraries building, upstream should be very well open to add a configure option (or anything similar to that) to use the system copy, optionally or by default, with fallback to the bundled copy.

A second reason might be that the library is being unstable when it comes to API; this is probably the first reason why FFmpeg is often bundled in software rather than using the copy on the system; while this is a concern that makes more sense then the one before, it’s still mostly a moot point since it really just requires to fix the issue at the source: get the original project to maintain API compatibility, to provide an API compatibility layer, or to finalise its API. Even when it cannot be helped, because the API is in flux, maintained software fears not the API break; it might be a bit of an hassle but in general it’s feasible to keep the use and the library in sync.

Thirdly, more worrisome, is when the library is modified, slightly or heavily, after bundling; in this case using the system copy might be quite a burden because it will lack the specific changes as made by the project. In this case there is a lot of work involved, sometimes more work that it can be taken care by distributions, and requires coordination of the project’s upstream together with the higher level upstream. This is what happened with xine-lib and FFmpeg: the copy in the xine-lib sources was heavily modified to suit both the build system and the interface requirements of xine, which made it also very difficult to update the internal snapshot. All the interface changes needed have then been pushed upstream to FFmpeg, and the buildsystem changes were made moot by using the default buildsystem (with needed changes pushed upstream) embedded in autotools; and then FFmpeg was entirely removed from xine-lib’s sources.

Now, on the other hand, the disadvantages of using bundled libraries are probably worse: code duplication means that there is more data to process (both at build time to compile and at load time to store in memory), there is more space used by the binaries, and there are duplicated bugs that need to be fixed twice. A lot of time in xine-lib the problems with decoding something with FFmpeg were solved by just using a newer FFmpeg; why keeping one then?

The most important issue though is about security: when a vulnerability is found in a library like zlib, fixing the library alone is not enough: while that fixes the majority of the software in a system, it’s not going to fix those who bundle it, both closed-source and open source. For instance, take dzip; it uses an ancient internal version of zlib; if somebody knows the format well enough, it’s far from impossible to craft a dzip file that contains a deflated stream that can executed malicious code.

For this latter issue alone, I’d say that any software bundling source code is not good enough to go stable on its own. Of course sometimes one has to bend the rules because of past mistakes, for instance even though Ruby bundles stuff, we cannot stop newer versions to go stable; this problem is not a regression. But should stop other broken software from entering portage or at least the stable tree.

But it’s not just security, subtle bugs might actually be quite a problem. For instance, you might remember all Java applications failing when libX11 was built with XCB support some time ago. The problem was due to some stricter checks in libxcb compared to what libX11 have been checking before, but the source of the problem was Xinerama. The problem with that was that Sun bundled an internal copies of libXinerama sources in the JRE sources, and even though libXinerama was since then fixed regarding that particular issue (the crash with XCB), it was never updated in the JRE before the issue became a nuisance for users.

A very similar issue, also involving X11 (just by chance, it’s not that all the issues involve X11) is this particular bug in Xorg that is triggered when launching SDL-based applications, because libSDL bundles ancient versions of X11 libraries.

As I said earlier, unbundling is rarely easy; there are subtle issues to be checked out, for instance one has to check if there are changes at all beside eventual build-system related things (for instance to avoid using a full-fledged ./configure), but altogether it’s usually not tremendously impossible. Of course one has to stop thinking “Oh my, what if a library changes and the software breaks?”, otherwise the task gets impossible. Software changes, software bitrots. It’s not by bundling internal copies of libraries that you can stop that. When the compiler gets upgraded, you’re going to have your software break, and you should fix your software; if the C library cleans up the includes, your software might not compile or might misbehave, deal with it. Sometimes the bundled libraries implement protocols and formats that need to work together with some other piece of software; if that changes, the bundled libraries are just going to break further.

Your software is rarely special enough that you can be exempted from following the rules. Even OpenOffice is using lots of system libraries nowadays!

Bundling and modifying libraries is just like forking a project, and forking might not always be the best approach ; sometimes with dead upstream for a project, forking is your only hope; but even in those cases there are nice ways to bundle libraries. If you look at the way nmap bundles libdnet, you can see that they not only document all the changes they made to the library, but also provides splitted down and commented patches for the various changes, making it possible to adapt it to their need.

For proprietary software packages, of course, the matter is different, since you cannot usually unbundle the libraries yourself; but it’s a good idea to ask upstream nicely if they can use system copies instead of internal ones. Mind you, some might be happy to fix their packages not to be vulnerable any longer. Although I guess lots of them might actually prefer to keep them as they are since it’s a cost to them. One more reason not to trust them, to me.

So bottom line, if you’re working on an ebuild for a new piece of software to submit for Portage addition, please look well to see if the software is bundling libraries, and if it is, don’t let it enter portage that way. And if you’re a developer who wants to push some ebuild to the tree, also remember to ensure that it complies with our policies and doesn’t bundle in libraries.