Rinse and repeat: tinderbox foresees slight libpng-1.5 trouble

As I quickly noted yesterday I started testing libpng-1.5 in the tinderbox to identify the packages failing to build with the new version. As it turns out there are quite a few as usual.

The problem turns out to be more complex than I’d have expected, though. My first step after installing the new libpng version was to make sure that no dangling package was present in the system, by running revdep-rebuild -- -C which supposedly should have removed all the libpng-using packages. I’ll be honest and say here that I was way too naïve and I shouldn’t have done so as first step at all.

While revdep-rebuild identified all the packages that installed .la files referencing libpng, which are quite a lot more than they should be, as I noted in the above-linked post, it didn’t seem to identify the broken ELF linking at all. I’m not sure why that is, maybe revdep-rebuild in the tinderbox is broken, or some package installs a wrong mask entry, but no matter what’s the case, I ended up having all the packages without .la files but with ELF linking to libpng14.so.14 still installed.

Thankfully, I whipped up quickly a scanelf/@qfile@ pipeline that provided me with a list of linked packages, which I could then remove and … compare with the reverse dependencies as produced by the QA Reports — turns out that there are packages linking to libpng (actually using it) and not listing it as a dependency. Reminds me of transitive dependencies that I discussed… over an year ago.

Unfortunately, it also appears to be a problem for the upstream code. One of the common failure has to do with zlib.h not being included by png.h any longer, which is a positive thing (as it stops entangling two libraries’ headers), but there are at least a few packages that used zlib’s interfaces without including zlib.h themselves. Again, isn’t it lovely, how people pretend not to include the headers they import functions and constants from? Unfortunately, while this is easy to notice for packages using constants such as Z_BEST_COMPRESSION, it might leave the compiler with just an implicit declaration (which might be further problem on 64-bit systems) if functions are involved.

And this is not all, of course. Even though I’m doing my best at covering as many packages as possible in the tinderbox, the guest I’m using for the builds right now is far from having a 100% coverage. The main issue is that it’s still testing glibc-2.14 — and I’m still finding packages that try to use RPC code and fail, I wouldn’t have guessed so many packages used it. And that also means I’m lacking Ruby altogether, as it fails badly during its own bootstrap, which in turn means that there are my own packages that I can’t test properly right now. I guess the second round of libpng-1.5 testing will happen in the 64-bit tinderbox that is still running glibc-2.13.

Back to work, I guess.

Library SONAME bumps and .la files: some visual clues

Before going on with the post, I’ll give users who’re confused by the post’s title some pointers on how to decipher it: I discussed .la files extensively before, and you can find a description of SONAMEs in another post of mine.

Long- and medium-time Gentoo users most likely remember what happened last time libpng was bumped last year, and will probably worry now that I’m telling them that libpng 1.5 is almost ready to be unmasked (I’m building the reverse dependencies in the tinderbox as we speak to see what breaks). Since I’ve seen through it with the tinderbox, I’m already going to tell you that it’s going to hurt, as a revdep-rebuild call will ask you to rebuild oh-so-many packages due to .la files that, myself, I’ll probably take the chance to move to the hardened compiler and run an emerge -e world just for the kicks.

But why is it this bad? Well, mostly it is the “viral propagation” of dependencies in .la files, which by itself is the reason why .la files are so bad. Since libgtk links to libcairo, and libcairo to libpng, any other library linking with libgtk will be provided with a -lpng entry to link to libpng, no matter whether it uses it or not. Unfortunately, --as-needed does not apply to libtool archives, so they end up overlinking, and only the link editor can drop the unused libraries.

For the sake of example, Evolution does not use libpng directly (the graphic files are managed through GTK’s pixbuf interface), but all of its plugins’ .la files will refer to libpng, which in turn means that revdep-rebuild will pick it up to rebuild it. D’oh!

So what about the visual clue? Well, I’ve decided to use the data from the gold based tinderbox to provide a graph of how many ELF objects actually link to the most common libraries, and how many libtool archives reference them. The data wasn’t easy to extract, mostly because at a first glance, the .la files seemed to be dwarfed by the actually linked objects.. until I remembered that ELF executable can’t have a corresponding .la file.

Library linking histogram

I’m sorry of some browsers might fail to display the image properly; please upgrade to a decent, modern browser as it’s a simple SVG file. The gnuplot script and the raw data file are also available if you wish to look at them.

The graph corroborates what I’ve been saying before, that the bump of libraries such as libexpat and libpng only is a problem because of overlinking and .la files. Indeed you can see that there are about 500 .la files listing either of the two libraries, when there are fewer than a hundred shared objects referencing them. And for zlib it’s even worse: while there are definitely more shared objects using it (348), there are four times as many .la files listing it as one of the dependencies, for no good reason at all.

A different story applies to GLib and GTK+ themselves: the number of shared objects using them is higher than the number of .la files that list them among their dependencies. I guess the reason here is that a number of their users are built with non-libtool-based build systems, and another good amount of .la files are removed by the less lazy Gentoo packagers (XFCE should be entirely .la free nowadays, and yes, it links to GTK+).

Now it is true that the amount of .la files and ELF files is not proportional to the number of packages installing them (for instance Evolution installs 24 .la files and 69 ELF objects), so you can’t really say much about the number of packages you’d have to rebuild when one of the three “virulent” libraries (libpng, libexpat, libz) is installed, but it should still be clear that marking five hundreds files as broken simply because they list a library that is gone, without their respective binary actually having anything to do with said library, is not the best approach we can have.

Dropping the .la file for libcairo (which is where libgtk picks it up) should probably make it much more resilient to the libpng bumps, which have proven to be the nastiest ones. I hope somebody will step up to do so, sooner or later.

Let’s call a spade a spade

Some people thought that my previous blog about the libpng debacle was meant as an attack, or a derision, of the work and effort that Samuli put into getting us out of the libpng-1.2 mess we were. Let me be clear: it wasn’t. I’m glad that Samuli is there, without him I would probably have left Gentoo a long time ago, frustrated by nothing happening.

But I think that we shouldn’t hide our head under the sand and keep repeating “it’s all good, it’s all good”. It isn’t.

Samuli did the best he could to get us out of the trouble, which is much bigger than a single person, two, three or even a dozen could properly tackle with all the possible bases covered, at this point, unless there is consensus among the whole developer body, which isn’t there.

But first, I have to say that one thing I’m going to maintain was done wrong, in the rush of the moment: stabling libpng-1.4 as part of a security fix. That was simply reckless. But the fault does not lie in a single person, but rather in the general spirit of avoiding doing extra work… still, reckless or not, it’s done and we have to live with it, and learn from it.

And learning seems like we are; finally there is enough traction for --as-needed to become default as I wrote recently and I got to thank Samuli and Kacper without whom we wouldn’t be able to reach that point at all. I unfortunately still don’t see the same traction behind the hidea of dropping .la files. Removing them from gtk+ which doesn’t install any static library and thus does not need the .la files at all, would have solved if not all, most of the problems people had with the upgrade…

Oh and by the way, the update script for libpng is a hack and it will leave behind .la files when packages will start dropping them, as it changes their checksum and timestamp without updating the package database. The same is true for the (generic) lafilefixer which is why I’ll recommend again to apply the incremental one, as declared in the two posts linked at the beginning.

Finally, libpng-1.5 is going to be released sometime soon… either we make a plan now, or we’re going to suffer through another identical pain soon. And libpng is known for having security issues quite often… I already sent Samuli the plan I was thinking on this morning; I’ll write more details about that as I find the time.

Stable users’ libpng update

Seems like my previous post didn’t make enough of a fuss to get other developers to work on feasible solutions to avoid the problem to hit stable users… and now we’re back to square one for stable users.

Since I also stumbled across two problems today while updating my stable chroots and containers, that represent the local install of remote vservers, and a couple of testing environments for my work, I guess it’s worth writing of a couple of tricks you might want to know before proceeding.

Supposedly, you should be able to properly complete the update without running the libpng-1.4.x-update.sh hack! (and this is important because that hack will create a number of problems on the longer run, so please try to avoid it!). If you have been using --as-needed for a decent amount of time, the update should come almost painless. Almost.

I maintained that revdep-rebuild should be enough to take care of the update for you, but it comes with a few tricks here that make it slightly more complex. First of all, the libpng-1.4 package will try to “preserve” the old library by copying it inside itself, avoiding dynamic link breakage. This supposedly makes for a better user experience as you won’t hit packages that fail to start up for missing libraries, but has two effects; one is that you may be running a program with both libpng objects around, which is not safe; the second is that revdep-rebuild will not pick up the broken binaries at all, this way.

Additionally, there is a slot of the package that will bring in only the library itself, so that binary-only packages linked to the old libpng can be used still; if you have packages such as Opera installed, you might have this package brought in on your system; this will further complicate matters because it will then collide with libpng-1.4… bad thing.

These are my suggested instructions:

  • get a console login, make sure that GNOME, KDE, any other graphical interface is not running; this is particularly important because you might otherwise experience applications that crash mid-runtime;
  • emerge -C =libpng-1.2* make sure that you don’t have the old library around; this works for both the old complete package and for the new library-only binary compatibility package;
  • rm -f /usr/lib/libpng12.so* (replace lib/ with lib64/ on x86-64 hosts; this way you won’t have the old libraries around at all; actually this should be a no-op since you removed it, but this way you ensure you don’t have them around if you had already updated;
  • emerge -1 =libpng-1.4* installs libpng-1.4 without preserving the libraries above; if you had already updated, please do this anyway, this way you’ll make sure it registers the lack of the preserved libraries;
  • revdep-rebuild -- --keep-going it shouldn’t stop anywhere now, but since it might, it’s still a good idea to let it build as much as it can.

Make also sure you follow my suggestion of running lafilefixer incrementally after every merge, that way you won’t risk too much breakage when the .la files gets dropped (which I hope we’ll start doing systematically soon), by adding this snipped to your /etc/portage/bashrc:

post_src_install() {
    lafilefixer "${D}"
}

Important, if you’re using binary packages! Make sure to re-build =libpng-1.4* after you deleted the file, if you had updated it before; otherwise the package will have preserved the files, and will pack it up in the tbz2 file, reinstalling it every time you merged the binary.

This post brought to you by the guy who has been working the past four years to make sure that this problem is reduced to manageable size, and that has been attacked, defamed, insulted and so on so forth for expecting other developers to spend more time testing their packages. If you find this useful, you might want to consider thanking him somehow…

Gentoo Failed Us Again

Kudos to Markos who basically gave me the title for this blog post!

I’ve spent the past week or so away from computers, I’m having some personal trouble, tied with bad migraines that would have burnt the hell out of me. I came back to updating my systems today, and I received a nasty surprise. Unmasked libpng 1.4 is wrecking havoc on so many systems that it’s not even funny.

I’m not complaining about the fact that we’ve finally unmasked the new libpng, it was needed and we should probably proceed on getting it stable soonish as well. What I complain about is that we’re hitting the same obstacles we hit with libexpat:

  • we still don’t have enabled --as-needed by default, which would reduce considerably the amount of packages that actually need to be rebuilt after such an update (and, by the way, not using --as-needed also increases tremendously the chance that some program will be loading both 1.2 and 1.4 versions, with the usual trouble of symbols’ collisions);
  • we still haven’t solved the problem with libtool archives , requiring the rebuild (or nasty hack) of a number of packages for no good reasons.

The worst part is that I have been preaching about both things for a while, a few years I’d say, and yet we have not gotten our heads out of the sand, so we hit users in the face after this kind of updates. Still.

Using --as-needed, only a fraction of the packages installed in the system will actually link against the libtool file, and only those would need to be rebuilt; without it, it’s very likely almost all the libtool-using packages, as well as most pkg-config using packages will be linking in libpng as a dependency of other libraries, such as GTK+ or Qt. And since you will start updating from those libraries, the newly-started packages will have problems because both libpng versions will be loaded at the same time: once from the library and once from the application.

For what concerns the .la files, the problem is mostly at buildtime and it makes it very very difficult to get out of the mess caused by the update, as a number of packages will start lamenting of missing targets for -lpng12. The solution for this would be to, obviously, carefully remove .la files within the ebuilds; this way we reduce the chances that the dependencies end up polluting packages that would, otherwise, have no involvement whatsoever with libpng.

Unfortunately, removing all the libpng file indiscriminately is a Bad Idea™ (and yes, I know some people experimented with that, I still maintain it’s a bad idea!). What you want to do is to reduce their impact as much as possible, but to do so you have to do some extra work, and that requires developers to understand the problem and accept working on a solution, even a temporary, imperfect one, to avoid staying in the problem area.

Do you remember when I checked eog (Eyes of Gnome) .la files resulting in stating clearly that they are totally useless? Well, eog still installs them; sure enough they are not excessively important in this situation, as they are not linked against, but they will create false positives in revdep-rebuild for instance. Even my flowchart has gone mostly unused by developers.

And we still don’t have any way to sanitise those files within Portage; lafilefixer does solve some stuff, but it’s not part of Portage proper, nor it’s integrated with it. If you want, in the future, to reduce your system’s pollution, do something like this:

# /etc/portage/bashrc
post_src_install() {
    lafilefixer "${D}"
}

This way the files will be sanitised before being merged in the system, and you won’t have to fix them manually.

Will we ever learn?

Software sucks, or why I don’t trust proprietary closed source software

If you follow my blog since I started writing, you might remember my post about imported libraries from last January and the follow up related to OpenOffice; you might know I did start some major work toward identifying imported libraries using my collision detection script and that I postponed till I had enough horsepower to run the script again.

And this is another reason why I’m working on installing as many packages as possible on my testing chroot. Now, of course the primary reason was to test for --as-needed support, but I’ve also been busy checking build with glibc 2.8, and GCC 4.3, and recently glibc 2.9 . And in addition to this, the build is also providing me with some data about imported libraries.

With this simple piece of script, I’m doing a very rough cut analysis of the software that gets installed, to check for the most commonly imported libraries: zlib, expat, bz2lib, libpng, jpeg, and FFmpeg:

    rm -f "${T}"/flameeyes-scanelf-bundled.log
    for symbol in adler32 BZ2_decompress jpeg_mem_init XML_Parse avcodec_init png_get_libpng_ver; do
    scanelf -qRs +$symbol "${D}" >> "${T}"/flameeyes-scanelf-bundled.log
    done
    if [[ -s "${T}"/flameeyes-scanelf-bundled.log ]]; then
    ewarn "Flameeyes QA Warning! Possibly bundled libraries"
    cat "${T}"/flameeyes-scanelf-bundled.log
    fi

This checks for some symbols that are usually not present without the rest of the library, and although it gives a few false positives, it does produce interesting results. For instance while I knew FFmpeg is very often imported, and I expected zlib to be copied in every other software, it’s interesting to know that expat as much used as zlib, and every time it’s imported rather than used from the system. This goes for both Free and Open Source Software and for proprietary closed-source software. The difference is that while you can fix the F/OSS software, you cannot fix the proprietary software.

What is the problem with imported libraries? The basic one is that they waste space and memory since they duplicate code already present in the system, but there is also one other issue: they create situations where even old, known, and widely fixed issue remain around for months, even years after they were disclosed. What preserved proprietary software this well to this point is mostly related to the so-called “security through obscurity”https://en.wikipedia.org/wiki/Security_through_obscurity. You usually don’t know that the code is there and you don’t know in which codepath it’s used, which makes it much harder for novices to identify how to exploit those vulnerabilities. Unfortunately, this is far from being a true form of security.

Most people would now wonder, how can they mask the use of particular code? The first option is to build the library inside the software, which hides it to the eyes of the most naïve researchers; by not loading explicitly the library it’s not possible to identify its use through the loading of the library itself. But of course the references to those libraries remain in the code, and indeed most of the times you’ll find the libraries’ symbols as defined inside executables and libraries of proprietary software. Which is exactly what my rough script checks. I could use pfunct from the seven dwarves to get the data out of DWARF debugging information, but proprietary software is obviously built without debug information so it would just waste my time. If they used hidden visibility, finding out the bundled libraries would be much much harder.

Of course, finding which version of a library is bundled in an open source software package is trivial, since you just have to look for the headers to find the one defining the version — although expat often is strippedf out of the expat.h header that contains that information. On proprietary software is quite more difficult.

For this reason I produced a set of three utilities that, given a shared object, find out the version of the bundled library. As it is it quite obviously doesn’t work on final executables, but it’s a start at least. Running these tools on a series of proprietary software packages that bundled the libraries caused me some kind of hysteria: lots and lots of software still uses very old zlib versions, as well as libpng versions. The current status is worrisome .

Now, can somebody really trust proprietary software at this point? The only way I can trust Free Software is by making sure I can fix it, but there are so many forks and copies and bundles and morphings that evaluating the security of the software is difficult even there; on proprietary software, where you cannot be really sure at all about the origin of the software, the embedded libraries, and stuff like that, there’s no way I can trust that.

I think I’ll try my best to improve the situation of Free Software even when it comes to security; as the IE bug demonstrated, free software solutions like Firefox can be considered working secure alternatives even by media, we should try to play that card much more often.

OpenJDK, the Vulnerabilities, and You

The title is an adaption of Alberto Sordi’s movie «Venezia, la luna e tu», for who wonders.

So, those of you who are actually reading my blog daily (I know there are still a few out there ;) ) know that I’ve started looking at OpenJDK since day one, hoping to be able to actually get it to work on Gentoo/FreeBSD to replace the Diablo JRE/JDK that are not updated for a while now (1.5 version), and with most probability vulnerable to something.

Well, beside a few patches here and there to use external libraries, that you can find here, and a minimal memory leak for a tempnam() value never freed, today I had the time to look at the unmodified libpng copy that OpenJDK distributes.. and the security issue that was being fixed last November is not fixed there. CVE-2006-5793 is still valid for sun-jdk-1.6 series, as well as 1.7 and OpenJDK of course.

This is just one thing you can know by having the source code; the issue would have probably be ignored for a lot of time if it wasn’t for the release of the code, and now that we know it’s there, it has to be fixed somehow.

I’ve filed bugs at Gentoo and at Sun, so that the issue can be taken care of. In the mean time, my local builds of OpenJDK uses external copies of libpng (as well as zlib, jpeg – for the most part – and giflib), so I’m relatively safe. It’s not a nasty, pernicious bug, as the code is only used for splashscreens.

Now the question is: how much time will it take before nasty pernicious bugs are found in JDK’s proper code?
At least I’m happy to see that Sun people are open to actually apply the needed changes.