Boosting my morale? Nope, still not.

I’m not sure if you’re following the development of this particular package in Gentoo, but with some discussion, quite a few developers reached a consensus last week that the slotted dev-libs/boost that we’ve had for the past couple of years had to go, replaced with a single-slot package like we have for most other libraries.

The main reason for this is that the previous slotting was not really doing what the implementers expected it to do — the idea for many is that you can always depend on whatever highest version of Boost you support, and if you don’t support the latest, no problem, you’ll get an older one. Unfortunately, this clashes with the fact that only the newest version of Boost is supported by upstream with modern configurations, so it happens that a new C library, or a new compiler, can (and do) make older versions non-buildable.

Like what happened with the new GLIBC 2.16, which is partially described in the previous post of the same series, and lately summarized, where there’s no way to rebuild boost-1.49 with the new glibc (the “patch” that could be used would change the API, making it similar to boost-1.50 which ..), but since I did report build failures with 1.50, people “fixed” them by depending on an older version… which is now not installable. D’oh!

So what did I do to sort this out? We dropped the slot altogether. Now all Boost versions install as slot zero and each replace the others. This makes it much easier for both developers and users, as you know that the one version you got installed is the one you’re building against, instead of “whatever has been eselected” or “whatever was installed last” or “whatever is the first one that the upstream user is finding” which was before — usually a mix of all.

But this wasn’t enough because unfortunately, libraries, headers and tools were all slotted so they all had different names based on the version. This was handled in the new 1.52 release which I unmasked today, by going back to the default install layout that Boost uses for Unix: the system layout. This is designed to allow one and only one version of each Boost library in the system, and does neither provide a version nor a variant suffix. This meant we needed another change.

Before going back to system layout, each boost version installed two sets of libraries, one that was multithread-safe and oen that wasn’t. Software using threads would have to link to the mt variant, while those not using threads could link to the (theoretically lower-overhead) single-thread variant. Which happened to be the default. Unfortunately, this also meant that a ton of software out there, even when using threads, simply linked to the boost library they wanted without caring for the variant. Oopsie.

Even worse, it was very well possible, and indeed was the case for Blender, that both variants were brought in, in the process’s address space, possibly causing extremely hard to debug issues due to symbol collisions (which I know, unfortunately, very well).

An easy way to see (using older versions of boost ebuilds) whether your program is linking to the wrong variant, is to see if you see it linking to libboost_threads-mt and at the same time to some other library such as libboost_system (not mt variant). Since our very pleasant former maintainer decided to link the mt variant of libboost_threads to the non-mt one, quite a few ways to check for multithreaded Boost simply … failed.

Now the decision on whether to build threadsafe or not is done through an USE flag like most other ebuilds do, and since only one variant is installed, everybody gets, by default and in most cases, the multithread-safe version, and all is good. Packages requiring threads might want already to start using dev-libs/boost[threads(+)] to make sure that they are not installed with a non-threadsafe version of Boost, but there are symlinks in place righ tnow so that even if they are looking for the mt variant they get the one installed version of boost anyway (only with USE=threads of course).

One question that raised was “how broken will people’s systems be, after upgrading from one Boost to another?” and the answer is “quite” … unless you’re using a modern enough Portage (the last few versions of the 2.1 series are okay, and most of the 2.2), which can use preserve-libs. In that case, it’ll just require you to run a single emerge command to get back on the new version, and if not, you’ll have to wait for revdep-rebuild to finish.

And to make things sweeter, with this change, the time it takes for Boost to build is halved (4 minutes vs 8 on my laptop), while the final package is 30MB less (here at least), since only one set of libraries is installed instead of two — without counting the time and space you’d waste by having to install multiple boost versions together.

And for developers, this also mean that you can forget about the ruddy boost-utils.eclass, since now everything is supposed to work without any trickery. Win-win situation, for once.

Gentoo Linux Health Report — October 2012 Edition

I guess it’s time for a new post on what’s the status with Gentoo Linux right now. First of all, the tinderbox is munching as I write. Things are going mostly smooth but there are still hiccups due to some developers not accepting its bug reports because of the way logs are linked (as in, not attached).

Like last time that I wrote about it, four months ago, this is targeting GCC 4.7, GLIBC 2.16 (which is coming out of masking next week!) and GnuTLS 3. Unfortunately, there are a few (biggish) problems with this situation, mostly related to the Boost problem I noted back in July.

What happens is this:

  • you can’t use any version of boost older than 1.48 with GCC 4.7 or later;
  • you can’t use any version of boost older than 1.50 with GLIBC 2.16;
  • many packages don’t build properly with boost 1.50 and later;
  • a handful of packages require boost 1.46;
  • boost 1.50-r2 and later (in Gentoo) no longer support eselect boost making most of the packages using boost not build at all.

This kind of screwup is a major setback, especially since Mike (understandably) won’t wait any more to unmask GLIBC 2.16 (he waited a month, the Boost maintainers had all the time to fix their act, which they didn’t — it’s now time somebody with common sense takes over). So the plan right now is for me and Tomáš to pick up the can of worms, and un-slot Boost, quite soon. This is going to solve enough problems that we’ll all be very happy about it, as most of the automated checks for Boost will then work out of the box. It’s also going to reduce the disk space being used by your install, although it might require you to rebuild some C++ packages, I’m sorry about that.

For what concerns GnuTLS, version 3.1.3 is going to hit unstable users at the same time as glibc-2.16, and hopefully the same will be true for stable when that happens. Unfortunately there are still a number of packages not fixed to work with gnutls, so if you see a package you use (with GnuTLS) in the tracker it’s time to jump on fixing it!

Speaking of GnuTLS, we’ve also had a smallish screwup this morning when libtasn1 version 3 also hit the tree unmasked — it wasn’t supposed to happen, and it’s now masked, as only GnuTLS 3 builds fine with it. Since upstream really doesn’t care about GnuTLS 2 at this point, I’m not interested in trying to get that to work nicely, and since I don’t see any urgency in pushing libtasn1 v3 as is, I’ll keep it masked until GNOME 3.6 (as gnome-keyring also does not build with that version, yet).

Markos has correctly noted that the QA team – i.e., me – is not maintaining the DevManual anymore. We made it now a separate project, under QA (but I’d rather say it’s shared under QA and Recruiters at the same time), and the GIT Repository is now writable by any developer. Of course if you play around that without knowing what you’re doing, on master, you’ll be terminated.

There’s also the need to convert the DevManual to something that makes sense. Right now it’s a bunch of files all called text.xml which makes editing a nightmare. I did start working on that two years ago but it’s tedious work and I don’t want to do it on my free time. I’d rather not have to do it while being paid for it really. If somebody feels like they can handle the conversion, I’d actually consider paying somebody to do that job. How much? I’d say around $50. Desirable format is something that doesn’t make a person feel like taking their eyes out when trying to edit it with Emacs (and vim, if you feel generous): my branch used DocBook 5, which I rather fancy, as I’ve used it for Autotools Mythbuster but RST or Sphinx would probably be okay as well, as long as no formatting is lost along the way. Update: Ben points out he already volunteered to convert it to RST, I’ll wait for that before saying anything more.

Also, we’re looking for a new maintainer for ICU (and I’m pressing Davide to take the spot) as things like the bump to 50 should have been handled more carefully. Especially now that it appears that it’s basically breaking a quarter of its dependencies when using GCC 4.7 — both the API and ABI of the library change entirely depending on whether you’re using GCC 4.6 or 4.7, as it’ll leverage C++11 support in the latter. I’m afraid this is just going to be the first of a series of libraries making this kind of changes and we’re all going to suffer through it.

I guess this is all for now.

Boosting my morale? I wish!

Let’s take a deep breath. You probably remember I’m running a tinderbox which is testing some common base system packages before they are unmasked (and thus unleashed on users); in particular I use it for testing new releases of GCC (4.7) and GLIBC (2.16).

It didn’t take me much after starting GLIBC 2.16 testing to find out that the previously-latest version of Boost (1.49) was not going to work with it. The problem is that there is a new definition that both of them tried to provide, TIME_UTC (probably relates to C++11/C11). Now unfortunately since it’s an API breakage to replace that definition, it means that it can’t be applied to the older versions, and it means that packages need to be fixed. Furthermore, the new 1.50 version has also broken the temporary compatibility introduced in 1.48 for their filesystem module’s API. This boils down to a world of pain for maintainers of packages using Boost (which includes yours truly, luckily none is directly maintained by me, just proxy).

So I had to add one extra package to the list, and ran the reverse dependencies — the positive side is that it didn’t take long to fill the bug although there are still a few issues with older boost versions not being supported yet. This brought up a few issues though…

First problem is the way Boost build itself, and its tests, is obnoxious: it’s totally serial, no parallelisation at all! The result is that to run the whole testsuite it takes over eight hours on Excelsior! The big issue is that for the testing, it takes some 10-20 times longer to build the test than run it (lovely language, C++), so a parallel build of the tests, even if the tests were executed in series, would mean a huge impact, and would also likely mean that the tests would become meaningful. As they sit, the (so-called) maintainer of the package has admitted to not run them when he revbumps, but only on full new versions.

The second problem is how Boost versions are discovered. The main issue is that Boost, instead of using proper sonames to keep binary compatibility, embeds its major/minor version pair in the library name — although most distributions symlinks the preferred version to the unversioned name (in Gentoo this is handled through the eselect boost tool). This is not extremely far from what most distributions do with Berkeley DB — but it causes problem when you have to find which one you should link to, especially when you consider that sometimes the unversioned name is not there at all.

So both CMake and Autotools (actually Autoconf Archive) provide macros to try a few different libraries. The former does it almost properly, starting from the highest version and then going in descending order — but uses a pre-defined list of versions to try! Which mean that most packages with CMake will try 1.49 first, as they don’t know that 1.50 is out yet! If no known version is found, then it will fallback to the unversioned library, which makes it work quite differently whether you have only one or more than one version installed!

For what concerns the macros from Autoconf Archive, they are quite nasty; from one side they really aren’t portable at all, as they use GNU sed syntax, they use uname (which makes them totally useless during cross-compilation), but most worrisome of all, is that they use ls to find which boost libraries are available and then take the first one that is usable. This means that if you have 1.50, 1.49 and 1.44 installed, it’ll use the oldest! Similarly to CMake, it uses the unversioned library last. In this case, though, I was able to improve the macros by reversing the check order, which makes them work correctly for most distributions out there.

What is even funnier about the AX macros (that were created for libtorrent, and are used by Gource, which I proxy-maintain for Enrico), is that due to the way they are implemented, it is feasible that they end up using the old libraries and the new headers (it was the case for me here with 1.491.50, as it didn’t fail to compile, just to link). As long as the interface used have different names and the linker will error out, all is fine. But if you have interfaces that are source-compatible, linker-compatible, but with different vtables, you have a crash waiting to happen.

Oh well…