I have written before about the big problems with BerkDB and it was over six months ago that the problems started to show up with release 5 of the library. Despite this new version introduces a number of new features, a few of which I’m sure packages have started using, or will soon do, as well upstream moving on to work on the 5.1 series, Gentoo still doesn’t have this version available even in ~arch.
What’s going on here? Is this a failure of QA itself like people muse from time to time? Are people going to insist that ~arch is becoming “the new stable”? I don’t think any of this is right, actually.
There are a few new problems in all this; one of these is that unfortunately, for the way we’ve been installing Berkeley DB, all of the developers feel like “lingering” in fixing their Berkeley DB support, and rather let the package use the previous versions when they haven’t been updated to use the new ones. And this results in the current mess of dependencies, in packages depending on particular versions of sys-libs/db, and the need to keep eleven versions of the same package in tree at any time.
Now, you can guess that having more code around to maintain, to build and to install is usually a bad thing. But there are more reasons to have them around at all; one of these is that the binary format of berkdb files is not stable between versions, so if you have a huge amount of data stored in version, say, 4.3, you cannot simply switch to 5.0 or vice-versa. For this reason people often enough try to stick with a single version of berkdb per system and don’t upgrade even when new versions are available.
Unfortunately, the fact that some packages bring in older BerkDB version hampers the diagnosis of packages broken by the presence of BerkDB5; the problem is that some of them will definitely stop working at the mere presence of Berkeley DB 5; others will simply fall-back to something they seem to understand, by identifying the presence of BerkDB 4.8 or earlier and using that. Unfortunately this detection could easily be faulty and cause very obnoxious results.
The main issue is that while we do provide slotted names for the libraries (libdb-4.8.so
and libdb-5.0.so
), and a different directory for the headers (/usr/include/db4.8
and /usr/include/db5.0
), we also provide compatibility links for libdb.so
and /usr/include/db.h
, both of which will cause autodetection to easily fall back to “whatever is available”, and depending on how crazy the checks are it could even use the header from one version and the library for another, which is a definitely bad idea.
So what am I doing and proposing to solve these issues? Well first of all I re-used a virtual machine I have laying around, removing all the old db versions and then rebuilding a few of the packages that I knew were having problems with db5, some of which I was able to fix, luckily. I’ll go through a few more soonish, since the tinderbox is not reliable to identify these problems (as it has all the versions installed).
A second task to handle is making sure that the packages that currently depend on “any version 4” of BerkDB are actually doing what they say. A common mistake was to use the dependency on any version 4 just because the code wasn’t going to work with version 3, which is wrong; and another common mistake is to require the presence of version 4 because it doesn’t work with 5, but still not ensure that version 4 is used (by leaving it to the code to decide what to use). I know it is a bit hazy to understand here, let’s just say that they might not do the right thing as it is.
Thankfully, Zac already wrote a script that can help us here, for my previous quest on fighting old automake last month (which is almost, but not completely, won), so we know what the specifics packages that need work are.
One lesson to be learnt here: if you’re looking to version-slot libraries, make sure you remove the generic fallback, and rather fix the packages relying on that before it turns out into a problem like this.