The Berkeley DB fiasco — Barely avoided!

This is one of my ranty posts; so if you don’t want to read me complaining about various things, in both my personal life and Gentoo, you can simply stop reading here and sorry for the noise.

I’m currently in a bit of a pinch; as I stated before I had to take a week off, because of some extra stress in my life drove me to very nasty migraines, and I was taking way too many meds for it. Luckily, I’ve now been able to lighten my load a bit, and thanks to a few other coincidences, I don’t have to break my neck working for a month or two, time enough to get in better shape.

The state of Gentoo when I came back, as I said, wasn’t very suggestive; not only the libpng bump that caused disarray for many many users (and could have been avoided if the other developers listened to me when I suggested both solutions, as messed up as they sounded), there was another, less visible problem, which I hit myself, but most users wouldn’t have noticed: Robin bumped Berkeley DB to version 5.0. I hit this because I had sys-libs/db unmasked a long time ago to check it against my packages in particular.

It might be interesting to note that this version of BerkDB implements a DB-backed SQLite-compatible interface, (libdbsql) which is something that lots of people, me included, are curious about for the future; having an alternative to SQLite (which is usually pretty slow) is not a bad thing, considering how much software relies on it, including Firefox.

Now, while with almost all the BerkDB releases we have some kind of problems; most of the time, it’s not even a problem with API breakage, but rather a problem with the software using BerkDB, trying to be smarter, detecting and accepting only a limited range of BerkDB versions, even when they work perfectly fine with newer versions. Then there are the API problems, of course. This basically brings us a number of problems:

  • API changes — these are unavoidable, of course; they aren’t usually too big changes, which means that it’s usually trivial to fix the packages for the new versions;
  • packages doing autodetect to find the latest DB version — just a minor annoyance, usually; the packages know that most distributions install multiple BerkDB versions, and try to look for the latest version available, checking in descending order, so 4.8, 4.7, 4.6 and so on; it’s not a bad thing, but there are two catches:
    **** packages need to have a way to override the detection, so that we can use our db-use.eclass to force our latest version (say, the package checks only up to 4.6… we can either patch the package to detect 4.7, 4.8 and now 5.0, or we can simply give the package an order to use what we know being the latest one, without patches);
    **** packages need to understand that newer versions provide (mostly) the same options as the older ones; this usually relates only to the features that are introduced in a given version, and are maintained in later; if a given feature is added to 4.6, you should expect it to be available in 4.7 and 4.8 as well;
  • some packages explicitly test for the version used to be within a given expected range; even when they do provide an override, they check the declared version and fail, these are nasty, as they need to be patched every time, sometimes waiting for upstream to accept them;
  • some packages compare versions as “equal” rather than “greater or equal”; this is again a minor, easy-to-patch problem (and rare enough), if it’s used to check the minor version of the package; it is becoming a problem with DB 5.0 as the major version changed, and the minor is now… lesser;
  • some package distinguish between the old DB versions (db3) and the “new” ones (db4), as between the two the API changed a lot; to do so, they check the major version of the DB library… problem is that it has changed now, and they only check for 34 rather than for “4 or later”.

There are a couple of extra problems caused by BerkDB upstream in the 5.0 version, but those are not something that I’m caring about here, they are the usual problem with any package.

While Robin’s idea was to unmask Berkeley DB as soon as the testsuite passed green, I hope I was able to slow him down on this; when I saw the implication of the new version in my running system – and decided for once to back-off rather than fixing what I used, like I usually do – I prepared to run a special cycle of tinderboxing, building all the reverse-dependencies of sys-libs/db to see how many failures we would be getting with the new version. The tracker bug give a good idea of the extension of the problem.

What I did not foresee when I started the cycle, is that the build-time failures risk to be just the tip of the iceberg. I’m actually very glad that I have decided to run with the tinderbox right away, as the packages that shown the symptom first aren’t really among the common ones, and very few people run testsuites on these. You might have guessed the problem now: runtime failures.

Indeed, it seems like Berkeley DB 5.0 has some stricter runtime checks on the status of the files and the database, so that some operations can only be executed after the database has reached given states. If it wasn’t for testsuites, we probably would have had to experience these problems at runtime, breaking user systems (hopefully, not production systems, though!) before we could find the root cause of the problem and fix it.

Unfortunately, as I often wrote about there are too many packages with no testsuites; and so many ebuilds that restrict them without a good reason (“package’s testsuite need a local daemon running” “RESO FIXED restricted” “Eh? No! REOPEN” is a common reduced exchange between me and other maintainers on the matter of testsuites), so even if we can get complete pass on the tinderbox, we’re doomed to find more of these problems at runtime.

Anyway, the tinderbox is still running, not all the packages have hit yet; I have at least one package of mine to fix, and I’ll do so today hopefully, and in the next weeks all of us developers will try our best to avoid another huge failure with updates. On my side, I accept thank you tokens as this kind of work is not only thankless on average, but sometimes even become controversial as I have to fight to get some fixes to be done, when they require some more than the basic effort to handle the package.