This Time Self-Hosted
dark mode light mode Search

That’s not the DB you’re looking for

I have written before about the big problems with BerkDB and it was over six months ago that the problems started to show up with release 5 of the library. Despite this new version introduces a number of new features, a few of which I’m sure packages have started using, or will soon do, as well upstream moving on to work on the 5.1 series, Gentoo still doesn’t have this version available even in ~arch.

What’s going on here? Is this a failure of QA itself like people muse from time to time? Are people going to insist that ~arch is becoming “the new stable”? I don’t think any of this is right, actually.

There are a few new problems in all this; one of these is that unfortunately, for the way we’ve been installing Berkeley DB, all of the developers feel like “lingering” in fixing their Berkeley DB support, and rather let the package use the previous versions when they haven’t been updated to use the new ones. And this results in the current mess of dependencies, in packages depending on particular versions of sys-libs/db, and the need to keep eleven versions of the same package in tree at any time.

Now, you can guess that having more code around to maintain, to build and to install is usually a bad thing. But there are more reasons to have them around at all; one of these is that the binary format of berkdb files is not stable between versions, so if you have a huge amount of data stored in version, say, 4.3, you cannot simply switch to 5.0 or vice-versa. For this reason people often enough try to stick with a single version of berkdb per system and don’t upgrade even when new versions are available.

Unfortunately, the fact that some packages bring in older BerkDB version hampers the diagnosis of packages broken by the presence of BerkDB5; the problem is that some of them will definitely stop working at the mere presence of Berkeley DB 5; others will simply fall-back to something they seem to understand, by identifying the presence of BerkDB 4.8 or earlier and using that. Unfortunately this detection could easily be faulty and cause very obnoxious results.

The main issue is that while we do provide slotted names for the libraries (libdb-4.8.so and libdb-5.0.so), and a different directory for the headers (/usr/include/db4.8 and /usr/include/db5.0), we also provide compatibility links for libdb.so and /usr/include/db.h, both of which will cause autodetection to easily fall back to “whatever is available”, and depending on how crazy the checks are it could even use the header from one version and the library for another, which is a definitely bad idea.

So what am I doing and proposing to solve these issues? Well first of all I re-used a virtual machine I have laying around, removing all the old db versions and then rebuilding a few of the packages that I knew were having problems with db5, some of which I was able to fix, luckily. I’ll go through a few more soonish, since the tinderbox is not reliable to identify these problems (as it has all the versions installed).

A second task to handle is making sure that the packages that currently depend on “any version 4” of BerkDB are actually doing what they say. A common mistake was to use the dependency on any version 4 just because the code wasn’t going to work with version 3, which is wrong; and another common mistake is to require the presence of version 4 because it doesn’t work with 5, but still not ensure that version 4 is used (by leaving it to the code to decide what to use). I know it is a bit hazy to understand here, let’s just say that they might not do the right thing as it is.

Thankfully, Zac already wrote a script that can help us here, for my previous quest on fighting old automake last month (which is almost, but not completely, won), so we know what the specifics packages that need work are.

One lesson to be learnt here: if you’re looking to version-slot libraries, make sure you remove the generic fallback, and rather fix the packages relying on that before it turns out into a problem like this.

Comments 5
  1. Ciao Diego. The problem with Berkley DB 4 and 5 is the same problem with closed source databases.In a closed environment binary data stored in databases is never assumed to be compatible across architectures, operating systems or even versions, only the wire protocol is assumed to be the same if the db version is the same. And the reason why is probably one out of: performance, convenience or not to crash your head writing a probably useless compatibility layer.I had a client that used the database from Big Blue on intel/windows, intel/linux and ppc/linux, and migrating data meant to produce sql import scripts from live data from the application and then feed it to the new databases. Slow but steady and safe.I personally would make a case for application developers to adop both a db and at the same time an extensible wire protocol api (protocol buffers, thrift etc…) for enabling people to do a sane sitewide migration of their data.For the Automake/Autoconf problem instead, I hear the laughters coming from Redmond-lovers all over the world saying “Ah! AH! So you have DLL HELL too!!!”. And that is not a good thing. 🙁

  2. Uhm Davide, I think you missed about any point I coud have made in that post.

  3. It is likely to be a silly and offtopic question, but is _sys-libs/db_ any good? I did set @USE=”-berkdb”@ in make.conf and completely got rid of it, and I fail to see any drawback. Should normal user (that is, to say, user who doesn’t deal with BerkeleyDB directly) notice the difference?

  4. Most packages have optional berkdb support and you probably won’t notice it at all. Mostly, a number of servers use berkdb for performance (Apache does so for the rewrite databases, postfix for the aliases, …).Also, a number of those that you’d find non a normal desktop have a fallback on gdbm if berkdb is not installed; in general, though, I think performance for berkdb is superior to gdbm.

  5. Thanks for explanation.But if we’re talking about _gdbm…_ Since all packages I use on desktop/notebook has optional dependencies on _sys-libs/gdbm_ I thought about dropping it too. Is it bad idea, how do you think?_(Sorry for deepening offtopic)_

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.