Why is Berkeley DB slotted (but Boost isn’t anymore)

It’s no mystery I hated Boost’s slotted package handling, and that’s captured by the fact that I unslotted it last November as it was doing nothing good for us. But one of the most common comparisons at that point is with Berkeley DB (sys-libs/db) which we still have slotted and I have no intention to change any time soon. So let me document a bit the reason behind this, as they tie up quite nicely with some other tasks connected to berkdb that I’ll get to at the end of the post.

First of all, whether you like or not the non-exactly-clinical assessment, Berkeley DB APIs have been much more stable than Boost’s — even the bump to 5.0 hasn’t been as bad as many thinks, the biggest problem I’ve noticed with it is a broken version check logic:

// use db 4.3+ API
// use db 4.2- API

Sometimes it goes even worse and instead of checking for a major of exactly 4, it checks for major greater or equal to 4 and minor greater or equal to X — which ends up working on 5.x but not on 5.(x-1). The fixes for these are trivial and usually are more a nuisance than a real problem.

But this is only part of the reason. While the API is stable, the ABI really is not (it’s possible to change API in such a way that it produces a different ABI, but it requires no change to the source code), which means that we do need to rebuild anything built against the older version, after an update. This is not so uncommon though, for unslotted packages as well, so it can’t be enough to warrant the slotting.

Here is the all-important detail: Berkeley DB mandates a file format, and that file format changes between versions! Okay it does not always change, but it changes often enough. What happens in this situation is that the files created by Berkeley DB version X will not be read by version Y and vice-versa — this is not much of a problem when the DB files are used as cache for fast access – like Postfix maps or, in my case, Apache rewrite maps – but it is when you use them for actually storing information that can’t be regenerated directly, and there pam_userdb comes to mind.

Indeed, for a long time, pam_userdb was built against a single copy of Berkeley DB, rather than using the system one — this way it would not change file format between rebuilds. Unfortunately this also had the effect of skipping over security updates, which was not considered very good. The current PAM ebuilds can build the module against system Berkeley DB, as the rare situations where these modules are used do not warrant the extra work to maintain the bundled copy of it.

So while this gets us near to the other topic I want to talk about in this post, how does slotting the library get us? The answer is that we don’t just slot the library, we also slot the tools that come with it, including the dump and restore tools. What happens then is that you can dump with the old slotted tool the content of the file that you can’t recreate, and restore it with the new one (the new tools read the dump of the older ones). By slotting the package, you’ll always have the set of tools that read the older databases.

This is the reason why we have a slotted Berkeley DB.

Now to complete the post I promised something else. Well, you probably have noticed because of the fallout, but there is work underway for a realistic native multilib eclass, that allows us to build in a single pass both 64- and 32-bit version of a single package. Why is this relevant? Well, if you got a 32-bit proprietary service that uses PAM, you need a 32-bit PAM library and 32-bit PAM modules as well…

If we were to provide pam_userdb with the emul-linux packages (which we do!) it’s very well possible that the two versions of Berkeley DB might not match on a given system, because the user is using a different one altogether. Being able to use a single ebuild for both ABIs would make much more sense, and is definitely something I’m looking forward to!