A few risks I see related to the new portage 2.2 preserve-libs behaviour

Fellow developer Zhang Le wrote about the new preserve-libs feature from Portage 2.2 that removes the need for revdep-rebuild.

As I wrote on gentoo-dev mailing list when Marius asked for comments, there are a few prolems with its implementation as it is, in my not-so-humble opinion. (Not-so-humble because I know exactly what I’m talking about, and I know it’s a problem).

Let’s take a common scenario, a system where --as-needed as never used, that is updating a common library from ABI 0 to ABI 1 (so with a change of soname). This library might be, for instance, libexpat.

I don’t want to discuss here what an ABI is and what an ABI bump consists of. Let’s just say that when you make an ABI bump you either remove functions, or you change the meaning of some functions (like the parameters, the behaviour or other things like those).

In the first case the bump is annoying but not much of a problem, executables stop being loaded because symbols are undefined; with lazy-loaded executables, they might die in the middle at the moment the undefined symbols is called, but that’s not our concern here.

The problem comes when a function with the same exact name changes meaning, parameters or return type. In this case, the executable might pass too much or too little data to the function, he pointers might be referring to something completely different, or might be truncated. In general, when you change the interface or the meaning of a function, if the executables built to use the previous version are executed with the new version, they’ll either crash down or behave in a corrupted manner. Which are two subtle issues which we should be looking forward to, as they are hard to debug unless you know about them.

So let’s return to our library changing ABI. Let’s say we have libfooA.so.0 and libfooA.so.1 installed, the first is preserved by preserved-libs, the second is the new one. libfooB.so.$anything links to libfooA as it uses it directly, so it will be in the set of packages to rebuild.

Introducing libfooC.so.$anything that links to libfooB.so.$anything, but as destiny wishes, is also using libfooA.

At this point before the ABI bump we have libfooB depending on libfooA.0, and libfooC depending on libfooB and libfooA.0; after the bump, we decide to rebuild only libfooB, which means that libfooB now depends on libfooA.1 while libfooC is still depending on libfooA.0.

What this means is that, minus symbol versioning, the same symbol would have two (probably different) definitions, which will collide one with the other, leading to subtle crashes, misbehaviour and other fun-to-debug problems.

The problem is that the two ABIs of the libraries are both being loaded in the same userspace, which is a very bad thing, unless the symbols are versioned. On the other hand, symbol versioning is a bit of a mess, it’s not implemented by all operating systems, and I find it quite convoluted.

At the moment I don’t see anything in portage that stops you from shooting in your own foot by doing a partial rebuild. I hope I’m mistaken, but if I’m not, please remember to always do a full rebuild, rather than a partial one. Instead of having programs not starting, you might have programs corrupting your data, otherwise.