Ebuilds have to be done right

There is quite some stir right now in the gentoo-dev mailing list following a mass-masking and for removal of packages for QA and security reasons; I think that Alec nailed down most of the issues with his comments:

> This thread is yet another proof that we need to introduce a “Upcoming
> masking” for unmaintained packages.


Shall I file those forms in triplicate and fax them to the main office sir?


Since amazingly I actually started the Treecleaners project; the
intent was actually to fix problems with packages. Part of the
problem is that there are hundreds of packages in the tree and the
fixes vary in complexity so it is difficult to create hard-and-fast
rules on when to keep a package versus when to toss it. One of the
things I like about masking is that it quickly gets people who
actually care about the package up to bat to fix it instead of leaving
it broken for months. I realize maintainers do not exactly enjoy this
kind of poking, however when things have been left for long enough I
believe our options become a bit more limited (in this case, masking
for removal due to unfixed sec bugs.)

Now, this is one issue I already partly addressed in my post about the five minutes fix myth but I’d like to remind again that even though we can easily spot some blatant problems with packages, having a package that compiles and that passes the obvious, programmatic QA checks does not really tell you much about the health status of the package; indeed, you won’t know whether the package works at all for the final users. Tying to another post of mine (incidentally, someone complained about my self-reference to posts… should I stop giving pointers and context?), I have to admit that sometimes it’s impossible to have a 100% coverage of packages, among other reasons because some packages need particular hardware, or particular software components set up, to be able to test them effectively. On the other hand, when such a complex setup isn’t strictly needed, we should expect some level of testing when making changes, minor or otherwise.

Sometimes, the mistakes are in the messages logged by the ebuild, at other times, the problem is that some important part of the package is missing, for example because the install phase is manually written in the ebuild, and upstream has added some extra utility that is installed by make install but is obviously ignored by the ebuild (and this actually is one of the points that Donnie brought up when I suggested to override upstream build systems with an eclass: we’d have to triple-check the new releases to make sure that no further source files or objects or libraries were added from the previously-packaged version). All these things are almost impossible to identify in a nice, programmatic scripted way, and need knowledge of a package, checking the release notes having an idea how to test the package.

For instance, I’ve been looking into sys-libs/libnss-pgsql today, as I have an interest on it; the ebuild installs the shared library manually (skipping libtool’s relinking phase, by the way); why did it do that? It takes four steps rather than the one needed for make install… well, the reason was obvious (but not commented upon!) after changing it to use make install: a post-install check actually aborted the merge: the problem was that the package installed the Name Service Switch library in /lib, but also installed the static archive and the libtool .la file, both of which are definitely not needed in /lib. The handwritten install solution solves the symptoms but not the following problems:

  • it will still build the static archive (non-PIC) version, causing twice the number of compiler calls;
  • it won’t tell upstream that they forgot one thing in their Makefile.am;
  • it’s still wrong because the libraries it links to are not available in /lib: it won’t be working before mounting /usr if /usr is on a different partition (who does still do that, nowadays?!) — it should be in /usr itself, at this point (and yes, you can do that: both GNU libc and FreeBSD – which has a different NSS interface by the way – check both /usr and /usr/lib).

Incidentally, why does glibc’s default nsswitch.con use db files for services, protocols, svc and ethers? Their presence in there means that each time you call into glibc to resolve a port name, it makes eight open() syscalls trying to find the file. It doesn’t sound too right.

I have patches, and I have a new ebuild, I’ll see to send them upstream and get it committed (by someone else, or by picking maintainership for it) in the next day or so. In the mean time I have to get back to my work.