Gentoo’s QA soft spots

Gentoo’s quality assurance is a quite difficult task to keep running smoothly: there are problems at so many levels that it’s not funny at all. I’ve been doing my share of the work, both with the tinderbox effort and with manual work to fix the issues when they come up. But there are some particular soft spots that I guess should be addressed sooner rather than later.

One of the most annoying problems I’m having lately is related to test failures with Python modules. Part of the failures were “expected” since they tied into the presence of Python 3.1 (I don’t use it on my standard system since it’s pointless to me but it’s available in the tinderbox), some were caused by GCC 4.4 and breaking of strict aliasing rules, others I’ve got no clue at all. Arfrerver cannot seem to reproduce them, and I don’t know how to look deeper into them. Having no idea where to start to find the cause of the failures, it also means that for me the problems are not solved.

But before somebody read something for Ruby and against Python in the fact that I have reported one if at all failure with Ruby packages’ tests, you should probably remember that we’re not running tests for any of the Ruby gems. Thankfully, as Alex posted yesterday are now being reviewed and added to the main Portage tree, and as we’re going to move to the fakegem eclass we’re also going to add the tests for all the packages. This is going to be a sorry work because I already noticed that a huge lot of packages in the Ruby land fail their tests, and that’s not only with Ruby 1.9 or JRuby!

Another thing that is in a definitely bad shape in Gentoo is the scientific software, and the libraries. I’m not sure why is that but it seems like most of the people writing scientific software have no clue about build systems, portability, good programming practises and stuff like that. Probably, it’s tied to the fact that people writing scientific software are mainly scientists who have some vague idea about programming (you definitely will find it pointless to seriously use software written by programmers that have some vague idea about science). The result is that not only the ebuilds are sometimes way overcomplicated for the task they have to take care of, but they often breach QA, and end up failing badly as soon as something changes in their dependencies.

This alone wouldn’t be the problem if not that half the sci team, and similarly half the cluster team that seems to have been supporting them, disappeared with time, and now the ebuilds are mostly unmaintained. Thankfully, we’ve got people like jlec who’re still updating the ebuilds in the overlay, but this is the catch: either you keep the stuff in the overlay entirely or you’ve got to fix it in the main tree as well. We’re going to need some hands porting stuff over the main tree from the sci overlay.

And a similar problem happens with the LISP overlay: packages that are in portage, and used by other packages as well, end up failing with time, and the solution you find around is “just use the overlay”, which is no solution at all. Again, fortunately, Ulrich is moving some (requested) ebuilds from the overlay to the main tree as an user indicates, but it’s still a sorry state, and a dependency over overlays that we’ve artificially increased and it’s showing its limitations right now, to me at least.

And finally, another problem comes up when you look at the external kernel modules, which as the kernel team expressed many times are “simply evil”. While some tend to be at least vaguely maintained (think about iscsitarget that I’ve ported to 2.6.32 myself, but which I basically just stopped maintaining — I moved to sys-block/tgt that uses the SCSI target module that is already present in the Linux kernel, this way I have no more external modules in my system and I don’t need to rebuild packages at every update, or fix the build if it breaks), a lot are not.

We’ve still got a few packages that are designed to work only on Kernel 2.4 (since we’re going to prune old glibc versions, shouldn’t we start pruning 2.4 kernel support as well? it has been so long that I doubt anybody is still interested in it even in the most conservative environments), and there are quite a few modules that only work with pretty old kernels, like 2.6.24 (current udev also does not support them). The problem with those is that often times they require specific hardware; lacking that there is no way to ensure they work. And some of the maintainers of those are going missing over time.

So these are some of the directions we should try to work on more heavily. Hopefully somebody else will also join me since I cannot really do much more than I’m doing already at this point.