I start sincerely to get tired about this, but here it comes: testsuites are important to know whether a package is good or not; so the package you’re developing should have a testsuite. And the ebuild you’re writing should execute it. And you should run it!
This is something I noticed first in the tinderbox, but some people even complained about that directly to me: a lot of ebuilds make a mess with testsuites. The problems with them range from not running them at all and restricting without really good reasons, to testsuites that are blatantly broken because they were never tested before.
I guess the first problem here is the fact that while the test feature that executes the actual testuites is disabled by default, Portage provides a default src_test
. Why is this a problem, you say? After all, it really does add some value when the testsuite is present, even if the maintainer in the ebuild didn’t spend some extra minutes writing it down. Unfortunately, while it adds test phases to lots of ebuilds where they are correctly executed, it also adds them to packages that don’t have testsuites at all (but, if they use automake
, it’ll still run a long make
chain, recursive if the build system wasn’t built properly!), to packages that have different meanings for the check
or test
targets (like all the qmail-related packages, for which make check
checks the installation paths and not the just-built software), and to packages whose testsuite is not only going to fail, but also to hog a computer down for a pretty long time (did somebody say qemu?).
Now, the problems with tests does not stop here with the default src_test, otherwise it would also be pretty easy to fix; the problem is that we don’t really have a clear policy on how to deal with the testsuites, especially those that fails. And I have to say that I’m as bad as the rest of the group when it comes to deal with the testsuites. I can, first thing, bring up two packages I deal with that have problems with their testsuites.
PulseAudio, which is a pretty important package, you’d say, has a complex testsuite; for quite a long time in the test releases (that in Gentoo become RCs even though they really are not, but that’s another issue here) one of the tests (
mix_test
) failed, because the test itself wasn’t being updated to support the new sample format, this was only fixed recently (there were other tests failure, but those I fixed myself at the first chance); on the other hand, the tests for the translations, that are also part of the package’s testsuite, are still not executed: the current version of intltool (0.40) does not interpret correctly theconfigure.ac
file (it parses it like it was a text file, rather than accepting that it’s a macro file), and causes the test to fail in a bad way; the solution for this part is to package and add a dependency over intltool 0.41, but seems like nobody is sure whether that’s an official release or a development release. For now, only the software tests are executed;- speaking of docbook the XSL stylesheet for Docbook used to have a complex testsuite that checked that the output was what it was supposed to be; now they weren’t really comprehensive and indeed at least one bug was missed by the testsuite in the whole 1.74 series. Starting from 1.75 the new testsuite should probably be tighter and support more than just one XSLT engine… the problem is that upstream doesn’t seem to have described the testing procedure anywhere, and I haven’t figured out how it works yet, with the result that the testsuite is now restricted in the ebuilds (with a test USE flag that is not masked, I already filed an enhancement request for Portage to handle this case).
At this point what I’m brought to wonder is: how harsh should we be on the packages with flawed, broken, or incomplete testsuites? Should they be kept in package.mask? Should they not reach stable? The stable-stopper for this kind of problems used to be Ferris, and it’s one reason I’m really going to miss him badly. On the other hand it seems like a few other arch team members started applying the same strictness, which I don’t dislike at all (although it’s now keeping libtool-2.2 from going stable, and with that PulseAudio as well). But what about the packages that already fail in stable? What about the packages failing because of mistakes in the testsuites?
There are also arch-specific issues, for instance I remember some time ago Linux-PAM requiring a newer glibc than it was available on some arches for its testsuite to proceed correctly… the running logic of PAM, though, seemed to work fine beside the test. What should have been the correct approach? Make the whole of Linux-PAM depend on the new glibc, making it unusable by some arches, or just the tests? I decided for the tests, because the new version was really needed, but on a pure policy point of view I’m not sure if it was the right step.
I guess the only thing I can add here is, once again, if you need to restrict or skip tests, keep the bug open, so that people will know that the problem has only been worked around and not properly fixed. And maintainers, always remember to run the testsuites of your packages when bumping, patching or otherwise changing your packages. Please!