I have already declared why a tinderbox is not enough but I think I should reprise this topic and write again why me, Patrick, and Ryan can’t find all the issues, even if we all put our efforts together.
The first problem is the sheer amount of combinations of packages: the different USE flags enabled, the different arches, the different order of merge, the different packages installed (or the way they are installed), all these differences combine to produce a way too high number of combinations to be able to test it in a lifetime. Of course we can probably find them most outstanding bugs quite quickly at a first pass with default USE flags; thanks to EAPI 2 and USE dependencies we can also have a decently clean track record of what needs to be enabled. On the other hand, it would be interesting to try disabling all the optional supports (but the ones that are strictly needed) and see how the ebuilds behave.
Then there is the problem with architectures, while in the past the architecture with the most keywords was x86, I”m not really sure if this is still true nowadays with the increment of amd64-based systems, I know I don’t usually keyword for ~x86 all the stuff I add to Portage, since I don’t run x86 anywhere. And while such packages probably can be compiled and used on x86, there are some that don’t, and there are issues that don’t apply to x86 at all but just other systems.
There are problems with packages that provide kernel modules, because they tend to break badly between one release and the other (myself I only help maintaining one of such packages nowadays, iscsitarget, and I’m usually good enough to get it to work properly a day or two after the new kernel is released – which means this weekend I’ll probably be doing another patch). I also had to blacklist a few packages that are only available for 2.4 kernels (why do we have a kernel 2.4 profile but none for 2.6? and why don’t we mask them on a profile level? no idea).
What about alternative packages? Collisions within packages create a bit of a problem when they are solved by blockers instead of allowing side-by-side install (and sometimes you don’t have any other choice than blocking one the other, see the two PAM providers). And there are still lots of packages that fail to merge (and thus in my case are re-added to the tinderbox build queue because they don’t result to be installed!) because of non-blocked collisions, sometimes for simply too-generic names.
Then there is the problem of overlays: my tinderbox can only check packages that are in the main tree (and not masked); all the packages in overlays gets ignored because they are, obviously, not added to the tinderbox. And the sheer amount of overlays make it likely impossible to deal with all of them. Let’s not even start thinking about the combinations that are added by different overlays added and the order in which they are added (which is one extra argument for not splitting our tree in multiple overlays!).
Are you not convinced yet? Well I really wish I had more convincing numbers, but I really don’t; I only know that the amount of work that my tinderbox effort – which is likely the less sophisticated one – is involving, is likely to be just a minuscule part of the effort needed to have a real quality control in Gentoo. And even though I can file, test, apply and close bugs, I cannot solve all the issues, because there are way too many variables in play.
Anyway, I’m taking a break now because my head is tremendously tired, and I’ve been filing bugs, working and scanning documents all day long. I could use some play time instead now…
I don’t know how in detail your tests look like, but it there are scripts which run automated on a Gentoo install, I’d be happy to help by providing one of my boxes (x86/amd64) to run as much as test possible?Can you point me to some more information about your test framework (if available and if this makes sense at all).Regards, Elias P.
It’s easy. You can’t test everything. No chance.So just accept it and try to get tests for the most important stuff, general case and so on. That way many bugs will be found quite fast and the remaining ones can be filed by people stumbling upon them. If I encounter bugs only rarely I’m much more inclined to actually file them.
Maybe a distributed approach would be best? Publish all the results of the tests on the per-maintainer QA web page and on the per-package QA web page. That way when people do work on a package they can see things that need doing.BTW, Lucas Naussbaum from Debian has access to Grid5000, a cluster of 5000 amd64 nodes in France. If you ever want to run something more time consuming than is possible on your tinderbox that could benefit from parallelism, you might try asking him to run a job for you.
I doubt Debian is interested in the kind of tinderboxing we do (or they would have done so already, it’s much easier for them).On the other hand, I’m trying to get the special code I’m using in the tinderbox merged with Portage proper and new tests added there so that the thing can be made much easier to replicate on other systems..