Bigger, better tinderbox

Well, not in the hardware sense, not yet at least, even though it’d be wicked to have an even faster box here (with some due control of course, but I’ll get back to that later). I’ll probably get some more memory and AMD Istanbul CPUs when I’ll have some cash surplus — which might not be soon.

Thanks to Zac, and his irreplaceable help, Portage gained a few new features that made my life as “the tinderbox runner” much easier: collision detection now is saved in the build/merge log, this way I can grep for them as well as for the failures; the die hook is now smarter, working even in case of failures coming from the Pythons side of Portage (like collisions) and it’s accompanied by a success hook. The two hooks are what I’m using for posting to the whole coming of the tinderbox (so you can follow that account if you feel like being “spammed” by the proceeding of the tinderbox — the tags allow to have quick look of how the average is).

But it’s not just that; if you remember me looking for a run control for the tinderbox, I’ve implemented one of the features I talked about in that post even without any fancy, complex application: when a merge fails, the die hook masks the failed package (the exact revision), and this has some very useful domino effects. The first is that the same exact package version can only ever fail once in the same tinderbox run (I cannot count the times my tinderbox wasted time rebuilding stuff like mplayer, asterisk or boost and failing, as they are dependencies of other packages), and that’s what I was planning for; what I had instead is even more interesting.

While the tinderbox already runs in “keep going mode” (which means that a failed, optional build will not cause the whole request to be dropped, and applies mostly to package updates), by masking specific, failing revisions of some packages, it also happens to force downgrades, or stop updates, of the involved packages, which means that more code is getting tested (and sometimes it gets luckier as older versions build where newer don’t). Of course the masking does not happen when the failure is in the tests, as those are quite messed up and warrant a post by themselves.

Unfortunately I’m now wondering how taxing the whole tinderbox process is getting: in the tree there are just shy of 14 thousands packages. Of these, some will merge in about three minutes (this is back-to-back from call to emerge to end of the process; I found nothing going faster than that), and some rare ones, like Berkeley DB 4.8, will take over a day to complete their tests (db-4.8 took 25 hours, no kidding). Accepting an average of half an hour per package, this brings us to 7 thousands hours, 300 days, almost an year. Given that the tinderbox is currently set to re-merge the same package over a 10 weeks schedule, this definitely gets problematic. I sincerely hope the average is more like 10 minutes, even thought that will still mean an infinite rebuild. I’ll probably have to find the real average looking through the total emerge log, and at the same time I’ll have to probably reduce the rebuild frequency.

Again, the main problem gets to be with parallel make: I’d be waiting for the load of the system to be pretty high while on the other hand it’s really left always under the value of 3. Quite a few build systems, including Haskell extensions’, Python’s setuptools, and similar does not seem to support parallel build (in case of setuptools, it seems that it calls make directly, so ignoring Gentoo’s emake wrapper), and quite a few packages force serial make (-j1) anyway.

And a note here: you cannot be sure that calling emake will give you parallel make; beside the already-discussed “jobserver unavailable” problem, there is the .NOTPARALLEL directive that instructs GNU make to not build in parallel even though the user asked -j14. I guess this is going to one further thing to look for when I’ll start with the idea of distributed static analysis.