To complete the topic I started with the previous post I would like to give you some more notes about the way the tinderbox work, and in particular about the manual fiddling I have to do on it to make sure that it works smooth; and the issues that haven’t been tackled yet.
As I said the tinderbox is a Linux Container; this helps isolating the test environment from my workstation:
killall and other unbound arguments are never going to hit my system, which is good. On the other hand, this still have a couple of rough patches to go through. Even with the latest (unreleased) version of lxc,
/dev is either statically created or bound: udev does not work properly inside the container, for somewhat obvious reasons. The problem of that is that if you bind the
/dev directory (or mount devtmpfs that is basically the same thing with a recent kernel), then you’ll have only one directory were FIFOs and sockets are created.
This not only causes sysvinit to shut down the host instead than the container if you use the
shutdown command, but also makes it impossible to have a running, working syslog from within the container. While this shouldn’t hinder the tinderbox work, but seems like it does .
Another problem is with something all users have to fight with every time: incompatible language updates: Python, Perl, OCaml, Haskel, you name it. Almost all of these languages come with an “updater” script that is intended to rebuild their extensions and libraries to make sure that they are again compatible with the new release; failing to run these scripts will introduce quite a few failure cases within the tinderbox that, obviously, will be spurious. The same goes for
lafilefixer. I’ll probably have to write a maintenance script to improve the flow of that, so I don’t forget steps around.
Yamato’s hardware is designed to work in parallel (as Mike also found out it seems like the sweet spot for build is number of cores per two); so another problem that adds to the tinderbox is that it does sequential merging of everything: making that parallel it is quite hard because of interdependencies of packages. So to speed stuff up, the build process itself has to be parallel-safe; which you probably know it often is not and which is one of the reasons why I often fix packages around.
One pretty bad example of time wasted because of serial-make runs is boost: almost four hours yesterday for the merge, because the tests are built and executed in series: instead of building all the test binaries and then executing them in series (which is a good compromise if you cannot run them in parallel), it goes on building and testing; the result is obviously pretty bad on my system.
Quite a few times, by the way, the whole situation is exasperated by the fact that the build failures were already reported, often times by me, last year. Yep, we got year-old build failures in tree that hit users. And guess what? At least a couple of time the proposed solution is “use an overlay”. No, the right solution is not to let software bitrot in the tree!
Anyway, thanks Genone who sent me the patch to have better collision diagnostics, and thanks Mauro who’s working on new bashrcng plugins for the QA tests. Hopefully, some of the tests will also find their way into Portage soon; and again, I’ll suggest you consider the idea of contributing somehow (if you cannot contribute by code or fixes) — might not be extremely difficult to deal with the tinderbox, but sure is time-consuming, and time, well, is money…