Sealed tinderbox

I’ve been pushing the tinderbox one notch stricter from time to time; a few weeks ago I set up the tinderbox so that any network access beside for the basic protocols (HTTP, HTTPS, FTP and RSYNC) was denied; the idea is that if the ebuilds try to access network by themselves, something is wrong: once the files are fetched, that should be enough. Incidentally, this is why live ebuilds should not be in the tree.

Now, since I’ve received a request regarding the actual network traffic issued by the tinderbox, I decided to go one step further still, and make sure that beside for the tasks that do require network access the tinderbox does not connect to anything outside of the local network. To do so, I set up a local RSync mirror, then added a squid passthrough proxy, that does not cache anything; at that point, rather than allowing some protocols on the router for the tinderbox, I simply reject anything originating from the tinderbox to access Internet; all the outgoing connections originating from the tinderbox are done through Yamato, so I have something like this in my make.conf:

FETCHCOMMAND="/usr/bin/curl --location --proxy yamato.local:3128 --output ${DISTDIR}/${FILE} ${URI}" 
RESUMECOMMAND="/usr/bin/curl --location --proxy yamato.local:3128 --continue-at - --output ${DISTDIR}/${FILE} ${URI}"

Note: googling on how to set up those two variables in Gentoo to use curl I did find some descriptions on the Gentoo Forums that provide most of them; unfortunately all I found ignore the --location option, which makes it fail to fetch stuff from the SourceForge mirrors and any other mirroring system that uses 302 Moved responses.

I also modified the bti-calling script so that the dents are sent properly through the proxy. I didn’t set the http_proxy variable, because that would have made moot the sealing. Instead, by setting it up this way, explicitly for the fetch and dent, if any testsuite tries to fetch something, even via HTTP, will be denied.

But… why should it be a problem if testsuites were to access services on the network? Well, the answer is actually easy once you understand two rules of Gentoo: what is not in package.mask is supposed to work, and any bug found needs to be fixable, and testsuites results need to be reproducible, to make sure that the package works. When you rely on external infrastructure like GIT repositories, you have no way to make sure that if there is a problem it can be fixed; and when your testsuite relies on remote network services, it might fail because of connection problems, and it will fail if the remote service is closed entirely.

I’ve also been tempted to remove IPv4 connectivity from the tinderbox at all; IPv6 should well be enough given that it only needs to connect to Yamato, and it would be under NAT anyway..

Opening up the tinderbox

As I said before, the tinderbox is hardly parallelisable but on the other hand, it can yield much better results if multiple instances are being executed, independently, by more people. Of course, this also requires that the executions are somewhat coordinated so that they don’t execute the exact process over and over, but rather some slight variation (different architecture, compiler, flag, basic USE settings, etc.).

Now, while Mark has been working on setting up a tinderbox for PPC64, I wanted to publish the scripts that I’ve been using all this time; I did so a few weeks ago by posting them but since then more problems and more solution came up. So today thanks to Tomas (why does the roll call show the “normalised” name? I’m pretty sure his name is not just ASCII) I started publishing the scripts in a public git repository which both other developers and interested users can use to improve, simplify and extend the tests.

If you look at the scripts and compare them with the old versions, you can see that I have made a few important changes, the first of which would be the presence of bti in them. Yes, I’m currently denting away the tinderbox results so that you all can follow them. This also gives a bit of insight of how the tinderbox works even to those who don’t want to look into the dirty details of the code.

The rest of the changes are vastly thanks to Zac: the first is that the merge operations are now running with --update --selective=n so that all the dependencies are considered as soon as possible, this solves some nasty deadlock cases, like gvim, vim and vim-core dependencies rolling around to the point of being rejected by portage. Unfortunately, this also calls for having a way to get some package out of the build loop; and I don’t think that a complex solution like Gearman is what I should be looking for now.

The other change is still incomplete for now as I wait for bug #295715 to be released: when a package fails to be merged (right now only if the ebuild fails; once complete even if it fails because of collisions) it gets masked in a temporary file that is cleaned up at the next restart of the round. This way when a dependency fails all the packages that depend on it will automatically be rejected (or will keep using the old merged version if present, or fall back to an older version if that works). This helps reducing the time wasted trying and re-trying the same package over and over again.

I also dropped the test for AC_CANONICAL_TARGET since that produces way too much noise, and it’s rather something that could be made to work with the static analysis idea that I got. With that, it’d be also easier to check for bashisms and other issues without adding noise to an already full log as those of the tinderbox are.

There is one very heavy check that is that to ensure that binchecks-restricted packages are not installing ELF files; the original idea for that restriction was to avoid running a number of ELF checks and mangling over non-ELF packages, such as kernel sources, fonts and similar. That is quite an issue when using virtual systems (where I/O has a nasty overhead) and is pointless for packages that we can be sure will not install executables; unfortunately a few developers seem to think that it’s a shortcut to avoid dealing with the ELF QA checks, instead of filling the boring bits that tells Portage to expect QA failures.

To reduce the chance of something breaking further down the road due to .la files removal I’ve also made sure lafilefixer is executed on every and each package.

And finally I’ve created a “restart” script that deals with the long procedure of restart of the tinderbox: it syncs, check if gcc has changed, if so makes sure that the as-needed version is selected. Right now it also deals with ghc updates, in the future I hope to be able to handled own all that kind of updates together. The problem there is that I don’t think the script works that well when something fails, as it’s mostly untested for now; and the updater scripts often don’t support the --keep-going option that is exactly what I’d like to use to avoid the domino effect.

In the next days I’ll try to write some more details into what things I end up checking along the way, may be of help to others who want to run their own tinderbox.