From scratch: about my tinderbox

I know that quite a bit of people are still unsure on what a tinderbox is, on what it does, and especially on why they should care enough that I’m bothering them every other day about it on my blog (and by extension on Planet Gentoo and the Gentoo.org homepage which syndicate it). So I’m now trying to write a long-breathed article that hopefully will cover all the important details.

What is “a” tinderbox

The term tinderbox, as reported by Wikipedia has been extended to the general use starting from the software, developed by Mozilla, designed to perform continuous integration over a given software project. This doesn’t really help that much because the term “continuous integration“ might still not say anything to users. So I’ll try to explain that from scratch as well.

Gentoo users, especially those running the unstable branch, are unfortunately used to breakage caused by build failures after a library update or other environment changes like that. This problem is increasingly present also during the development phase of any complex software. It gets even worse because while developers might not see any build failure on their machine, they can introduce “buggy“ code that breaks on others, as not all distributions have the same environment, the same libraries, the same filesystem layout and so on so forth.

And this is just by limiting the scope of our consideration to a single operating system – Linux in this case – and not considering the further problems caused by the presence of other software platforms, such as FreeBSD (which is still a Unix system), Mac OS X, and Windows. This is one of the biggest problems with multi-platform development that make it definitely non-trivial to write software that works correctly between these systems. I could write more about that topic but I’ll leave that to another time.

To make it possible for the developers not to have to manually get their code on multiple machines with various combination of operating systems and versions to ensure that the code does build properly after every change is merged, the smart approach was to invent the concept of continuous integration. To sum it up, think about what I just said – getting the code on multiple machines, with different operating systems, or versions of operating systems, build and eventually execute it – but done by a software rather than having an human doing it.

Mozilla’s software for continuous integration is called Tinderbox and for whatever reason, the name has stuck in Gentoo to refer to the various tries we have had at setting up some continuous integration testing. I’m not sure on the reasons why the name stuck in the Gentoo development team, but so it is, and so I’m keeping it.

Why is it “my” tinderbox

As I just said, Gentoo has had in the past more than one try at making continuous integration a major player in its development model. I think at least four people separately have worked on various “tinderboxes“ in the past five years, both as proper Gentoo projects and as outside, autonomous efforts. So I cannot call it “the Tinderbox”, but rather “my tinderbox”, to qualify which one among those that were and are.

Right now, we have three actively maintainer continuous integration efforts called tinderboxes: my own is the latest one joining the party, before that we already had Patrick’s and before that the Gentoo tinderbox which runs on proper Gentoo infrastructure boxes.

As I already rambled on about this “duplication“ of efforts is not a problem, as the three current systems have different procedures, and different targets. In particular, the “infra“ Tinderbox is mostly there to ensure the packages in the system set work for most major architectures, to provide reverse dependency information, and to make available binary packages useful for system recovery in case of major screw-up. On the other hand, me and Patrick are using our tinderboxes to identify bugs and report them to developers who might have not known about them otherwise.

What my Tinderbox *does*

I’ve written more or less how it works although that was really going in the detail of what I do to have it working. For what concern users, those details are mostly unimportant.

What my Tinderbox does, put it simply, is building the whole Portage tree, at least that’s the theoretical target. Unfortunately, as many noted before, it’s a target that no one box in the world can likely reach in our lifetime, let alone continuously. Continuous integration tools are designed to take away from the developers the time-consuming task of trying their code on different platforms (software or hardware, it doesn’t matter); even though I’m now limiting the scope of them to testing Gentoo, the possible field of variations is limitless.

Not only we support at least two operating systems (Linux and FreeBSD), a number of architectures (over a dozen), two branches (stable and unstable), but for the very essence of Gentoo, each installed system is never identical to another one: USE flags, CFLAGS, package sets, accepted packages from the unstable branches, overlays… Testing all these variables is impossible, so my tinderbox reduces (a lot) the scope, while still trying to be as broadly applicable as possible.

It runs over the unstable x86-keyworded branch of the tree (~x86 keyword), along with some packages that are not available by default and are rather unmasked (mostly toolchain-related). It uses vary bland CFLAGS, forced --as-needed and some common USE flag setup. It runs tests but it doesn’t let their failure stop the merge. It enables USE flag (for dependencies) as needed to have as many packages installed as possible.

The big part of the work is done by a single loop that tries to install, one by one, all the packages that haven’t been tested in the past two and a half months. Portage is left to remove conflicting packages as it finds fit. Sometimes packages are rejected for various reasons (they collide with another package, there is a conflict in the needed USE flag settings between different packages, or they require very old software versions that would in turn cause other problems), so instead of having the full Portage tree available for ~x86, I can only get a subset, the biggest subset I can at least.

The results

Of course, to understand why the Tinderbox is an important project, and why I’m investing so much time on it, and looking for other people to invest part of their time (or resources) to help me, is necessary to look at what the by-products of its execution are.

I’m not making my binpkgs available to the public (like the infra-based tinderbox has), mostly because I’m not allowed to: firstly I don’t enable the bindist USE flag, which means I’m allowing build configurations that are non-redistributable, because of licenses or trademarks, secondly because that’s not my target at all, and they may very well not be functional, in some parts.

The accomplishments coming from my efforts are, generally speaking, bug reports. Not only I scan through the failure logs for those ebuilds that fail to build in this environment, but I also scan for failing test procedures, and other similar quality issues. But again, this is not the result final users are looking for, as bug reports don’t mean anything for them.

Results that are good for users are, instead, the fixes for these bug reports: you can be safe to have documentation installed in the right place where you expect it; you can be safe that if you need to debug a problem, you only need to follow the guide without the build system of the package coming in your way; you can be safe that a package that will report itself as installed is indeed installed and it does work.

While, obviously, there are still problems that my Tinderbox effort won’t be finding anytime soon (for instance the problem of missing dependencies, which is instead the speciality of Patrick’s Tinderbox), and test procedures are rarely that well thought, designed and realised that they don’t leave possible runtime problems to be found, I think that keeping it running, continuously integrating as many ebuilds as it can, is definitely going to be of help for Gentoo’s general quality and stability.

And there is another thing that users are most likely interested in: testing of the future toolchain. As I said before, beside using the ~x86 visible ebuilds, my Tinderbox is also running some masked packages, which include, nowadays, gcc-4.4 and autoconf-2.65. Using these versions, not yet available to the general public of users, allows me to find failure cases related to them before they are made available. This serves the double role of assessing the impact that any given change might have on the users (by knowing how many packages will be broken if the package is unmasked), and correcting those problems before they have the chance of wasting users’ time.

Helping

Continuous integration can’t be said to be a “boring” task, in the sense that every day you might stumble into a new problem, for which you’ll have to devise a solution. On the other hand, it’s an heavy duty, for both the machines performing the job and the people that have to look through the logs. Even the sole task of filing the bugs is time-consuming, and that’s without adding the problems dealing with the human factor which often take you much more time.

Furthermore there are a number of practical issues tied into running this kind of continuous integration: the time needed to build packages conditions the amount of packages they can test over a given time frame, and unfortunately to reduce that time often you have to push for higher end machines (a lot of the work is I/O bound, and not all the software can build or execute tests in parallel), which tend to have high price tags, as well as consuming more power (and thus costing more over the long term) than your average machine.

For this reason both me and Patrick tend to ask for material help: we are taking the costs of these efforts directly.

It has been suggested a number of time to host the Tinderboxes over Gentoo infrastructure hardware, but there have been problems before about this. On the other hand, I think that the way I’m currently proceeding, for my effort at least, could open the possibility for that to happen. The one big issue for which I don’t really feel comfortable with running the tinderbox remotely is easy access to the logs for identifying problems and, especially, to attach them to bug reports. I’m currently thinking how to close that gap but it needs another kind of help in the form of available developers.

And one kind of help that costs very little on users, but can help a lot our work is commenting (and criticising if needed) our design choices, our code, our methods. Constructive criticism of course: telling us that our method sucks and not giving any suggestion on how to make them better is not going to help anybody, and can actually lower our morale quite a bit.

So please, the greatest favour you can do me is continuing to listen my ramblings about QA, tests, Tinderbox and log analysis; and if you are still unsure how something works in this system, feel free to ask.