Everybody’s a critic: the first comment I received when I showed other Gentoo developers my previous post about the tinderbox was a question on whether I would be using
pkgcore for the new generation tinderbox. If you have understood what my blog post was about, you probably understand why I was not happy about such a question.
I thought the blog post made it very clear that my focus right now is not to change the way the tinderbox runs but the way the reporting pipeline works. This is the same problem as 2009: generating build logs is easy, sifting through them is not. At first I thought this was hard just for me, but the fact that GSoC attracted multiple people interested in doing continuous build, but not one interested in logmining showed me this is just a hard problem.
The approach I took last time, with what I’ll start calling TG3 (Tinderbox Generation 3), was to: highlight the error/warning messages; provide a list of build logs for which a problem was identified (without caring much for which kind of problem), and just showing up broken builds or broken tests in the interface. This was easy to build up, and to a point to use, but it had a lots of drawbacks.
Major drawbacks in that UI is that it relies on manual work to identify open bugs for the package (and thus make sure not to report duplicate bugs), and on my own memory not to report the same issue multiple time, if the bug was closed by some child as NEEDINFO.
I don’t have my graphic tablet with me to draw a mock of what I have in mind yet, but I can throw in some of the things I’ve been thinking of:
- Being able to tell what problem or problems a particular build is about. It’s easy to tell whether a build log is just a build failure or a test failure, but what if instead it has three or four different warning conditions? Being able to tell which ones have been found and having a single-click bug filing system would be a good start.
- Keep in mind the bugs filed against a package. This is important because sometimes a build log is just a repeat of something filed already; it may be that it failed multiple times since you started a reporting run, so it might be better to show that easily.
- Related, it should collapse failures for packages so not to repeat the same package multiple times on the page. Say you look at the build failures every day or two, you don’t care if the same package failed 20 times, especially if the logs report the same error. Finding out whether the error messages are the same is tricky, but at least you can collapse the multiple logs in a single log per package, so you don’t need to skip it over and over again.
- Again related, it should keep track of which logs have been read and which weren’t. It’s going to be tricky if the app is made multi-user, but at least a starting point needs to be there.
- It should show the three most recent bugs open for the package (and a count of how many other open bugs) so that if the bug was filed by someone else, it does not need to be filed again. Bonus points for showing the few most recently reported closed bugs too.
Why do I spend this much time thinking and talking (and soon writing) about UI? Because I think this is the current bottleneck to scale up the amount of analysis of Gentoo’s quality. Running a tinderbox is getting cheaper — there are plenty of dedicated server offers that are considerably cheaper than what I paid for hosting Excelsior, let alone the initial investment in it. And this is without going to look again at the possible costs of running them on GCE or AWS at request.
Three years ago, my choice of a physical server in my hands was easier to justify than now, with 4-core HT servers with 48GB of RAM starting at €40/month — while I/O is still the limiting factor, with that much RAM it’s well possible to have one tinderbox building fully in tmpfs, and just run a separate server for a second instance, rather than sharing multiple instances.
And even if GCE/AWS instances that are charged for time running are not exactly interesting for continuous build systems, having a cloud image that can be instructed to start running a tinderbox with a fixed set of packages, say all the reverse dependencies of libav, would make it possible to run explicit tests for code that is known to be fragile, while not pausing the main tinderbox.
Finally, there are different ideas of how we should be testing packages: all options enabled, all options disabled, multilib or not, hardened or not, one package at a time, all packages together… they can all share the same exact logmining pipeline, as all it needs is the
emerge --info output, and the log itself, which can have markers for known issues to look out for or not. And then you can build the packages however you desire, as long as you can submit them there.
Now my idea is not to just build this for myself and run analysis over all the people who want to submit the build logs, because that would be just about as crazy. But I think it would be okay to have a shared instance for Gentoo developers to submit build logs from their own personal instances, if they want to, and then have them look at their own accounts only. It’s not going to be my first target but I’ll keep that in mind when I start my mocks and implementations, because I think it might prove successful.