Everybody’s a critic: the first comment I received when I showed other Gentoo developers my previous post about the tinderbox was a question on whether I would be using
pkgcore for the new generation tinderbox. If you have understood what my blog post was about, you probably understand why I was not happy about such a question.
I thought the blog post made it very clear that my focus right now is not to change the way the tinderbox runs but the way the reporting pipeline works. This is the same problem as 2009: generating build logs is easy, sifting through them is not. At first I thought this was hard just for me, but the fact that GSoC attracted multiple people interested in doing continuous build, but not one interested in logmining showed me this is just a hard problem.
The approach I took last time, with what I’ll start calling TG3 (Tinderbox Generation 3), was to: highlight the error/warning messages; provide a list of build logs for which a problem was identified (without caring much for which kind of problem), and just showing up broken builds or broken tests in the interface. This was easy to build up, and to a point to use, but it had a lots of drawbacks.
Major drawbacks in that UI is that it relies on manual work to identify open bugs for the package (and thus make sure not to report duplicate bugs), and on my own memory not to report the same issue multiple time, if the bug was closed by some child as NEEDINFO.
I don’t have my graphic tablet with me to draw a mock of what I have in mind yet, but I can throw in some of the things I’ve been thinking of:
- Being able to tell what problem or problems a particular build is about. It’s easy to tell whether a build log is just a build failure or a test failure, but what if instead it has three or four different warning conditions? Being able to tell which ones have been found and having a single-click bug filing system would be a good start.
- Keep in mind the bugs filed against a package. This is important because sometimes a build log is just a repeat of something filed already; it may be that it failed multiple times since you started a reporting run, so it might be better to show that easily.
- Related, it should collapse failures for packages so not to repeat the same package multiple times on the page. Say you look at the build failures every day or two, you don’t care if the same package failed 20 times, especially if the logs report the same error. Finding out whether the error messages are the same is tricky, but at least you can collapse the multiple logs in a single log per package, so you don’t need to skip it over and over again.
- Again related, it should keep track of which logs have been read and which weren’t. It’s going to be tricky if the app is made multi-user, but at least a starting point needs to be there.
- It should show the three most recent bugs open for the package (and a count of how many other open bugs) so that if the bug was filed by someone else, it does not need to be filed again. Bonus points for showing the few most recently reported closed bugs too.
Why do I spend this much time thinking and talking (and soon writing) about UI? Because I think this is the current bottleneck to scale up the amount of analysis of Gentoo’s quality. Running a tinderbox is getting cheaper — there are plenty of dedicated server offers that are considerably cheaper than what I paid for hosting Excelsior, let alone the initial investment in it. And this is without going to look again at the possible costs of running them on GCE or AWS at request.
Three years ago, my choice of a physical server in my hands was easier to justify than now, with 4-core HT servers with 48GB of RAM starting at €40/month — while I/O is still the limiting factor, with that much RAM it’s well possible to have one tinderbox building fully in tmpfs, and just run a separate server for a second instance, rather than sharing multiple instances.
And even if GCE/AWS instances that are charged for time running are not exactly interesting for continuous build systems, having a cloud image that can be instructed to start running a tinderbox with a fixed set of packages, say all the reverse dependencies of libav, would make it possible to run explicit tests for code that is known to be fragile, while not pausing the main tinderbox.
Finally, there are different ideas of how we should be testing packages: all options enabled, all options disabled, multilib or not, hardened or not, one package at a time, all packages together… they can all share the same exact logmining pipeline, as all it needs is the
emerge --info output, and the log itself, which can have markers for known issues to look out for or not. And then you can build the packages however you desire, as long as you can submit them there.
Now my idea is not to just build this for myself and run analysis over all the people who want to submit the build logs, because that would be just about as crazy. But I think it would be okay to have a shared instance for Gentoo developers to submit build logs from their own personal instances, if they want to, and then have them look at their own accounts only. It’s not going to be my first target but I’ll keep that in mind when I start my mocks and implementations, because I think it might prove successful.
Just wondering if adding additional metadata to build logs from the Gentoo build system would assist your processing. I’m not suggesting that be your problem, and perhaps it even makes sense to have a second structured log for automated parsing stored in the work directory during a build. I don’t have any domain knowledge, so this is just my out loud thoughts.
There has been more metadata added to the logs already, which is quite neat. Unfortunately I could not find back in the days a good way to provide additional logs that may be useful. Things like a copy of `config.log` would be very useful to debug say a broken configure run.Right now one of the things that is done by the tinderbox support scripts is appending the failure logs for autotools and epatch, by simply catting them to the standard output before the build log gets tarred to the network. Having the ability to send multiple relevant logs and then a way to select which one to attach (defaulting to all) would be very neat for the tinderbox.
> Finding out whether the error messages are the same is tricky […]Tricky, but I wonder if it would be worth the time to make it work anyway? I’ve only used a few systems like this, and only as a spectator, but the ability to load up a page and quickly see a certain package is still suffering from bug X and bug Y but has no new/unreported problems seems like a good time savings.Then again, I guess it depends on use case. Someone who files new bugs on thousands of packages and only comes back to them when they’ve finished the whole list won’t care much, whereas someone who has 10 packages they care about and watches them closely would benefit more.
I think you would have a better time using paludis instead of pkgcore. Everyone with a brain knows that pkgcore is a piece of shit.
And comments like yours show exactly why I won’t be touching that code and community any time soon.
Wojdan you are surely rude.You might provide facts about your statement above:- know from some benchmark that by default paludis is quite slow- It is surely written in a language with a runtime that is quite easy to break in subtle ways.That is enough to make it less suitable for tinderboxing.On the other hand pkgcore is quite compact, cleanly written and with a sort of regular API, python is quite stable usually.I’m not sure if all the paludis users are rabid fans like you, but it is quite a telltale that people should NOT use paludis in general.
And the both of us are unable to read. This post is nothing about any change to what is going to run the tinderbox.But sure, keep wasting my time and we’ll see how much contribution I’ll keep up with.
I’ve been researching log analysis recently for the purpose of system health reporting, but what I found might be of interest to you too. Recent versions of syslog-ng have support for pattern matching databases with ability to correlate and group related messages and to trigger actions conditionally based on such message classification and parsing. Sure, it’s only nearly as powerful as an average awk script, but I still think it’s pretty cool and might be of use if one could pipe build output to it easily.