I need help publishing tinderbox logs

I’m having a bit of time lately, since I didn’t want to keep one of the gigs I’ve been working on in the past year or so… this does not mean I’m more around than usual for now though, mostly because I’m so tired that I need to recharge before I can even start answering the email messages I’ve received in the past ten days or so.

While tinderboxing lately has been troublesome – the dev-lang/tendra package build causes Yamato to reach out of memory, causing the whole system to get hosed; it looks like a fork bomb actually – I’ve also had some ideas on how to make it easier for me to report bugs, and in general to get other people to help out with analysing the results.

My original hope for the log analysis was to make it possible to push out the raw log, find the issues, and report them one by one.. I see now that this is pretty difficult, nearing infeasible, so I’d rather try a different approach. The final result I’d like to have now is a web-based equivalent of my current emacs+grep combination: a list of log names, highlighting those that hit any trouble at all, and then within those logs, highlights on the rows showing trouble.

To get to this point, I’d like to start small and in a series of little steps… and since I honestly have not the time to work on it, I’m asking the help of anybody who’s interested in helping Gentoo. The first step here would be to find a way to process a build log file, and translate it into HTML. While I know this means incrementing its size tremendously, this is the simplest way to read the log and to add data over it. What I’d be looking for would be a line-numbered page akin to the various pastebins that you can find around. This does mean having one-span-per-line to make sure that they are aligned, and this will be important in the following steps.

The main issue with this is that there are build logs that include escape sequences, even though I disable Portage’s colours as well as build systems’, and that means that whatever converts the logs should also take care of stripping away said sequences. There are also logs that include outputs such as wget’s or curl’s, that use the carriage-return code to overwrite the output line, but creates a mess when viewing the log outside a terminal — I’m not sure why the heck they don’t check whether you’re outputting only on a tty. There are also some (usually Java) packages whose log appears to grep as a binary file, and that’s also something that the conversion code should have to deal with.

As a forecast of what’s to come with the next few steps, I’ll need a way to match error messages in each line, and highlight those. Once they are highlighted, just using XPath expressions to count the number of matched lines should make it much easier to produce an index of relevant logs… then it’s a matter where to publish these. I think that it might be possible to just upload everything here in Amazon’s S3 system and use that, but I might be optimistic for no good reason, so it has to be discussed and designed.

Are you up to help me on this task? If so, join in the comments!