I need a (log) analyst

Flameeyes

16 years ago

Since I started working on my own tinderbox experiments, I was able to file around a few hundreds different bugs, all by hand, for various issues like pre-stripped files, packages installing files in invalid paths like /usr/man and /usr/doc, packages bundling libraries, packages needing to learn EAPI=1, and so on so forth.

Today, I added to that list a few more packages having maintainer-mode induced rebuilds which are bad for so many reasons; unfortunately I knew I already filed a few and I really didn’t want to re-file them again if I did. So I decided I needed a way to quickly check if an issue found in a package was already reported or not. This is particularly useful for stuff like pre-stripped files, since between me and Patrick we already filed a lot of bugs about them, although, since software sucks, I’m sure we didn’t file all of them.

First I found an interesting sh oneliner approach, which was pretty cool, but wasteful, and also still required me to check each package’s metadata to find who to assign it to, each time. Since I wanted to do as few lookups as possible, I decided I needed something more complex, and I’ve started working on a Python (yes, you read it right, Python) script that could analyse log files and identify issues in it, just like I would do with multiple, repeated fgrep commands all over files.

The reason why I choose Python to do this were exactly two: the first is that Portage is written in Python and it was easier to just use Portage’s interface to split package names than changing qatom to do something more useful for sh scripts. The second is that in the future I hope to just make it ask me whether I want to file a bug for the package, and let it just open me the page with the bug form filed in. For now, though, reporting is already quite enough.

I’m not going to release the code of the script just yet; it’s far from finished, my Python coding skills suck, and it would be a bad DDoS to Bugzilla if lots of people were to use it simultaneously since for almost each “condition” found in the log file, a query is made to Bugzilla to check if the bug was reported already. The broadest one I have yet to script in though, and it would hit all the bugs for a package when it fails. With enough time, I hope to be able to detect the most common compiler errors too so that I can also identify unreported GCC failures and similar.

Unfortunately, I reasonably can’t unleash the tool on the full set of logs produced by the tinderbox just yet, since we talk of about 28 thousands log files. In these thousands of log files, some are duplicated, because for instance when asterisk failed, all the asterisk plugin tried to merge it in and my tinderbox tried compiling it multiple times, and others are obsolete because newer versions or revisions were added to the tree, and the old ones might have issue the new ones have fixed. For this reason I’ve asked the near allmighty Zac to find me an easy way to prune the log archive to just keep one log per slot per package, also removing eventual cruft left there for other reasons, hopefully reducing the dataset to something much more manageable.

And the work is not done just by analysing these logs. There is the elog output I have also to make some sense of; in particular I have to identify which packages had fetch restriction turned on (and I didn’t have the file fetched), so that I can get more distfiles for the next run, and which packages used USE-based dependencies and thus require me to enable more USE flags on the packages, and finally I have to check for and report all the package conflicts that are not expressed by blockers (although I already said blockers abuse is bad stuff).

Hopefully in a not too distant future problems like these can be taken care of automatically by an automated tinderbox like Autoua or whatever that’s called. In the mean time I’m going to provide sweatwork for this at least. On the other hand, my tinderbox is taking into consideration more than a few things that Portage and Gentoo as a whole doesn’t care about, like the canonical target problem which I’m trying to push as much as possible upstream. So I guess I’ll be running my tinderbox for quite a long time still.

And of course, I’m accepting endorsements, if you think this is work you can get the fruits of. Would also make it less of a problem for me to get out the money to replace the disk; as I said one started clicking, I would have replaced it already if I didn’t hear about Seagate’s firmware fail. Funnily, I had a similar failure with a 160GB Maxtor drive a few years ago (at the time it was a big drive); and a few friends of mine as well, although neither Maxtor nor the retailers ever admitted it to be related to firmware failures. The problem was solved with MaxBlast software, which was the Maxtor version of the SeaTools that Seagate suggests to use now (by the way…). Suggestions on which brand of disks to buy next is welcome, I now have discarded Samsung (as soon as they run a little hotter they break and I can’t put more fans into Yamato), Maxtor (after the 160GB debacle) and Seagate (after the 7200.11 series).

Share this: