Log analysis, yet again

So I’m again trying to find a solution to the log analysis problem; the main issue at this point is that the tinderbox is generating something along the lines of 200MB of logs a week — probably also because thanks to Zac it’s much much more efficient than it was before. With such an amount of data to shuffle through, the grep command from within Emacs is no longer feasible.

What I’m considering using now is to store most of the data directly inside a database (PostgreSQL since that’s what I’m using already here) and then take it out from that, in a simple (web) interface. The reason why I’m going for a web interface is that it’s likely what takes less time to design, to quickly report and copy content.

On the storage side, the main question for me is whether the database should also contain specific details of the problem or just the presence of such a problem and a pointer to the log file. In the former case, the web application could easily be extended to something more than a glorified grep, but it’d have to store a non-trivial amount of data. Some log files are well over the 10MB, so it gets a bit tricky to handle those properly.

Thinking a bit further on the interface, it should really be a way to report bugs directly: if the application can find that the merge found ELF files in /usr/share, filing the bug directly is just a matter of finding who exactly maintains a particular package (which is quite easy), and it wouldn’t even require copy-pasting if the data is available directly in the database, already parsed. Obviously, it would still require manual confirmation before opening the bug, and before doing so, it should also implement an easy search function to show possible duplicates.

While my first guess was to write a stupid CGI (or using the Ruby integrated webservers in a script) to have on the browser the results from the database, I’m now more interested in the idea of having some more complete application to deal with this. Pavel also suggested for allowing other developers to access the interface to report the bugs, so that even if I’m not around to do the filing someone else can. Unfortunately that also bring up a problem: if I were to allow developers to file bugs with their account I’d have to make them give their login information to the tinderbox (and I don’t like that not even if it’s me running it); on the other hand I’d rather not make them file bugs with my own account, so I guess it’d require to set up a no-mail account for the tinderbox (no-mail since it’d be pointless to have mail coming for a tinderbox account), and then make the users CC their own address by default.

Now comes the problem: I can probably start working on such an interface myself, using Ruby on Rails, which is something I’m somewhat fluent in; on the other hand, I know of no Ruby interface for the Bugzilla RPC protocol, but there is a well-tested pybugz extension for Python (which I’m definitely not fluent in). Before I start hacking anything at all (since that’s going to change quite some bits of the interface; if I were to use Ruby on Rails, the ORM will most likely call for an abstracted interface to the database, which is good for some things but not for everything), I really need to see if somebody could help me with such a task in the long run.

If somebody is up to writing the interface in Python to my specs, using pybugz, that’d be fine, otherwise I’d like to see if somebody already worked in a pybugz-like interface for Ruby instead. At worse I could settle for just opening the bug with pre-filled fields, and then attach the build log afterwards (to attach the log I need to know the bug number of the just filed bug), and that’s not feasible by just providing a link to the pre-filled bug (although it should be still be quite an improvement to my workflow, if I had that!).

So, anybody can provide any insight or volunteer to help me out?