I announced it last November and now it became reality: the Tinderbox is no more, in hardware as well as software. Excelsior was taken out of the Hurricane Electric facility in Fremont this past Monday, just before I left for SCALE13x.
Originally the box was hosted by my then-employer, but as of last year, to allow more people to have access to is working, I had it moved to my own rented cabinet, at a figure of $600/month. Not chump change, but it was okay for a while; unfortunately the cost sharing option that was supposed to happen did not happen, and about an year later those $7200 do not feel like a good choice, and this is without delving into the whole insulting behavior of a fellow developer.
Right now the server is lying on the floor of an office in the Mountain View campus of my (current) employer. The future of the hardware is uncertain right now, but it’s more likely than not going to be donated to Gentoo Foundation (minus the HDDs for obvious opsec). I’m likely going to rent a dedicated server of my own for development and testing, as even though they would be less powerful than Excelsior, they would be massively cheaper at €40/month.
The question becomes what we want to do with the idea of a tinderbox — it seems like after I announced the demise people would get together to fix it once and for all, but four months later there is nothing to show that. After speaking with other developers at SCaLE, and realizing I’m probably the only one with enough domain knowledge of the problems I tackled, at this point, I decided it’s time for me to stop running a tinderbox and instead design one.
I’m going to write a few more blog posts to get into the nitty-gritty details of what I plan on doing, but I would like to provide at least a high-level idea of what I’m going to change drastically in the next iteration.
The first difference will be the target execution environment. When I wrote the tinderbox analysis scripts I designed them to run in a mostly sealed system. Because the tinderbox was running at someone else’s cabinet, within its management network, I decided I would not provide any direct access to either the tinderbox container nor the app that would mangle that data. This is why the storage for both the metadata and the logs was Amazon: pushing the data out was easy and did not require me to give access to the system to anyone else.
In the new design this will not be important — not only because it’ll be designed to push the data directly into Bugzilla, but more importantly because I’m not going to run a tinderbox in such an environment. Well, admittedly I’m just not going to run a tinderbox ever again, and will just build the code to do so, but the whole point is that I won’t keep that restriction on to begin with.
And since the data store now is only temporary, I don’t think it’s worth over-optimizing for performance. While I originally considered and dropped the option of storing the logs in PostgreSQL for performance reasons, now this is unlikely to be a problem. Even if the queries would take seconds, it’s not like this is going to be a deal breaker for an app with a single user. Even more importantly, the time taken to create the bug on the Bugzilla side is likely going to overshadow any database inefficiency.
The part that I’ve still got some doubts about is how to push the data from the tinderbox instance to the collector (which may or may not be the webapp that opens the bugs too.) Right now the tinderbox does some analysis through bashrc, leaving warnings in the log — the log is then sent to the collector through -chewing gum and saliva- tar and netcat (yes, really) to maintain one single piece of metadata: the filename.
I would like to be able to collect some metadata on the tinderbox side (namely,
emerge --info, which before was cached manually) and send it down to the collector. But adding this much logic is tricky, as the tinderbox should still operate with most of the operating system busted. My original napkin plan involved having the agent written in Go, using Apache Thrift to communicate to the main app, probably written in Django or similar.
The reason why I’m saying that Go would be a good fit is because of one piece of its design I do not like (in the general use case) at all: the static compilation. A Go binary will not break during a system upgrade of any runtime, because it has no runtime; which is in my opinion a bad idea for a piece of desktop or server software, but it’s a godsend in this particular environment.
But the reason for which I was considering Thrift was I didn’t want to look into XML-RPC or JSON-RPC. But then again, Bugzilla supports only those two, and my main concern (the size of the log files) would still be a problem when attaching them to Bugzilla just as much. Since Thrift would require me to package it for Gentoo (seems like nobody did yet), while JSON-RPC is already supported in Go, I think it might be a better idea to stick with the JSON. Unfortunately Go does not support UTF-7 which would make escaping binary data much easier.
Now what remains a problem is filing the bug and attaching the log to Bugzilla. If I were to write that part of the app in Python, it would be just a matter of using the pybugz libraries to handle it. But with JSON-RPC it should be fairly easy to implement support for it from scratch (unlike XML-RPC) so maybe it’s worth just doing the whole thing in Go, and reduce the proliferation of languages in use for such a project.
Python will remain in use for the tinderbox runner. Actually if anything I would like to remove the bash wrapper I’ve written and do the generation and selection of which packages to build in Python. It would also be nice if it could handle the USE mangling by itself, but that’s difficult due to the sad conflicting requirements of the tree.
But this is enough details for the moment; I’ll go back to thinking the implementation through and add more details about that as I get to them.