TG4: Tinderbox Generation 4

Everybody’s a critic: the first comment I received when I showed other Gentoo developers my previous post about the tinderbox was a question on whether I would be using pkgcore for the new generation tinderbox. If you have understood what my blog post was about, you probably understand why I was not happy about such a question.

I thought the blog post made it very clear that my focus right now is not to change the way the tinderbox runs but the way the reporting pipeline works. This is the same problem as 2009: generating build logs is easy, sifting through them is not. At first I thought this was hard just for me, but the fact that GSoC attracted multiple people interested in doing continuous build, but not one interested in logmining showed me this is just a hard problem.

The approach I took last time, with what I’ll start calling TG3 (Tinderbox Generation 3), was to: highlight the error/warning messages; provide a list of build logs for which a problem was identified (without caring much for which kind of problem), and just showing up broken builds or broken tests in the interface. This was easy to build up, and to a point to use, but it had a lots of drawbacks.

Major drawbacks in that UI is that it relies on manual work to identify open bugs for the package (and thus make sure not to report duplicate bugs), and on my own memory not to report the same issue multiple time, if the bug was closed by some child as NEEDINFO.

I don’t have my graphic tablet with me to draw a mock of what I have in mind yet, but I can throw in some of the things I’ve been thinking of:

  • Being able to tell what problem or problems a particular build is about. It’s easy to tell whether a build log is just a build failure or a test failure, but what if instead it has three or four different warning conditions? Being able to tell which ones have been found and having a single-click bug filing system would be a good start.
  • Keep in mind the bugs filed against a package. This is important because sometimes a build log is just a repeat of something filed already; it may be that it failed multiple times since you started a reporting run, so it might be better to show that easily.
  • Related, it should collapse failures for packages so not to repeat the same package multiple times on the page. Say you look at the build failures every day or two, you don’t care if the same package failed 20 times, especially if the logs report the same error. Finding out whether the error messages are the same is tricky, but at least you can collapse the multiple logs in a single log per package, so you don’t need to skip it over and over again.
  • Again related, it should keep track of which logs have been read and which weren’t. It’s going to be tricky if the app is made multi-user, but at least a starting point needs to be there.
  • It should show the three most recent bugs open for the package (and a count of how many other open bugs) so that if the bug was filed by someone else, it does not need to be filed again. Bonus points for showing the few most recently reported closed bugs too.

You can tell already that this is a considerably more complex interface than the one I used before. I expect it’ll take some work with JavaScript at the very least, so I may end up doing it with AngularJS and Go mostly because that’s what I need to learn at work as well, don’t get me started. At least I don’t expect I’ll be doing it in Polymer but I won’t exclude that just yet.

Why do I spend this much time thinking and talking (and soon writing) about UI? Because I think this is the current bottleneck to scale up the amount of analysis of Gentoo’s quality. Running a tinderbox is getting cheaper — there are plenty of dedicated server offers that are considerably cheaper than what I paid for hosting Excelsior, let alone the initial investment in it. And this is without going to look again at the possible costs of running them on GCE or AWS at request.

Three years ago, my choice of a physical server in my hands was easier to justify than now, with 4-core HT servers with 48GB of RAM starting at €40/month — while I/O is still the limiting factor, with that much RAM it’s well possible to have one tinderbox building fully in tmpfs, and just run a separate server for a second instance, rather than sharing multiple instances.

And even if GCE/AWS instances that are charged for time running are not exactly interesting for continuous build systems, having a cloud image that can be instructed to start running a tinderbox with a fixed set of packages, say all the reverse dependencies of libav, would make it possible to run explicit tests for code that is known to be fragile, while not pausing the main tinderbox.

Finally, there are different ideas of how we should be testing packages: all options enabled, all options disabled, multilib or not, hardened or not, one package at a time, all packages together… they can all share the same exact logmining pipeline, as all it needs is the emerge --info output, and the log itself, which can have markers for known issues to look out for or not. And then you can build the packages however you desire, as long as you can submit them there.

Now my idea is not to just build this for myself and run analysis over all the people who want to submit the build logs, because that would be just about as crazy. But I think it would be okay to have a shared instance for Gentoo developers to submit build logs from their own personal instances, if they want to, and then have them look at their own accounts only. It’s not going to be my first target but I’ll keep that in mind when I start my mocks and implementations, because I think it might prove successful.

The tinderbox is dead, long live the tinderbox

I announced it last November and now it became reality: the Tinderbox is no more, in hardware as well as software. Excelsior was taken out of the Hurricane Electric facility in Fremont this past Monday, just before I left for SCALE13x.

Originally the box was hosted by my then-employer, but as of last year, to allow more people to have access to is working, I had it moved to my own rented cabinet, at a figure of $600/month. Not chump change, but it was okay for a while; unfortunately the cost sharing option that was supposed to happen did not happen, and about an year later those $7200 do not feel like a good choice, and this is without delving into the whole insulting behavior of a fellow developer.

Right now the server is lying on the floor of an office in the Mountain View campus of my (current) employer. The future of the hardware is uncertain right now, but it’s more likely than not going to be donated to Gentoo Foundation (minus the HDDs for obvious opsec). I’m likely going to rent a dedicated server of my own for development and testing, as even though they would be less powerful than Excelsior, they would be massively cheaper at €40/month.

The question becomes what we want to do with the idea of a tinderbox — it seems like after I announced the demise people would get together to fix it once and for all, but four months later there is nothing to show that. After speaking with other developers at SCaLE, and realizing I’m probably the only one with enough domain knowledge of the problems I tackled, at this point, I decided it’s time for me to stop running a tinderbox and instead design one.

I’m going to write a few more blog posts to get into the nitty-gritty details of what I plan on doing, but I would like to provide at least a high-level idea of what I’m going to change drastically in the next iteration.

The first difference will be the target execution environment. When I wrote the tinderbox analysis scripts I designed them to run in a mostly sealed system. Because the tinderbox was running at someone else’s cabinet, within its management network, I decided I would not provide any direct access to either the tinderbox container nor the app that would mangle that data. This is why the storage for both the metadata and the logs was Amazon: pushing the data out was easy and did not require me to give access to the system to anyone else.

In the new design this will not be important — not only because it’ll be designed to push the data directly into Bugzilla, but more importantly because I’m not going to run a tinderbox in such an environment. Well, admittedly I’m just not going to run a tinderbox ever again, and will just build the code to do so, but the whole point is that I won’t keep that restriction on to begin with.

And since the data store now is only temporary, I don’t think it’s worth over-optimizing for performance. While I originally considered and dropped the option of storing the logs in PostgreSQL for performance reasons, now this is unlikely to be a problem. Even if the queries would take seconds, it’s not like this is going to be a deal breaker for an app with a single user. Even more importantly, the time taken to create the bug on the Bugzilla side is likely going to overshadow any database inefficiency.

The part that I’ve still got some doubts about is how to push the data from the tinderbox instance to the collector (which may or may not be the webapp that opens the bugs too.) Right now the tinderbox does some analysis through bashrc, leaving warnings in the log — the log is then sent to the collector through -chewing gum and saliva- tar and netcat (yes, really) to maintain one single piece of metadata: the filename.

I would like to be able to collect some metadata on the tinderbox side (namely, emerge --info, which before was cached manually) and send it down to the collector. But adding this much logic is tricky, as the tinderbox should still operate with most of the operating system busted. My original napkin plan involved having the agent written in Go, using Apache Thrift to communicate to the main app, probably written in Django or similar.

The reason why I’m saying that Go would be a good fit is because of one piece of its design I do not like (in the general use case) at all: the static compilation. A Go binary will not break during a system upgrade of any runtime, because it has no runtime; which is in my opinion a bad idea for a piece of desktop or server software, but it’s a godsend in this particular environment.

But the reason for which I was considering Thrift was I didn’t want to look into XML-RPC or JSON-RPC. But then again, Bugzilla supports only those two, and my main concern (the size of the log files) would still be a problem when attaching them to Bugzilla just as much. Since Thrift would require me to package it for Gentoo (seems like nobody did yet), while JSON-RPC is already supported in Go, I think it might be a better idea to stick with the JSON. Unfortunately Go does not support UTF-7 which would make escaping binary data much easier.

Now what remains a problem is filing the bug and attaching the log to Bugzilla. If I were to write that part of the app in Python, it would be just a matter of using the pybugz libraries to handle it. But with JSON-RPC it should be fairly easy to implement support for it from scratch (unlike XML-RPC) so maybe it’s worth just doing the whole thing in Go, and reduce the proliferation of languages in use for such a project.

Python will remain in use for the tinderbox runner. Actually if anything I would like to remove the bash wrapper I’ve written and do the generation and selection of which packages to build in Python. It would also be nice if it could handle the USE mangling by itself, but that’s difficult due to the sad conflicting requirements of the tree.

But this is enough details for the moment; I’ll go back to thinking the implementation through and add more details about that as I get to them.

Heartbleed and xine-project

This blog comes in way too late, I know, but there has been extenuating circumstances around my delay on clearing this through. First of all, yes, this blog and every other websites I maintain were vulnerable to Heartbleed. Yes they are now completely fixed: new OpenSSL first, new certificates after. For most of the certificates, though, there is no revocation issued, as they are issued through StartSSL which means that they are free to issue, and expensive to revoke. The exception to this has been the certificate used by xine’s bugzilla which was revoked, free of charge, by StartSSL (huge thanks to the StartSSL people!)

If you have an account on xine’s Bugzilla, please change your passwords NOW. If somebody knows a way to automatically reset all passwords that were not changed before a given date in Bugzilla, please let me know. Also, if somebody knows whether Bugzilla has decent support for (optional) 2FA, I’d also be interested.

More posts on the topic will follow, this is just an announcement.

The latest development in Tinderboxing: semi-automated bug filing

Beside criticising Intel for their x32 ABI I’ve spent the last few days working hard on the Tinderbox to make sure that the GCC 4.7 failures are identified quickly.

While the analysis of the logs is now vastly automated compared to earlier efforts, I’m still not using automated scripts for bug filing, and I file them by hand. How did I do that? Well, before I used delicious to store a number of bug templates for situations such as fortified sources, but this is no longer a viable option… for one simple reason. When the service was bought off Yahoo!, instead of focusing on the one reason why I’m pretty sure a number of users stopped using it (namely, the lack of a decent extension for Chrome), they decided to re-make it in a more “web 2.0” fashion…

… the problem is that it came with what I would define “web 0.2” developers, and the data import caused corruption in the (very long) URLs of those bug templates.

I had people contribute possible ways to deal with automated bug filings from the tinderbox, but all of these were fragile, because the interface between Bugzilla and the scripts has changed already too many times, and you need to be able to create the bug to attach a file to it (okay it’s not strictly true but it’s the case most of the times).

What did I decide to do then? Well, the output of the logs analysis is displayed as a webpage, through Sinatra, and when I’m doing the analysis of the logs I can come up with a little more data than before… and the very same system I used to file the bugs with templates can be used with a simple link. This means no worrying about Bugzilla authentication, credential sharing, bug filing and attachment adding.

What happens now is that near to every link for the build log, I have a [B] (which I’d love to replace with a 16×16 icon of a bug if somebody feels like finding or drawing one! — I could also use a wrench with a red bar to indicate a log with build failure, and a magnifying glass similarly barred for test failures). This is actually a link to an half-filled template, with the URL field pointing to the build log, the package name already set in the subject, and most importantly the maintainers already properly assigned.

The only remaining issue was having a way to tell the template which emerge --info posting, so for now it’s simply using a file on the system, with as name the hostname from which the log has been sent — this way I can easily set up multiple tinderboxes, after all the emerge --info output doesn’t change that often. The alternative would be to have a Portage feature that always outputs the full info at the top of the build log.

And to finish this off — I said before I can set up multiple tinderboxes with this new box. Indeed next week I’ll set a new one up, amd64 stable with hardened (right now it’s targeting ~arch and only runs hardened kernel). I was considering setting one up for x32 but then I would just end up filing bugs over bugs over bugs for no real gain and I don’t think the rest of the guys will like that.

Mister 9% — Why I like keeping bugs open.

Last week, Kevin noted that Gentoo’s Bugzilla – recently redesigned, thanks to all the people involved! – has thousands of open bugs; in particular as of today, we have over 17000 non-resolved bugs. Of those, 1500 have been reported by me, which is about 9% of the total.

For those wondering why there is this disparity, the main reason is that I still report manually (and that means that I use my account) the bugs found by my tinderbox and thus they add up. Together with that, you have to count the fact that I’m one of those people who like to keep the bugs open until they are fixed; this is why I keep insisting that if you add a workaround to a package, you should be keeping a bug open so that we know that the problem is not fixed yet, which happens unfortunately quite often for what concerns parallel build failures.

So how do you shorten the list of open bugs without losing data?

It is obviously natural for people joining a team to maintain all the general packages, or those picking up a new package to go through the bugs open for the package and try to solve/close those. And I think that is the one process that we should go through right now; the problem is that you cannot simply solve the issues with “WONTFIX” as some people proposed: the bug need to be fixed. You can close the bug with NEEDINFO if the user didn’t provide enough information (it happens with my bugs as well when I forget to attach a build log), or with TEST-REQUEST if the bug is no longer reproducible on latest version — if you do know that the bug is solved, FIXED is a good idea as well.

A number of trivial bugs are also open for areas of Gentoo that aren’t much followed at all nowadays: this includes for instance most of the mail-related packages, some easier to solve than others. Most of the time they simply require a new, dedicated maintainer that uses the software: you cannot really maintain a package you stopped using, and I speak by experience.

Most of the bugs opened at least by the tinderbox also fall into the 5 minutes fix category that I so much loathe: they look so trivial that people are prone to simply ignore need for action on them, but when you do look into them you end up with a very messed up ebuild, which makes you spend hours to get it right — when they don’t uncover that the upstream code or build system is totally messed up to the point that it would have to be rewritten to work properly.

Unfortunately, simply ignoring the “nitpicks” QA bugs is not a good idea either: more often than not, presence of these bugs show that the package is stale, and the fact that they are relatively easy to solve only goes to show that the package hasn’t been cared for too long. Closing them out of lacking interests is also not going to solve anything at all; it’ll just mean that they’ll be hidden, and re-opened next time the tinderbox or an user merged them.

So rather than closing the bugs that look abandoned, waiting for other people to report them again, the solution should be to fix them. Closing them will just ensure that they’ll be reported once again, increasing the size of the Bugzilla database, which is a more serious problem than just inflating a counter.

So if the bugs look too many… proxy-maintain new packages!

Using reports for bugs autofiling

In my previous post I’ve written about my thoughts about developing a system to automatically produce report bugs out of a compilation failure from the tinderbox. While the thought was born out of a need in the tinderbox, after writing about it, I start to think it might actually be useful to implement in a much more generic scale, so that it can be used by users as well.

The idea is that if you can actually extract all the most important information about a merge failure in a single report, it’d be also possible to use that report to open a bug report on the bugzilla. Since you know the package name and version, you can also found the proper maintainer, making sure you’re using the latest version of the metadata file. Since you know the revision, you can also compare it with the currently-available version on some websites, so to stop the user from reporting a bug that might have been fixed already.

Furthermore, by filing bugs through an automatic or semi-automatic system you can also provide a standard summary line for bugs, or make use of the status whiteboard, so that the auto-filing feature can also check for eventual already-reported bugs (this would also turn out quite useful in the tinderbox).

The whole point of me writing down these ideas is that they might sound farfetched, and they did to me for a long while, but the truth is that you just need to break through the doubts, breach the barrier finding one place to start attacking it, and I think I did; starting from what I need for the tinderbox now, I really want to develop the idea of report files, and then start working with somebody, and with pybugz, so that the auto-filing feature can actually be tested somewhere. This is something that we could have by the end of the year, in my opinion.

But there is more to this; there are times when just knowing versions and USE flags don’t just work, for build failures you need to get some more build logs; for instance I already noted that econf produces a config.log, and that both epatch and the autotools.eclass functions create logs in the temporary directory of the package. While for these examples we could just wire the support up, there are many other packages that don’t use autotools, and thus won’t be helped by these. We could probably wire up CMake as well, since it is becoming widespread, but the rest?

Indeed there are packages with custom build systems, like FFmpeg, that have their own logs for the configure stage, and others, like a Python extension I sincerely forgot the name of, that hides the errors from the default build log and force you to pick up another file to find the actual errors (I find this pretty stupid, but that’s what you’re given to work with).

What I propose is then adding a new function that can either provide a list of files to attach, or even generate further content that is needed to be bale to file a bug for a build failure of that package (think about non-gcc compilers settings, like ghc and fpc, or the Java compilers). With that inside the ebuild you could really write up a complete report, and make bug filing an almost entirely automated process, which would mean more bugs filed by users (rather than the tinderboxes) and a much more pleasing tinderbox setup (given somebody can actually wrangle the bugs up for the latter case).

Anybody up to the task of helping me out here?

Why you use trackers rather than multi-assigned bugs

Today seems to be the day of multi-assigned bugs, so I decided to write a few words on why you should not file a multi-assigned bug but rather file a tracker and then multiple bugs, which is what I do for the issues I found with tinderbox (and not just that) and the way I reached over 1500 bugs open at one point.

While creating new bugs means that you got more work to do, and even the bugzilla database gets loaded more, mutli-assigning a bug (that is creating a bug with a list of ebuilds involved and CCing all the maintainers of those ebuilds) means overloading the mail server and even worse the mailboxes of the developers involved. For each comment (useless or not) you get to send multiple emails, and it’s not extremely unlikely that a dev gets to receive multiple copies of the same comment if he’s on many aliases.

Even more obnoxious is when you get in a fight on whether that’s the correct way to proceed for one given package, and you get to have back and forth arguments between the filer (or the QA team, or whoever) and the maintainer of that package that get sent to multiple developers who are not involved and who are most likely not interested in the problem. Even worse, you have no clear timeline on when a bug was corrected for a given package, since the history is merged and you got to check on when someone commented, and de-CCed an alias from the bug.

The tracker method allows for the tracker to be visible by the QA team, which can monitor which bugs get fixed and which gets ignored, and each team only gets to see what they have to be concerned with. And to file the actual bugs, you can just use the ”Save as bookmarkable template” method that I also use to file my bugs.

So please use trackers and templated-bugs, I beg you!

Manual template-based bugs

Today I filed a new batch of mass-bugs, this time related to the -Werror compiler flag that I talked about last week. To file the bugs, I used once again the output of the tinderboxing run that is being executed in the chroot that is (still) called “asneeded” (it started as a way to test how much stuff breaks with --as-needed, but is now a generic tinderbox).

Unfortunately, since it’s not a coincidence I went to talk about -Werror last week, but is related to the new warnings of gcc 4.3.3 which were added by Gentoo, a few packages went fixed between that time and now and the result is that I ended up filing at least two duplicates for stuff that have been fixed already.

Now, just so that people don’t assume that I’m not caring about the time spent by the other developers looking at bugs, I’d like to explain why it ended up like this. The tinderbox is running since last week. Since I check for the packages that have not been installed, or that needs to be updated, or that haven’t been merged in the past six weeks, with each iteration, I don’t usually stop and resume it or it would repeatedly re-compile the same packages (well, not as much as it did the first times, now the order of merge of the packages is random, rather than alphabetical). I also don’t like to stop it and resume it from where it left with a sync in the middle because it can cause further problems. So at the end of the day I just stay without syncing until I need to sync to update my own packages or something.

Since I don’t have updated logs for every package, I tried checking all of the packages merged against the synced tree to see if the problem was fixed already, unfortunately this leaves me with a blind spot of about a week: all the bugs fixed during that one week I will hit nonetheless and results are duplicated bugs.

Yes of course I could check the CVS for each of them, but it would take much more time, to a point where the amount of time I’d have to spend to file a single bug would really be too high for me to accept doing it without an explicit reason. So I usually value the collaboration between developers, and hope they accept the eventual duplicated bug. Which usually happens.

Now, what many people suggested before is that we get the thing much more automated, with automated bugs being filed, logs being mined automatically and so on so forth. My last stake at that has actually made me file more duplicated that doing it manually. Sure it could be possible to fine tune the reporting further so that it would not find false positives, but as Duncan said, I should do what I’m good at, and sincerely log analysis software is not my area of expertise, nor I’d like for it to be sincerely.

Sure there have been quite a few projects related to automated build and reporting of problems in Gentoo, if I remember correctly one was in the Google Summer of Code projects from last year. But I haven’t seen much bugs filed through that, I think the ones that filed bugs for mostly-unknown and mostly-unused packages have been Patrick and me, which says that there is something missing here.

Now, please consider I do have much better things to do than reading through logs manually to check whether -Werror is used for configure-time checks (like in xine and other autotools-based software I work on, just as an example), or if it’s actually used during build. And that I do quite know that most of my bugs are pre-emptive, and show issues that have not been hit yet, but that’s the whole point. If I report to you the bug before it hits users, I’m doing so at my own expense of time to avoid you and the users from wasting more time when the problem arises and you don’t know what to do.

So, I beg you, bear with me if I file a few dupes from time to time, okay?

Why I like my bugs open

Even though I did read Donnie’s posts with his proposal on how to better Gentoo, I don’t really want to discuss them at the moment, since I have to admit that Gentoo politics is far from what I want to care right now, I have little time lately and that little time should not be wasted in discussing non-technical stuff, in my opinion, since what I’m best doing is technical stuff. Yet, I wanted to at least discuss one problem I see with most non-technical ideas about how to improve Gentoo: the bug count idea.

Somehow, it seems like management-kind people like to use the bug count of something as a metric to decide whether it’s good or not. I disagree with this ferociously because it really does not say much about software, if we consider as closed bugs that have just been temporarily worked around. It would be like considering the count of lines of code as a metric to evaluate the value of software. Any half-decent software engineer know that the amount of lines of code alone is pointless, and that you should really consider the language (it really changes a lot if the lines can contain one or twenty instructions each), and the comment-to-code ratio.

If the Gentoo developers were evaluted based on how many open bugs there are or on the average time a bug is kept open, we’ll end up with an huge amount of bugs closed for “need info” or “invalid” or “works for me” (which has a vague “you suck” idea), and of course “later” which is one kind of resolution I really loathe. The problem is that sometimes, to reduce the impact of a bug on users, you should put a quick workaround, like a -j1 in the emake call, or you’ll have to wait for something else to happen (for instance you might need to use built_with_use checks until the Portage version that supports EAPI 2 is stable.

Here the problem is that the standard workflow used in Bugzilla does not suite the way Gentoo works at all. And not everybody in Gentoo follows the same policy when it comes to bugs. For instance, many people would find that any problem with the upstream code does not concern Gentoo, others feel like crashes and similar problems should be taken care of, but improvements and other requests should be sent upstream. Some people think that parallel make is a priority (I’m one of them) but others don’t care. All in all, before we decide to measure the performance of developers based on bugs, we should first come over with a strict policy on how bugs are handled.

But still, open bugs mean that we know there are problems, and might or might not mean that we’re actively working on solving them, it might well be that we’re waiting for someone else to take care of them for instance. How should we deal with this?

I sincerely think that we should see to add more states to the bugs state machine. For instance we could make so that there’s a “worked around” state, which would be used for stuff like parallel make being disabled, --as-needed turned off and similar. It would mean that the bug is still there and should be resolved, but in the mean time users shouldn’t be hitting it any longer. Furthermore, we should have a way to take a bug off our radars for a while by setting it “on hold” till another bug is solved. For instance, the new Portage goes stable so we can deal with all the bugs related to EAPI 2 features being needed.

Then we should make sure that bugs that are properly resolved, and confirmed so, are closed, so that we can be sure that they won’t come up again. Of course it can’t be the same person who has marked the bug as resolved to mark it as closed, but someone else, in the same team, or the user reporting the bug, should be able to confirm that the resolution is correct and the bug is definitely closed.

But the most important thing for me is to take away the idea that bugs are a way to measure how software sucks and consider them instead a documentation of what has yet to be done. This is probably why trackers different from Bugzilla often change the name of the entries to “tasks” or “issues”; language has a high psychological impact, and we might want to deal with that ourselves.

So, how many tasks have you completed today?