What’s wrong with Gentoo, anyway?

Yesterday I snapped and declared my intent to resign from Gentoo together with stopping the tinderbox and leaving the use of Gentoo either. Why did that happen? Well, it’s a huge mix of problems, all joined together by one common factor: no matter how much work I pour into getting Gentoo working like it should be, more problems are generated by sloppy work from at least one or two developers.

I’m not referring about the misunderstandings about QA rules, which happens and are naturally caused by the fact we’re humans and not being of pure logic (luckily! how boring it would be otherwise, to always behave in the most logical way!). Those can upset me but they are still after all no big deals. What I’m referring to is the situation where one or two developers can screw up the whole tree without anybody being (reasonably) able to do a thing about it. We’ve had to two (different) examples in the past few months, and while both have undeniably bothered QA, users, and developers alike, no action has been taken in any of these cases.

We thus have developer A, who decided that it’s a good idea to force all users to have Python 3 installed on their systems, because upstream released it (even when upstream consider it still experimental, something to toy with), and who kept on ignoring calls for dropping that from both users and developers (luckily, the arch teams are not mindless drones, and wouldn’t let this slide to stable as he intended in the first place). The same developer also hasn’t been able to properly address one slight problem with the new wrapper after months from the unleashing of that to the unstable users (unstable does not mean unusable).

Then we have developer B who feels like the tree’s saviour, the only person who can make Gentoo bleeding edge again… while most of if not all of the rest the developer pool is working on getting Gentoo more stable and more maintainable. So, among the things he went on doing, there was a poorly-performed Samba bump (suboptimal was the term he used — I ended up having to fix the init scripts myself because they weren’t stopping/restarting properly, as the ebuild and the init scripts went out of sync regarding paths), some strangely incomplete PostgreSQL changes, and a number of minor problems with the packages.

Of the two, I was first upset most by the former, but on the long run, the latter is the one who drove me mad. Let’s not dig too much on the stance about --as-needed (cosmetics — yeah because being able to return from a jpeg bump with less than 100 packages, rather than the whole world, is just cosmetics), and the fact that he’s ignored most of the QA issues with the packages he touched. Instead look at the behaviour with a package of mine (alas, I made the mistake of let this one slip with just a warning, I should have taken the chance to actually defer it to devrel…): vbindiff.

The package is something I added a while ago because from time to time it comes out useful. I’m in metadata.xml; I’m definitely not an unresponsive maintainer. Yet, while my last bump was on June 2008, the version in tree was not the latest one up to last September (2009). Why? A quick glance at the homepage shows that the beta4 release was mostly fixing a Win32 bug, and introducing a way to enable debug-mode. So what happens? Our mighty developer decides to go on and bump the package; without asking me; with nobody asking him; without a mail, a nod or anything. I literally notice this as emerge tries to upgrade a package I know I maintain. You’d expect for the debug support to be present in the ebuild then, and you’d find a debug USE flag if you checked now indeed, but that’s something I added myself afterwards, as the damage of pointlessly bumping something was already done.

Now, why did that happen? Well, he admitted he just went through the dev-* categories, without considering maintainers declared in metadata, and blindly bumped ebuilds when the latest version available on the site was higher than the one in tree. Case in point he had to open the vbindiff site and thus the release notes regarding Win32 and --enable-debug would have been clearly visible, if he cared to even read part of them. Whoever tried doing serious ebuild business should know that most of the time even the upstream-provided release notes are not something to go on by… Interestingly enough, his bleeding-edge hunger didn’t make him ask for a new stable, and we currently have a very old one.

So there we have your developer B, the super-hero, the last good hope of the bleeding edge, who bumps packages without consulting the guy who maintain them (and is around almost 247) and without even caring to use them at all. Why did I let it slip? Because I was most focused on trying to stop developer A at the time is probably the right answer. I did issue a reprimand reminding him to not touch someone else’s packages, and to learn using package.mask for things like Samba. I was hoping he would listen. Oh boy, was I ever so wrong.

Speaking a second again about Samba, did I tell you yet that the split into multiple packages was done, straight to ~arch, without any plan to follow-up to convert dependencies? Wonder why the whole thing is now stalemated again. Maybe the arch teams don’t see it all too well to have the same kind of dependency breakage in stable as there was/is on unstable right now.

First-hand information about our developer B wants him to be inlined with a zealot point of view regarding the Mono project — you’d then guess that dotnet stuff would be the last thing he’d be touching, but instead, without any questioning, ignoring the fact I stated at FOSDEM that I was going to look into that as soon as I had time, the fact that I stated before multiple times that I was already working on un-splitting the gtk-sharp packages, and the fact that I took contact with the Mono developers (again at FOSDEM) to try following upstream more closely. Oh and the one thing that pissed me off about that bump? Beside the fact that tomboy now refuses to work? Remember this patch? It was dropped; without even mailing me if I had or could make a version for the latest version. It was dropped in unstable (or, how it should be called if this kind of stuff is allowed to continue, unusable).

And the cherry on top? As I said, this developer touched Samba, PostgreSQL, now Mono… there are three aliases for these things (samba, pgsql-bugs and dotnet), who the bugs are assigned to… he’s on none of them! And before somebody tries to argue that, I’m pretty confident he’s not following the aliases on the Bugzilla (plus, given he also argued that the problem was with leaving security-vulnerable stuff in the tree – which by the way means having working, complete, safe ebuilds to be able to mark stable, and he doesn’t seem to be able to come up with any of those – the most important security bugs don’t get sent to watchers). How does he suppose to see the bugs coming? Oh but by wrangling the bug himself! Yeah, after all developers don’t file bugs themselves assigning them straight to the maintainers by procedure, do they? (fun fact: Bugzilla queries report at most 5K bugs, so that list is a very much limited result from what I was hoping to get); nor do other developers ever wrangle it would be silly, and there is no Arch Tester to speak of, right?

You can now see most of the pictures, and why I’m mostly upset with developer B. What made me snap yesterday were remarks that insisted that I was just “whining” and “not doing enough” as bugs kept piling up. What the heck? I constantly had over 1000 bugs (over 1300 today) for the past year or so, I know very well that bugs keep piling up! And I’ve been doing all I can do outside of my work hours (while I have to thank some people, including Paul, David, Simon, Andrew and Bela for their contributions, I’m not paid to do Gentoo work; and while I do get to use it, and thus contribute back to, for some of the jobs I take, it’s definitely not the same as working on Gentoo), including the whole RubyNG porting and improvement trying to make sure we can actually get to a point where unmasking Ruby 1.9 will not break any user whatsoever. Am I really doing too little? ”Not enough”?

Okay so the proper way to handle this, with the current procedures, would be to take this up to the Developers’ Relations so that they could act on it; QA can only ask infra to restrict commit access if we’re expecting a grave and dangerous breaking of the tree, or misuse of commit rights. So why didn’t I bring this up to devrel? Well, the main reason is that devrel nowadays, as far as I can tell, is exactly three people: Petteri, Denis and Jorge, and of the three the only one who’s for preventive suspension of commit rights is Denis (this has been proven with the case about developer A above); one out of three does not really sound much of a chance for this to improve the situation. And if – again as happened with developer A – DevRel then decided that the right action would be to issue a reprimand, that would amount to scolding the developer and asking to work more with others… well, it wouldn’t change a thing.

The whole QA system has to change! We’ve got to write down guidelines, rules, and laws, and be conservative in applying them. You shouldn’t go around breaching them and then appealing when QA finds you out of line, you should talk with QA if you feel the rule is misapplied to your case in any way.

So here you go, in a nutshell, why my preservation instinct right now is telling me to flee. I’m not sure yet if I’ll outright flee or just give it time for the situation is addressed and then decide. The reason is: I still like the Gentoo system, and since I rely on it for my work I cannot leave it alone; if I were to move to anything else I would have to spend (waste?) even more time to fix the same issues anyway, and I’d much rather get Gentoo working right. But I cannot do this alone, I cannot do this especially if I have support neither from developers nor users. So please voice your concern.

If you feel like Gentoo needs the better QA, if you feel like we shouldn’t be translating unstable to unusable, then please ask for it. I’m not saying that we should become stale like Debian stable, but if it takes a few months to get something straight, then it should take its time and not be forced through (that’s what the Ruby team has been doing all this time to work with Ruby 1.9 and Ruby EE and other implementations as well!). If you use Twitter, identi.ca, Digg, Reddit, Slashdot, whatever, get this post running. Maybe I’m subverting the process, but to quote BBC’s NewsQuiz, “Trial by media is the most efficient form of justice” (this was in reference to the British MP expenses scandal last year), and right now my only concern is effectiveness.