The myth of the perfectionist QA

There is a bad trend going on that seems to purport Gentoo’s QA (which for the past couple of years has meant mostly me alone), as a perfectionist hindering getting stuff done — and I call all of this utter bull feces. It’s probably telling that the people who seem to then expect that every single issue has an explicit written rule, with no leeway for different situations.

So let me give you some insights so that you can actually get a realistic clue of what’s going on. Rich is complaining that I made it a task of mine to make sure that the software in Portage don’t use bundled libraries; for some reasons, he seems to assume that I have no clue on how cumbersome it is to deal with said bundled libraries, and he couldn’t be more wrong. You know what my first bundled libraries project? xine. And it took me quite a long time to get rid of (almost all) the bundled libraries; even more so because many of said libraries were more or less heavily modified internally. Was it worth it? Totally. It actually made the xine package much more reliable to security issues (and there had been quite a few), as well as solving a number of other issues, including the most infamous problem with the symbols’ collisions between the system’s libfaad (brought in by ffmpeg) and the internal copy of it.

So, I know it’s a lot of work, and I know it’s not always a task for the faint of heart, and most of the times there is no immediate improvement by fixing those. Why am I insisting on it as a point of policy? Because we have quite a few examples with software being bundled for which vulnerabilities were found and could be leveraged by an attacker, especially for what concern zlib and software that either downloads or receives compressed content over the network.

So from what Rich wrote, we’re trying to hinder getting stuff in tree by refusing it if it bundles libraries. That’s so much of a lie that it’s not even funny. We have had packages entering the tree with bundled libraries, and I’m pretty sure there still is a ton of it. Heck, do you remember what I wrote about Blender just last month? Blender has a number of bundled libraries, and yet it’s in tree, I maintain it, and it’s going stable from time to time.

What is important in that scene? That I’m trying to get rid of said libraries. The same applies to Chromium and most other packages that have a ton of bundled libraries; most packages are responsible enough, and generally know enough about the package that they can work on getting rid of said libraries, if that’s feasible at all — in the case of Chromium it’s an extremely difficult task I’m sure, mostly because upstream does not care the least, and that was clear at last VDD when we were discussing the patches applied over ffmpeg/libav.

So let’s get into the specific details for the complains, as Rich’s “just an example” is not an example of what happens at all. There is a server software written by Logitech for their SqueezeBox devices, that is now called logitechmediaserver-bin in Gentoo. Said package has been known to be a bundled for years — Logitech is bundling a large set of CPAN modules with it, as it seems like the bulk of the code is written in Perl or something like that. I’ve known it at quite a few versions to bundle a CPAN module that, in turn, bundled an old copy of zlib, one that is vulnerable to a few different issues. Right now, it’s not only bundling a bunch of modules, but I found out that it installs a number of files for architectures that are completely incompatible with your system (i.e. a PowerPC binary on amd64). This bothered me quite a lot more. Why? Because it means that the (proxy) maintainer is not discriminating at all, and just installing whatever the upstream package comes with. Guess what? That’s not a good way to package the software.

When the proxy maintainer is not doing stuff that gets even nearby the quality level of most of the rest of the tree, and the Gentoo developer who should be proxying him ignoring the problems, things are messy enough. But it gets worse when you get a person known for bad patches sold as fixes from over three years ago, who still expect to be able to call the shots.

If you see bug #251494 he’s actually suggesting marking the package stable because it’s not going to run with the current unstable Perl which is a completely backward reason (if it can’t work with the latest Perl, it means that it hasn’t been tested in testing for a long time), and is going to create more trouble later on (the moment when the new Perl goes stable, we have a known broken, unfixable package in stable). But he’s Daniel Robbins, and some people expect that that’s enough for him to be right — if that was the case, I suppose he’d still be the Gentoo Linux lead, and not just of a wannabe project.

Anyway, here’s the deal: QA policies are more like a guideline, in general. On the other hand, there is no reason why we should be forced to mark things stable if they do not follow the common QA policies — especially for proprietary software, and software requiring particular hardware, marking things stable is not that great an idea, as they tend to be orphaned quite easily if a single developer retires. We already have quite enough packages stable in x86, ppc and sparc that are not really able to run, because they were always broken, but has been stabled in ancient times. Sometimes, we even have packages keyworded that cannot be used on a given arch, but they do build, and the failure would only happen at runtime, so they have been, again in ancient times, keyworded.

Maybe this is what people who want to follow Daniel expect: coming back to the “it builds, ship it!” mentality that made Gentoo a joke in the eyes of almost everybody at the time. For sure, it’s not what I want, and I don’t think it’s what the users as a whole group want or need.

Tinderbox and manual intervention

So after my descriptive post you might be wondering what’s so complex or time-requiring in running a tinderbox. That’s because I haven’t spoken about the actual manual labor that goes into handling the tinderbox.

The major work is of course scouring the logs to make sure that I file only valid bugs (and often enough that’s not enough, as things hide behind the surface), but there are a quite a number of tasks that are not related to the bug filing, at least not directly.

First of all, there is the matter of making sure that the packages are available for installation. This used to be more complex, but luckily thanks to REQUIRED_USE and USE deps, this task is slightly easier than before. The tinderbox.py script (that generates the list of visible packages that need to be tested) also generates a list of use conflicts, dependencies etc. This list I have to look at manually, and then update the package.use file so that they are satisfied. If their dependencies or REQUIRED_USE are not satisfied, the package is not visible, which means it won’t be tested.

This sounds extremely easy, but there are quite a few situations, which I discussed previously where there is no real way to satisfy requirements for all the packages in the tree. In particular there are situations where you can’t enable the same USE flag all over the tree — for instance if you do enable icu for libxml2, you can’t enable it for qt-webkit (well, you can but you have to disable gstreamer then, which is required by other packages). Handling all the conflicting requirements takes a bit of trial and error.

Then there is a much worse problem and that is with tests that can get stuck, so that things like this happen:

localhost ~ # qlop -c
 * dev-python/mpi4py-1.3
     started: Sat Nov  3 12:29:39 2012
     elapsed: 9 hours, 11 minutes, 12 seconds

And I’ve got to keep growing the list of packages whose tests are unreliable — I wonder if the maintainers ever try running their tests, sometimes.

This task used to be easier because the tinderbox supports sending out tweets or dents through bti so that it would tell me what was its action — unfortunately identi.ca kept marking the tinderbox’s account as spam, and while they did unlock it three times, it meant I had to ask support to do so every other week. I grew tired of that and stopped caring about it. Unfortunately that means I have to connect to the instance(s) from time to time to make sure they are still crunching.

I need help publishing tinderbox logs

I’m having a bit of time lately, since I didn’t want to keep one of the gigs I’ve been working on in the past year or so… this does not mean I’m more around than usual for now though, mostly because I’m so tired that I need to recharge before I can even start answering the email messages I’ve received in the past ten days or so.

While tinderboxing lately has been troublesome – the dev-lang/tendra package build causes Yamato to reach out of memory, causing the whole system to get hosed; it looks like a fork bomb actually – I’ve also had some ideas on how to make it easier for me to report bugs, and in general to get other people to help out with analysing the results.

My original hope for the log analysis was to make it possible to push out the raw log, find the issues, and report them one by one.. I see now that this is pretty difficult, nearing infeasible, so I’d rather try a different approach. The final result I’d like to have now is a web-based equivalent of my current emacs+grep combination: a list of log names, highlighting those that hit any trouble at all, and then within those logs, highlights on the rows showing trouble.

To get to this point, I’d like to start small and in a series of little steps… and since I honestly have not the time to work on it, I’m asking the help of anybody who’s interested in helping Gentoo. The first step here would be to find a way to process a build log file, and translate it into HTML. While I know this means incrementing its size tremendously, this is the simplest way to read the log and to add data over it. What I’d be looking for would be a line-numbered page akin to the various pastebins that you can find around. This does mean having one-span-per-line to make sure that they are aligned, and this will be important in the following steps.

The main issue with this is that there are build logs that include escape sequences, even though I disable Portage’s colours as well as build systems’, and that means that whatever converts the logs should also take care of stripping away said sequences. There are also logs that include outputs such as wget’s or curl’s, that use the carriage-return code to overwrite the output line, but creates a mess when viewing the log outside a terminal — I’m not sure why the heck they don’t check whether you’re outputting only on a tty. There are also some (usually Java) packages whose log appears to grep as a binary file, and that’s also something that the conversion code should have to deal with.

As a forecast of what’s to come with the next few steps, I’ll need a way to match error messages in each line, and highlight those. Once they are highlighted, just using XPath expressions to count the number of matched lines should make it much easier to produce an index of relevant logs… then it’s a matter where to publish these. I think that it might be possible to just upload everything here in Amazon’s S3 system and use that, but I might be optimistic for no good reason, so it has to be discussed and designed.

Are you up to help me on this task? If so, join in the comments!

I wouldn’t want to be a mirror admin

I’ve noted in my previous post that I recently built a 12TB storage server; half a terabyte has been already reserved for Gentoo’s distfiles, both as a local mirror to update my boxes without having to re-download everything, and because the tinderboxes require a lot of distfiles by definition (since they build the whole tree).

The original way I used to download all the files was to simply pass the whole list of files to emerge -Of that ran in parallel with the tinderbox process.. unfortunately this has shown to be of limited reliability; in particular due to REQUIRED_USE there are situations where a newly-introduced requirement will cause the fetch not to behave, and thus will slow down tinderboxing of new packages while the new files are fetched. Plus, if the tinderbox masked all the version of a particular package (which it can do when no version of said package builds in its environment), and I passed it to emerge -f, it wouldn’t fetch anything at all — you can’t really run a single emerge -f command, as the command line arguments limit is hit much sooner, so xargs splits it into multiple, serial calls. And as a final straw, whenever the tinderbox has to fallback to an older version of a package, it’ll have to find that distfile as well, which might not be in its cache already.

To solve all these issues and make good use of the new box that stores the data, I was given by Zac the set of infra scripts that are used to manage distfiles; in particular the mirror-dist script, written by Brian a long time ago, is the one that takes care of fetching the packages from the upstream sources and add them to the master mirror. Looking at its output I’m .. honestly scared.

Let’s begin with the whole kernel.org issue: you probably already know that their master server was compromised and all the attached services have been disabled, including the network of mirrors for both kernel- and non-kernel-related software (among others, Linux-PAM is also hosted at kernel.org). Well, this means that all the upstream fetch URIs for those packages are unusable. Due to the nature of the mirror-dist script, it was obviously not going to fetch the packages out of Gentoo mirrors, until I asked Fabio to hack around it (I’m no good with Python), and get the packages from Gentoo mirrors first, so until that point, it was unable to fetch any package released on kernel.org. Lovely.

There is a second condition that is outside of Gentoo’s control that is causing headaches to this, and probably to our mirror admins as well. It hasn’t gotten as much coverage as the whole kernel.org issue, but FSF found themselves not in compliance with the GPL, with respect to binutils, as some intermediate output was provided in the tarballs without the original sources used to generate that. So what did they decide to do? Revoke all the tarballs and replace them with a new release with new version numbers? No. Reissue them with an appended “a” noting it? No. They decided to simply rewrite all of them. Same filename, same URL, but different content. Congratulations for the headache you’re causing us!

But kernel.org projects, and GNU packages, are definitely not the only type of packages that have trouble with fetching; a number of upstream repositories no longer allows packages to be downloaded, and this causes major headaches if you don’t want to rely on the Gentoo mirrors’ network.

It has been proposed many times before to fix the SRC_URI variable for the packages that point to unfetchable sources. I even opened a wishlist bug for it to check (with HTTP’s HEAD method) whether the file is available upstream or not (and the same goes for the homepage). Unfortunately I lack the Python skills to implement this and nobody else seems to be interested in this. I would have suggested this for GSoC, but .. let’s not go there, please.

But if you have the skills, and the time, having repoman check for the availability of the files before committing would be a godsend — it would have, for instance, prevented committing one system and two system-related packages in the past months without their respective patchsets. Well, if we were to also ban direct access to mirror://gentoo/ of course.

Are there still qmail users in Gentoo?

This might sound like a stupid question, but looking at the tinderbox results, it looks like qmail is in need of a new maintainer, or set of maintainers.

Let’s start with the old app-doc packages that are still lingering around since 2002, taking app-doc/ucspi-tcp-man as an example. It was originally installed as a separate package, but since 2008, with the 0.88-r17 revision of sys-apps/ucspi-tcp it was merged in. But it took two year to start a stable request by the maintainer and it was only this year that it finally went stable. A similar issue happened with daemontools and both of those required me to go through and remove the old ebuilds and the -man packages that cause so much trouble with my tinderbox setup.

But there are many more issues with the related packages; for instance the huge list of collisions on man pages only for many qmail-related packages, or the fact that some ebuilds have bogus src_test functions (which I’m now removing myself without even waiting for maintainers at this point).

All in all, the long series of bugs, some of which appears to be also security-related, makes me wonder if we wouldn’t need a more “hands-on” maintainers team for qmail and the rest of the djb software, which usually seems to be quite tightly related. So if you are a qmail user in Gentoo, please look at the proxy maintainers project that Markos announced two days ago, your help will be appreciated.

Rinse and repeat: tinderbox foresees slight libpng-1.5 trouble

As I quickly noted yesterday I started testing libpng-1.5 in the tinderbox to identify the packages failing to build with the new version. As it turns out there are quite a few as usual.

The problem turns out to be more complex than I’d have expected, though. My first step after installing the new libpng version was to make sure that no dangling package was present in the system, by running revdep-rebuild -- -C which supposedly should have removed all the libpng-using packages. I’ll be honest and say here that I was way too naïve and I shouldn’t have done so as first step at all.

While revdep-rebuild identified all the packages that installed .la files referencing libpng, which are quite a lot more than they should be, as I noted in the above-linked post, it didn’t seem to identify the broken ELF linking at all. I’m not sure why that is, maybe revdep-rebuild in the tinderbox is broken, or some package installs a wrong mask entry, but no matter what’s the case, I ended up having all the packages without .la files but with ELF linking to libpng14.so.14 still installed.

Thankfully, I whipped up quickly a scanelf/@qfile@ pipeline that provided me with a list of linked packages, which I could then remove and … compare with the reverse dependencies as produced by the QA Reports — turns out that there are packages linking to libpng (actually using it) and not listing it as a dependency. Reminds me of transitive dependencies that I discussed… over an year ago.

Unfortunately, it also appears to be a problem for the upstream code. One of the common failure has to do with zlib.h not being included by png.h any longer, which is a positive thing (as it stops entangling two libraries’ headers), but there are at least a few packages that used zlib’s interfaces without including zlib.h themselves. Again, isn’t it lovely, how people pretend not to include the headers they import functions and constants from? Unfortunately, while this is easy to notice for packages using constants such as Z_BEST_COMPRESSION, it might leave the compiler with just an implicit declaration (which might be further problem on 64-bit systems) if functions are involved.

And this is not all, of course. Even though I’m doing my best at covering as many packages as possible in the tinderbox, the guest I’m using for the builds right now is far from having a 100% coverage. The main issue is that it’s still testing glibc-2.14 — and I’m still finding packages that try to use RPC code and fail, I wouldn’t have guessed so many packages used it. And that also means I’m lacking Ruby altogether, as it fails badly during its own bootstrap, which in turn means that there are my own packages that I can’t test properly right now. I guess the second round of libpng-1.5 testing will happen in the 64-bit tinderbox that is still running glibc-2.13.

Back to work, I guess.

Library SONAME bumps and .la files: some visual clues

Before going on with the post, I’ll give users who’re confused by the post’s title some pointers on how to decipher it: I discussed .la files extensively before, and you can find a description of SONAMEs in another post of mine.

Long- and medium-time Gentoo users most likely remember what happened last time libpng was bumped last year, and will probably worry now that I’m telling them that libpng 1.5 is almost ready to be unmasked (I’m building the reverse dependencies in the tinderbox as we speak to see what breaks). Since I’ve seen through it with the tinderbox, I’m already going to tell you that it’s going to hurt, as a revdep-rebuild call will ask you to rebuild oh-so-many packages due to .la files that, myself, I’ll probably take the chance to move to the hardened compiler and run an emerge -e world just for the kicks.

But why is it this bad? Well, mostly it is the “viral propagation” of dependencies in .la files, which by itself is the reason why .la files are so bad. Since libgtk links to libcairo, and libcairo to libpng, any other library linking with libgtk will be provided with a -lpng entry to link to libpng, no matter whether it uses it or not. Unfortunately, --as-needed does not apply to libtool archives, so they end up overlinking, and only the link editor can drop the unused libraries.

For the sake of example, Evolution does not use libpng directly (the graphic files are managed through GTK’s pixbuf interface), but all of its plugins’ .la files will refer to libpng, which in turn means that revdep-rebuild will pick it up to rebuild it. D’oh!

So what about the visual clue? Well, I’ve decided to use the data from the gold based tinderbox to provide a graph of how many ELF objects actually link to the most common libraries, and how many libtool archives reference them. The data wasn’t easy to extract, mostly because at a first glance, the .la files seemed to be dwarfed by the actually linked objects.. until I remembered that ELF executable can’t have a corresponding .la file.

Library linking histogram

I’m sorry of some browsers might fail to display the image properly; please upgrade to a decent, modern browser as it’s a simple SVG file. The gnuplot script and the raw data file are also available if you wish to look at them.

The graph corroborates what I’ve been saying before, that the bump of libraries such as libexpat and libpng only is a problem because of overlinking and .la files. Indeed you can see that there are about 500 .la files listing either of the two libraries, when there are fewer than a hundred shared objects referencing them. And for zlib it’s even worse: while there are definitely more shared objects using it (348), there are four times as many .la files listing it as one of the dependencies, for no good reason at all.

A different story applies to GLib and GTK+ themselves: the number of shared objects using them is higher than the number of .la files that list them among their dependencies. I guess the reason here is that a number of their users are built with non-libtool-based build systems, and another good amount of .la files are removed by the less lazy Gentoo packagers (XFCE should be entirely .la free nowadays, and yes, it links to GTK+).

Now it is true that the amount of .la files and ELF files is not proportional to the number of packages installing them (for instance Evolution installs 24 .la files and 69 ELF objects), so you can’t really say much about the number of packages you’d have to rebuild when one of the three “virulent” libraries (libpng, libexpat, libz) is installed, but it should still be clear that marking five hundreds files as broken simply because they list a library that is gone, without their respective binary actually having anything to do with said library, is not the best approach we can have.

Dropping the .la file for libcairo (which is where libgtk picks it up) should probably make it much more resilient to the libpng bumps, which have proven to be the nastiest ones. I hope somebody will step up to do so, sooner or later.

Tinderbox limits: blockers, collisions and conflicting requirements

For those who still don’t know that; my tinderbox is simply a properly configured container, based on LXC, that builds and installs, one by one, all the fifteen thousands packages available in Gentoo’s Portage tree — or at least those that are keyworded for either ~x86 or ~amd64 depending on the running instance.

This task is obviously time-consuming (which is why I keep shooting down the proposals of moving the tinderbox over to EC2: in my experience I/O is simply too slow for I/O-bound processes such as builds, and the fact that you pay for time doesn’t make it any better — S3 might be a better idea to store logs though, that is true), and it gets worse as the tree keeps growing. But that’s not half of it.

The most obnoxious issue relates to packages that provide (more or less) the same functionalities. Since Gentoo never implemented a generic and integrated system for alternatives – something I still envy Debian for – when two packages provide the same interface, or the same command, you end up with either choosing only one of them as the provider, or implementing a custom eselect module for each of them, neither of which is tremendously fun to consider. The third solution is the straight one, which is the fastest to implement but, from my point of view, the worst: you add blockers.

Blockers in Gentoo ebuilds simply mean that two packages cannot be installed at the same time in the same system, which is “perfect” when two packages both provide the same interface, such as jpeg and jpeg-turbo. But at the same time, this is bothersome when two packages install the same file… but with different meaning, which means you have to take one branch of the deptree and cull it out. And this is unfortunately common with too generic names for libraries, such as libnw that is either dev-games/libnw or sci-biology/newick-utils.

It might be of interest to note here that this became increasingly annoying with the various MTAs we have in Gentoo: ever since the mailwrapper idea was discarded, you could only get one of the different packages providing MTAs installed at the same time, as they all provided the sendmail command. This was relatively easy to deal with in the past because even if you added a new MTA to the tree, it would only have to register itself as virtual/mta and you had no blocker to update. With the introduction of the new-style virtual/mta package, the blocker needs to be fully-expanded in each and every MTA ebuild, which has become a bit of a burden. On the other hand, the new-style virtual also allowed some more fine-grained control, so that both msmtp and ssmtp are conflicting with other MTAs only if their mta USE flag is enabled.

Again, unfortunately the problem isn’t solved simply by blockers. Another problem happens when you have conflicting requests for USE flags. For instance, thunderbird-bin requires curl with the nss USE flag enabled, as that’s the Mozilla library implementing secure transports. At the same time, dev-libs/libcaldav wants curl with GnuTLS instead. And this is without adding collisions to the mess that make curl so much a pain in the back to deal with.

Some of the requirements are also pretty arbitrary, and usually stem from an overuse of “same flag” dependencies. So for instance if you got a program that can optionally use IPv6, it makes sense for it to depend on its backend library with IPv6 as well… but is the opposite true? If the program shouldn’t support IPv6, would the backend library still be able to have IPv6 support enabled? When the answer is yes, then you have a [ipv6?] dependency, while if they need to be kept in sync, you have a [ipv6=] dependency. Unfortunately I’ve seen it more than once that the latter form is used in places where the former was needed.

Hopefully, at some point, this could really be fixed once and for all.. even if it might end up requiring multiple compilation of the same library, for instance to have three different libcurl implementations with the different secure layer implementations, kinda like we have to build the same Ruby extension three or four times to install it for multiple implementations.

PSA: Packages failing to install with new, OpenRC-based stages: missing users and groups

This month Gentoo finally marked stable baselayout2 and OpenRC, which is an outstanding accomplishment, even though it happens quite late in the game, given that OpenRC exists for a few years by now. Since now these packages are stable, they are also used to build the new stages that are provided for installing new copies of Gentoo.

This has had an unforeseen (but not totally unexpected) problem with users and groups handling, since the new version of baselayout dropped a few users and groups that were previously defined by default, in light of its more BSD-compatible nature. Unfortunately, some of these users and groups were referenced by ebuilds, like in the case of Asterisk, that set its user as part of both the asterisk and dialout groups — the latter is no longer part of the default set of users created by baselayout, so installing Asterisk before last on a new system created from the OpenRC-based stage would have failed.

Okay so this is a screw-up and one that we should fix it as soon as possible, but why did it happen in the first place? There are two things to consider here: the root cause of the problem and why it wasn’t caught before this happened. I’d start with the second one in my opinion.

When testing OpenRC, we all came to the conclusion that it worked fine. Not only my computers, but even my vservers, my customer’s servers, and most of the developers’ production boxes have been running OpenRC for years. I even stopped caring about providing non-OpenRC-compatible init scripts at some point. Why did none of us hit this problem before? Being a rolling distribution, our main testing process does not involve making a new install altogether: you upgrade to OpenRC and judge whether it works or not.

Turns out this is not such a great idea for what concern critical system packages (we have seen issues with Python before as well): when upgrading from Baselayout 1 to Baselayout 2, not all files are replaced; users and group added by baselayout 1 are kept around, which makes it impossible to identify this class of issues. We should probably document more stringent stable marking process for system components, and work with releng to find a way to test a stage so that it actually boots up with a given kernel and configuration (KVM should help a lot there).

As for the root cause of the problem, we have been fighting with this issue since I became a dev, and that’s why there is GLEP27 which is supposed to take care of managing users and groups and assigning them global IDs. Unfortunately this is one of those GLEPs that were defined, but never implemented.

To be honest there has been work on the issue, which was also funded by the Google Summer of Code program, but the end results didn’t make it to Gentoo, but rather to another project (which is why I always have doubts about Gentoo’s waste of GSoC funding).

So until we have a properly-implemented GLEP27, which is nothing glamorous, nothing that newcomers seem to feel like tackling, we’re just dancing around a huge number of problems with handling of users and groups, that is not going to get easier with time, at all.

What is my plan here? I’ll probably find some time tonight or so to set up a tinderbox that uses the OpenRC-based stage, and see what might not work out of the box; unfortunately even that is not going to be a complete solution: if two ebuilds use the same group, and they are independent one from the other, it is well possible that the group is added by one and not the other, so whether they install correctly depends on the order of installation. Which is simply a bad thing to have and a difficult to test for.

In the mean time, please do report any package that fails to build with the new stages. Thank you!

Mister 9% — Why I like keeping bugs open.

Last week, Kevin noted that Gentoo’s Bugzilla – recently redesigned, thanks to all the people involved! – has thousands of open bugs; in particular as of today, we have over 17000 non-resolved bugs. Of those, 1500 have been reported by me, which is about 9% of the total.

For those wondering why there is this disparity, the main reason is that I still report manually (and that means that I use my account) the bugs found by my tinderbox and thus they add up. Together with that, you have to count the fact that I’m one of those people who like to keep the bugs open until they are fixed; this is why I keep insisting that if you add a workaround to a package, you should be keeping a bug open so that we know that the problem is not fixed yet, which happens unfortunately quite often for what concerns parallel build failures.

So how do you shorten the list of open bugs without losing data?

It is obviously natural for people joining a team to maintain all the general packages, or those picking up a new package to go through the bugs open for the package and try to solve/close those. And I think that is the one process that we should go through right now; the problem is that you cannot simply solve the issues with “WONTFIX” as some people proposed: the bug need to be fixed. You can close the bug with NEEDINFO if the user didn’t provide enough information (it happens with my bugs as well when I forget to attach a build log), or with TEST-REQUEST if the bug is no longer reproducible on latest version — if you do know that the bug is solved, FIXED is a good idea as well.

A number of trivial bugs are also open for areas of Gentoo that aren’t much followed at all nowadays: this includes for instance most of the mail-related packages, some easier to solve than others. Most of the time they simply require a new, dedicated maintainer that uses the software: you cannot really maintain a package you stopped using, and I speak by experience.

Most of the bugs opened at least by the tinderbox also fall into the 5 minutes fix category that I so much loathe: they look so trivial that people are prone to simply ignore need for action on them, but when you do look into them you end up with a very messed up ebuild, which makes you spend hours to get it right — when they don’t uncover that the upstream code or build system is totally messed up to the point that it would have to be rewritten to work properly.

Unfortunately, simply ignoring the “nitpicks” QA bugs is not a good idea either: more often than not, presence of these bugs show that the package is stale, and the fact that they are relatively easy to solve only goes to show that the package hasn’t been cared for too long. Closing them out of lacking interests is also not going to solve anything at all; it’ll just mean that they’ll be hidden, and re-opened next time the tinderbox or an user merged them.

So rather than closing the bugs that look abandoned, waiting for other people to report them again, the solution should be to fix them. Closing them will just ensure that they’ll be reported once again, increasing the size of the Bugzilla database, which is a more serious problem than just inflating a counter.

So if the bugs look too many… proxy-maintain new packages!