Restarting a tinderbox

So after my post about glibc 2.17 we got the ebuild in tree, and I’m now re-calibrating the ~amd64 tinderbox to use it. This sounds like an easy task but it really isn’t so. The main problem is that with the new C library you want to make sure to start afresh: no pre-compiled dependencies should be in, or they won’t be found: you want the highest coverage as possible, and that takes some work.

So how do you re-calibrate the tinderbox? First off you stop the build, and then you have to clean it up. The cleanup sometimes is as easy as emerge --depclean — but in some cases, like this time, the Ruby packages’ dependencies are causing a bit of a stir, so I had to remove them altogether with qlist -I dev-ruby virtual/ruby dev-lang/ruby | xargs emerge -C after which the depclean command actually starts working.

Of course it’s not a two minutes command like on any other system, especially when going through the “Checking for lib consumers” step — the tinderbox has a 181G of data in its partition (a good deal of which is old logs which I should actually delete at this point — and no that won’t delete the logs in the reported bugs, as those are stored on s3!), without counting the distfiles (which are shared with its host).

In this situation, if there were automagic dependencies on system/world packages, it would actually bail out and I’d have to go manually clean them up. Luckily for me, there’s no problem today, but I have had this kind of problem before. This is actually one of the reasons why I want to keep the world set in the tinderbox as small as possible — right now it consists basically of: portage-utils, gentoolkit (for revdep-rebuild), java-dep-check, Python 2.7 (it’s an old thing, it might be droppable now, not sure), and netcat6 for sending the logs back to the analysis script. I would have liked to remove netcat6 from the list but last time the busybox nc implementation didn’t work as expected with IPv6.

The unmerge step should be straightforward, but unfortunately it seems to be causing more grief than it’s expected, in many cases. What happens is that Portage has special handling for symlinked directories — and after we migrated to use /run instead of /var/run all the packages that have not been migrated to not using keepdir on it, ebuild-side, will spend much more time at unmerge stage to make sure nothing gets broken. This is why we have a tracker bug and I’ve been reporting ebuilds creating the directory, rather than just packages that do not re-create it on the init script. Also, this is when I thank I decided to get rid of XFS as the file deletion there was just way too slow.

Even though Portage takes care of verifying the link-time dependencies, I’ve noticed that sometimes things are broken nonetheless, so depending on what one’s target is, it might be a good idea to just run revdep-rebuild to make sure that the system is consistent. In this case I’m not going to waste the time, as I’ll be rebuilding the whole system in the next step, after glibc gets updated. This way we’re sure that we’re running with a stable base. If packages are broken at this level, we’re in quite the pinch, but it’s not a huge deal.

Even though I’m keeping my world file to the minimum, the world and system set is quite huge, when you add up all the dependencies. The main reason is that the tinderbox enables lots and lots of flags – as I want to test most code – so things like gtk is brought in (by GCC, nonetheless), and the cascade effect can be quite nasty. The system rebuild can easily take a day or two. Thankfully, the design of the tinderbox scripts make it so that the logs are send through the bashrc file, and not through the tinderbox harness itself, which means that even if I get failures at this stage, I’ll get a log for them in the usual place.

After this is completed, it’s finally possible to resume the tinderbox building, and hopefully then some things will work more as intended — like for instance I might be able to get a PHP to work again… and I’ll probably change the tinderbox harness to try building things without USE=doc, if they fail, as too many packages right now fail with it enabled or, as Michael Mol pointed out, because there are circular dependencies.

So expect me working on the tinderbox for the next couple of days, and then start reporting bugs against glibc-2.17, the tracker for which I opened already, even though it’s empty at the time of writing.

A worldly opponent

Yes I’m writing again about Ruby after a very long hiatus… it feels like there’s a pattern behind this it seems.

After my recent changes, Hans approached me today with two issues: the first is that I should stop adding virtual/ruby-test-unit dependencies on ebuilds since (finally!) Ruby 1.9 gained a completely compatible test-unit implementation, which means you can rely on packages to behave correctly without adding extra dependencies, and old, stale, and darcs-mantained gems to keep around — hopefully, at least. The other issue is more troublesome, and relates to bug #380409 where users are being forced to get Ruby Enterprise installed in their system.

Why would this happen? Well, as you can see in that bug, once the version of ruby-rdoc that references Ruby Enterprise (this applies to the other virtuals as well, though) was marked stable, Portage wanted to update it, causing the new implementation to be requested by the dependency tree. You can easily guess that we didn’t have in mind this behaviour when we decided to go with the virtuals. So what’s going on?

When a package, such as redmine (which I fixed yesterday to do this) requires the virtual/ruby-ssl ebuild (which relates to a Ruby implementation that includes the ssl extension, an USE flag for the C-based implementations, but a separate package for JRuby), the expanded dependencies appear like this: ruby_targets_ruby18? ( virtual/ruby-ssl[ruby_targets_ruby18] ) … since each one of those virtuals lives in a different slot, and only supports one target, you get requested one ebuild for each target you requested … and the target definition on those is forced on.

Since the dependencies resolve to a straight slot, Portage wouldn’t try to “update” the virtual to get the highest-version one .. which would then bring in a different Ruby implementation altogether. So why is bug #380409 there? As Hans points out the problem has to be that the virtuals are added to the world set: being in the world set (unbound from a slot) means that Portage will try to get the highest-version of the package installed, no matter which slot it belongs to. Incidentally, this is also why you got Python 3 installed on every system even though you didn’t ask for it, unless you mask it explicitly.

Handling of world set’s content is fishy at its best; when you install a package with a straight emerge command, you’re “selecting” the package, and that adds it to the world set; if you want to avoid that, you have to do so with emerge -1 .. and most people forget about that. This is probably why the bug entered on users’ systems.

So how do you clean up the situation so that it behaves as expected instead? Well, it’s actually easier than you’d expect:

qlist -IC virtual/ruby | xargs emerge --deselect

This removes the packages from the world file, but keeps them installed, so that you won’t be asked to merge them again later. And it handles all the Ruby virtuals at once, so even if in your case the problem was with rubygems, this will fix it.

Sharing distfiles directories, some lesser known tricks

I was curious to see why there was much noise coming from the Gentoo Wiki LXC page to my blog lately, so I decided to peek at the documentation there once again, slightly more thoroughly; while there are a number of things that could likely be simplified (but unless I’m paid to do so I’m not going to work on documenting LXC until it’s stable enough upstream; as it is it might just be a waste of time since they change the format of configuration files every other release), there is one thing that really baffled me:

If you want to share distfiles from your host, you might want to look at unionfs, which lets you mount the files from your host’s /usr/portage/distfiles/ read-only whilst enabling read-write within the guest.

While unionfs is the kind of feature that goes quite well with LXC, for something as simple as that is quite an overkill. On the other hand I remembered that I also had to ask about this to Zac, so it might be a good idea to document this a moment.

Portage already allows you to share a read-only distfiles directory.

You probably remember that I originally started using Kerberos because I wanted some safe way to share a few directories between Yamato and the other boxes. One of the common situations I had was the problem of sharing the distfiles, since quite a few are big enough that I’d rather not download them twice. Well, even with that implemented, I couldn’t get Portage to properly access them, so I looked for alternative ways. And there is a variable that solves everything: PORTAGE_RO_DISTDIRS.

How does that work? Simply mount some path such as /var/cache/portage/distfiles/remote as the remote distfiles directory, then set PORTAGE_RO_DISTDIRS to that value. If the required files are in the read-only directory, and not in the read-write one, Portage will symlink them and ensure access on a stable path. Depending on your setup, you might want to export/import the directory as NFS, or – in case of LXC – you might want to set it up as a read-only bind mount. Whatever you choose, you don’t need kernel support for unionfs to use a read-only distfiles directory. Okay?

Not all failures are caused equal

Ryan seem to think of me of inconsistent for promoting Portage failures when _FORTIFY_SOUCE points out a sure fortification error. The related bug is getting more useful now than it was when it was started, but shows not only some misunderstanding of my position regarding breakages and QA in general.

So first of all, let me repeat (because I said that before) that I prefer build-time failures to run-time failures. Why? It’s not difficult to understand, when you think about it, it covers at least my three different use cases:

  • I have two remote vservers running Gentoo; I cannot build on them (and neither should you!), but I keep a local chroot environment where I pre-build binary packages for the software I need; I update the environment once daily, and if it gets special security concerns; when I do that, I build the packages and upload them to the servers, which can take quite a bit of time as my upload is around 40 KiB/s;
  • also, I connect my whole house (my computers, my mother’s, and a bunch of various devices and appliances; plus the guests’ devices) via not a standard SoHo router, but a much more flexible Gentoo box, running basically “embedded” on a Celeron board with only SSH and serial console access; I update the “firmware” of the system once in a blue moon, mostly for security concerns, by flashing a CF card that is connected to the EIDE bus; also here I use a chroot environment to build the packages;
  • finally, I have a laptop, also running Gentoo; this one is updated more irregularly, because there are days I don’t even touch it; I also try not to build stuff when I’m on the run because I might not have enough bandwidth to download the sources (and while connected to my local network I access them directly).

In all these three cases, I’m bothered by build failures, but they aren’t much of a problem; if I’m building something in a pinch is a nuisance, in other cases, upgrade or new build, I usually have time to look up what is broken before it gets a real problem. If any package is failing at runtime, though, it’s much worse than a nuisance. If Apache dies, then I have to downgrade and go rebuild another one quickly; if the router’s SSH daemon crashes, I have to go downstairs with the laptop and access the serial console; and if at that moment the picocom tool I use to access the serial console doesn’t start, then I’m simply going to cry.

So with this situation at hand, can you blame me for preferring stricter compilers and build managers (don’t forget that Portage and its cousins are not just package managers à-la APT, but also build managers à-la FreeBSD Ports) that actually disallows installation of broken code? This is the same reason why we added --no-undefined to Ruby — rather than hiding the problems under the hood and leaving the car to melt down on failure, we notice the leak beforehand, and disallow it from ever leaving the garage!

As Mike pointed out and I exemplified on the bug linked above, even if the code is, at a bird’s eye, perfectly fine, because it relies on a number of assumptions about how the compiler and the C libraries work, if the compiler is reporting a sure failure, it is going to fail. This is the same reason why, with GCC 4.5, we had a runtime breakage of GNU tar, as it tried to fill two adjacent strings in a structure with a single copy, rather than two. It wasn’t going to create an actual overflow problem, but it was stepping over the final character; not only the compiler warned about it but… the C library aborted at runtime!

Now you could argue that it means that they are not discerning between hacky-but-good solutions to improve performance, and security-vulnerable code. But in truth, you have to take a standing at some point, and it makes total sense to actually be as safe as possible, especially in today’s world where security should be that much of a concern and performances can be easily improved with faster hardware.

This should cover my reasons to be in favour of dying when the code provides a path where the C library is going to abort at runtime (and it’s not properly controller). What about my actual criticising the unmasking of glibc and other software before, when they caused failures? Well it is also two-folded. From one side, glibc 2.12 does not only causes build-time failures; as I explained there are enough indication that software could abort at runtime when an header is missing in the sources; but whatever the problem is, let’s look at it from a need perspective; what is the reason for this failure to be introduced? Simply to clean up the code, and make it faster to build (less headers are less files to load and parse), so you force users to include what they need, rather than include a long chain of dependent headers, it’s a laudable target, but mostly one of optimisations and enhancement.

On the other hand, fortification features are used to increase security by mitigating possible vulnerabilities in the software design. This wide difference in the root reason for the breakage alone is what make them different in my personal assessing of the problem. And because of this, I think there is a case for Portage aborting right away in ~arch for packages with known overflows, even though I maintain that glibc 2.12 and similar problems shouldn’t have been left loose on the users. The former does not make it more broken, it makes it safer!

And tomorrow, “Is fixing implicit declarations for me?”

Are we done with LDFLAGS?

Quite some weeks ago, Markos stated asking for a more thorough testing of ebuilds to check whether they support LDFLAGS; he then described why that is important which can be summarised in “if LDFLAGS are ignored, setting --as-needed by default is pointless”. He’s right on this of course, but there are a few more tricks to consider.

You might remember that I described an alternative approach to --as-needed that I call “forced --as-needed”, by editing the GCC specifications. The idea is that by forcing the flag in through this method, packages ignoring LDFLAGS and packages using unpatched libtool won’t simply ignore it. This was one of the things I had in mind when I suggested that approach, but alas, as it happens, I wasn’t listened to.

Right now there are 768 bugs reported blocking the tracker of which 404 are open and, in the total, 635 were reported by me (mostly through the tinderbox but not limited to). And I’m still opening bugs on a daily basis for those.

But let’s say that these are all the problems there are, and that in two weeks time no package is reporting ignored LDFLAGS at all. Will that mean that all packages will properly build with --as-needed at that point? Not really.

Unfortunately, LDFLAGS-ignored warnings have both false positives (prebuild binary packages, packages linking with non-GNU linkers) and false partially-negatives. To understand that you have to understand how the current warning works. A few binutils versions ago, a new option was introduced, called --hash-style; this option is used to change the way the symbol table is hashed for faster loading by the runtime linker/loader (ld.so); the classic hashing algorithm (SysV) is not excessively good for very long, similar symbols that are way too common when using C++, so there has been some work to replace that with a better GNU-specific algorithm. I’ve followed most of the related development closely at the time, since Michael Meeks (of OpenOffice.org fame) actually came asking Gentoo for some help testing things out; it was that work that actually got me interested in linkers and the ELF format.

At any rate, while the original hash table was created in the .hash section, the new hash table, being incompatible, is in .gnu.hash. The original patch only replaced one with the other, but what actually landed in binutils was slightly different (it allows to choose between SysV, GNU, or both hash tables), and the default has been to enable both hash tables, so that older systems can load .hash and the new ones can load .gnu.hash; win win. On the other hand, on Gentoo (and many other distributions) where excessively old GLIBC versions are not supported at all, there is enough of a reason to use --hash-style=gnu to disable the generation of the SysV table entirely.

Now, the Portage warning is derived by this situation: if you add -Wl,--hash-style=gnu to your LDFLAGS, it will be checking the generated ELF files and warn if it finds the SysV .hash section. Unfortunately this does not work for non-Linux profiles (as far as I know, FreeBSD does not support the GNU hashing style yet; uClibc does), and will obviously report all the prebuilt binaries coming from proprietary packages. In those cases, you don’t want to strip .hash because it might be the only hash table preventing ld.so from doing a linear search.

So, what is the problem with this test? Well, let’s note one big difference between --as-needed and --hash-style flags: the former is positional, the second is not. That means that --as-needed, to work as intended, needs to be added before the actual libraries are listed, while --hash-style can come at any point in the command line. Unfortunately this means that if any package has a linking line such as

$(CC) $(LIBS) $(OBJECTS) $(LDFLAGS)

It won’t be hitting the LDFLAGS warning from Portage, but (basic) --as-needed would fail — OTOH, my forced --as-needed method would work just fine. So there is definitely enough work to do here for the next… years?

Gentoo needs you! A few things that would definitely be useful

After my ranting about QA – which brought back our lead from hiding, at least for a while, let’s see how he fares this time – a number of people asked me what they could do to help QA improve. Given that we want both to test and prevent, here comes a list of desirable features and requests that could obviously be helpful:

  • as usual, providing for another tinderbox would be a good thing as more situations could be tested; right now I’d very much like to have a server powerful enough to run a parallel tinderbox set up to test against stable-tree amd64 keyword, rather than the current unstable x86;
  • repoman right now does not warn about network errors in HOMEPAGE or SRC_URI; see this bug as for why; it would be a good thing to have;
  • again repoman, lacking network checks, lack a way to make sure that the herd specified in metadata.xml exists at all; it’s pretty important to do that since it happened before that herds weren’t updated at all;
  • there is also no way to ensure that metadata.xml lists maintainers that have accounts in Bugzilla; with proxy-maintainership sometime it happens that either the email address is not properly updated (proxier fail) or the maintainer does not have an account on Bugzilla at all (which is against the rules!); unfortunately this requires login-access to bugzilla, and is thus a no-no right now;
  • the files/ subdirectories are still a problem for the tree, especially since they are synced down to all users and should, thus, be kept slim, if not for the critical packages; as it turns out, between Samba and PostgreSQL they use over half a megabyte for files that a lot of people wouldn’t be using at all; we also had trouble with people not understanding that the 20KB size limit on files does not mean you can split it in two 10KB files, nor compress it to keep it in tree (beside the fact we’re using CVS, there is a separate patch repository for that; Robin is working on providing a reliable patch tarball hosting as well); see bug #274853 and bug #274868
  • moving on to bugzilla, there is currently no way to display inline on the browser logs that are compressed with gzip, that the new Portage can generate and make the tinderbox work much easier; this makes bug reports a bit more complex to manage; if somebody knows how to get Bugzilla to provide proper headers so that the browsers can read them, it’d be pure magic;
  • right now we lack a decent way to test for automagic dependencies; you could make use of something similar to my oneliner but it won’t consider transitive dependencies on virtual packages;
  • with split-unsplit-split-unsplit cycles, it happens that a number of packages are in tree and are now obsoleted – this is why I had to remove a number of geda ebuilds the other day – removing these packages not only reduces the pressure on the tree itself, but also allows the tinderbox to run much more smoothly without me having to mask those packages manually: when they are visible, the tinderbox might end up trying to merge them by removing the (newer) un-split ebuild, or vice-versa;
  • one more Portage (non-repoman!) feature that would be nice: per-package features or at least per-package maintainer mode while Portage has a number of features to make sure that maintainers don’t commit crap (testsuite support; die-on-some-warnings; fix-or-die) – such as the stricter feature – but using them system-wide is unfeasible (I’m okay with fixing my own stuff, but I don’t want an unusable system because some other developer didn’t fix his).

The five-minutes-fix myth

One of the complains I often get for my QA work (which I have to say is vocally not appreciated even by the other devs), is that I could just go on and fix the issues I find rather than opening but for them because “it just takes five minutes from bug to commit” to fix the problem.

No it does not take five minutes to fix something, I can assure you!

Of course people will continue to say that it just takes a few minutes to find the problem and come up with a patch; the problem is that most of the time, for the kind of bugs I report myself, to fix them properly takes much, much more.

Most of the time that some developer decides that some single problem does not warrant to remove a package, even if it doesn’t have anybody looking after it, the same packages re-enter my radar at the next round of tinderboxing, or in the best of cases, a few months later.

The problem is, when a package has a maintainer, that maintainer is supposed to keep the package in line with the development policies; if you don’t maintain a package but commit a five-minutes-fix you’re not ensuring it complies with the policy but you’re fixing the single symptom I reported, which most likely is just the one that I hit first.

When I (and I’m not boasting here, but just stating how it is) fix a package, I do things like removing older versions, making sure it complies with all the policies, check all the bugs the package has open, check for things that there aren’t bugs for, like CFLAGS and LDFLAGS being respected, the files not being pre-stripped, features that might be missing, version bumps requirement, correct dependencies, and so on so forth. It’s not a five-minutes work! It’s often a multiple hours work instead!

What is upsetting me, to return to the fact that Gentoo’s QA problem is with developers is that some developers think it’s fine to remove a package from the QA removal queue just because they fixed the last bug I filed. I’m sincerely considering the idea of making it policy that if you wish to save a package from QA you got to fix all the problems with it, and take maintainership of it if it breaks again. And for those ignoring these, we should definitely enforce some kind of penalty in form of not being able to “save” a package from removal any longer.

From scratch: about my tinderbox

I know that quite a bit of people are still unsure on what a tinderbox is, on what it does, and especially on why they should care enough that I’m bothering them every other day about it on my blog (and by extension on Planet Gentoo and the Gentoo.org homepage which syndicate it). So I’m now trying to write a long-breathed article that hopefully will cover all the important details.

What is “a” tinderbox

The term tinderbox, as reported by Wikipedia has been extended to the general use starting from the software, developed by Mozilla, designed to perform continuous integration over a given software project. This doesn’t really help that much because the term “continuous integration“ might still not say anything to users. So I’ll try to explain that from scratch as well.

Gentoo users, especially those running the unstable branch, are unfortunately used to breakage caused by build failures after a library update or other environment changes like that. This problem is increasingly present also during the development phase of any complex software. It gets even worse because while developers might not see any build failure on their machine, they can introduce “buggy“ code that breaks on others, as not all distributions have the same environment, the same libraries, the same filesystem layout and so on so forth.

And this is just by limiting the scope of our consideration to a single operating system – Linux in this case – and not considering the further problems caused by the presence of other software platforms, such as FreeBSD (which is still a Unix system), Mac OS X, and Windows. This is one of the biggest problems with multi-platform development that make it definitely non-trivial to write software that works correctly between these systems. I could write more about that topic but I’ll leave that to another time.

To make it possible for the developers not to have to manually get their code on multiple machines with various combination of operating systems and versions to ensure that the code does build properly after every change is merged, the smart approach was to invent the concept of continuous integration. To sum it up, think about what I just said – getting the code on multiple machines, with different operating systems, or versions of operating systems, build and eventually execute it – but done by a software rather than having an human doing it.

Mozilla’s software for continuous integration is called Tinderbox and for whatever reason, the name has stuck in Gentoo to refer to the various tries we have had at setting up some continuous integration testing. I’m not sure on the reasons why the name stuck in the Gentoo development team, but so it is, and so I’m keeping it.

Why is it “my” tinderbox

As I just said, Gentoo has had in the past more than one try at making continuous integration a major player in its development model. I think at least four people separately have worked on various “tinderboxes“ in the past five years, both as proper Gentoo projects and as outside, autonomous efforts. So I cannot call it “the Tinderbox”, but rather “my tinderbox”, to qualify which one among those that were and are.

Right now, we have three actively maintainer continuous integration efforts called tinderboxes: my own is the latest one joining the party, before that we already had Patrick’s and before that the Gentoo tinderbox which runs on proper Gentoo infrastructure boxes.

As I already rambled on about this “duplication“ of efforts is not a problem, as the three current systems have different procedures, and different targets. In particular, the “infra“ Tinderbox is mostly there to ensure the packages in the system set work for most major architectures, to provide reverse dependency information, and to make available binary packages useful for system recovery in case of major screw-up. On the other hand, me and Patrick are using our tinderboxes to identify bugs and report them to developers who might have not known about them otherwise.

What my Tinderbox *does*

I’ve written more or less how it works although that was really going in the detail of what I do to have it working. For what concern users, those details are mostly unimportant.

What my Tinderbox does, put it simply, is building the whole Portage tree, at least that’s the theoretical target. Unfortunately, as many noted before, it’s a target that no one box in the world can likely reach in our lifetime, let alone continuously. Continuous integration tools are designed to take away from the developers the time-consuming task of trying their code on different platforms (software or hardware, it doesn’t matter); even though I’m now limiting the scope of them to testing Gentoo, the possible field of variations is limitless.

Not only we support at least two operating systems (Linux and FreeBSD), a number of architectures (over a dozen), two branches (stable and unstable), but for the very essence of Gentoo, each installed system is never identical to another one: USE flags, CFLAGS, package sets, accepted packages from the unstable branches, overlays… Testing all these variables is impossible, so my tinderbox reduces (a lot) the scope, while still trying to be as broadly applicable as possible.

It runs over the unstable x86-keyworded branch of the tree (~x86 keyword), along with some packages that are not available by default and are rather unmasked (mostly toolchain-related). It uses vary bland CFLAGS, forced --as-needed and some common USE flag setup. It runs tests but it doesn’t let their failure stop the merge. It enables USE flag (for dependencies) as needed to have as many packages installed as possible.

The big part of the work is done by a single loop that tries to install, one by one, all the packages that haven’t been tested in the past two and a half months. Portage is left to remove conflicting packages as it finds fit. Sometimes packages are rejected for various reasons (they collide with another package, there is a conflict in the needed USE flag settings between different packages, or they require very old software versions that would in turn cause other problems), so instead of having the full Portage tree available for ~x86, I can only get a subset, the biggest subset I can at least.

The results

Of course, to understand why the Tinderbox is an important project, and why I’m investing so much time on it, and looking for other people to invest part of their time (or resources) to help me, is necessary to look at what the by-products of its execution are.

I’m not making my binpkgs available to the public (like the infra-based tinderbox has), mostly because I’m not allowed to: firstly I don’t enable the bindist USE flag, which means I’m allowing build configurations that are non-redistributable, because of licenses or trademarks, secondly because that’s not my target at all, and they may very well not be functional, in some parts.

The accomplishments coming from my efforts are, generally speaking, bug reports. Not only I scan through the failure logs for those ebuilds that fail to build in this environment, but I also scan for failing test procedures, and other similar quality issues. But again, this is not the result final users are looking for, as bug reports don’t mean anything for them.

Results that are good for users are, instead, the fixes for these bug reports: you can be safe to have documentation installed in the right place where you expect it; you can be safe that if you need to debug a problem, you only need to follow the guide without the build system of the package coming in your way; you can be safe that a package that will report itself as installed is indeed installed and it does work.

While, obviously, there are still problems that my Tinderbox effort won’t be finding anytime soon (for instance the problem of missing dependencies, which is instead the speciality of Patrick’s Tinderbox), and test procedures are rarely that well thought, designed and realised that they don’t leave possible runtime problems to be found, I think that keeping it running, continuously integrating as many ebuilds as it can, is definitely going to be of help for Gentoo’s general quality and stability.

And there is another thing that users are most likely interested in: testing of the future toolchain. As I said before, beside using the ~x86 visible ebuilds, my Tinderbox is also running some masked packages, which include, nowadays, gcc-4.4 and autoconf-2.65. Using these versions, not yet available to the general public of users, allows me to find failure cases related to them before they are made available. This serves the double role of assessing the impact that any given change might have on the users (by knowing how many packages will be broken if the package is unmasked), and correcting those problems before they have the chance of wasting users’ time.

Helping

Continuous integration can’t be said to be a “boring” task, in the sense that every day you might stumble into a new problem, for which you’ll have to devise a solution. On the other hand, it’s an heavy duty, for both the machines performing the job and the people that have to look through the logs. Even the sole task of filing the bugs is time-consuming, and that’s without adding the problems dealing with the human factor which often take you much more time.

Furthermore there are a number of practical issues tied into running this kind of continuous integration: the time needed to build packages conditions the amount of packages they can test over a given time frame, and unfortunately to reduce that time often you have to push for higher end machines (a lot of the work is I/O bound, and not all the software can build or execute tests in parallel), which tend to have high price tags, as well as consuming more power (and thus costing more over the long term) than your average machine.

For this reason both me and Patrick tend to ask for material help: we are taking the costs of these efforts directly.

It has been suggested a number of time to host the Tinderboxes over Gentoo infrastructure hardware, but there have been problems before about this. On the other hand, I think that the way I’m currently proceeding, for my effort at least, could open the possibility for that to happen. The one big issue for which I don’t really feel comfortable with running the tinderbox remotely is easy access to the logs for identifying problems and, especially, to attach them to bug reports. I’m currently thinking how to close that gap but it needs another kind of help in the form of available developers.

And one kind of help that costs very little on users, but can help a lot our work is commenting (and criticising if needed) our design choices, our code, our methods. Constructive criticism of course: telling us that our method sucks and not giving any suggestion on how to make them better is not going to help anybody, and can actually lower our morale quite a bit.

So please, the greatest favour you can do me is continuing to listen my ramblings about QA, tests, Tinderbox and log analysis; and if you are still unsure how something works in this system, feel free to ask.

Coming up with QA tests

For the tinderbox I’ve been adding quite a few tests that are not currently handled by Portage; some of these tests as I said before are taxing, and all of them are linear, serial, which means that even with the huge amount of power Yamato they take quite some time (here the only thing that does help is the memory that allows for more files to be kept in cache).

But even more difficult than implementing the tests is to come up with them. As I said, they are not in Portage for one reason or another: too specific, too taxing, possibly disruptive and so on. Not all of them are my ideas, some are actually suggestions by other developers for particular cases, or by users.

For instance, while I added the tests for /usr/man, /usr/doc and /usr/local directories, the one for the misplaced Perl modules (site dir rather than vendor dir) is something that tove came up with (and I’m filing bugs about that to help him transition site_dir away); and the make “cheat” to call parallel build anyway is something Kevin Pyle came up with.

I came up with the “ELF file in /usr/share” because I had seen Portage stripping some in those path which I knew shouldn’t be there; and from that, I started wondering about the binchecks restriction, so I added the (very slow!) test that ensures that bincheck-restricted packages don’t install ELF files at all (actually it seems like all Linux kernel sources do install some ELF files, I need to track down if it’s a Gentoo problem or if upstream packages them up like that).

The test for bundled libs is very shallow (and leads to false positive on the packages that do provide those libraries of course) but it tries to identify the most common offenders, which is something quite important given that a nasty weakness was found on libtool’s libltdl which is among those most often bundled with software.

The notice on setXid files is something that Robert (rbu) asked me to add so that it would be possible to get a list of setXid executables in Gentoo (just getting them off the livefs of the tinderbox is not possible because you never ever cannot get all packages merged in the same system).

Recently I’ve added one check for libtool .la files so that I can find at least some certainly unhelpful ones (installed with Ruby, Python and Perl extensions: none of those use ltdl to load them so none of those will need the extra file beside the shared object).

If you got any idea for QA tests that you think the tinderbox should be running, you can either mail them to me or leave them as a comment here, just remember that they need to have a fair balance so that I don’t get one out of three to be a false positive, and they need to be efficiently executed over all the packages in the tree.