Video: unpaper with Meson — pytesting our way out

Another couple of hours spent looking at porting Unpaper to Meson, this time working on porting the tests from the horrible mix of Automake, shell, and C to a more manageable Python testing framework.

I’ll write up a more complex debrief of this series of porting videos, as there’s plenty to unpack out of all of them, and some might be good feedback for the Meson folks to see if there’s any chance to make a few things easier — or at least make it easy to find the right solution.

Also, you can see a new avatar in the corner to make the videos easier to recognize 😀 — the art is from the awesome Tamtamdi, commissioned by my wife as a birthday present last year. It’s the best present ever, and it seemed a perfect fit for the streams.

And as I said at the end, the opening getting ready photo stars Pesto from Our Super Adventure, and you probably saw it already when I posted about Sarah and Stef’s awesome comics.

As a reminder, I have been trying to stream a couple of hours of Free Software every so often on my Twitch channel — and then archiving these on YouTube. If you’re interested in being notified about these happening, I’m usually announcing them with a few hours to spare (rarely more than that due to logistics) on Twitter or Facebook.

When the tests are more important than the functions

I’ve got to admit that the whole “Test-Driven Development” hype was not something that appealed to me as much — not because I think tests are wrong, I just think that while tests are important, focusing almost exclusively on them is just as dumb as ignoring them altogether.

In the Ruby world, there is so much talking about tests, that it’s still very common to find gems that don’t work at all with newer Ruby versions, but their specs pass just fine. Or even the tests pass fine for the original author, but they will fail on everyone else’s system because they depend heavily on custom configuration — sometimes, they depend on case-insensitive filesystems because the gem was developed on Windows or Mac, and never tested on Linux. Indeed, for the longest time, Rails own tests failed to work at all chances, and the “vendored” code they brought in, never had a working testsuite. Things have improved nowadays but not significantly.

Indeed, RubyGems do not make it easy to perform testing upon install, which means that many gems distributed lack part of the testsuite altogether — sometimes this is an explicit choice; in the case of my own RubyElf gem the tests are not distributed because they grow and grow, and they are quite a bit of megabytes at this point; if you want to run them you fetch the equivalent snapshot from GitHub — the ebuild in Gentoo uses that as a basis for that reason.

Sometimes even gems coming from people who are considered nearly Ruby Gods, like rails_autolink by tenderlove end up with a gem that fails tests, badly, in its release — the version we have in Portage is patched up, and the patch is sent upstream. Only the best for our users.

Now unfortunately, as I noted in the post’s title, some projects care more about the tests than the functionality — the project in question is the very same Typo that I use for this blog, and which I already mused forking to implement fixes that are not important for upstream. Maybe I should have done that already, maybe I will do that.

So I sent a batch of changes and fixes to upstream, some of them fixing issues compelled by their own changes, other implementing changes to allow proper usage of Typo over SSL vhosts (yes my blog is now available over SSL — I have to fix a few links and object load paths in some of the older posts, but it will soon work fine), other again simply making it a bit more “SEO”-friendly, since that seems to be a big deal for the developers.

What kind of response do I get about the changes? “They fail spec” — no matter that the one commit I’m first told it breaks specs actually fix editing of blog post after a change that went straight to master, so it might break specs, but it solve a real life issue that makes the software quite obnoxious. So why did I not check specs?

group :development, :test do
  gem 'thin'
  gem 'factory_girl', '~> 3.5'
  gem 'webrat'
  gem 'rspec-rails', '~> 2.12.0'
  gem 'simplecov', :require => false
  gem 'pry-rails'
end

I have no intention to start looking into this whole set of gems just to be able to run the specs for a blog which I find are vastly messed up. Why do I think so? Well, among other reasons, I’ve been told before quite a few times that they wouldn’t ever pass on PostgreSQL — which happens to be the database that has been powering this very instance for the past eight years. I’m pretty sure it’s working good enough!

Well, after asking (a few times) for the specs output — turns out that most of the specs broken are actually those that hardcode http:// in the URLs. Of course they break! My changes use protocol-relative URIs which means that the output changes to use // — no spec is present that tries to validate the output for SSL-delivered blogs which would otherwise break before.

And what is the upstream’s response to my changes? “It breaks here and there, would you mind looking into it?” Nope! “The commit breaks specs.” — No offer (until I complained loudly enough on IRC) for them to look into it, and fix either the patches or the specs are needed. No suggestion that there might be something to change in the specs.

Not even a cherry-pick of the patches that do not break specs.

Indeed as of this writing, even the first patch in the series, the only one that I really would care about get merged, because I don’t want to get out-of-sync with master’s database, at least until I decide to just get to the fork, is still there lingering, even if there is no way in this world that it breaks specs as it introduces new code altogether.

Am I going to submit a new set of commits with at least the visible specs’ failures fixed? Not sure — I really could care more about it, since right now my blog is working, it has the feature, the only one missing being the user agent forwarding to Akismet. I don’t see friendliness coming from upstream, and I keep thinking that a fork might be the best option at this point, especially when, suggesting the use of Themes for Rails to replace the currently theme handling, so that it works properly with the assets pipeline (one of the best features of Rails 3), the answer was “it’s not in our targets” — well it would be in mine, if I had the time! Mostly because being able to use SCSS would make it easier to share the stylesheets with my website (even though I’m considering getting rid of my website altogether).

So my plead to the rest of the development community, which I hope can be considered part of, is to not be so myopic that you care more about tests passing than features working. For sure Windows didn’t reach its popularity level being completely crash-proof — and at the same time I’m pretty sure that they did at least a level of basic testing on it. The trick is always in the compromise, not on the absolute care or negligence for tests.

On test verbosity (A Ruby Rant, but not only)

Testing of Ruby packages is nowadays mostly a recurring joke to the point I didn’t feel I could add much more to my last post — but there is something I can write about, which is not limited to Ruby at all!

Let’s take a step back and see what’s going on. In Ruby you have multiple test harnesses available, although the most commonly used are Test::Unit (which was part of Ruby 1.8 and is available as a gem in 1.9) and RSpec (version 1 before, version 2 now). The two of them have two very different defaults for test output: the former uses, by default, a quiet “progress” style output (one dot per test passed), the latter uses a verbose “documentation” style. But both of them can use the other style as well.

Now, we used to just run tests through rake, using the Rakefile coming with the gem, or the tarball, or the git snapshot, to run tests and build documentation, since both harnesses have very tight integrations with it — but since the introduction of Bundler, this starts to get more troublesome than just writing the test commands out in the ebuild explicitly.

Today I went to bump a package (origin, required for the new Mongoid), and since I didn’t wire in the tests last time (I was waiting for a last minute fix), I decided to do so this time. It should be of note that while, when using rake to run the tests, you’re almost entirely left to the upstream’s preference on what style to use for the tests, it’s tremendously easy to override that in the ebuild itself.

The obvious question is then “which of the two styles should ebuild use for packaged gems?” — The answer is a bit complex, which is why this post came up. Please also note that this does not only apply to Ruby packages, it should be considered for all packages where the tests can be silent or verbose.

On a default usage, you only care about whether the tests fail or pass — if they fail, they’ll give you as much detail as you need, most of the times at least, so the quiet (progress) style is a very good choice: it’s less output, which means smaller logs, less context, and it’s a win-win. For the tinderbox, though, I found out that having more verbose output is useful: to make sure that other tests are not being skipped, to make sure that the build does not get stuck, and if it does, where it got stuck.

Reconciling these two requirements is actually easier than one might think at first because it’s an already solved problem: both perl.eclass and, lately, cmake-utils.eclass support one variable, TEST_VERBOSE which can be set to make the tests output more details while running. So right now origin is the first Ruby ebuild supporting that variable, but talking about this with Hans we came to the conclusion we need a wrapper to take care of that.

So what I’m working on right now is a few changes to the Ruby eclasses so that we support TEST_VERBOSE and NOCOLOR (while the new analysis actually strips out most of the escape codes it’s easier if it doesn’t have to do that!) with something such as ruby-ng-rspec call (which can also check whether rspec is properly in the dependencies!). Furthermore I plan on changing ruby-fakegem.eclass so that it also allows to switch between the current default test function (which uses rake), one that uses RSpec (adding the dependency!), and one that uses Test::Unit. Possibly, after this, we might want to do the same thing with the API documentation building, although that might be trickier — in general though having the same style of API docs installed, instead of using each project’s own, might be a good idea.

So if your package runs tests, please try supporting TEST_VERBOSE, and if it can optionally use colors, make sure it supports NOCOLOR as well, thanks!

Autotools Mythbuster: On parallel testing

A “For A Parallel World” crossover!

Since now the tinderbox is actually running pretty good and the logs are getting through just fine, I’ve decided to spend some more time expanding the Autotools Mythbuster guide with more content, in particular in areas such as porting for automake 1.12 (and 1.13).

One issue though which I’ll have to discuss in that guide soon, and for which I’m posting already right now, is parallel testing, because it’s something that is not really well known, and is something that, at least for Gentoo, involves the EAPI=5 discussion.

Build systems using automake have a default target for testing purposes called check. This target is designed to build and execute testcases, in a pretty much transparent way. Usually this involves two main variables: check_PROGRAMS and TESTS. The former defines the binaries to build for the testcases, the latter which testcases to run.

This is counter-intuitive and might actually sound silly, but in some cases you want to build test programs as binaries, but call scripts instead to compare them. This is often the case when you test a library, as you want to actually compare the output of a test program with the known-good output.

Now, up to automake 1.12, if you run make -j16 check, what is parallelized is only the building of the binaries and targets; you can for instance make use of this with check_DATA to preprocess some source files (I do that for unpaper which only ships in the repository the original PNG files of the test data), but if your tests take time, and you have little stuff that needs to be built, then running make -j16 check is not going to be a big win. This added with the chance that the tests might just not work in parallel is why the default up to now in Gentoo is to run the tests in series.

But that’s why recent automake introduced the parallel-tests option, which is actually going to be the default starting from 1.13. In this configuration, the tests are executed by a driver script, which launches multiple copies of them at once, and then proceeds with receiving the results. Note that this is just an alternative default test harness, and Automake actually supports custom harnesses as well, which may or may not be run in parallel.

Anyway, this is something that I’ll have to write about in more details in my guide — please be patient. In the mean time you can see unpaper as an example, as I just updated the git tree to make the best use of the parallel tests harness (it actually saved me some code).

Do-s and Don’t-s of Ruby testing

This article is brought to you by some small time I had free to write something down. I’m clearing my own stuff out of the way in Italy, before going back to Los Angeles at the end of the month.

I’ve been working on updating some Gentoo Ruby ebuilds lately, and I start to feel the need to scream at some of the Gems developers out there… to be more constructive, I’ll try to phrase them this way.

Do provide tests in your gem file, or in a tarball

You’d be surprised how many times we have to rely on GIT snapshots in Gentoo simply because the gem file itself lacks the test files; even worse is when you have the test units, but these rely on data that is not there at all. This usually is due to the test data being too big to fit in a quick gem.

Interestingly enough there used to be a gem command parameter that let it run the tests for the gem at install time. I can’t seem to find it any more — this is probably why people don’t want to add big tests to their gem files, what’s the point? Especially given most people won’t be using them at all?

It is true that even my Ruby-Elf also doesn’t ship neither test units nor test data in the gem, but that’s why I still release a tarball together with the gem file itself. And yes I do release it not just tag it on GitHub.

Do tag your releases in GitHub!

Especially if your tests are too big or boring to fit into the gem file, please do tag your release versions in GitHub; this way we already have a workaround for the lack of testing: we’ll just fetch the tarball from there and use it for building our fake gem installation.

Do not mandate presence of analysis tools.

Tools such as simplecov, or ruby-debug, or lately guard, seems to be all the rage with Ruby developers. They are indeed very useful, but please do not mandate their presence, even if only in development mode.

While using these tools is dramatically important if you plan on doing real development on the gem, we don’t care about how much coverage is the gem’s tests having: if the tests pass we know our environment should be sane enough; if they fails we have a problem. That’s enough for us, and we end up having to fight with stuff like simplecov on a per-package basis most of the time.

Do not require test dependencies to build API docs (and vice versa)

This is unfortunately tied with the very simplistic dependency handling of RubyGems. When you look up a Gem’s dependencies you only see “runtime” and “development” — it says nothing about what’s needed to build the docs versus what’s required to run the tests.

Since testing and documentation are usually both handled through a Rakefile, and often enough, especially for complex testing instrumentations, the Rakefile uses the task files provided by the testing framework to declare the targets, it’s hard to even run rake -D on a gem that doesn’t have the testing framework installed.

Now in Gentoo you might want to install the API documentation of your gem but not run the tests, or vice-versa you might want to run the tests but don’t care about API documentation. This is a pain in the neck if the Rakefile requires all the development dependencies to be present.

This together with the previous point ties down to the Bundler simplistic tracking — there is no way to avoid having installed the gems in Gentoo even if they are properly grouped and you “exclude” them. Bundler will try resolving the dependencies anyway, and that will defeat the purpose of using Gentoo’s own dependency tracking.

Do not rely on system services.

This is actually something that happens with non-Ruby packages as well. Many interface libraries that implement access to services such as PostgreSQL, or MongoDB, will try connecting to the system instance during their test routine. This might be okay when you’re the developer, but if you’re doing this on a continuous basis, you really don’t want to do that: the service might not be there, or the service might just not be setup how you expect it to be.

Please provide (and document!) environment variables that can be set to change the default connection strings to the services. This way we can set up a testing instance when running the tests so that you have your full own environment. Even better if you’re able to write setup/teardown scripts that run unprivileged and respect $TMPDIR.

Do have mocks available.

Possibly due to the high usage for Rails and other webapp/webservice technologies based on Rails, there are gems out there for almost every possible third-party webservice. Unfortunately testing these against the full-blown webservice requires private access keys; most gems I’ve seen allow you to specify new keys that are used during the testing phase — it’s a step in the right direction, but it’s not enough.

While by allowing the environment to declare the access keys allows you to set up testing within Gentoo as well, if you care about one particular package, it’s not going to let everybody run their tests automatically. What can you do then? Simple: use mocks. There are tools and libraries out there designed to let you simulate a stateless or partially-stateful webservice so that you can try out your own method to see if they behave.

One important note here: the mocks don’t strictly have to be written in Ruby! While gems don’t allow you to specify non-Ruby dependencies, distributions can do that as well, so if you rely on a mock that is written in Python, C or Java, you can always document it. It’s still going to be easier than getting an access key and using that.

Even better, if your webservice is friendly to open source (which one isn’t nowadays? okay don’t answer), ask them to either provide an “official mock”, or create some standard access keys to be used for sandbox testing, in some kind of sandbox environment.

Do verify the gems you rely upon.

Especially for tests, it’s easy to forget to make sure that the software you rely upon works. Unfortunately, I’ve seen more than a couple of times that the gems used for testing do not work as expected themselves, causing a domino effect.

This is part of the usual “make sure that the software you rely upon really works”, but seems like while most people do verify their own direct runtime dependencies, they more often fail to verify those used during testing.

Of course this list is not complete, and there probably are more things that I didn’t think about, but it’s enough of a rant for now.

Bundler and JRuby — A Ruby Rant

This post is supposed to be auto-posted on Saturday, March 3rd, since I’m testing the new Typo 6 — I used to write posts in advance and let them be posted, but then the support broke, I hope the new version supports it again. If not, it might be a good excuse to fork Typo altogether.

So after a quick poll on Twitter it seems like people are not really disliking my series of Ruby rants, so I think I’ll continue writing a bit more about them. Let me start with making it clear that I’m writing this with the intent of showing the shortcoming of a development environment – and a developers circle – that has closed itself up so much lately that everything is good as long as some other Ruby developer did it before. This is clearly shown on some of the posts defending bundler in the reddit comments about one of my previous posts.

For what concerns Bundler, I don’t think I’m blaming the wrong technology. Sure, Bundler has simplified quite a bit the handling of dependencies and that’s true for Gentoo developers as well (it’s still better than the stupid bundling done by the Rails components in 2.3 series), but the problem here is that it’s way too prone to abuse, especially due to the use of “one and only one” dependencies. I could show you exactly what you mean by simply pointing out that the bourne I already complained about depends on a version of mocha (0.10.4) that is broken on Ruby 1.9 (neither 0.10.3 nor 0.10.5 suffer from this problem, and bourne works fine with either, but they still depend on the only broken one because that was out when they released bourne-1.1.0), but I’ll show you exactly how screwed the system is with a different issue altogether.

Can you vouch that Bundler works on JRuby just as well as it works on MRI 1.8 and 1.9?

If you just answered “Yes”, you have a problem: you’re overconfident that widely used Ruby gems work just because others use them. And this kind of self-validation is not going to let you understand why my work (and Hans’s, and Alex’s) is important.

Let’s take a step back again; one of the proclaimed Ruby strengths is always considered to be the fact that most developers tie long and complex tests in their code so that the code is behaving as expected when changing dependencies, implementation and operating systems. While this can build confidence on one’s code, the sheer presence of these tests is not going to be enough, as many times the developer is the only one running them, and he or she might not have many different operating systems to try it on (multiple implementations are somewhat solved by using RVM and similar projects). Of course it is true that Travis is making it more accessible to try your code on as many different configurations as possible, but it’s still no magic wand.

So there are tests in the libraries, and that’s good — tests are also supposed to be installed by RubyGems, and it should be possible to run them through the gem package manager itself (although I can’t find an option to do that right now), but sometimes you end up with one file missing here or there; in some cases it’s datafiles that the tests use (rather than the test files themselves), or it’s a configuration file that is required for them to run — lately, the nasty and ironic part is that it’s the Gemfile file that is missing from the .gem packaging, making it impossible to run the tests from where, especially when dependencies are only listed there.

An aside, for those of you who might want to find me at fault; I only own a single RubyGems package: Ruby-Elf and it does not come with tests. The reason is that the test data itself is quite big, and most of the Gem users wouldn’t care about the tests anyway (especially since I couldn’t find how to run them). For this reason I decided that the .gem package is only used for execution, and not for packaging. The tarball is always released and tagged so you should use that for packaging; which is exactly what dev-ruby/ruby-elf does.

Okay now you’re probably confused: what has it to do with JRuby that some gems lack a Gemfile (which is used by bundler)? Nothing really, I was just trying to give you an idea of how tricky it can be for us to make sure that the gems we package actually work. I’m now going back to Bundler.

So Bundler comes with its own set of tests — and most importantly it’s designed to be tested against different versions of RubyGems library which is very important for them since they hook deep into the RubyGems API — to the point one wonders if it wouldn’t make more sense to merge Bundler’s dependency rigging and RubyGems loading code in one library, and the bundle and gem commands on a package manager tool project. This is good although we might not want to test it with a number of RubyGems libraries but just the one we have installed. Different target, same code to leverage though.

Anyway, Bundler’s tests work fine in Gentoo for both Ruby 1.8, 1.9 and Ruby Enterprise… but they can’t work with JRuby, and it’s not just a matter of Gentoo packaging. The problem is that Bundler’s Rakefile insists on running tests related to the manual and documentation as well. It’s not too bad, indeed the bundle documentation is installed as man pages, even though these man pages are not installed in the standard man directory, so while they are displayed on bundle --help they are not available to man bundle. I still find it nice: it’s not too different from what I’ve done with Ruby-Elf, where the tools’ help is only available as man pages (but the Gentoo packaging make them available to man as well).

These man pages are generated through a tool that could probably interest me as well — the current man pages for Ruby-Elf are generated through DocBook, which does make them more well organized, but is still one hefty dependency. Anyway the tool is ronn which is a Ruby gem that builds man pages out of Markdown-written text files. The problem is that rather than leaving it possible for the man pages to be generated beforehand, the Rakefile wants to make sure ronn can be called by the current Ruby implementation.

Which should be easy: since ronn is written in Ruby, just make it possible to install it for JRuby. It would be easy indeed if not for the library it uses for Markdown parsing and output generation; it’s not the usual BlueCloth that we’re mostly used to, instead it uses rdiscount which is another (faster?) implementation of Markdown, which is based once again on a C native extension. Being a C extension, that is not available to JRuby, which means that you can’t use ronn with JRuby as it is now.

Since both BlueCloth and rdiscount are based on C extensions, one would wonder how do you make it possible for JRuby to handle Markdown? The answer comes with another gem, which I have learnt about while bumping Radius (a support gem for Radiant), and is called kramdown and is a pure-Ruby, yet fast, implementation of Markdown. For basic usage it’s identical to use to BlueCloth to the point that my monster switched this week from BlueCloth to kramdown with just two lines changes (one to the Gemfile the other to the code).

So I reported to Charles that Bundler can’t have its tests running with Bundler, and he obviously suggested me to poke the Bundler developers to see if they can help out to fix this up. Unfortunately the suggestion is something like you can patch it up but we won’t do the work for you — which is understandable to a point.

The problem with this is that this means nobody has ever tested Bundler on JRuby; if you’re using it there, then you’re assuming it works because everyone else is using it, but there might well be a completely unknown, nasty bug that is just waiting to eat one user’s work on a corner case.

If you do care about JRuby, one very nice thing you could do here is getting in touch with ronn’s upstream, and convert it to use kramdown; then it should be possible to have a fully functional Bundler, relying solely on JRuby. Until that’s done, Gentoo’s JRuby will lack Bundler, and that also means it’ll lack a long list of other software.

One’s DoS fix is one’s test breach (A Ruby Rant)

What do you know, it did indeed become a series!

Even though I’m leaving for Los Angeles again on Monday, March 5th, I’m still trying to package up the few dependencies that I’m missing to be able to get Radiant 1.0 in tree. This became more important to me now that Radiant 1.0.0 has finally been released, even though it might become easier for me to just target 1.1.0 at some later point, given that it would then use Rails 3. Together with that I’m trying to find what I need to get Typo 6 running to update this blog’s engine under the hood.

Anyway one of the dependencies I needed to package for Radiant 1.0. was delocalize which is an interesting library that allows dealing with localise date and time formats (which is something I had to hack together for a project myself, so I’m quite interested in it). Unfortunately, the version I committed to the tree is not the right one I should have packaged, which would have been 0.2.6 or so — since Radiant 1.0 is still on Rails 2.3. Unfortunately there is not a git tag for 0.2.6 so I haven’t packaged that one yet (and the .gem file only has some of the needed test files, not all of them — it’s the usual issue: why do you package the test files in the .gem file if you do not package the test data?), but I’m also trying to find why the code is failing on Ruby 1.9.

Upstream is collaborative and responsive, but as many others is now more focused on Ruby 1.9, rather than 1.8. What does this mean? Well, one of the recent changes to most of the programming languages out there involves changing the way keys are mangled in hash-based storage objects (such as Ruby’s Hash); this has changed the way Ruby 1.8 reported the content of those objects (Ruby 1.9 has ordered hashes which means that they don’t appear to have changed), which caused headaches for both me and Hans for quite a while.

This shows just one of the many reasons why we might have to step in on the throttle and try to get Ruby 1.9 stable and usable soon, possibly even before Ruby 1.8 falls out of security updates, which is scheduled for June — which was what I promised in the FOSDEM talk and I’m striving to make sure we deliver.

EAPI=4, missing tests, and emake diagnostics

Some time ago, well, quite a bit of time ago, I added the following snippet to the tinderbox’s bashrc file:

make() {
    if [[ "${FUNCNAME[1]}" == "einstall" ]] ; then
        emake -j1 "$@"
    else
        eqawarn "Tinderbox QA Notice: 'make' called by ${FUNCNAME[1]}"
        emake "$@"
    fi
}

I’m afraid I forgot whose suggestion that was, but it was an user, Edit: thanks to the anonymous contributor who actually remembered the original author of the code. And sorry Kevin, my memory starts to feel like it’s an old flash.

Kevin Pyle suggested me this code to find packages that ignored parallel build simply by using the straight make invocation over the correct emake one. It has served me well, and it has shown me a relatively long list of packages that should have been fixed, only a handful of which actually failed with parallel make. Most of the issues were due to to simple omission, rather than wilful disregard for parallel compilation.

One of the problems that this has shown is that, by default, all the make calls within src_test() default functions are not running in parallel. I guess the problem is that most people assume that tests are too fragile to run in parallel, but at least with (proper) autotools this is a mistake: the check_PROGRAMS and other check-related targets are rebuilt before running tests, and that can be done with parallel make just fine. Only starting with automake 1.11 you have the option to execute tests in simultaneously, otherwise the generated Makefile would still call them one by one, no matter how many make jobs you requested. This is why most of my ebuilds rewrite the src_test() function simply to call emake check.

This is the set-up situation. Now to get to the topic of the post… I’ve been seeing in the tinderbox some strange output, when looking at the compilation output, that I couldn’t find in the logs either. Said output simply told me that emake failed, but there was no obvious failure output from it, and the package merged fine. What was the problem? Well, it is a bit complex.

The default src_test() function contains a bit of logic to identify what the Makefile’s test target is, so that it can be used both with autotools (the default target to run tests with an automake-generated Makefile is check) and with otherwise-provided Makefile that likely use test as their testing target. This has unfortunate implications already, especially for packages whose maintainers didn’t test with FEATURES=test enabled:

  • if the package has a check target that doesn’t run tests, but rather checks whether the installed copy is in working state, then you have a broken src_test(); qmail-related packages are used to this, and almost all the ebuilds for said packages never restricted or blocked tests;
  • if the package does not have any specific test or check target, but it’s using make’s default rules, it could be trying to build a test command out of a non-existing test.c source file
  • in any case, it requires make to parse and load the Makefile which could be a bit taxing, depending on the package’s build system.

To this you can now add one further issue, when you join the above-quoted snippet from the tinderbox’s bashrc and the new EAPI=4 semantics for which ebuild helpers die by default on failure: if there is no Makefile at all in $S then you get an emake failure, which of course wouldn’t be saved in the logs but would still be repeated at the end of merge just for the sake of completeness. Okay, it’s just a nuisance and not a real issue, but it still baffled me.

So what is my suggestion here? Well, it’s something I’ve been applying myself independently of all this: declare your src_test explicitly. This means not only add it with the expected interface even if tests are present, but actually null it out if there are no tests: this way no overhead is added when building with FEATURES=test (which is what I do in the tinderbox). To do so, rather than using RESTRICT=test (which at least I read as a “tests are supposed to work but they don’t”), do something like:

src_test() { :; } # no tests present

_I usually do the same for each function in virtual packages._

I really wish that more of my fellows started doing the same, the tinderbox would probably have a much lower overhead when not running tests, which seems to be an unfortunate scenario, as most automake-based packages will try a pointless, empty make check each time they are merged, even if no test at all is present.

I’ll follow up with more test-related issues for the tinderbox in the next weeks, if time permits. Remember that the tinderbox is on flattr if you wish to show your appreciation… and I definitely could use some help with the hardware maintenance (I just had to replace the system’s mirrored disks, as one WD RE3 died on me).

Testing stable; stable testing

It might not be obvious but my tinderbox is testing the “unstable” ~x86 keyword of Gentoo. This choice was originally due to the kind of testing (the latest and greatest version of packages for which we don’t know the averse effects of), and I think it has helped tremendously, up to now, to catch things that could have otherwise bit people months, if not years later, especially for the least-used packages. Unfortunately that decision also meant ignoring for the most part the other, now more common architecture (amd64 — although I wonder if the new, experimental amd32 architecture would take its place), as well as ignoring the stable branch of Gentoo, which is luckily tracked by other people from time to time.

Lacking continuous testing though, what is actually considered stable, sometimes it’s not very much so. This problem is further increased by the fact that sometimes the stable requests aren’t proper by themselves: it can be an user asking to stable a package, and the arch teams being called before they should have, it might be an overseen issue that the maintainer didn’t think of, or it might simply be that multiple maintainers had different feelings about stabilisation, which happens pretty often in team-maintained packages.

Whatever the case, once a stable request has been filed, it is quite rare that issues are brought up that are severe enough the stable is denied: it might be that the current stable is in a much worse shape, or maybe it’s a security stable request, or a required dependency of something following one of those destinies. I find this a bit unfortunate, I’m always happy when issues are brought up that delay a stable, even if that sometimes means having to skip a whole generation of a package, just as long as they mean having a nicer stable package.

Testing a package is obviously a pretty taxing task: we have numbers of combination of USE flags, compiler and linker flags, features, and so on so forth. For some packages, such as PHP, testing every and each combination of USE flags is not only infeasible, but just impossible within this universe’s time. to make the work of the arch team developers more bearable, years ago, starting with the AMD64 team, the concept of Arch Testers was invented: so-called “power users” whose task is to test the package and give green light for stable-marking.

At first, all of the ATs were as technically capable as a full-fledged developer, to the point that most of the original ones (me included) “graduated” to full devship. With time this requirement seems to have relaxed, probably because AMD64 became usable to the average user, and no longer just to those with enough knowledge to workaround the landmines originally present when trying to use a 64-bit system as a desktop — I still remember seeing Firefox fail to build because of integer and pointer variable swaps, once that became an error-worthy mistake in GCC 3.4. Unfortunately this also meant that a number of non-apparent issues became even less apparent.

Most of these issues often end up being caused by one simple fault: lack of direction in testing. For most packages, and that includes, unfortunately, a lot of my packages, the ebuild’s selftests are nowhere near comprehensive enough to tell whether the package works or not, and unless the tester actually uses the package, there is little chance that he knows really how to test it. Sure that covers most of the common desktop packages and a huge number of server packages, but that’s far from the perfect solution since it’s the less-common packages that require more eyes on the issues.

What the problem is, with most other software, is that it might require specific hardware, software or configuration to actually be tested. And most of the time, it requires a few procedures to be applied to ensure that it is actually tested properly. At the same time, I only know of Java and Emacs team publishing proper testing procedures for at least a subset of their packages. Most packages could use such a documentation, but it’s not something that maintainers, me included, are used to work on. I think that one of the most important tasks here, is to remind developers when asking stable to come up with a testing plan. Maybe after asking a few times, we’ll get to work and write up that documentation, once and for all.

A “new” Tinderbox

While I’m still hoping for somebody to fund the PAM audit and fixup (remember: if you rely on PAM on your Gentoo systems, you really want for somebody to do the work!), and even though I have to reduce the tinderbox costs , I got some pretty cool news for many of you out there.

Up to now, the tinderbox have been running over the unstable/testing visibility for x86. The new tinderbox, which I simply called tinderbox64, uses instead the unstable/testing for amd64.

The new tinderbox is now testing ~amd64 rather than ~x86.

Why did I decide to go this route? Well, while the 64-bit builds require more space and time, I thought a bit about it, and even the stuff I introduce does not get keyworded ~x86 right away; it’s ignoring tests on my own stuff! Beside, with even my router moving to 64-bit to give the best with hardened, I start to think x86 is not really relevant for anything, nowadays.

It’s not all there of course; there are a number of issues that only appear on 64-bit (well, there are almost as many that only appear on 32-bit but for now let me focus on those): integer and buffer overflows, implicit function declarations that truncate pointers to integer, 64-bit unsafety that makes packages fail to build… All these conditions are more relevant because 64-bit is what you should be using on most modern systems, so they should be the ones tested, even more than ~x86.

Now of course it would be better to have both tinderboxes running, and I think I could get the two of them to run in parallel, but then I’d need a new “frontend” system, one I could use both for storage and for virtual machine hosting; probably something a little more beefy that the laptop I’m using, mostly in term of RAM, would be quite nice (the i7 performs quite nicely, but 4GB of RAM is just too little to play with KVM). But even if I could afford to buy a new frontend now (I cannot), it would still be a higher cost on a monthly basis in power. Right now I can roughly estimate that between power, and the maintenance costs (harddisks, UPSes, network connection), running the tinderbox is costing me between €150 and €200/month, which is not something I can easily afford, especially considering that last year, net of taxes and most expenses, I had an income of €500/month to pay for groceries and food. Whoopsie. And this is obviously without including the time I’m spending for manually review the results, or fixing them.

Anyway, expect another flood of bugs once the tinderbox gets again up to speed; for now, it might find a few more problems that previously it ignored, since it started building from scratch. And while the 32-bit filesystem is frozen, I’ll probably find some time to run again the collision-detection script that is part of Ruby-ELF that is supposed to find possible collisions between libraries and similar, which is something that is particularly important to take into consideration as those bugs tend to be the most complex to debug.