When the tests are more important than the functions

I’ve got to admit that the whole “Test-Driven Development” hype was not something that appealed to me as much — not because I think tests are wrong, I just think that while tests are important, focusing almost exclusively on them is just as dumb as ignoring them altogether.

In the Ruby world, there is so much talking about tests, that it’s still very common to find gems that don’t work at all with newer Ruby versions, but their specs pass just fine. Or even the tests pass fine for the original author, but they will fail on everyone else’s system because they depend heavily on custom configuration — sometimes, they depend on case-insensitive filesystems because the gem was developed on Windows or Mac, and never tested on Linux. Indeed, for the longest time, Rails own tests failed to work at all chances, and the “vendored” code they brought in, never had a working testsuite. Things have improved nowadays but not significantly.

Indeed, RubyGems do not make it easy to perform testing upon install, which means that many gems distributed lack part of the testsuite altogether — sometimes this is an explicit choice; in the case of my own RubyElf gem the tests are not distributed because they grow and grow, and they are quite a bit of megabytes at this point; if you want to run them you fetch the equivalent snapshot from GitHub — the ebuild in Gentoo uses that as a basis for that reason.

Sometimes even gems coming from people who are considered nearly Ruby Gods, like “rails_autolink”https://rubygems.org/gems/rails_autolink by tenderlove end up with a gem that fails tests, badly, in its release — the version we have in Portage is patched up, and the patch is sent upstream. Only the best for our users.

Now unfortunately, as I noted in the post’s title, some projects care more about the tests than the functionality — the project in question is the very same Typo that I use for this blog, and which I already mused forking to implement fixes that are not important for upstream. Maybe I should have done that already, maybe I will do that.

So I sent a batch of changes and fixes to upstream, some of them fixing issues compelled by their own changes, other implementing changes to allow proper usage of Typo over SSL vhosts (yes my blog is now available over SSL — I have to fix a few links and object load paths in some of the older posts, but it will soon work fine), other again simply making it a bit more “SEO”-friendly, since that seems to be a big deal for the developers.

What kind of response do I get about the changes? “They fail spec” — no matter that the one commit I’m first told it breaks specs actually fix editing of blog post after a change that went straight to master, so it might break specs, but it solve a real life issue that makes the software quite obnoxious. So why did I not check specs?

group :development, :test do
  gem 'thin'
  gem 'factory_girl', '~> 3.5'
  gem 'webrat'
  gem 'rspec-rails', '~> 2.12.0'
  gem 'simplecov', :require => false
  gem 'pry-rails'
end

I have no intention to start looking into this whole set of gems just to be able to run the specs for a blog which I find are vastly messed up. Why do I think so? Well, among other reasons, I’ve been told before quite a few times that they wouldn’t ever pass on PostgreSQL — which happens to be the database that has been powering this very instance for the past eight years. I’m pretty sure it’s working good enough!

Well, after asking (a few times) for the specs output — turns out that most of the specs broken are actually those that hardcode http:// in the URLs. Of course they break! My changes use protocol-relative URIs which means that the output changes to use // — no spec is present that tries to validate the output for SSL-delivered blogs which would otherwise break before.

And what is the upstream’s response to my changes? “It breaks here and there, would you mind looking into it?” Nope! “The commit breaks specs.” — No offer (until I complained loudly enough on IRC) for them to look into it, and fix either the patches or the specs are needed. No suggestion that there might be something to change in the specs.

Not even a cherry-pick of the patches that do not break specs.

Indeed as of this writing, even the first patch in the series, the only one that I really would care about get merged, because I don’t want to get out-of-sync with master’s database, at least until I decide to just get to the fork, is still there lingering, even if there is no way in this world that it breaks specs as it introduces new code altogether.

Am I going to submit a new set of commits with at least the visible specs’ failures fixed? Not sure — I really could care more about it, since right now my blog is working, it has the feature, the only one missing being the user agent forwarding to Akismet. I don’t see friendliness coming from upstream, and I keep thinking that a fork might be the best option at this point, especially when, suggesting the use of Themes for Rails to replace the currently theme handling, so that it works properly with the assets pipeline (one of the best features of Rails 3), the answer was “it’s not in our targets” — well it would be in mine, if I had the time! Mostly because being able to use SCSS would make it easier to share the stylesheets with my website (even though I’m considering getting rid of my website altogether).

So my plead to the rest of the development community, which I hope can be considered part of, is to not be so myopic that you care more about tests passing than features working. For sure Windows didn’t reach its popularity level being completely crash-proof — and at the same time I’m pretty sure that they did at least a level of basic testing on it. The trick is always in the compromise, not on the absolute care or negligence for tests.

Do-s and Don’t-s of Ruby testing

This article is brought to you by some small time I had free to write something down. I’m clearing my own stuff out of the way in Italy, before going back to Los Angeles at the end of the month.

I’ve been working on updating some Gentoo Ruby ebuilds lately, and I start to feel the need to scream at some of the Gems developers out there… to be more constructive, I’ll try to phrase them this way.

Do provide tests in your gem file, or in a tarball

You’d be surprised how many times we have to rely on GIT snapshots in Gentoo simply because the gem file itself lacks the test files; even worse is when you have the test units, but these rely on data that is not there at all. This usually is due to the test data being too big to fit in a quick gem.

Interestingly enough there used to be a gem command parameter that let it run the tests for the gem at install time. I can’t seem to find it any more — this is probably why people don’t want to add big tests to their gem files, what’s the point? Especially given most people won’t be using them at all?

It is true that even my Ruby-Elf also doesn’t ship neither test units nor test data in the gem, but that’s why I still release a tarball together with the gem file itself. And yes I do release it not just tag it on GitHub.

Do tag your releases in GitHub!

Especially if your tests are too big or boring to fit into the gem file, please do tag your release versions in GitHub; this way we already have a workaround for the lack of testing: we’ll just fetch the tarball from there and use it for building our fake gem installation.

Do not mandate presence of analysis tools.

Tools such as simplecov, or ruby-debug, or lately guard, seems to be all the rage with Ruby developers. They are indeed very useful, but please do not mandate their presence, even if only in development mode.

While using these tools is dramatically important if you plan on doing real development on the gem, we don’t care about how much coverage is the gem’s tests having: if the tests pass we know our environment should be sane enough; if they fails we have a problem. That’s enough for us, and we end up having to fight with stuff like simplecov on a per-package basis most of the time.

Do not require test dependencies to build API docs (and vice versa)

This is unfortunately tied with the very simplistic dependency handling of RubyGems. When you look up a Gem’s dependencies you only see “runtime” and “development” — it says nothing about what’s needed to build the docs versus what’s required to run the tests.

Since testing and documentation are usually both handled through a Rakefile, and often enough, especially for complex testing instrumentations, the Rakefile uses the task files provided by the testing framework to declare the targets, it’s hard to even run rake -D on a gem that doesn’t have the testing framework installed.

Now in Gentoo you might want to install the API documentation of your gem but not run the tests, or vice-versa you might want to run the tests but don’t care about API documentation. This is a pain in the neck if the Rakefile requires all the development dependencies to be present.

This together with the previous point ties down to the Bundler simplistic tracking — there is no way to avoid having installed the gems in Gentoo even if they are properly grouped and you “exclude” them. Bundler will try resolving the dependencies anyway, and that will defeat the purpose of using Gentoo’s own dependency tracking.

Do not rely on system services.

This is actually something that happens with non-Ruby packages as well. Many interface libraries that implement access to services such as PostgreSQL, or MongoDB, will try connecting to the system instance during their test routine. This might be okay when you’re the developer, but if you’re doing this on a continuous basis, you really don’t want to do that: the service might not be there, or the service might just not be setup how you expect it to be.

Please provide (and document!) environment variables that can be set to change the default connection strings to the services. This way we can set up a testing instance when running the tests so that you have your full own environment. Even better if you’re able to write setup/teardown scripts that run unprivileged and respect $TMPDIR.

Do have mocks available.

Possibly due to the high usage for Rails and other webapp/webservice technologies based on Rails, there are gems out there for almost every possible third-party webservice. Unfortunately testing these against the full-blown webservice requires private access keys; most gems I’ve seen allow you to specify new keys that are used during the testing phase — it’s a step in the right direction, but it’s not enough.

While by allowing the environment to declare the access keys allows you to set up testing within Gentoo as well, if you care about one particular package, it’s not going to let everybody run their tests automatically. What can you do then? Simple: use mocks. There are tools and libraries out there designed to let you simulate a stateless or partially-stateful webservice so that you can try out your own method to see if they behave.

One important note here: the mocks don’t strictly have to be written in Ruby! While gems don’t allow you to specify non-Ruby dependencies, distributions can do that as well, so if you rely on a mock that is written in Python, C or Java, you can always document it. It’s still going to be easier than getting an access key and using that.

Even better, if your webservice is friendly to open source (which one isn’t nowadays? okay don’t answer), ask them to either provide an “official mock”, or create some standard access keys to be used for sandbox testing, in some kind of sandbox environment.

Do verify the gems you rely upon.

Especially for tests, it’s easy to forget to make sure that the software you rely upon works. Unfortunately, I’ve seen more than a couple of times that the gems used for testing do not work as expected themselves, causing a domino effect.

This is part of the usual “make sure that the software you rely upon really works”, but seems like while most people do verify their own direct runtime dependencies, they more often fail to verify those used during testing.

Of course this list is not complete, and there probably are more things that I didn’t think about, but it’s enough of a rant for now.

Ruby-NG: We’ll Always Have an Implementation

While we’re definitely not yet done with translating ebuilds to the Ruby-NG eclasses (ruby-ng proper and ruby-fakegem have been joined by ruby-ng-gnome2 that Hans implemented to replace the older ruby-gnome2, that used the old eclasses), we’re coming to a point where the amount of knowledge we’re amassing from dealing with common troubles in Ruby packages can be very important to other developers. I have widely ranted about the amount of trouble we’re going through for tests to work properly and stuff on those lines, but as I’ve been noted, I haven’t given much information on how to solve the problems. I’ve decided to try solving that now.

I have to say first of all that running tests is definitely quite important to find eventual latent issues, so we’re trying our bests to run them whenever it’s possible, also thanks to the help of arch team members, starting from Brent (rangerpb) that has spent the past two days keywording ~ppc64 the Ruby packages that had dropped keywords with the port to the new eclasses (because of newly introduced dependency). Not only he reported some missing dependencies (when you already have a long list of gems installed, it gets difficult to identify eventual other needed gems, especially when they are not listed in the gemspec, as I already noted), but also found one very nasty issue in RedCloth for PPC and PPC64 systems (where char is unsigned by default), thanks to the testsuite that can, now, be executed (on the other hand, 4.2.2 with the old system was already keyworded, and yet it couldn’t work properly, so that shows how it is easy to ignore problems when you don’t have testsuites to run).

Anyway, first in a list of possible Ruby solutions to problem we’re hitting, I’d like to propose the problem of the varying implementation of a testsuite. While there are many different frameworks for running tests on Ruby, that is not what I’m trying to get your attention drawn to. Instead, think of the way you can actually call the (for the sake of argument) test-unit testsuites from a Rakefile.

The rough way consists of just requiring all the files with a given name inside the test directory; this is what the original fakefs gem does:

$LOAD_PATH.unshift File.join(File.dirname(__FILE__), 'test')
desc "Run tests"
task :test do
  Dir['test/**/*_test.rb'].each { |file| require file }
end

This does not seem to work properly with either test-unit-2 installed for Ruby 1.8, or with JRuby 1.4.0 (at least). I guess this relies on the ability, for the test runner, to automatically pick up the test objects.

The nasty way is to call the testrb command, that invokes the test runner “properly”. I say it this way because the problem here is that a Rakefile that invokes testrb will ignore the currently-running Ruby version, and will fire up the default Ruby implementation selected (I reported of a similar problem with specrb before):

# Flameeyes's note: this code comes from amatch, so it's very real.

desc "Run unit tests"
task :test => :compile_ext do
  sh %{testrb -Iext:lib tests/test_*.rb}
end

The correct way is even easier! Just rely on rake to do the right thing for you if you use it properly!

require 'rake/testtask'
Rake::TestTask.new do |t|
  t.libs << [ "ext", "lib", "tests" ]
  t.test_files = FileList["tests/test_*.rb"]
end

In this case, rake will take care of setting up Ruby exactly as it’s needed, testing using the same version of Ruby that has started the Rakefile in the first place. And it standardises a lot more the Rakefile.

I’m going to send this change to amatch exactly as I did for fakefs, but there are most likely more cases of that out there. If you find one, please fix it, so that we’re not going to hit more bumps down the road. Thanks!

*And for those interested, my GitHub page lists some of the packages for which I submitted patches and fixes, some are related to packaging entirely, others are Ruby 1.9 or JRuby related. You can see that I don’t stop at ranting but I actually write code to tackle the issues.*

Unit testing frameworks

For a series of reason, I’m interested in writing unit tests for a few project, one of which will probably be my 1.3 branch of xine-lib (Yes I know 1.2 hasn’t been released yet in beta form either).

While unit tests can be written quite easily without much framework, having at least a basic framework would help to make automated testing possible. Since the project I have to deal with use standard C, I started looking at some basic unit test frameworks.

Unfortunately, it seems like both CUnit and check have last seen a release in 2006, and their respective repositories seem quite calm. In case of CUnit I also have noticed a quite broken buildsystem.

Glib seems to have some basic support for test units, but even they don’t use it so I doubt it’d be a nice choice. There are quite a few unit testing frameworks for particular environments, like Qt or Gnome, but I haven’t found anything generic.

It seems funky that even if people always seems to cheer test-driven development there isn’t a good enough framework for C. I think Ruby, Java, Perl and Python have already their well established frameworks, and most of the software use that, but there is neither a standard nor a widely accepted framework for C.

I could probably write my own framework but that’s not really an option, I don’t have so much free time to my hands, I suppose the less effort would be to contribute to one of the frameworks already out there, so that I can fix whatever I need fixed and have it working as I need. Unfortunately, I’d have to start looking at all of them and find the less problematic before I start doing this, and it is not useful if the original authors have gone MIA or similar, especially since at least CUnit was still developed using CVS.

If somebody have suggestions on how to proceed, they are most welcome. I could probably fork one of them if I have to, although I dislike the idea. Out of what I gathered quite briefly, the presence of XML generation for results in CUnit might be useful to gather up tests’ statistics for an automatic testing facility.