Ruby-NG: We’ll Always Have an Implementation

While we’re definitely not yet done with translating ebuilds to the Ruby-NG eclasses (ruby-ng proper and ruby-fakegem have been joined by ruby-ng-gnome2 that Hans implemented to replace the older ruby-gnome2, that used the old eclasses), we’re coming to a point where the amount of knowledge we’re amassing from dealing with common troubles in Ruby packages can be very important to other developers. I have widely ranted about the amount of trouble we’re going through for tests to work properly and stuff on those lines, but as I’ve been noted, I haven’t given much information on how to solve the problems. I’ve decided to try solving that now.

I have to say first of all that running tests is definitely quite important to find eventual latent issues, so we’re trying our bests to run them whenever it’s possible, also thanks to the help of arch team members, starting from Brent (rangerpb) that has spent the past two days keywording ~ppc64 the Ruby packages that had dropped keywords with the port to the new eclasses (because of newly introduced dependency). Not only he reported some missing dependencies (when you already have a long list of gems installed, it gets difficult to identify eventual other needed gems, especially when they are not listed in the gemspec, as I already noted), but also found one very nasty issue in RedCloth for PPC and PPC64 systems (where char is unsigned by default), thanks to the testsuite that can, now, be executed (on the other hand, 4.2.2 with the old system was already keyworded, and yet it couldn’t work properly, so that shows how it is easy to ignore problems when you don’t have testsuites to run).

Anyway, first in a list of possible Ruby solutions to problem we’re hitting, I’d like to propose the problem of the varying implementation of a testsuite. While there are many different frameworks for running tests on Ruby, that is not what I’m trying to get your attention drawn to. Instead, think of the way you can actually call the (for the sake of argument) test-unit testsuites from a Rakefile.

The rough way consists of just requiring all the files with a given name inside the test directory; this is what the original fakefs gem does:

$LOAD_PATH.unshift File.join(File.dirname(__FILE__), 'test')
desc "Run tests"
task :test do
  Dir['test/**/*_test.rb'].each { |file| require file }
end

This does not seem to work properly with either test-unit-2 installed for Ruby 1.8, or with JRuby 1.4.0 (at least). I guess this relies on the ability, for the test runner, to automatically pick up the test objects.

The nasty way is to call the testrb command, that invokes the test runner “properly”. I say it this way because the problem here is that a Rakefile that invokes testrb will ignore the currently-running Ruby version, and will fire up the default Ruby implementation selected (I reported of a similar problem with specrb before):

# Flameeyes's note: this code comes from amatch, so it's very real.

desc "Run unit tests"
task :test => :compile_ext do
  sh %{testrb -Iext:lib tests/test_*.rb}
end

The correct way is even easier! Just rely on rake to do the right thing for you if you use it properly!

require 'rake/testtask'
Rake::TestTask.new do |t|
  t.libs << [ "ext", "lib", "tests" ]
  t.test_files = FileList["tests/test_*.rb"]
end

In this case, rake will take care of setting up Ruby exactly as it’s needed, testing using the same version of Ruby that has started the Rakefile in the first place. And it standardises a lot more the Rakefile.

I’m going to send this change to amatch exactly as I did for fakefs, but there are most likely more cases of that out there. If you find one, please fix it, so that we’re not going to hit more bumps down the road. Thanks!

And for those interested, my GitHub page lists some of the packages for which I submitted patches and fixes, some are related to packaging entirely, others are Ruby 1.9 or JRuby related. You can see that I don’t stop at ranting but I actually write code to tackle the issues.

Ruby-NG: Code of Horror

I’ve decided to try basing the titles of the blog posts about Ruby-NG, from now on, on the titles of Star Trek: The Next Generation episodes to have some fun myself when I post this (quite unfun) content. Hope you like the idea.

Even if some people seem to try painting me as an anti-Ruby activist, I’d like to repeat that I like Ruby as a language, and to some extent I like Rails as a language. What I don’t like is the way that a lot of things in the Ruby environment seem to be doing: make it easier to create low-quality code, and make use of it. Don’t get me wrong: there is a lot of good Ruby code out there, and it’s often easily available because of tools like RubyGems, GemCutter, RubyForge and, increasingly, GitHub. The problem is that while these tools make it simple to produce and distribute good code… they also make it too simple to produce and distribute bad code.

I have ranted many times about the problems related to testsuites: while Ruby, and Rails in particular, seems to have been considered by many the apotheosis of Test- and Behaviour-Driven Development (TDD and BDD), and of the Agile development model (which also stress testing at both unit and integration levels, although not as much as the other two), most of the software that gets published for these environments don’t seem to apply it all too much. Even when tests are present, they might fail because of blatant errors in the code itself, or because of wrong assumptions about the standard provided by the language and other libraries the current code makes use of. And even when they are written properly so that these problems don’t arise, they might not be tested to work outside of the development repository of a package, with missing test sources, or missing help files that are needed for the tests to succeed.

Tight integration between different projects of the same developers also tend to be quite a problem when trying to independently verify the tests handling: this is the case for most of the software that is released by the seattlerb project that expects the sources for all the different extensions to be present in the same directory, and unversioned, for some of the testsuites to properly execute. A similar problem arose today with rspec-rails accessing rspec directly in its testsuite; I have to report this upstream tonight.

Another common problem caused by the very simplified distribution and handling provided by RubyGems is the easiness with which many projects break their interface, or with which other projects use interfaces that are not guaranteed to be stable among releases. In the first case, the solution is to hope that all the dependencies are written correctly, and thus that slotting the package will not break anything. For the other case, the problem is much bigger. Projects like those keep on being used because there is an implicit slotting of all packages by default in RubyGems, so if you depend on a particular version, say test-unit = 1.2.3 you generally won’t trigger any suspicious behaviour by RubyGems. But it certainly is frown upon by distributions like Gentoo, to depend on an older major version of a package, whatever package that is.

And again it also doesn’t help that the various Ruby versions and implementations have changed the amount of software bundled with them: Ruby 1.9 dropped test-unit (now available as a standalone Gem, which is not working properly with Ruby 1.8 or with RSpec) and instead added minitest (which is available standalone for Ruby 1.8 and JRuby). RDoc that was once standalone, then it was bundled with Ruby, and again now a new version (RDoc 2) is released as a standalone gem (although hanna works only with an older version as it depends on some internal interfaces that have, since, changed a lot). JRuby comes with its version of rake as well.

Don’t get me started on the various middleware packages used on top of RubyGems either: Hoe, Echoe, Jeweler. Three options to do more or less the same thing (testing gets even worse, as I said above, but I’ll pick it up in a moment); if you’re lucky, the developers know enough to make them optional, and simply omit the gem-publishing tasks from the Rakefile when they are not available (it might be bugged, like it happened on samuel, but it’s easy to fix in that case); in other cases you’re not so lucky so you have to depend on them, for both building the documentation and running the tests, or even building the extension if it’s native code. Up to now Gentoo has been able to ignore the necessity of Jeweler, but we ended up having to keep around Hoe and Echoe at least — they both look more widely (mis)used: I don’t think I have encountered any package that mandates Jeweler, if not for a minor bug.

Returning to the above noted “there are many ways to skin a cat” (I find this phrase icky, but I don’t know of a nicer alternative in English, so if you have suggestion, they are welcome; I’ll also gladly edit the post), testing is another quite not-so-nice situation. While I can understand that different testing frameworks have different targets, and everybody should be able to use whatever tool they are comfortable with, this adds quite a burden over distributors, and generally people wanting to verify the software they depend on. It’s not just a matter of finding which packages are needed for tests (as most packages don’t seem to document that at all, neither in gemspec nor in the documentation), but also to find how to launch them: it might be rake test, rake spec or even rake examples; sometimes, two or more of those options are available, but only one is the right one, so it requires human intervention and similarly happens to the API documentation: rake rdoc, rake redoc, rake doc, rake rdocs, …

I already tried prodding other distributors into authoring a set of guidelines (not rules, but suggestions welcomed by a wide and varied group of developers) for packaging of Ruby extensions. I’ll probably get back to that at some point (for now I have other projects that I should be taking care of). Lacking a mandated interface for RubyGems, we can at least try to write down some suggested interface – like how to call the tasks for testing and for API documentation generation – and maybe convince the various middleware packages named above to implement the equivalent of make distcheck for autotools: package the extension, then extract it in a different path, and run the testsuite; if it fails there is something wrong. Of course this also requires to convince the developers to use that feature.

And if somebody will complain now that this would be bad for the Ruby community because “it limits the developers’ creative freedom”, then they have no idea what it means to develop complex software, and especially to make it work as intended.

RubyGems, CPAN and other languages

I’m surprised that my previous post about rubygems got so much coverage, since I really didn’t find I added much to the common knowledge about RubyGems shortcomings. It’s not like I’m just piling up together with the Debian guys on the “gems bashing” just for fun. My switch away from gems for the packages I maintain started a few weeks ago already, and I’ve spoken with Hans about this at the time too. Now, let me try to address some of the concerns that people seem to bring up, since I’m not the kind of person who throws a rock and then hide to see the results.

As Alex said, it is actually possible to patch up Ruby Gems; and indeed I knew how to do that myself, but… it takes unpacking and repacking the files, which is generally a waste of CPU time and that is not really something that I would call “feasible” just to fix the code, when we do the same thing without any kind of issues for a huge amount of software already. Also, as he also pointed out, log4r and other gems don’t really use the tar format, instead it seems to contain some YAML-formatted list of base64-encoded files. Not nice, no.

On the other hand, Elias added that the problem is shared by all common toolkit-specific package managers: PHP/PEAR, Ruby/Gem, Python/Egg, Perl/CPAN, TeX/CTAN. This is probably right, although I think I have something to add here since lots of people seem to think that CPAN and RubyGems are on the same page, when they certainly are not, as Debian developers already pointed out. As for what concerns Python Eggs, and PHP PEAR, I have no experience working with them, while I know the Gentoo Python team does not like Eggs for just about the same reason why Ruby team does not like Gems.

As Wouter Verhelst points out:

There’s CPAN.pm, which does much the same thing as RubyGems; but much of CPAN can be easily turned into a Debian (or RPM, or whatnot) package. The few cases where it can’t, you’re usually dealing with a broken CPAN package, anyway.

He’s perfectly right here, CPAN modules can be easily turned into distribution-specific packages, included ebuilds thanks to g-cpan. The reason for this is that instead of inventing its own freaking file format, CPAN uses standard tarballs or ZIP packages, and a specific structure of the files inside of the package. Once you know how the structure for a package is, implementing downstream-specific packaging of such tools is very quick, and still allows for all the versatility that downstream package manager require.

For the little I know about it, I think CTAN does just the same, and indeed Wouter states that too:

There are similar things for Python (egg), and, heck, even TeX (CTAN). The fact that other scripting languages have proper and working packaging systems only outlines that there really isn’t an excuse for the horrors of RubyGems.

I think that Wrobel’s post is indeed a very important reading because it summarise the fact that the argument that Gentoo and Debian have against RubyGems is not that there shouldn’t be a language-specific package manager for Ruby, but rather that we would have certainly liked a better-designed package format that could suit both the system-independent packaging and the downstream packaging at once, allowing for proper integration between modules installed by users and by the distribution.

The fact is that beside gems there is no vast and common standardisation of practises between Ruby packages, which makes the role of distributions much more complicated than it has been historically for Perl, Python, TeX and the like. What do I mean? Well, take extconf.rb and setup.rb files that are often used to build binary C-based extensions for Ruby: they are not generated by tools like autotools or distutils, and instead they are copied and morphed from package to package, causing them to have subtly different syntax and stuff like that. To use a DESTDIR-like parameter (to install in a subtree before packaging) some require you to specify that during install phase only, others will require you to pass it on all the parameters. One would expect that Rake could have brought some more standardisation in packaging, but it’s not the case either.

For once, Rake does not usually have an “install” task, and even less one that taking a DESTDIR parameter to install in an offset. This is something I’ll probably try to find a solution for once Ruby-Elf arrives to a point where a release is needed, because I don’t intend to use RubyGems as my primary form of distribution, quite obviously. But it does not limit to that: even the tasks that are usually common between packages don’t follow the same interface. Some packages will use “rake test” for running the testsuite, others “rake spec” and others “rake rspec” because they don’t use Test::Unit but rather Spec. Some packages will build the HTML documentation with “rake doc” and others with “rake rdoc”.

I think that the underlying problem here is that Ruby has been really the most “magic” of the languages. While this is very nice for the developers, it is not always good; sometimes having too much magic around can make integration or debugging very hard, just like automagic dependencies make it difficult to properly package software, Ruby magic can be a hinderance when doing something that goes a bit away from what the mainstream developers do.

Now, since just criticising software is rarely useful, let me try to explain what is my plan to resolve, or work around, the issue. First of all, I’ll see to work with Hans and Alex, as soon as I have time, to find a way to simulate the presence of a gem when the package was installed by the ebuild using a tarball, so that there is no loss of magic when using the ebuilds themselves. Then I’m going to try preparing a distribution-friendly “install” task for Rake, as I said first and foremost for RubyElf but I’m going to release it as-is so that other projects can use it too, and try to standardise it.

But these are little steps; what I’d like to do, but I lack the time to, is writing a specific about packages installation and design for Ruby so that a project following that specific can not only build gems for “magic” installation on heterogeneous systems, but also prepare tarballs that distribution can turn into proper packages without having to deal with black magic. This would not mandate a particular file format like gems are doing, but rather a particular directory structure or Rake-based interface. Once the same exact steps can be run for any given package following the specification, it’s going to be just enough for the distributions.

Introducing the Rust category

Here I am, creating one of the new categories that I told you I would have created when I moved to blojsom, so that I can easily talk about Rust, my Ruby bindings generator for C++ and C.

I’ve started yesterday to write the RDoc documentation in the ruby files, if all goes like i hope, soon I’ll be able to get ruby-hunspell to build with Rust, which means that at least the C++ bindings support will be working as I intend.

When that will be okay, I’ll be working on packaging Rust in a gem format so that it can be easily installed in a Ruby fashion; unfortunately for C++-based extensions using a Gem does not seem easy (extsetup.rb does not work with C++), so ruby-hunspell and rubytag++ will have to remain built with CMake.

One of the problems I’m having now is finding a decent way to run unit tests; Rake does not seem to be that good when it comes to generate a cc/hh pair, and then build an extension based on that. I’m afraid I’ll have to resolute myself to use CMake again, but this time basing myself on Test::Unit for most of it.

Let’s see if I can get it to a decent fit before night, so I can publish the GIT repository for good.