Always a better Ruby

To make Gentoo a much better platform for Ruby development, I’ve started working last year on the Ruby-NG eclasses which provide a way to install Ruby extensions for multiple Ruby implementations in parallel (leaving to the user the choice for what to install them — unlike Python). I say “eclasses” because one is general and another is used to install RubyGems-based packages, with “fake” specifications that sidestep problematic dependencies and other similar issues.

Now, when I started implementing this, my idea was to add support for Ruby 1.9 and JRuby (both of which were missing before), but the result was suitable for Ruby Enterprise as well, which Alex has been working on lately. The end result is that, for standalone Ruby extensions, the eclasses were well received, and more than half the tree now uses the new eclasses:

Update (2016-04-29): This used to include live graphs, but these graphs are now lost, I’m sorry.

What we haven’t yet experimented too much with is using the new eclasses to support bindings that are part of bigger packages, like obexftp which is still broken. I guess is another reason why you should split foreign language bindings rather than keep them monolithically inside the single package. I was talking about this with Hans today and I think this is one of the things we should work on soon, if we want to deprecate the old eclasses.

As it is, just a handful of simple Ruby extensions are missing to be migrated before we can “safely” unmask Ruby 1.9 (I say “safely” because I expect the tinderbox to go crazy once Ruby 1.9 is unmasked and selected, but that’s beside the point now).

Now, back to Ruby 1.8 and Enterprise. Since I had to fix the two of them for BerkDB 5.0 I decided to backport the patch I made for Ruby 1.9 to enable --no-undefined when linking extensions. Interestingly enough, this shown up the problem (already fixed by Alex) with the OpenSSL bindings in the upstream package — remind you of something?

Enabling the --no-undefined flag on all the Ruby versions available, means that we can be sure that the extensions built will work as intended on all of them, and that a patch from one version won’t break it on another. Well, it does not give us 100% safety, but it at least increases it. Without this change, adding a call to a newly-introduced function could produce a non-working extension without warning, but for an abort at runtime.

Unfortunately this does not happen without consequences and false positives; ruby-gstreamer is an example of this: it fails because of the undefined symbols in the extension; the extension is not broken (but the ebuild is), it simply needs another extension to provide those symbols before it is loaded. I think this is a very rare situation and I’d rather deal with this on a case-by-case basis rathe than leave all the undefined references as “fine” — I said that the ebuild is broken; the problem is that the extension needs ruby-glib at runtime and we currently don’t depend on it at all.

The next steps are obviously to run the tinderbox with all the Ruby implementations enabled and see how it works out, so that maybe we can improve the lines on this graph:

Update (2016-04-29): This used to include live graphs, but these graphs are now lost, I’m sorry.

To improve this situations, I tried to solve the test-unit problem. Ruby 1.9 ships with a reduced test-unit implementation (which is what is also available as minitest in Gentoo, for 1.8, JRuby and EE); since most testsuite need the full-blown test-unit interface, there is a test-unit gem to provide it for Ruby 1.9. It’s not entirely API compatible, but it comes very near to that. After this, another implementation was created, test-unit-2, which is even less API compatible but provides enhanced features, and works on (almost) all implementation – it fails on JRuby maybe for a JRuby bug.

Unfortunately, auto-gem loading causes test-unit-2 from loading on all the implementations, if installed, which is the reason why we’re keeping it masked. While I still haven’t found a proper solution to deal with this; the best choice I can see now is just depend on the 1.x series of test-unit (only available for Ruby 1.9) by default; depend on test-unit-2 if the package needs it; and block test-unit-2 if the package fails tests with it installed. This should allow to cover most of the needs of our users.

Finally, a request if somebody feels like playing a bit around with Unix commands, to improve the way we currently install the Ruby-NG based ebuilds. Since we install for up to four targets at the same time, most of the time we install multiple copies of the same files. They can easily become a problem. While I know there is a (very incomplete) work for btrfs to support live data de-duplication, it would be very nice if we could, at some point, reduce the waste due to this, without relying on the filesystem.

I’m afraid I have no knowledge on how to do that, but if we could just run some pass of software after the install is complete (we can easily hook stuff like that up in the eclass) we could then use hardlinks between the files that are identical rather than having to install them multiple times.

Anyway, this is enough for now, news will follow, and please let us know if an extension that “worked” before now fails to build for undefined symbols… we’ll have to deal with them, one way or the other!

4 thoughts on “Always a better Ruby

  1. Regarding file de-duplication, here are some ideas:1. Install one central instance of the gem contents, then install shadow instances of it using symlinks for each ruby implementation and only use files if a file differs2. Hard link identical files in a post-installation pass, iterating over each file in each installed instance (compare instance 1 with 2-4, compare instance 2 with 3-4 etc), unlinking one of two identical files and replacing it with a hardlink to the other. However, I don’t know what support the final ebuild merge process has for hardlinked files – I suppose currently it would simple re-duplicate each hardlink?


  2. Amending my first reply post: is a very basic (and limited) ruby script to collect checksums of files from directories (calling “checksums”), merging these into (checksum,[*files]) pairs (calling “merger”) and generating a bash script to execute (calling “hardlinker”). Feel free to improve – current problems: Cannot work with big files, slow, does not check ownership.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s