ruby-bombe and consistency

My work on ruby-bombe seems to be proceeding nicely; for now it’s a very nice Ruby library that … seeks. The only thing I’ve implemented up to now, and not yet entirely, is seeking. Why this? Because it’s probably the trickiest part, since on some backends has to be emulated, on others it has to be adapted. Reading is going to be probably easier to implement.

Together with writing support for seeking, I also added tests, a huge amount of tests to make sure that my code works as I want; I have to thank Jason again, since RSpec makes it much easier to understand if the tests are being enabled, and makes it also easier to split the tests to reuse the same logic, which Test::Unit makes much more difficult.

But this is not what I’m here to talk about, I’m rather trying to flesh out, to myself and to those interested in ruby-bombe, some notes about consistency. In particular, I found that using ruby-mmap to access file is totally inconsistent with accessing them through the File class. In particular, the ruby-mmap extension always raises ArgumentError exceptions when the path used points to a file that does not exist, or when the file is unreadable. I’ll have to add quite a few more tests, for instance to make sure that when giving the path to a directory rather than a file.

As of this moment, ruby-bombe has its own exceptions to handle FileNotFound and PermissionError cases, rather than using the Errno module exceptions, since I’m not really sure if they would apply properly to network cases like HTTP and similar. For the rest, I’m trying to use all the possible Ruby exception, when they make sense (ESPIPE sincerely is not very userfriendly in my opinion).

Unfortunately, of course, I don’t know all Ruby’s facets myself, so here is why I’m blogging: I’d be very happy if somebody could help me to ensure tha tthe behaviour, the naming, and the tricks used in my library are consistent with the rest of the language. I think I was able to get a similar enough interface, even when the backend libraries are pretty inconsistent with the rest of Ruby (like ruby-mmap).

As of now, I have backends for IO streams (like pipes), in particular files (with path-based access), and sockets (both TCP and UDP), gzip-compressed files (with emulated seek), memory mapped files and string/arrays. Planned there is at least the bzip2 compressed files with no seeking, bzip2 files with seeking (using lots of memory) and http-downloadable files. The two backends I’m particularly interested in completing, for ruby-elf, are the Gzip and mmap backends. The reasons are very practical, the first is needed to reduce the space taken up by the testsuite of ruby-elf, since the files don’t have to be executed they might as well be compressed, the second is interesting if I get to implement scanelf.rb or something like that, since mapping the ELF executables and libraries into memory is most likely going to find some data already loaded in the system memory, mapped for files that are loaded for execution. This might actually improve its performances, but before judging that I’ll have to write the code.

On a different note, if you’re interested in Ruby packages, I’ve added two more ebuilds in portage: dev-ruby/uuidtools and dev-ruby/flickr, both are extensions used by Typo; while the upstream version gets them from Subversion, I’ve decided that to try reducing the amount of code I cannot easily control I’m going to pick them up from Portage, like I did for the original 4.0 version on Gentoo/FreeBSD. So enjoy them, if you need them.

Ruby, tests and specs

While I started working on ruby-bombe, I also wanted to improve the tests of Ruby-Elf, possibly making them more explicit, comment them out so that they are more meaningful and not just to me, and so on. But when I started looking at the code, I found myself trying to start with some dynamic programming, trying to write methods that could generate the test methods, since otherwise I’ve been adding code over code that did just the same thing with very slight differences in names and similar.

So I decided to look into alternative testing frameworks, to see if there was something providing the features I felt I needed. While the basic Test::Unit support in Ruby suits pretty well ruby-bombe, which is mostly a functional library (I’m interested in behaviour rather than data), it does not seem to apply very well to Ruby-Elf where I’m instead testing returned values to ensure they conform to what I’m expecting from them.

I asked Jason from Ohloh if he had any suggestion, and pointed me at RSpec (which I knew by name but never tried myself) and ZenTest (which seems to be lacking any sort of decent documentation). I tried RSpec first and it seems to be almost what I need. Almost.

It might be that I haven’t found how to do it yet, so I’m here asking the ruby-knowing lazyweb some help. Basically, I’m fine with the description idea at the base of RSpec, and indeed it’s useful to see it this way:

describe "linux_amd64_dynamic_executable" do
  it_should_behave_like "dynamic ELF executables"
end

so that it actually runs the tests like I want them to be run:

linux_amd64_dynamic_executable
- should be an ELF file
- should be version 1
- should be an executable file
- should have a .dynamic section
- should have a .dynstr section

But I need to go much deeper. For instance, I’d like that I could for instance describe further the behaviour of the dynamic section, having code similar to this:

  it "should have a .dynamic section" do
    @elf.should have_section(".dynamic")
    @elf[".dynamic"].describe do
      it_should_behave_like "all dynamic sections"
    end
  end

and then having something like this as results:

linux_amd64_dynamic_executable
- should be an ELF file
- should be version 1
- should be an executable file
- should have a .dynamic section
  - should be a dynamic section
  - should be of dynamic type
  - should be called .dynamic
  - should have a final NULL entry
  - ...
- should have a .dynstr section
  - should be a string table
  - should be of string table type
  - ...

The problem is that, as far as I can see, I cannot use the describe method there to do what I need. Or if I can, the documentation does not tell me how and why… but as far as I can see, this only creates a new top-level example, which I don’t need in this case..

Does anybody know anything like this or do I have to hack my own rspec-alike?

Seamless access to files, IO streams, compressed files, …

Since the testsuite for Ruby-Elf starts being disproportionate to the actual code that Ruby-Elf consists of, and I’m still lacking regression tests for the two scripts in it (missingstatic and cowstats), I’ve considered some time ago to support accessing ELF files compressed with either gzip or bzip2 so that the space required would be drastically reduced.

Unfortunately my idea ended up being unrealisable at least at the time since Ruby-Elf needs to seek, and neither formats allow for easy seeking around.

I started working on some generic IO access to files in a branch, but it didn’t turn out very good, and I left it behind for a while. Since now I’m at the point I really need to write the testsuites for the two scripts, I decided to revive the idea, and implement it with a system of “backends”.

I started with two simple backends: access through path (with a File instance) and access with a direct IO stream. Very easy and not really complex at all. Then I introduced a ruby-mmap backend so that the file could be mapped into memory rather than read and copied over, and this also was fine, although I had to emulate the cursor handling (seek and tell). Reading gzip compressed files was also quite easy since Ruby already provides a good interface to that. Unfortunately bzip2 support is a totally different matter, since the bzlib interface does not provide the tell and rewind methods that are needed to emulate seek for compressed files (slowly).

The problem at this point is that the code for the backends is complex on its own, and it would add over Ruby-Elf’s complexity to a point that they wouldn’t really make sense together at all. Reached this point, you know what it comes: factoring the code out.

For this reason I’m now thinking what the best course of action can be; I want to have access to possibly a lot of backends: straight files, compressed files (gzip, bzip2, lzma), archive files (tar and pax, ar, zip, … — ruby-libarchive would help here), network files and so on. I already decided on the name: ruby-bombe (if you like history you should get the reason for the name), the problem is now taking the code out of Ruby-Elf, write ruby-bombe, adapt Ruby-Elf to the new library, and hope none of the three users of ruby-elf gets mad at me for requiring a dependency.

Tonight is going to be a long night.