When I wrote Ruby-Elf was mostly looking forward to learn about the ELF format itself and thus be able to understand more deeply the performance issues related to that. The choice of Ruby was so I could gloss over most of the tricks related to C programming, like memory management and error detection.
Over time, especially with the creation of cowstats
first and missingstatic
after, I thought that Ruby was a suboptimal language for what I’ve been doing; the ELF format was designed with C in mind, and indeed would map pretty well to C processing: you map the file in memory and access it as a single byte array. Pretty nice, but I couldn’t do it (easily) with Ruby. The way Ruby-Elf works is not by saving memory at all, instead it uses lots of memory, parsing and saving in memory all the information read from the ELF file. This has some disadvantages, but also some advantages.
In particular I was reminded of this today, after launching a scanelf
process to look for some symbols in all the libraries of the system, I had it crash in front of me while trying to scan the debug files for libstdc++. It was trying to access an out of bound memory area since the file is actually partial (and has to be taken in consideration together with the other half in the stripped file; the fix has been committed to CVS and will be in the next pax-utils release.
But the point here is that with Ruby, such a problem wouldn’t have stopped me during my task, requiring me to put it on hold, start a new one (fixing scanelf) and then coming back to that. If something is broken in ruby-elf itself, most of the times the result is that an exception is raised, which can be rescued in the loop over the files to analyse, so that an error can be outputted to the user, and the processing resumes with the next file.
This is why I originally thought about learning Ada and converting my ELF analysis tools to that: it is compiled, so it should be faster than Ruby, but it’s also very well protected and will raise exceptions when problems are found. Unfortunately I haven’t had the time to learn it yet, which is very unfortunate since I’d really like to use it. For now I’ll have to keep working with Ruby and C, I guess.
One thing that I’m considering, though, is writing a C Ruby extension for byteswapping arrays; the reason for this is that it seems like the most time-consuming task in there is the byteswapping of data read from the ELF file itself.
I also have to consider the idea of implementing the ability of read compressed ELF files, but the fact that neither Zlib nor Bzlib2 have seeking functionalities makes it kinda hard (right now the code first reads through the file the offsets of the sections and then reads them as needed to avoid processing data that’s not going to be used, and to be able to properly identify sections by name when deciding the type.
At any rate, I might just decide to try out a basic scanelf reimplementation using Ruby-Elf for when I need stuff and I can’t wait to fix scanelf…