I tend to suffer from a common trait in open source developers, which is very worrisome, and it’s the syndrome of the autoexpanding task dependencies; that is when I do something, it naturally sprouts more things to do, or more areas to investigate. Since my time is quite limited, I never end up doing, writing, investigating everything I should, or would like to.
This can be easily seen by the fact that my linking collision script, originally created to identify subtle bugs with similarly-named symbols, only marginally found interesting collisions, and instead for the most part found imported libraries with security concerns and some issues with bad performance for modern computers.
Unfortunately, since the lower hanging fruits of my collision detection scripts are the imported libraries and their related security issues (which by the way shown another hit), sometimes it might seem that the output of my script is useful for that and has absolute answers. It couldn’t be more wrong.
Because of the nature of the script itself, what it analyses is the resulting code as loaded by ld.so, so it can identify the libraries only if, when imported, they keep their symbols exported. Which means that it can be easily worked around by either hiding the symbols through ELF visibility or by adding an application-specific prefix. The latter, I can sort-of work around by accessing the data directly on the database; today I used that to file at least one new bug but it won’t be the only one I’m sure.
Now I could probably try to make my script smarter to identify 100% (or lower) inclusion of symbols between libraries, but this wasn’t the original plan at all, and it really makes me wonder if I should consider working on a different software to take care of that. An alternative implementation could check the built object files and identify common prefixes on system-provided symbols, but it would require much more power, and much longer work, and would also be susceptible to mistakes when the prefixed symbols are just a way to bind libraries between different languages (like caml seems to be doing). Doing so would allow me to hit on hidden and static functions as well.
At any rate I’m going to try to port Ruby-Elf to Ruby 1.9 (right now it fails badly) and try to make it multithreaded, so that it can be faster when run on Yamato; if I remember correctly, 1.9 has native threading (rather than green threads as used by Ruby 1.8). I also considering using JRuby so that no porting was needed, but the start-up time for the JVM is considerably high, and it wouldn’t help.
Oh and I added some technical books to my wishlist; if you wish to help me research more problems to find more solutions, they would be appreciated.