After my quick last post I decided to look more into what I could do to save the one year of work I pushed into Ruby-Elf. After fighting a bit to understand how the new String class from Ruby 1.9 worked, I could get the testsuite to pass on Ruby 1.9, and cowstats
to also report the correct data.
Unfortunately, this broke down Ruby 1.8 support, as far as I can see because IO#readpartial does not work that well on 1.8 even for on-disk files; similarly happens to JRuby.
After getting Ruby 1.9 to work the obvious next task was to make cowstats work with multiple threading. The final idea was to have a -j
parameter akin to make
’s but for now I only wanted to create one thread per file to scan. In theory, given native threading, all the threads would be executing at once, scheduled by the system schedule, allowing to saturate the 8 cores, reaching 800% of CPU time as a theoretical maximum.
Unfortunately reality soon kicked in, and the ruby19 process limited itself to 100%, which means a single core out of eight, which also means no parallel scan. A quick glance through the sources shows that while YARV (the Ruby 1.9 VM) lists three possible methods to achieve mutlithreading, only one is currently implemented, the second one. The first method is the old one, green threading, which basically means simulated threads, as the code never executes in parallel but uses an event-loop-like construct to switch the execution between different “threads”. The second method makes use of a giant lock, which in this case is called Giant VM Lock (GVL), and is called GIL (Giant Interpreter Lock) in Python, where the threads are scheduled by the operating system, which allows for more fair scheduling among execution threads, but still allows just one thread per VM to be executed in parallel. The third method is the one I was hoping for and allows for multiple threads to be executed at the same time on different cores on the same virtual machine; instead of having a single lock on the whole VM, the locks are sparse around the code to just lock the needed resources for each thread.
I also checked this out on JRuby, to compare; unfortunately JRuby in portage cannot handle the code as I changed it to work with Ruby 1.9, so I have been unable to actually benchmark a working run of cowstats with it; but I could see that the CPU used by JRuby spiked at 250%, which means it at least is able to execute the threads quite independently; which proves that Ruby can be parallelised up to that point just fine.
So what is the fuss about Ruby 1.9 new native threading support if multiple threads cannot be executed in parallel? Well it still allows for a single process to spawn multiple VMs and execute parallel threads on them, isolated one from the other. Which happens to be useful for Ruby on Rails web applications. If you think well about it, the extra complexity added to deal with binary files is also to address some interesting problems that come up in environment where multiple encodings can often be used, which is, web applications. Similarly the JRuby approach, which is very fast once the JVM is loaded, works fine for applications where you start up once and then proceed to elaborate for a long time, which again fits web application and little more.
I’m afraid to say that what we’re going to see in the next and not-so-next future is for Ruby to lose the general-purpose support and just focus more and more on the web application side of the fence. Which is sad since I really cannot think of anything else I would like to rewrite my tools in, beside, maybe, C# (if it could be compiled in ELF — I should try Vala for that). I feel like my favourite general-purpose language is slipping away, and I should stop worrying and working on that.
I know you have overlays but I’ve been kind of maintaining JRuby in java-overlay. It’s currently at 1.1.4. 1.1.6 is the latest version and it adds better support for 1.9. A mere bump may work but it’s not the simplest of ebuilds. I’ll probably look at it soon.
Vala indeed is interesting. I really suck at coding C/C++ and I’m a Java programmer (who sometimes is using PHP). So I’m really a “high level” guy. I needed to write a small application for an embedded system that is able to use bluetooth to automatically deliver content. I did that in Vala (using BlueZ over DBus together with SQLite3). I knew nothing about any of BlueZ, DBus, SQLite3 and Vala. And in my opinion Vala really made my life easy.I can only recommend to at least try Vala. It has a good language syntax, some nice code snippets to learn it and the API documentation is lenghty and really good. Just look at http://www.valadoc.org/I think everything you’d need to write a replacement for ruby-elf should be accessible from Vala, too. And it is also good at threading (using gthread).
I’d echo the other commenter’s recommendation to try 1.1.6. It doesn’t have a ton of 1.9 API stuff, but it’s certainly more stable. 1.1.7, due out in February, should fill out the API a lot more.I’d love to get your stuff working well on JRuby and see some numbers. In theory, we should be both faster and much more parallelizable, once we get past any execution problems.
I’d be glad to try JRuby but I’m sincerely not that good with Java ebuilds, so James just let me know when you bump it in the overlay (which I already have set up by the way), so I’ll try it out ;)As for 1.9, I think the only thing that is currently not working properly is IO#readpartial behaviour which seems to have changed between 1.8 and 1.9 (in 1.9 it seems to have more or less the same effect as readbytes from the @readbytes.rb@ extension).For what concerns Vala, I have been wanting to try it out for quite a while, I’ll check it out and see how feasible it is to produce a similar inheritance scheme as Ruby-Elf’s when I can spare some extra time. Hopefully if I can get to use lower-level access like @mmap()@ I should also be able to make it very fast, even though making it nice to work with multiple backends like I was trying to do with Ruby-Bombe is likely quite more difficult.
You should check perl6 =P
Well, since you almost decide to abandon Ruby stuff, may I recommend you to look at this http://www.rosecompiler.org/Honestly, I don’t think that high level languages will bring fruits in your endeavour. That project might be overkill for your task, but it can bring more pleasure and nice thoughts to you and may be to gentoo as well:)