Book Review: Learning Cython Programming

Thanks to PacktPub I got a chance to read an interesting book this week: Learning Cython Programming by Philip Herron (Amazon, Packt). I was curious because, as you probably noticed, after starting at my current place of work the past April, I ended up having to learn to use Python, which ended up with me writing my glucometer utilities in that language, contrarily to most of my other work, which has been done in Ruby. But that’s a topic for a different post.

First of all, I was curious about Cython; I heard the name before but never looked much into it, and when I saw the book’s title and I quickly checked what it was, my interest was definitely picked. If you haven’t looked into it either, at a quick summary it’s a code generator bridge between Python and good plain old C, wrapping the latter such that you can either make it run Python callbacks, or generate a shared object module that Python can load, and offload the computation-intensive code to a more performant language. And it looks a lot like a well-designed and well-implemented version of what I hoped to get in Ruby with Rust — no connection with Mozilla’s language with the same name.

The book is a quick starter, short and to the point, which is an approach I like. Together with the downloadable source code, it makes it a very good solution to learn Cython, and I recommend it if you’re interested. Not only it covers the obvious language itself, but it covers a wide range of use cases that show how to make good use of the options provided by Cython. It even goes on to show how to integrate it in a build system (although I have some reserves on the Autotools code in there, which I think I’ll send Philip a correction for).

I seriously wish I had Cython and this book when I was working on Hypnos, an Ultima OnLine «server emulator» for which I wanted to add Python-based scripting — other emulators at the time used either a very simple, almost basic-like scripting language, Perl or C#. This was before I tried to use Python for real, which turned me to hate its whitespace-based indentation. I did write some support for it but it was a long and tedious process, so I never finished it. Not only Cython would make that work much less tedious, but the book shows exactly how to add Python scripting capabilities to a bigger, C program using tmux as the example.

The book does not ignore the shortcomings of Cython of course, including the (quite clumsy) handling of exceptions when crossing the foreign language barrier. While there are still a bunch of issues to be straightened out, I think the book is fairly good at setting proper expectation for Cython. If you’re interested in the language, the book looks like the perfect fit.

Diabetes control and its tech, take 4: glucometer utilities

This is one of the posts I lost due to the blog problems with draft autosaving. Please bear with the possibly missing pieces that I might be forgetting.

In the previous post on the subject I pointed out that thanks to a post in a forum I was able to find how to talk with the OneTouch Ultra 2 glucometer I have (the two of them) — the documentation assumes you’re using HyperTerminal on Windows and thus does not work when using either picocom or PySerial.

Since I had the documentation from LifeScan for the protocol, starting to write an utility to access the device was the obvious next step. I’ve published what I have right now on a GitHub repository and I’m going to write a bit more on it today after a month of procrastination and other tasks.

While writing the tool, I found another issue with the documentation: every single line returned by the glucometer is ending with a four-digits (hex) checksum, but the documentation does not describe how the checksum is calculated. By comparing some strings with the checksum I knew, I originally guessed it might have been what I found called “CRC16-Syck” — unfortunately that also meant that the only library implementing it was a GPL-3 one, which clashed with my idea of a loose copyleft license for the tools.

But after introducing the checksum verification, I found out that the checksum does not really match. So more looking around with Google and in forums, and I get told that the checksum is a 16-bit variation of Fletcher’s checksum calculated in 32-bit but dropping the higher half… and indeed it would then match, but when then looking at the code I found out that “32-bit fletcher reduced to 16-bit” is actually “a modulo 16-bit sum of all the bytes”. It’s the most stupid and simple checksum.

Interestingly enough, the newer glucometers from LifeScan use a completely different protocol: it’s binary-based and uses a standard CRC16 implementation.

I’ve been doing my best to design the utility in such a way that there is a workable library as well as an utility (so that a graphical interface can be built upon it), and at the same time I tried making it possible to have multiple “drivers” that implement access to the glucometer commands. The idea is that this way, if somebody knows the protocol for other devices, they can implement support without rewriting, or worse duplicating, the tool. So if you own a glucometer and want to add support for it to my tool, feel free to fork the repository on GitHub and submit a merge request with the driver.

A final note I want to leave about possible Android support. I have been keeping in mind the option of writing an Android app to be able to dump the readings on the go. Hopefully it’s still possible to build Android apps for the market in Python, but I’m not sure about it. At the same time, there is a more important problem: even though I could connect my phone (Nexus 4) to the glucometer with an USB OTG cable and the one LifeScan sent me, but the USB cable has a PL2303 and I doubt that most Android devices would support it anyway.

The other alternative I can think about is to find an userland implementation of PL2303 that lets me access it as a serial port without the need for a kernel driver. If somebody knows of any software already made to solve this problem, I’ll be happy to hear.

I think I’ll keep away from Python still

Last night I ended up in Bizarro World, hacking at Jürgen’s gmaillabelpurge (which he actually wrote on my request, thanks once more Jürgen!). Why? Well, the first reason was that I found out that it hasn’t been running for the past two and a half months, because, for whatever reason, the default Python interpreter on the system where it was running was changed from 2.7 to 3.2.

So I tried first to get it to work with Python 3 keeping it working with Python 2 at the same time; some of the syntax changes ever so slightly and was easy to fix, but the 2to3 script that it comes with is completely bogus. Among other things, it adds parenthesis on all the print calls… which would be correct if it checked that said parenthesis wouldn’t be there already. In a script link the one aforementioned, the noise on the output is so high that there is really no signal worth reading.

You might be asking how comes I didn’t notice this before. The answer is because I’m an idiot! I found out only yesterday that my firewall configuration was such that postfix was not reachable from the containers within Excelsior, which meant I never got the fcron notifications that the job was failing.

While I wasn’t able to fix the Python 3 compatibility, I was able to at least understand the code a little by reading it, and after remembering something about the IMAP4 specs I read a long time ago, I was able to optimize its execution quite a bit, more than halving the runtime on big folders, like most of the ones I have here, by using batch operations, and peeking, instead of “seeing” the headers. At the end, I spent some three hours on the script, give or take.

But at the same time, I ended up having to workaround limitations in Python’s imaplib (which is still nice to have by default), such as reporting fetched data as an array, where each odd entry is a pair of strings (tag and unparsed headers) and each even entry is a string with a closed parenthesis (coming from the tag). Since I wasn’t able to sleep, at 3.30am I started re-writing the script in Perl (which at this point I know much better than I’ll ever know Python, even if I’m a newbie in it); by 5am I had all the features of the original one, and I was supporting non-English locales for GMail — remember my old complain about natural language interfaces? Well, it turns out that the solution is to use the Special-Use Extension for IMAP folders; I don’t remember this explanation page when we first worked on that script.

But this entry is about Python and not the script per-se (you can find on my fork the Perl version if you want). I have said before I dislike Python, and my feeling is still unchanged at this point. It is true that the script in Python required no extra dependency, as the standard library already covered all the bases … but at the same time that’s about it: it is basics that it has; for something more complex you still need some new modules. Perl modules are generally easier to find, easier to install, and less error-prone — don’t try to argue this; I’ve got a tinderbox that reports Python tests errors more often than even Ruby’s (which are lots), and most of the time for the same reasons, such as the damn unicode errors “because LC_ALL=C is not supported”.

I also still hate the fact that Python forces me to indent code to have blocks. Yes I agree that indented code is much better than non-indented one, but why on earth should the indentation mandate the blocks rather than the other way around? What I usually do in Emacs when I’m getting stuff in and out of loops (which is what I had to do a lot on the script, as I was replacing per-message operations with bulk operations), is basically adding the curly brackets in different place, then select the region, and C-M- it — which means that it’s re-indented following my brackets’ placement. If I see an indent I don’t expect, it means I made a mistake with the blocks and I’m quick to fix it.

With Python, I end up having to manage the space to have it behave as I want, and it’s quite more bothersome, even with the C-c < and C-c > shortcuts in Emacs. I find the whole thing obnoxious. The other problem is that, while Python does provide basics access to a lot more functionality than Perl, its documentation is .. spotty at best. In the case of imaplib, for instance, the only real way to know what’s going to give you, is to print the returned value and check with the RFC — and it does not seem to have a half-decent way to return the UIDs without having to parse them. This is simply.. wrong.

The obvious question for people who know would be “why did you not write it in Ruby?” — well… recently I’ve started second-guessing my choice of Ruby at least for simple one-off scripts. For instance, the deptree2dot tool that I wrote for OpenRC – available here – was originally written as a Ruby script … then I converted it a Perl script half the size and twice the speed. Part of it I’m sure it’s just a matter of age (Perl has been optimized over a long time, much more than Ruby), part of it is due to be different tools for different targets: Ruby is nowadays mostly a long-running software language (due to webapps and so on), and it’s much more object oriented, while Perl is streamlined, top-down execution style…

I do expect to find the time to convert even my scan2pdf script to Perl (funnily enough, gscan2pdf which inspired it is written in Perl), although I have no idea yet when… in the mean time though, I doubt I’ll write many more Ruby scripts for this kind of processing..

Gentoo binhosts notes

I’ve been meaning to write about this before, and since I was asked of something related by the Gechi I thought this was as good as a moment as it could be, for what concerns this topic. You probably all know that I run a tinderbox that is trying to ensure that the packages in Gentoo build, and work at least as intended. But as the description page says, the same name has been used by other projects, such as the official Infra one which provides logs, reverse dependency information, and binary packages for various architectures.

Given that, some people asked me before if I can provide binaries of the packages built by my tinderbox; the answer is “not a chance”, for a few reasons that are more complex than the average first glimpse would tell you. And it’s mostly beside the basic problem that Gentoo has very shabby support for binary packages, as both Fabio and Constanze could easily tell you.

In the set of problems, the issue at hands is licenses (and here the expert would probably be Matija as he’s quite interested in getting them right). Not only the tinderbox accept almost all the possible licenses that are available in Portage so that it can actually merge as many packages as are available, but it also does not turn the bindist USE flag on (which is used to make sure that the result is actually distributable; it disables feature that would link GPL-incompatible libraries into GPL software; it disables feature that are known patented and shouldn’t be used; it make sure that trademarks are respected — like Firefox). Both these issue make a good deal of generated binaries non-redistributable by themselves.

But that’s not all; even when the license would let me redistribute the binaries; copyleft licenses require me to redistribute the sources as well; just pointing at the Gentoo mirrors is not a good option there; I would have to provide all the downloaded and used distfiles, and all the ebuilds as used for build that binary. You might guess it’s not an easy task nor one that requires just a bit of online space.

Now, after you actually tackle all these issues, enabling the bindist USE flag, only accepting true opensource licenses that allow redistribution, and providing a way to publish all the sources together with the binaries, you’re still not set. The rest of the problems are actually technological ones, tied to the way my tinderboxing scripts are designed to work. Even without counting the fact that the flag combinations are so many that you have to limit to some sanity, actually building every package in the tree gives me a number of headaches, starting with a number of automagic dependencies which would make the binary packages unusable.

On a side note, I’ve been thinking for a while of getting a way to set up dependency verification to ensure that no automagic dependencies enter the ELF files at least; unfortunately this is not as straightforward as I’d like it to be. New-style virtuals mean that the dependency is hidden under a second layer of packages, which in turn would make it difficult to actually pinpoint the error conditions. Think of the PyCURL trouble that I pointed at a few weeks ago.

What would actually work to produce decently-working binary packages is the method that Patrick has been using for tinderboxing: build one by one each package with only its own detailed dependencies merged in. After all, this is the method that the Debian- and RPM-based distributions follow. This has the opposite side effect of maybe failing if an indirect (transitive) dependency is missing, which often happens with pkg-config, but at least you wouldn’t find automagic dependencies on the binary packages themselves.

And as I noted in passing above, there are a number of problems related directly with the way Portage manages binary packages by itself: it does not have a properly clean way to track actually needed dependencies (ABI-wise that is) – and let’s not even go about the tracking of python dependencies – and the format itself is not really flexible enough, which causes headaches when trying to deal with special features like file-based capabilities.

So, good luck if you want to provide public binhosts, myself I don’t have time nor will to even think about them, for the most part.

Gentoo’s Quality Status

So I have complained about the fact that we had a libpng 1.4 upgrade fallout that we could have for the most part avoided by using --as-needed (which is now finally available by default in the profiles!) and starting much earlier to remove the .la files that I have been writing for so long about.

I also had to quickly post about Python breakage — and all be glad that I noticed by pure chance, and Brian was around to find a decent fix to trickle down to users as fast as possible.

And today we’re back with a broken, stable Python caused by the blind backport of over 200k of patch from the Python upstream repository.

And in all this, the QA team only seem to have myself complaining aloud about the problem. While I started writing a personal QA manifesto, so that I could officially requests new election on the public gentoo-qa mailing list, I’m currently finding it hard to complete; Mark refused to call the elections himself, and noone else in the QA team seems to think that we’re being too “delicate”.

But again, all the fuck-ups with Gentoo and Python could have been seen from a mile away; we’ve had a number of eclass changes, some of which broke compatibility, packages trying to force users into using the 3.x version of Python that even upstream considers not yet ready for production use, and an absolute rejection of working together with others. Should we have been surprised that sooner or later shit would hit the fan?

So we’re now looking for a new Python team to pick up the situation and fix the problem, which will require more (precious) Tinderbox time to make sure we can pull this without risking breaking more stuff in the middle of it. And as someone already said to me, whoever is going to pick-up Python again, will have at their hand the need to replace the “new” Python packaging framework with something more similar to what the Ruby team has been doing with Ruby NG — which could have actually been done once and for all before this..

Now, thankfully there are positive situations; one is the --as-needed entering the defaults, if not yet as strongly as I’d liked it to be; another is Alexis and Samuli asking me specifically for OCAML and XFCE tinderbox runs to identify problems beforehand, and now Brian with the new Python revision.

Markos also is trying to stir up awareness about the lack of respect for the LDFLAGS variable; since the profile sets --as-needed in the variable, you end up ignoring that if you ignore the variable. (My method of using GCC_SPECS is actually sidestepping that problem entirely.) I’ve opened a bunch of bugs on the subject today as I added the test on the tinderbox; it’s going to be tricky, because at least the Ruby packages (most of them at least) respect the flags set on the Ruby implementation build, rather than on their own, as it’s saved in a configuration file. This is a long-standing problem and not limited to Ruby, actually. I’ve been manually getting around the problem on some extensions such as eventmachine but it’s tricky to solve in a general way.

And this is without adding further problems as that pointed out by Kay and Eray that I could have found before if I had more time to work on my linking collision script — it is designed to just find those error cases, but it needs lot of boring manual lookup to identify the issues.

Now, I’d like to be able to do more about this, but as you can guess, it already eats up enough of my time that I have even trouble fitting in enough work to cover the costs of running it (Yamato is not really cheap to work on, it’s power-hungry, has crunched a couple of hard-disks already, and needs a constant flow of network data to work clearly, and this is without adding the time I pour into it to keep it working as intended). Given these points, I’m actually going to make a request if somebody can get either one of two particular pieces of hardware to me: either another 16GB of Crucial CT2KIT51272AB667 memory (it’s Registered ECC memory) or a Gigabyte i-RAM (or anything equivalent; I’ve been pointed at the ACard’s ANS-9010 as an alternative) device with 8 or 16GB (or more, but that much is good already) of memory. Either option would allow me to build on a RAM-based device which would thus reduce the build time, and make it possible to run many many more tests.

Important! Do not update to Python 2.6.5_p20100801

Seems like someone pulled another breakage, almost an year after the past one. Please do not upgrade to this version of Python, and if you have problems like the following:

>>> Emerging (1 of 1) dev-lang/python-2.6.5-r3
Traceback (most recent call last):
  File "/usr/bin/emerge", line 42, in 
    retval = emerge_main()
  File "/usr/lib64/portage/pym/_emerge/main.py", line 1555, in emerge_main
    myopts, myaction, myfiles, spinner)
  File "/usr/lib64/portage/pym/_emerge/actions.py", line 434, in action_build
    retval = mergetask.merge()
  File "/usr/lib64/portage/pym/_emerge/Scheduler.py", line 914, in merge
    rval = self._merge()
  File "/usr/lib64/portage/pym/_emerge/Scheduler.py", line 1222, in _merge
    self._main_loop()
  File "/usr/lib64/portage/pym/_emerge/Scheduler.py", line 1369, in _main_loop
    self._poll_loop()
  File "/usr/lib64/portage/pym/_emerge/PollScheduler.py", line 134, in
_poll_loop
    handler(f, event)
  File "/usr/lib64/portage/pym/_emerge/SpawnProcess.py", line 151, in
_output_handler
    buf.fromfile(files.process, self._bufsize)
IOError: [Errno 11] Resource temporarily unavailable

Then please see bug #330937at this point in time I have no idea how to solve if you updated already, it’s 3:50am and I’m actually here just because I made a mistake rebooting Yamato and found this out.

Update: we have a quick hotfix for you to apply if you reach this point:

wget http://dev.gentoo.org/~ferringb/fix-python-2.6.5_p20100801.patch -O - | patch /usr/lib/portage/pym/_emerge/SpawnProcess.py

it’s a one-liner, just execute it, then you can simply re-merge Python 2.6.5-r3 and Portage to have the pristine system.

Ruby 1.9 vs Python 3

In my previous post where I declared myself up for hiring by those who really really want Ruby 1.9 sooner than we’re currently planning to release it, I’ve said that the Ruby team doesn’t want to “Pull a Python 3”. I guess that I should explain a bit what I meant just there.

Ruby 1.9 and Python 3 are, conceptually, actually similar: while Python 3 actually make a much wider change in syntax as well as behaviour, both requires explicit, often non-trivial, porting of the software to work. Thus, they both require you to be slotted, installed side-by-side, with the older, more commonly used alternative, and so do the libraries and programs.

There is more similitude between the way the two are handled than you’d expect, mostly because the Python support for that has been partly copied out of Ruby NG stripped of a few features. These features are, for the most part, what I’d say protect us from pulling a Python 3.

As it is, installation of Python 3-powered packages is done once Python 3 is installed; and Python 3 is installed, unless explicitly masked, on every system, stable or not, because of the way Portage resolves dependencies. In my case, I don’t care about having it around, so it’s masked on all my systems (minus the tinderbox, for obvious reasons). You cannot decide whether a given package is installed for 2.6, 2.7 or 3.1, and you can only keep around safely one Python for the 2.x series as it will only install for that — which is going to be fun, because 2.7 seem to break so many things.

Ruby packages instead is coordinated through the use of the RUBY_TARGETS variable, that allows us (and you) to choose for which implementation (if supported) install a given package; you can even tweak it package-per-package via package.use! This, actually, makes the maintenance burden quite higher on our side because we have to make sure that all the dependency tree is up-to-date with a given target, on the other hand though it allows us be sure that the packages are available, and it would scream at us if they weren’t (or rather Mr Bones would).

Most importantly, we don’t need no stinkin’ script like python-updater to add or remove an implementation; since the implementations are user-chosen via an USE-expanded variable (RUBY_TARGETS as I said), what you otherwise do with python-updater (or even perl-cleaner) is done through …. emerge -avuDN world.

Even though, I’ll admit, there is one thing that at least python-updater seems to take into consideration and that for now we can’t cater: using the Ruby interpreter rather than binding a library to be usable via Ruby; as I said in the post I linked above, it’s one of the few cases that needs to be kinked out still before it can be unmasked. Again you can either wait or hire somebody to do the dirty job for you.

A note about the “stinkin’ script” notion: one of the reason why I dislike the python-updater approach is that it lists a few “manual” packages to be rebuilt. The reason for that to happen is the old Python bug that caused packages to link the Python interpreter statically. The problem has since been fixed, but the list (which is very limited compared to what the tinderbox found at the time), is still present.

It is not all. I said at the start that right now Python 3 is installed unconditionally by default on all systems; we’re going to do double- and triple-work to make sure that the same won’t happen with Ruby 1.9 until we’re ready to switch the defaults. Switching the defaults will likely take a much longer time; we’re going to make 1.9 stable first, and start stabling packages supporting that… from there on, we’d be considering removing packages that are 1.8-only.

Well, to be honest, we’re going to consider switching some packages that won’t work with 1.9 (or JRuby) and neither have use nor they are maintained upstream. For good or bad, a lot of the packages in the tree have been added by the previous team members, and they, like us, often did so when they had a personal interest in the package… those packages often times are no longer maintained and are dead in the water, but we still carry them around.

Anyway, once again, the road is still bumpy, but it’s not impossible; I’m not sure if we can get to unmasking it before end of the summer as I was hoping to, but we’re definitely on track to provide a good user experience for Gentoo users who develop in Ruby, and most of the time, we can even provide a better upstream experience.

Log analysis, yet again

So I’m again trying to find a solution to the log analysis problem; the main issue at this point is that the tinderbox is generating something along the lines of 200MB of logs a week — probably also because thanks to Zac it’s much much more efficient than it was before. With such an amount of data to shuffle through, the grep command from within Emacs is no longer feasible.

What I’m considering using now is to store most of the data directly inside a database (PostgreSQL since that’s what I’m using already here) and then take it out from that, in a simple (web) interface. The reason why I’m going for a web interface is that it’s likely what takes less time to design, to quickly report and copy content.

On the storage side, the main question for me is whether the database should also contain specific details of the problem or just the presence of such a problem and a pointer to the log file. In the former case, the web application could easily be extended to something more than a glorified grep, but it’d have to store a non-trivial amount of data. Some log files are well over the 10MB, so it gets a bit tricky to handle those properly.

Thinking a bit further on the interface, it should really be a way to report bugs directly: if the application can find that the merge found ELF files in /usr/share, filing the bug directly is just a matter of finding who exactly maintains a particular package (which is quite easy), and it wouldn’t even require copy-pasting if the data is available directly in the database, already parsed. Obviously, it would still require manual confirmation before opening the bug, and before doing so, it should also implement an easy search function to show possible duplicates.

While my first guess was to write a stupid CGI (or using the Ruby integrated webservers in a script) to have on the browser the results from the database, I’m now more interested in the idea of having some more complete application to deal with this. Pavel also suggested for allowing other developers to access the interface to report the bugs, so that even if I’m not around to do the filing someone else can. Unfortunately that also bring up a problem: if I were to allow developers to file bugs with their account I’d have to make them give their login information to the tinderbox (and I don’t like that not even if it’s me running it); on the other hand I’d rather not make them file bugs with my own account, so I guess it’d require to set up a no-mail account for the tinderbox (no-mail since it’d be pointless to have mail coming for a tinderbox account), and then make the users CC their own address by default.

Now comes the problem: I can probably start working on such an interface myself, using Ruby on Rails, which is something I’m somewhat fluent in; on the other hand, I know of no Ruby interface for the Bugzilla RPC protocol, but there is a well-tested pybugz extension for Python (which I’m definitely not fluent in). Before I start hacking anything at all (since that’s going to change quite some bits of the interface; if I were to use Ruby on Rails, the ORM will most likely call for an abstracted interface to the database, which is good for some things but not for everything), I really need to see if somebody could help me with such a task in the long run.

If somebody is up to writing the interface in Python to my specs, using pybugz, that’d be fine, otherwise I’d like to see if somebody already worked in a pybugz-like interface for Ruby instead. At worse I could settle for just opening the bug with pre-filled fields, and then attach the build log afterwards (to attach the log I need to know the bug number of the just filed bug), and that’s not feasible by just providing a link to the pre-filled bug (although it should be still be quite an improvement to my workflow, if I had that!).

So, anybody can provide any insight or volunteer to help me out?

Gentoo’s QA soft spots

Gentoo’s quality assurance is a quite difficult task to keep running smoothly: there are problems at so many levels that it’s not funny at all. I’ve been doing my share of the work, both with the tinderbox effort and with manual work to fix the issues when they come up. But there are some particular soft spots that I guess should be addressed sooner rather than later.

One of the most annoying problems I’m having lately is related to test failures with Python modules. Part of the failures were “expected” since they tied into the presence of Python 3.1 (I don’t use it on my standard system since it’s pointless to me but it’s available in the tinderbox), some were caused by GCC 4.4 and breaking of strict aliasing rules, others I’ve got no clue at all. Arfrerver cannot seem to reproduce them, and I don’t know how to look deeper into them. Having no idea where to start to find the cause of the failures, it also means that for me the problems are not solved.

But before somebody read something for Ruby and against Python in the fact that I have reported one if at all failure with Ruby packages’ tests, you should probably remember that we’re not running tests for any of the Ruby gems. Thankfully, as Alex posted yesterday are now being reviewed and added to the main Portage tree, and as we’re going to move to the fakegem eclass we’re also going to add the tests for all the packages. This is going to be a sorry work because I already noticed that a huge lot of packages in the Ruby land fail their tests, and that’s not only with Ruby 1.9 or JRuby!

Another thing that is in a definitely bad shape in Gentoo is the scientific software, and the libraries. I’m not sure why is that but it seems like most of the people writing scientific software have no clue about build systems, portability, good programming practises and stuff like that. Probably, it’s tied to the fact that people writing scientific software are mainly scientists who have some vague idea about programming (you definitely will find it pointless to seriously use software written by programmers that have some vague idea about science). The result is that not only the ebuilds are sometimes way overcomplicated for the task they have to take care of, but they often breach QA, and end up failing badly as soon as something changes in their dependencies.

This alone wouldn’t be the problem if not that half the sci team, and similarly half the cluster team that seems to have been supporting them, disappeared with time, and now the ebuilds are mostly unmaintained. Thankfully, we’ve got people like jlec who’re still updating the ebuilds in the overlay, but this is the catch: either you keep the stuff in the overlay entirely or you’ve got to fix it in the main tree as well. We’re going to need some hands porting stuff over the main tree from the sci overlay.

And a similar problem happens with the LISP overlay: packages that are in portage, and used by other packages as well, end up failing with time, and the solution you find around is “just use the overlay”, which is no solution at all. Again, fortunately, Ulrich is moving some (requested) ebuilds from the overlay to the main tree as an user indicates, but it’s still a sorry state, and a dependency over overlays that we’ve artificially increased and it’s showing its limitations right now, to me at least.

And finally, another problem comes up when you look at the external kernel modules, which as the kernel team expressed many times are “simply evil”. While some tend to be at least vaguely maintained (think about iscsitarget that I’ve ported to 2.6.32 myself, but which I basically just stopped maintaining — I moved to sys-block/tgt that uses the SCSI target module that is already present in the Linux kernel, this way I have no more external modules in my system and I don’t need to rebuild packages at every update, or fix the build if it breaks), a lot are not.

We’ve still got a few packages that are designed to work only on Kernel 2.4 (since we’re going to prune old glibc versions, shouldn’t we start pruning 2.4 kernel support as well? it has been so long that I doubt anybody is still interested in it even in the most conservative environments), and there are quite a few modules that only work with pretty old kernels, like 2.6.24 (current udev also does not support them). The problem with those is that often times they require specific hardware; lacking that there is no way to ensure they work. And some of the maintainers of those are going missing over time.

So these are some of the directions we should try to work on more heavily. Hopefully somebody else will also join me since I cannot really do much more than I’m doing already at this point.

Needing a run control

You might not be familiar with the term “run control” even though you use openrc; guess what the rc stands for?

This post might not be of any interest to you; it’s going to delineate some requirements for my tinderbox to be expanded and streamlined. So if you don’t care to help me or know not of development, you can skip it and wait for the next one tomorrow.

As the title leave you to guess, I’m looking for a better way to handle the execution of the tinderbox itself. Right now as I shown you, I’m just using a simple xargs call that then takes care of launching a number of emerge requests to build the list of packages one by one. Unfortunately this approach has quite a few problems, the first of which is that I have no way to check if the tinderbox is proceeding properly without connecting to its console. Which is quite taxing to do especially when I’m not at home.

I could write a shell script to handle that; but on the other hand I’d rather have something slightly more sophisticated, and more powerful. Unfortunately because the whole design of the tinderbox relies so heavily on the Portage internals, I guess the language of choice for something like this should probably be Python, so, for instance it can call the tinderbox script via function call rather than forking.

What I’d be looking for, right now, would be a daemon: have this daemon run in the background started automatically by the init system inside the container, with two ports open for remote control: one, with the command console that allows for starting and suspending the execution, or aborting the current run, and one for the logging console that shows what emerge is doing; the whole thing would look a lot like the ssh/@screen@/@xargs@ execution I’m doing right now, just, less locally-tied. Bonus points if the whole system only allows for SSL connection using known client certificates.

My reasons to wanting a separate run control than just ssh to the console is to allow for other developers eventually to access the tinderbox, for instance to prioritize one particular package over another, or if needed to change the settings (flags and similar) for a particular execution. In the case the client authentication was too much to implement, it could probably be easily solved by creating a script using nc6 to talk to the console, and using that as a shell, leaving access through SSH (with non-root accounts).

Another reason for this is to better handle the cascading failure of dependencies. If I’m going to merge package C that requires A and B, but B fails, I should be masking that for the time of the current run. This way when I’m asked to install D that also requires B, it’ll be skipped over (instead of insisting rebuilding the same package over and over). At the end of the run (which would mean at the next --sync), those masking can be dropped entirely.

This means that at the end of an emerge request, you need to find if it completed fine or not, and if not, on which package it failed.

Other features for the run control would sprout once this basis is done, so if anybody is interested in helping out with the tinderbox and wants to start with writing this type of code, it’s definitely welcome.