Testsuites are important, once again.

I start sincerely to get tired about this, but here it comes: testsuites are important to know whether a package is good or not; so the package you’re developing should have a testsuite. And the ebuild you’re writing should execute it. And you should run it!

This is something I noticed first in the tinderbox, but some people even complained about that directly to me: a lot of ebuilds make a mess with testsuites. The problems with them range from not running them at all and restricting without really good reasons, to testsuites that are blatantly broken because they were never tested before.

I guess the first problem here is the fact that while the test feature that executes the actual testuites is disabled by default, Portage provides a default src_test. Why is this a problem, you say? After all, it really does add some value when the testsuite is present, even if the maintainer in the ebuild didn’t spend some extra minutes writing it down. Unfortunately, while it adds test phases to lots of ebuilds where they are correctly executed, it also adds them to packages that don’t have testsuites at all (but, if they use automake, it’ll still run a long make chain, recursive if the build system wasn’t built properly!), to packages that have different meanings for the check or test targets (like all the qmail-related packages, for which make check checks the installation paths and not the just-built software), and to packages whose testsuite is not only going to fail, but also to hog a computer down for a pretty long time (did somebody say qemu?).

Now, the problems with tests does not stop here with the default src_test, otherwise it would also be pretty easy to fix; the problem is that we don’t really have a clear policy on how to deal with the testsuites, especially those that fails. And I have to say that I’m as bad as the rest of the group when it comes to deal with the testsuites. I can, first thing, bring up two packages I deal with that have problems with their testsuites.

  • PulseAudio, which is a pretty important package, you’d say, has a complex testsuite; for quite a long time in the test releases (that in Gentoo become RCs even though they really are not, but that’s another issue here) one of the tests (mix_test) failed, because the test itself wasn’t being updated to support the new sample format, this was only fixed recently (there were other tests failure, but those I fixed myself at the first chance); on the other hand, the tests for the translations, that are also part of the package’s testsuite, are still not executed: the current version of intltool (0.40) does not interpret correctly the configure.ac file (it parses it like it was a text file, rather than accepting that it’s a macro file), and causes the test to fail in a bad way; the solution for this part is to package and add a dependency over intltool 0.41, but seems like nobody is sure whether that’s an official release or a development release. For now, only the software tests are executed;

  • speaking of docbook the XSL stylesheet for Docbook used to have a complex testsuite that checked that the output was what it was supposed to be; now they weren’t really comprehensive and indeed at least one bug was missed by the testsuite in the whole 1.74 series. Starting from 1.75 the new testsuite should probably be tighter and support more than just one XSLT engine… the problem is that upstream doesn’t seem to have described the testing procedure anywhere, and I haven’t figured out how it works yet, with the result that the testsuite is now restricted in the ebuilds (with a test USE flag that is not masked, I already filed an enhancement request for Portage to handle this case).

At this point what I’m brought to wonder is: how harsh should we be on the packages with flawed, broken, or incomplete testsuites? Should they be kept in package.mask? Should they not reach stable? The stable-stopper for this kind of problems used to be Ferris, and it’s one reason I’m really going to miss him badly. On the other hand it seems like a few other arch team members started applying the same strictness, which I don’t dislike at all (although it’s now keeping libtool-2.2 from going stable, and with that PulseAudio as well). But what about the packages that already fail in stable? What about the packages failing because of mistakes in the testsuites?

There are also arch-specific issues, for instance I remember some time ago Linux-PAM requiring a newer glibc than it was available on some arches for its testsuite to proceed correctly… the running logic of PAM, though, seemed to work fine beside the test. What should have been the correct approach? Make the whole of Linux-PAM depend on the new glibc, making it unusable by some arches, or just the tests? I decided for the tests, because the new version was really needed, but on a pure policy point of view I’m not sure if it was the right step.

I guess the only thing I can add here is, once again, if you need to restrict or skip tests, keep the bug open, so that people will know that the problem has only been worked around and not properly fixed. And maintainers, always remember to run the testsuites of your packages when bumping, patching or otherwise changing your packages. Please!

Complex software testing

Yamato is currently ready to start a new tinderbox run, with tests enabled (and test-fail-continue feature so that it does not stop the whole merge when tests do fail); I still have to launch it and I’m still not sure if I should: beside the quite long tests for GCC, which also fail, the glibc tests not only fail but also don’t seem to fail reliably, stopping the ebuild from continuing. I wonder if this is a common trait of tests.

The main issue here is that without tests it makes it very difficult to identify whether the software is behaving as it should or not; as I said, not using gems helped me before and I had plans to test an otherwise not-testable software (although it failed in misery). And because of lack of testing in packages such as dev-ruby/ruby-fcgi, so-called “compatibility patches” get added that don’t really work as they are supposed to.

By having a testsuite you can easily track down issues with concurrency, arch-specific code and other similar classes of problems. Unfortunately, with software that gets complex pretty quickly, and the need for performance overcoming the idea of splitting code in functional units, testing can get pretty ugly.

I currently have two main projects that are in dire need for testing, both failing badly right now; the first is my well-known ruby-elf that, while already having an extensive (maybe too extensive) unit testing suite, lacks some kind of integration testing for the various tools (cowstats, rbelf-size and so on) that can ensure that the results they report are the one that are expected of them. The other project is probably one of the most complex projects I ever worked on: feng .

Testing feng is a very interesting task, since you cannot stop at testing the functional units in it (which, by the way, does not exist: all the code depends one way or another on another piece of it!), you’ve got to test at the protocol level. Now, RTSP is derived out of HTTP, so one could expect that using the methods employed to test HTTP would be good enough.. not the case though: while testing an HTTP server or a web application can be tricky, it’s at least an order of magnitude easier than testing a streaming server. I can write basic tests that ensure the correct behaviour of the server to a series of RTSP requests, but it’d also have to check the RTP data being sent to be correct, and that RTCP is sent and received correctly.

As I said, it’s also pretty difficult to test the software with unit testing, because the various subsystems are not entirely isolated one from the other, so testing the various pieces requires to either fake the presence of the other subsystems, or heavily splitting the code. They rely not only on functions but also on data structure, and the presence of certain, coherent data inside these. Splitting the code is though not always an option because it might get ugly to have good performances out of it, or might require very ugly interfaces to pass the data around.

Between one thing and the other that I have seen lately, I’m really hoping one day to work on some project where extensive testing is a hard requirement, rather than something that I’m doing myself, alone, and is not essential to deliver the code. Sigh.

Some design notes for unit testing

After my post about unit testing I decided to give a try to check. Although I don’t like the syntax tremendously, it works quite decently and it’s used by many projects so it should be safe enough to use and debug.

There are a few issues with the software I’m writing testcases for, that lies in the design of the software itself. After these findings, I guess it’ll be a good idea to write some suggestions, so that others might actually find it useful to read my blog, from time to time.

The first problem is that even the internal functions seems to work with high-level structures. While it’s nice to have functions act directly on high-level structures, it’s a bit of a problem to test them if you need to fill in a properly-configured structure full of data, just to make sure the parser is parsing the buffer correctly.

For this reason, my suggestion here is to break apart functions, one function has the interface that the users will use, with the high-level data structures and all the nice parts, and one acts directly on the lowest-level possible, so that testing becomes then very possible.

For instance, if you have a function that is given a structure, which contains an array of characters as a buffer, and a pointer to character with a value to extract from the buffer, it’s easier for testing if the function were just an interface to a lower-level one that acts on two characters arrays rather than on the high-level structure directly.

Another problem here is interfacing with visibility-enabled interfaces. The idea of unit testing is not, obviously, just for user-facing interfaces, but also, even more importantly, for internal interfaces. This is why even software that results in a final executable, rather than a library, should make heavy use of unit testing. Unfortunately, if you do use visibility for your software, linking the shared library will disallow the tests from properly executing.

To solve this last part, the easiest way is to make sure you’re linking against the static version of libraries, even commodity ones. Since visibility is applied at link stage, the static version of a library will not apply it to the functions before linking to the test, allowing access to the internal interfaces.

A software properly designed for unit testing is most likely to be solid especially when porting it to different systems, or when a new version of its dependencies is released and the behaviour changed, deliberately or not.

OpenSolaris Granularity

If you follow my blog since a long time ago you know I had to fight already a couple of time with Solaris and VirtualBox to get a working Solaris virtual machine to test xine-lib and other sotware on.

I tried again yesterday to get one working, since Innotek was bought by Sun, VirtualBox support for Solaris improved notably, to the point they now have a different network card emulated by default, that works with Solaris (that has been the long-standing problem).

So I was able to install OpenSolaris, and thanks to Project Indiana I was able to check which packages were installed, to remove stuff I don’t need and add what I needed. Unfortunately I think the default granularity is a bit concerning. Compiz on a virtual machine?

The first thing I noticed is that an update of a newly-installed system with the last released media requires to download almost the size of the whole disk in updates, the disk is a simple 650MB CD image, and the updates were over 500MB. I suppose this is to be expected, but at that point, why not pointing to some updated media by default, considering updating is far from being trivial? Somehow I was unable to perform the update properly with the GUI package manager, and I had to use the command-line tools.

Also, removing superfluous packages is not an easy task, since the dependency tracking is not exactly the best out there: it’s not strange for a set of packages not to be removed because some of them are dependencies… of one of them being removed (this usually seems to be due to plugin-ins; even after removing the plugins, it’d still cache the broken dependency and disallow me from removing the packages).

It’s not all here of course, for instance to find the basic development tools in their package manager is a problem of its own; while if you look for “automake” it will find a package named SUNWgnu-automake, if you look for “autoconf” it will find nothing; the package is called SUNWaconf. I still haven’t been able to find pkg-config, although the system installs .pc files just fine.

I guess my best bet would be to remove almost everything out the system from their own package manager and decide to try prefixed-Portage, but I just haven’t had the will to look into that just yet. I hope it would also help with the version of GCC that Sun provides (3.4.3).

I got interested back into Solaris since, after a merge of Firefox 3.0.2, I noticed cowstats throwing up an error on an object file, and following to that, I found out a couple of things:

  • cowstats didn’t manage unknown sections very well;
  • Firefox ships with some testcases for the Google-developed crash handler;
  • one of these testcases is an ELF ET_EXEC file (with .o extension) built for Solaris, that reports a standard SysV ABI (rather than a Sun-specific one), but still contains Sun-specific sections;
  • readelf from Binutils is not that solid as its homologue from Elfutils.

Now cowstats should handle these corner-cases pretty well, but I want to enrich my testcases with some Solaris objects. Interestingly enough, in ruby-elf probably 80% of the size of an eventual tarball would be taken up by test data rather than actual code. I guess this is a side-effect of TDD, but also exactly why TDD-based code is usually more solid (every time I find an error of mine in ruby-elf, I tend to write a test for it).

Anyway, bottom line: I think Project Indiana would have been better by adapting RPM to their needs rather than inventing the package manager they invented, since it doesn’t seem to have any feature lacking in Fedora, but it lacks quite a bit of other things.

Unit testing frameworks

For a series of reason, I’m interested in writing unit tests for a few project, one of which will probably be my 1.3 branch of xine-lib (Yes I know 1.2 hasn’t been released yet in beta form either).

While unit tests can be written quite easily without much framework, having at least a basic framework would help to make automated testing possible. Since the project I have to deal with use standard C, I started looking at some basic unit test frameworks.

Unfortunately, it seems like both CUnit and check have last seen a release in 2006, and their respective repositories seem quite calm. In case of CUnit I also have noticed a quite broken buildsystem.

Glib seems to have some basic support for test units, but even they don’t use it so I doubt it’d be a nice choice. There are quite a few unit testing frameworks for particular environments, like Qt or Gnome, but I haven’t found anything generic.

It seems funky that even if people always seems to cheer test-driven development there isn’t a good enough framework for C. I think Ruby, Java, Perl and Python have already their well established frameworks, and most of the software use that, but there is neither a standard nor a widely accepted framework for C.

I could probably write my own framework but that’s not really an option, I don’t have so much free time to my hands, I suppose the less effort would be to contribute to one of the frameworks already out there, so that I can fix whatever I need fixed and have it working as I need. Unfortunately, I’d have to start looking at all of them and find the less problematic before I start doing this, and it is not useful if the original authors have gone MIA or similar, especially since at least CUnit was still developed using CVS.

If somebody have suggestions on how to proceed, they are most welcome. I could probably fork one of them if I have to, although I dislike the idea. Out of what I gathered quite briefly, the presence of XML generation for results in CUnit might be useful to gather up tests’ statistics for an automatic testing facility.

Testcases are the way to go

So, as I suspended my work on a Valgrind frontend until I can decie if I should be hacking at helgrind to produce an XML output, or just focus for now on writing a memcheck frontend, akin to Valkyrie, I decided to resume working on something I started working on quite some months ago: ruby-elf.

For those of you who don’t read my blog since it started, ruby-elf is an ELF file parser written in pure Ruby (available at https://www.flameeyes.eu/p/ruby-elf). I started writing it to have a script capable of identifying colliding symbols between different shared objects on the system.

Together with ruby-elf, I also implemented a very simple nm command, and a readelf -d script. They are very basic commands and don’t follow 1:1 the behaviour of the equivalent tools from binutils, but they were a nice testcase while working on ruby-elf in the past.

What instead was really missing in ruby-elf was a real testsuite. So I decided to go with that, considering that writing testcases for my Valgrind frontend made me see how the code was shaping up.

I decided that the proper way to test ruby-elf was to actually provide a set of ELF files to parse, and I started with Linux/amd64 and Linux/x86 files as those were the files I could compile without having to install a crosscompiler.

The first test were trivial and passed fairly easily, but when I added a more complex test, that looked for a specific symbol that had to be missing in the file, I got a very nasty failure, an OutOfBound exception for the Elf’s Symbol Type value on the x86 executable. After looking at the code for a while, I thought it was correct, so I asked solar if he knew why I would find an impossible type on the symbol, but that made no sense at all.

After checking the offsets of the read value, I came to see that there was a 64-bit read for the address, rather than a 32-bit read. Further debugging shown me that using alias to create the specific read functions (for addresses) on the Elf file depending on the class didn’t work quite as well as I hoped.

The thing goes this way: I open a 32-bit file, the class is Elf32, so the alias should create read_address as read_u32; then I open a 64-bit file, the class is Elf64, so the alias should create read_address as read_u64. Then I load the 32-bit file’s symbols, and read_address is called. I expected alias to create the alias on the instance, as I ran it from an instance function, not outside scope, but instead it’s created on the class, which means that at that point, read_address is still aliased to read_u64, reading a 64-bit address rather than a 32-bit one.

Now, either this is a bug in Ruby, or I misunderstood the alias command… and I have to say that if it’s not a bug, alias is non-intuitive compared with a lot other code in Ruby which does just what you might think it does.

Anyway, thanks to the fact I started writing testcases, I was able to identify the problem. Tomorrow I’ll add some more executables, of different machine types, and of different OSes (FreeBSD to begin with), so that the testcases can check as much code as possible from ruby-elf.

Too bad writing testcases for libxine is almost impossible.