Testing beforehand

Things need to be tested, by developers, before they get ready for public consumption. This is something that is pretty well known in the Free Software development work, but that does not seem to get to all the users, especially the novices that come from the world of proprietary software, in particular Windows software.

This is probably because in that world, stable software does not get really much changes by default, and as a result most users tend to use experimental software, in beta state often times, to work daily. Now, this is probably due to how the software is labelled by those companies too: Apple’s “beta” Safari 4 is mostly stable, compared to the older version, but I guess it’s far from complete behind the scenes; on the other hand a development version of a piece of Free Software may very well be unusable from the crashes, since it gets available much sooner.

Similarly, tricks that increase performance in software ecosystems like Windows’s are pretty common, because there is no other way to get better performances (and Microsoft is pretty bad at that, I guess I’ll write about that one day too). At the same time what we consider tricky in Free Software world may very well be totally and utterly broken.

Indeed, since I joined Gentoo there has been quite a few different tweaks and tricks that are supposed to either improve your runtime performance tenfold, or make you compile stuff in a fifth of the time. Some of these came out just stupid copy and paste while other are outright disinformation which I tried to debunk before. On the other hand, I’m the handler of one of the most successful tricks (at least I hope it is so!).

My problem is that for some users, the important tricks are the ones that the developers don’t speak about. I don’t know why this is, maybe they think that the developers are up to screw them up. Maybe they think the distribution developers like me are just part of a conspiracy that wants to waste their CPU power or something. Or maybe they think we want to be cool being able to compile stuff much sooner than they do. Who knows? But the point is that none of this is the case, of course.

What I think is that this kind of tricks should really be tested by developers first so that they don’t end up biting people in their bottoms. One of these tricks that lately seems to be pretty popular is in-memory builds with tmpfs. Now, this is something I really should look into doing myself, too. With 16GB of memory, with the exception of OpenOffice, this can be quite an useful addition to my tinderbox (if I can save and restore the state quickly, that is).

I do have a problem with the users telling people to use this right now as it is. The first problem is that, given that ccache and distcc usage are handled by Portage, this probably should be, too. The second problem is related to what the suggestions lack: the identification of the problems. Because, mind you, it’s not just building in memory what you’re doing, it’s also building with tmpfs!

By itself, tmpfs does not have any particular bugs that might hinder builds, but it has one interesting feature: sub-second timestamps. These are available also on XFS, so to say that Gentoo does not support building on tmpfs (because it increases build failure rate) is far from being the truth, as we do support XFS builds pretty well. Unfortunately neither users nor, I have to say, developers, know about this detail. Indeed you can see it here:

flame@yamato ~ % touch foobar /tmp/foobar
flame@yamato ~ % stat -c '%y' foobar /tmp/foobar 
2009-06-02 04:04:14.000000000 +0200
2009-06-02 04:04:21.197995915 +0200

How this relates to the builds is easy to understand if you know how make works: by tracking mtime of dependencies and targets. If they don’t follow in the right sequence the build may break or enter infinite loops (like in the case of smuxi some time ago), and indeed this is much easier when the resolution of mtime is higher than a second: if the timestamp stops as a second, any command taking less than that will not be considered as an extra overhead.

I have written already a few posts about fixing make in my “For a Parallel World” series; most of them are useful to fix this kind of issues too, so you might want to refer to those.

Finally, I want to say that there are other things that you should probably know when thinking about using tmpfs to build straight in memory. One of these is that, by default, gcc is going to build in memory by itself, somehow. Indeed the -pipe compiler flag that almost everybody has in their CFLAGS variable tells the compiler just that: to keep in memory the temporary data and execute, for instance, the assembler directly there. While the kind of temporary data that is kept in the build directory and that kept in memory by -pipe are not the same thing, if you’re limited on memory you could probably just try to disable -pipe and leave the compiler to use in-memory files.

But sincerely, I think there would be a much greater gain if people started to help out at fixing parallel make issues; compiling with just one core can get pretty tiresome even on a warbox like Yamato, and this is the case of Ardour for instance because scons is not currently called with a job option to build in parallel. Unfortunately last time I tried to declare a proper variable for the number of parallel jobs, so that it didn’t have to be hackishly extracted from MAKEOPTS, the issue ended up stopped in gentoo-dev by bikeshed arguments on the name of the variable.

On the other hand this “trick” (if you want to call it this way) could be a nice way to start, given that lots of parallel make issues also appear with tmpfs/xfs (the timestamps might go backward); I think I remember ext4 having an option to enable sub-second timestamps, maybe developers should start by setting up their devbox with that enabled, or with xfs, so that the issues can be found even by those who don’t have enough memory to afford in-memory builds.

Further readings: Mike’s testing with in-memory builds for FATE .