Converting Unpaper to Meson

You may remember that I took over Unpaper a number of years ago, as the original author was not interested in maintaining it in the long term. And one of the first things I did was replacing the hand-rolled build system with Autotools, as that made packaging it significantly simpler. Followed by replacing the image decoding with libav first, and more recently FFmpeg.

For various reasons, I have not spent much time on Unpaper over the years I spent in my last bubble. When I joined my current employer I decided that I cared more to get Unpaper back into a project people can contribute to, and less about maintaining the copyright of my contributions, which makes it easier for me to work on it without having to carve out some special time for it.

One of the main requests over on the GitHub project, over these years, has been Windows support. And I did say that Windows 10 is becoming increasingly interesting for Free Software development. So when I was asked to rant on a bit about build systems, which I did over on Twitch, I decided to take a stab at figuring out if Meson (which supports Windows natively), would be a good fit. And I did that using Visual Studio Code and Debian/WSL!

If you haven’t seen the video yet, spoiler alert: I tried, got it working within ten minutes, and it worked like a charm. I’m not kidding you, it pretty much worked at the first attempt (well, the first session, not the first execution), and it made total sense the way it works. You can tell that the folks involved in building Meson (including Gentoo’s own Nirbheek!) knew what they were embarking to do and how to have it fit together. Even small bits like keeping large files support always enabled made me a very happy user.

I have now a branch on GitHub for the Meson build, although it’s incomplete. It doesn’t install any of the docs, and it doesn’t install the man page. Also it doesn’t build nor run the tests. For all of those I created a project to track what needs to be done: move on from the current implementation, it’s 2020!

The test system in the current Autotools version of Unpaper is leveraging the implementation of make check from Automake, together with a C program that compares the expected outputs with what is generated by the newly-built unpaper binary. It also needs to consider a threshold of difference between the two, because precision is not guaranteed, in either direction. This is incredibly fragile and indeed is currently failing for… not sure which reason. Getting a “diff” of the generated versus expected in C is fairly hard and deserves its own project. Instead, relying on Python for instrumenting and running the tests would make it much easier to maintain, as you wouldn’t need to maintain at least three integration points to keep this together.

Something similar is a problem for documentation: right now the documentation is split between some Markdown files, and a single DocBook (XML) file that is converted to a man page with xsltproc. Once the bullet of Python is bit, there’s no reason not to just use ReStructuredText and Sphinx, which already provides integration to generate man pages — and honestly nowadays I feel like the XML version is definitely not a good source because it can’t be read by itself.

What I have not done yet is making sure that the Meson build system allows a Windows build of Unpaper. The reason is relatively simple: while I have started using Visual Studio Code, and clang is available for Windows in the form of a binary, and so is FFmpeg, fitting it all together will probably take me more time, as I’m not used to this development environment. If someone is interested in making sure this works out of the box, I’m definitely happy to review pull requests.

So yeah, I guess the Autotools Mythbuster can provide a seal(ion) of approval to Meson. Great work, folks! Happy to see we have a modern build system available that compromises in the right places instead of being too dogmatic!

DVD access libraries and their status

I’ve noted when I posted suggestions for GSoC that we’ve been working on improving the DVD-related libraries that are currently used by most of the open-source DVD players out there: libdvdread and libdvdnav — together with these, me and the Other Diego have been dusting off libdvdcss as well, which takes care of, well, cracking the CSS protection on DVDs so that you can watch your legally owned DVDs on Linux and other operating systems without going crazy.


Yes I did take the picture just to remind you all that I do pay for content so if you find me taking about libdvdcss is not because I’m a piracy apologist because I’m cheap — whenever I do resolve to piracy it’s because it’s nigh impossible to get the content legally, like for what concerns J-Drama.

Anyway, the work we’ve been pouring into these libraries will hopefully soon come into fruition; on my part it’s mostly a build system cleanup task: while the first fork, on mplayer, was trying to replace autotools with a generic, FFmpeg-inspired build system, the results have been abysmal enough that we decided to get back to autotools (I mean, with me on board, are you surprised?) so now they have a modern, non-recursive, autotools based build system. Diego and J-B have been cleaning up the code itself from the conditionals for Windows, and and Rafaël has now started cleaning up libdvdnav’s code by itself.

One of the interesting part of all this is that the symbol table exposed by the libraries does not really match what is exposed by the headers themselves. You can easily find this by using exuberant-ctags – part of dev-util/ctags – to produce the list of exported symbols from a set of header files:

% exuberant-ctags --c-kinds=px -f - /usr/include/dvdread/*.h
DVDClose        /usr/include/dvdread/dvd_reader.h       /^void DVDClose( dvd_reader_t * );$/;"  p
DVDCloseFile    /usr/include/dvdread/dvd_reader.h       /^void DVDCloseFile( dvd_file_t * );$/;"        p
DVDDiscID       /usr/include/dvdread/dvd_reader.h       /^int DVDDiscID( dvd_reader_t *, unsigned char * );$/;" p
DVDFileSeek     /usr/include/dvdread/dvd_reader.h       /^int32_t DVDFileSeek( dvd_file_t *, int32_t );$/;"     p
DVDFileSeekForce        /usr/include/dvdread/dvd_reader.h       /^int DVDFileSeekForce( dvd_file_t *, int offset, int force_size);$/;"  p
DVDFileSize     /usr/include/dvdread/dvd_reader.h       /^ssize_t DVDFileSize( dvd_file_t * );$/;"      p
DVDFileStat     /usr/include/dvdread/dvd_reader.h       /^int DVDFileStat(dvd_reader_t *, int, dvd_read_domain_t, dvd_stat_t *);$/;"    p
DVDISOVolumeInfo        /usr/include/dvdread/dvd_reader.h       /^int DVDISOVolumeInfo( dvd_reader_t *, char *, unsigned int,$/;"       p
DVDOpen /usr/include/dvdread/dvd_reader.h       /^dvd_reader_t *DVDOpen( const char * );$/;"    p
DVDOpenFile     /usr/include/dvdread/dvd_reader.h       /^dvd_file_t *DVDOpenFile( dvd_reader_t *, int, dvd_read_domain_t );$/;"        p
DVDReadBlocks   /usr/include/dvdread/dvd_reader.h       /^ssize_t DVDReadBlocks( dvd_file_t *, int, size_t, unsigned char * );$/;"      p
DVDReadBytes    /usr/include/dvdread/dvd_reader.h       /^ssize_t DVDReadBytes( dvd_file_t *, void *, size_t );$/;"     p
DVDUDFCacheLevel        /usr/include/dvdread/dvd_reader.h       /^int DVDUDFCacheLevel( dvd_reader_t *, int );$/;"      p
DVDUDFVolumeInfo        /usr/include/dvdread/dvd_reader.h       /^int DVDUDFVolumeInfo( dvd_reader_t *, char *, unsigned int,$/;"       p
FreeUDFCache    /usr/include/dvdread/dvd_udf.h  /^void FreeUDFCache(void *cache);$/;"   p

You can then compare this list with the content of the library by using nm:

% nm -D --defined-only /usr/lib/
0000000000004480 T DVDClose
0000000000004920 T DVDCloseFile
0000000000005180 T DVDDiscID
0000000000004e60 T DVDFileSeek
0000000000004ec0 T DVDFileSeekForce
0000000000005120 T DVDFileSize
00000000000049c0 T DVDFileStat
000000000021e878 B dvdinput_close
000000000021e888 B dvdinput_error
000000000021e898 B dvdinput_open
000000000021e890 B dvdinput_read
000000000021e880 B dvdinput_seek
000000000000f960 T dvdinput_setup
000000000021e870 B dvdinput_title
0000000000005340 T DVDISOVolumeInfo
0000000000003fa0 T DVDOpen
0000000000004520 T DVDOpenFile
0000000000004d80 T DVDReadBlocks
0000000000004fa0 T DVDReadBytes

But without going into further details I can tell you that there are two functions that should be exported that are not, and the dvdinput_ series that shouldn’t have been exposed are. So there are a few things to fix there for sure.

As I said before, my personal preference would be to merge libdvdread and libdvdnav again (they were split a long time ago as some people didn’t need/want the menu support) — if it wasn’t for obvious legal issues I would merge libdvdcss as well, but that’s a different story. I just need to find the motivation to go look in the reverse dependencies of these two libraries, and see if the interface exposed between the two is ever used, it might be possible to reduce their surface as well.

Yes this would be a relatively big change for relatively small gain, on the other hand, it might be worth to get this as a new side-by-side installable library that can be used preferentially, falling back to the old ones if not present. And given the staleness of the code, I wouldn’t really mind having to go through testing from scratch at this point.

Anyway, at least the build system of the three libraries will soon look similar enough that they seem to be part of the same project, instead of each going its own way — among other things the ebuilds for the three should look almost entirely identical, in my opinion, so that should be a good start.

If you want to contribute, given that the only mailing list we have on videolan is for libdvdcss, you can push your branches to Gitorious thanks to the VideoLAN mirror and from there just contact somebody in #videolan on Freenode to get it reviewed/merged.

Update (2017-04-22): as you may know, Gitorious was acquired by GitLab in 2015 and turned down the service. So no more VideoLAN mirror or anything actually.

For A Parallel World: Parallel building is not passé

It’s been a while since I last wrote about parallel building. This has only to do with the fact that the tinderbox hasn’t been running for a long time (I’m almost set up with the new one!), and not with the many people who complained to me that spending time in getting parallel build systems to work is a waste of time.

This argument has been helped by the presence of a --jobs option to Portage, with them insisting that the future will have Portage building packages in parallel, so that the whole process will take less time, rather than shortening the single build time. I said before that I didn’t feel like it was going to help much, and now I definitely have some first hand experience to tell you that it doesn’t help at all.

The new tinderbox is a 32-way system; it has two 16-core CPUs, and enough RAM for each of them; you can easily build with 64 process at once, but I’m actually trying to push it further by using the unbound -j option (this is not proper, I know, but still). While this works nicely, we still have too many packages that force serial-building due to broken build systems; and a few that break in these conditions that would very rarely break on systems with just four or eight cores, such as lynx .

I then tried, during the first two rebuilds of world (one to set my choices in USE flags and packages, the other to build it hardened), running with five jobs in parallel… between the issue of the huge system set (yes that’s 4.24 years old article), and the fact that it’s much more likely to have many packages depending on one, rather than one depending on many, this still does not saturate the CPUs, if you’re still building serially.

Honestly seeing such a monstrous system take as much as my laptop, which is 14 in cores and 14 in RAM, to build the basic system was a bit… appalling.

The huge trouble seem to be for packages that don’t use make, but that could, under certain circumstances, be able to perform parallel building. The main problem with that is that we still don’t have a variable that tells us exactly how many build jobs we have to start, instead relying on the MAKEOPTS variable. Some ebuilds actually try to parse it to extract the number of jobs, but that would fail with configurations such as mine. I guess I should propose that addition for the next EAPI version… then we might actually be able to make use of it in the Ruby eclasses to run tests in parallel, which would make testing so much faster.

Speaking about parallel testing, the next automake major release (1.13 — 1.12 was released but it’s not in tree yet, as far as I can tell) will execute tests in parallel by default; this was optional starting 1.11 and now it’s going to be the default (you can still opt-out of course). That’s going to be very nice, but we’ll also have to change our src_test defaults, which still uses emake -j1 which forces serialisation.

Speaking about which, even if your package does not support parallel testing, you should use parallel make, at least with automake, to call make check; the reason is that the check target should also build the tests’ utilities and units, and the build can be sped up a lot by building them in parallel, especially for test frameworks that rely on a number of small units instead of one big executable.

Thankfully, for the day there are two more packages fixed to build in parallel: Lynx (which goes down from 110 to 46 seconds to build!) and Avahi (which I fixed so that it will install in parallel fine).

Cross-compilation and pkg-config

As it happens – and as I noted yesterday – one of my current gigs involves cross-compilation of Gentoo packages; this should also be obvious for those monitoring my commits. This sometimes involves simply fixing the ebuilds; and other times it needs to move up the stream and fix the buildsystem of the original project. Lately, I hit a number of the latter cases.

In the previous post, I noted that I fixed upower’s build system to not choose the backend based on the presence of files in the build’s machine. Somehow, checking for presence of files in the system to choose whether to enable stuff, or to install stuff, seems to be the thing to do for build system; a similar situation happened with SystemTap, that tries to identify the prefix of NSS and Avahi headers by checking for the /usr/include/nss and /usr/include/avahi-common directories, among others. Or, I should have said, tried, since luckily my two patches to solve those issues have also been merged today.

Another package I had to fix for cross-compilation recently was ecryptfs-utils, and again NSS is involved. This time rather than testing the presence of files, the build system did almost the right thing, and used the nss-config script that ships with the package to identify which flags and libraries to use. While it is a step in the right direction, compared to SystemTap’s, it is not yet the real right thing to do.

What is the problem? When you’re cross-compiling, simply calling nss-config will not do: the build machine’s script would be used, which will report the library paths of that system and not your target, which is what you really want. How do you solve this trouble then? Simple, you use the freedesktop-developed pkg-config utility, that can be easily rigged to handle cross-compilation nicely, even though it is not as immediate as you’d like, and if you ever tried using it in a cross-compilation environment, you probably remember the problems.

It is for this reason that I started writing cross-compile documentation for pkg-config in my Autotools Mythbuster quasi-book. Speaking about which, I’d like to find some time to spend focusing on writing more sections of it. To do so, though, my only chance is likely going to be if I get to take some vacation, and book a hotel room while I work on it: the former part is needed as right now my gigs tend to occupy my whole day leaving me with scarce to no free time, and the latter because at home I have too many things to take care of to actually focus on writing when I’m not working.

At any rate, please people, make sure you support pkg-config rather than rely on custom scripts that are not really cross-compilable.. it is the best thing you can do, trust me on this please.

Buildsystem quirks: now you know why you don’t rely on uname

It is difficult to be angry at Linus for anything, on the count of how much good he did, and is doing to us with the whole idea of having Linux – it’s not like waiting for a GNU kernel would have helped – but honestly I feel quite bothered that he ended up making the decision of bumping the kernel’s version number like it was just a matter of public relations, without the need to consider the technical side of it, of software relying on the version numbers being, you know, meaningful. Which, honestly, reminded me of something but let’s not get that in front of us.

Let’s ignore the fact that kernel modules started to fail — they would probably be failing because of API changes without the need of anything as sophisticated as a version bump to 3. And let me be clear on one thing at least: it’s not the build failures that upset me – as I said last year I prefer it when packages fail at build time, rather than, subtly, at runtime.

At any rate, let’s begin with the first reason why you should not rely on uname results: cross-compilation. Like it or not, cross-compilation is still a major feature of a good build system. Actually, my main sources of revenue in the past five years have involved at least some kind of cross-compilation, and not just for embedded systems, which is why I stress out so often the importance of having a good cross-compiled build system.

So what happens with build systems and Linux 3? Well, let’s just say that if you try to distinguish between Linux 2.4 and 2.6, you should not check for major == 2 && minor >= 4 or something along those lines. A variation of this is what happens with ISC DHCP as it didn’t consider any major version beside 2.

Now in the case of DHCP, it’s a simple failure, because the build system is refusing to build at all when not understanding the uname results, but there are a number of other situations where this is not as clear, because the test is to enable features, or backends, or special support for Linux version 2.6, and hitting an unknown version leads to generic (or even worse, dummy!) code to be built. These situations are almost impossible to identify without actually using the software itself; even the tinderbox testing is useless most of the time with these situations, as the tests are probably also throttled down not to hit the Linux-specific codepaths.

And don’t worry, there are enough build systems that are designed so bad that this is not a random, virtual risk. You could take the build system for upower (and devicekit-power) before today: it decided which of its few backends to enable by checking for the presence of some header files on the system used for the build – which by itself is hindering cross-compilation – and if it had found no combination of files, it built in the dummy backend. For the curious, I’ve sent today a patch – that Richard Hughes applied right away, thanks Richard! – for upower to choose which backend to build based on the $host value that is handed over by autoconf, which finally makes it cross-compilable without passing extra parameters to ./configure (even though the override is still available of course).

How long will it take for all the bugs to be sorted out? I’m afraid the answer is impossible for me to give you. We might end up finding more bugs an year from now the same way we might not find any in the next six months.. unfortunately not all projects update at the same pace. I have an example of that in front of my eyes: my laptop’s HSDPA modem, that includes a GPS module, is well supported by the vanilla kernel, for what concerns the network connection… but at the same time, the GPS support is still lacking. While there is a project to support these cards the userland GPS driver still relies on HAL, rather than the new udev, which makes it quite useless for what I’m concerned.

So anyway, next time you write a build system, do not even consider uname … and if your build system rely on that, please fix it — and if you write an ebuild that relies on aleatory data such as uname results or presence of given files (which differs from given headers, be warned!), make sure that there is an override and .. use it.

Forking unpaper, call for action

You might or might not know the tool by the name of unpaper that has been in Gentoo’s main tree for a while. If you don’t know it and you scan a lot, please go look at it now, it is sweet.

But sweet or not, the tool itself had quite a few shortcomings; one of these was recently brought to my attention as unsafe use of sprintf() that was fixed by upstream after 0.3 release, but which never arrived to a release.

When looking at fixing that one issue, I ended up deciding for a slightly more drastic approach: I forked the code, imported it to GitHub and started hacking at it. This both because the package lacked a build system, and because the tarball provided didn’t correspond with the sources on CVS (nor with those on SVN for what it’s worth).

For those who wonder why I got involved in this while this is obviously outside my usual interest area, I’m using unpaper almost daily on my paperless quest that is actually paying its fruits (my accountant is pleasantly surprised by how little time it takes to me to find the paperwork he needs). And if I can shave even fractions of seconds from a single unpaper process it can improve my workflow considerably.

What I have now in my repository is an almost identical version that has passed through some improvements: the build system is autotools (properly written), that works quite fine even for a single-source package, as it can find a couple of features that would otherwise be ignored. The code does not have the allocation waste that it did before, as I’ve removed a number of pointers to characters with preprocessor macros, and I started looking at a few strange things in the code.

For instance, it now no longer opens the file, seek to the end, then rewind to the start to find the file’s size, which was especially unhelpful since the variable where the file’s size was saved was never read from but the stdio calls have side effects, so the compiler couldn’t drop them by itself.

And when it is present, it will use sincosf() rather than calling sin() and cos() separately.

I also stopped the code from copying a string from a prebuilt table, and parse it at runtime to get the represented float value.. multiple times. This was mostly tied with the page size parsing, which I have basically rewritten, also avoiding looping twice over the two sizes with two separate loops. Duh!

I also originally overlooked the fact that the repository had some pre-defined self-tests that were never packaged and thus couldn’t be used for custom builds before; this is also fixed now, and make check runs the tests just fine. Unfortunately what this does not do is comparing the output with some known-good output, I need an image compare tool to do so; for now it only ensures that unpaper behaves as expected with the commandline it is provided, better than nothing.

At any rate this is obviously only the beginning: there are bugs open on the berlios project page that I should probably look into fixing, and I have already started writing a list of TODO tasks that should be taken care of at some point or another. If you’re interested in helping out, please clone the repository and see what you can do. Testing is also very much appreciated.

I haven’t decided when to do a release, for now I’m hoping that Jens will join the fork and publish the releases on berlios based on the autotools build system. There’s a live ebuild in main tree for testing (app-text/unpaper-9999), so I’d be grateful if you could try it on a few different systems. Please enable FEATURES=test for it so that if something breaks we’ll know son enough. If you’re a maintainer packaging unpaper on other distributions, feel free to get in touch with me and tell me if you’ve other patches to provide (I should mail the Debian maintainer at this point I guess).

Gold readiness obstacle #4: libtool (part 1)

In my current series of posts about gold this time I’m presenting you with a two-parter that shows how GNU libtool is causing further problem with this new link editor. The reason why I split this into two part is because it hits two different issues with it: one is a “minor inconvenience” due to its design, and the other is a known bug due to it.

You might remember me distinguishing into two schools of build configuration systems one relying on tests being compiled and executed, and the other relying on knowing intimate details about the working of the tools to be used with it. Autotools for the vast part are designed to fall squarely into the former category, with a number of advantages, but also of disadvantages, the most obvious of which is the slowness of the ./configure process.

This is true of moth autoconf and autoconf macro; on the other hand, libtool works vastly as a repository of knowledge, knowing rules about various operating systems, link editors and compilers. It shouldn’t surprise anybody, given that it already spends way more time than one would like, to discover how to build shared libraries; if it was a pure guessing game, it would probably make it unbearable to use.

Unfortunately this makes libtool vulnerable to the main issue of knowledge repository systems, with a bad twist: not only it becomes outdated as operating systems, link editors and compilers are released, but due to the autotools philosophy of not requiring the tools themselves at build time, simply updating the system copy of libtool is not enough. For details about this statement see my previous post on libtool from which I still haven’t had time to distil documentation for my guide.

What this boils down to is simple yet scary: even though libtool properly implemented support for gold starting from version 2.2.7 – which actually only means that it now knows that gold supports anonymous versions in linker scripts – any package whose autotools had been built with older versions wouldn’t support gold out of the box. Luckily for us, this isn’t an excessively invasive issue: anonymous versioning, as far as I can tell, is only used when using libtool to export and hide symbols which for good or bad is not used that often.

To make sure that projects know about gold support for the feature, you’re left with two solutions. The former is the obvious one of rebuilding autotools; while this is often necessary for other reasons, it isn’t that good an idea, because it wastes time. I have other notes about the rebuilding of autotools but I’ll skip over them for now. In Gentoo we have already a method to take care of this, consisting of the elibtoolize function. This function applies a number of patches over an already-generated autotools build system, to fix common issues (mostly due to libtool) without having to rebuild autotools altogether.

My first encounter with this interface was due to my early Gentoo/FreeBSD work. One of the things libtool “knows” about, is that FreeBSD does not like having two-part sonames which made many minor version bump, which should have kept the same, or a compatible, ABI into a link-breaking update. Since we didn’t look for binary compatibility with the original FreeBSD I went out of my way to patch libtool so that it would use the Linux naming scheme instead. Given that I already knew how to set it up, it wasn’t that difficult to add one extra patch when gold is used as the link editor of choice.

Unfortunately, having worked with that before also means I know the bad side of elibtoolize: too many packages do not use it at all. This is partly because developers do not know about the need for it (it applies, among others, the --as-needed compatibility patch), and partly because they think that it causes autotools to be rebuilt, rather than preventing the requirement for it.

Waiting for other obstacles to be solved, this one is probably going to be bothering us for months, if not years, to come. I don’t expect a resolution anytime soon, given that the idea of having a comprehensive, versioned system for autopatching packages, which was requested/proposed many times, never went into fruition. Maybe I’ll try pushing for it for the next GSoC (I note that I only partecipate into even years editions).

Parameters to ./configure can deceive

As much as I’d like for this not to be the case, one of the most obnoxious problems with autotools is that not only there is little consistency between packages on the use of ./configure parameters, but also when there is, the parameters themselves can deceive, either because of their sheer name or because of one package differing in its use.

This is the reason why I dislike the idea of autotools-utils.eclass of wiring the debug USE flag to --enable-debug: that option is often implemented badly, if at all, and is not consistently used across projects: it might enable debug specific code, it might be used to turn off assertions (even though AC_HEADER_ASSERT already provides a --disable-assert option), it might add debug symbol information (-g and -ggdb) or, very bad!, it might fiddle with optimizations.

One of the most-commonly deceiving options is --enable-static, which is provided by libtool, and thus used in almost any autotools-based build system. What this does is tell libtool to build static archives (static libraries) for the declared libtool targets, but enough people, me included when I started, assumed it was going to tell the build system to build static binaries, and wired it to the static USE flag, which was simply wrong. Indeed, most of the static USE flags are wired to append -static to LDFLAGS instead, while --enable-static is tied to static-libs, when it does make sense — in a lot of cases, building the static libraries doesn’t make sense; for details see my other post where I make some notes about internal convenience libraries.

Unfortunately, while libtool-based packages are forced into consistency on this parameter, this doesn’t stop it from being inconsistent among other, non-libtool-based projects (who intend it to behave as “make the final executable static”) and, ironically, by libtool itself. Indeed if you use the --disable-static option to ./configure for libtool itself, what you’re actually asking is for the libtool script to be unable to build static libraries altogether. This is not happening out of the blue though; in itself it is actually fairly consistent within libtool itself. But to understand this you need to take a step back.

Autoconf, automake, and libtool are all designed not to add extra dependencies to build a package, unless you change the autotools sources themselves. The fact that for a distribution such as Gentoo this is so common that building a system without autotools is near impossible is a different story altogether. Following this design, when a package uses autotools and libtool, it cannot rely on the presence of /usr/bin/libtool; what it does instead is creating its own libtool script, using the same macros used by the libtool package. Incidentally, this is the reason why we can’t just patch the libtool package to implement features and work around bugs, but we usually have to add a call to elibtoolize in the ebuild.

So, from a behaviour point of view, the --disable-static option does nothing more than generating a script that is unable to build static libraries, and from there you get packages without static libraries, when they don’t use the system libtool script.

On the other hand, the system libtool script is still used by some packages, one of which turns out to be lua. The build system used by lua is one that is quite messed up, and it relies on the creation of a static liblua archive, from which just a subset of sources are used to generate the lua compiler. To do so, it uses the system libtool script. If your sys-devel/libtool package is built with --disable-static, though, you get failures which took me a while to debug, when it hit me on the chroot I use to build package for my vservers (it is related to ModSecurity 2.6 having an automagic dependency on lua I haven’t had time to work on yet).

What’s the end of the story for this post? Do not assume that two packages share the same meaning for a given option unless those packages come from the same developer, I guess. And even then make sure to check it out. And don’t provide random parameters to packages in eclasses or generic setting files just because it seems like a good idea, you might end up debugging some random failure.

P.S.: I’ll try to find time to write something about libtool’s system script issue on Autotools Mythbuster over the weekend; last week I added some notes about pkg-config and cross-compiling due to work I’ve been doing for one of my contract jobs.

Endless recursion

I’m not referring to the usual programming mistake but to the endless problem of having build systems designed to run recursively, so that you have a single Makefile (or equivalent) rule file in each directory, and a top-level one to bridge them in.

Both within my “duties” in Gentoo and outside, I came to work heavily on fixing, managing, or at least (ab)using build systems. And since, when possible, I like my box to perform to its maximum potential, I’ve been a heavy maintainer that parallel make is the future to the point that I fixed a number of build systems myself, so that they built fine with parallel make.

But building fine is solving just half the problem; unfortunately a number of build systems are designed to work recursively, as I said above, and the result is that you only get a suboptimal parallel build out of them. How suboptimal? Well, I finally have a diagram to give an idea:

Timing diagram of a recursive build with four jobs

In this diagram – which is designed after the UML 2.0 new-style timing diagram, which in turn is unfortunately not supported by my tool of choice, I had drew them with Inkscape – you can see the flow of a recursive build using four jobs. Each of the “blocks” is a build job; for an easy understanding, think of the green ones as compiler calls (gcc), the purple ones as some code generator (such as ragel), while the yellow/orange pair is designed to be considered unrelated to the final output, such as a public header generator, or a documentation generator (this distinction is important, as another diagram later would shatter dependencies); finally the blue blocks are the link editor (ld).

Hopefully, the diagram is clear enough that the horizontal axis represents the time spent for the build, and thus the white spots over the linker calls should be easily understood as a sign that something is not performing to its full potential. If it isn’t as clear, then I should rethink the usefulness of the new-style timing diagram. The situation gets worse if you increase the job count to seven:

Timing diagram of a recursive build with seven jobs

The problem here is that each recursion ends with a link editor call, which as we all know by now, is the slowest part of the process (even when using gold). This is the most common situation, when a package provides one or more shared libraries that are built and installed in the system, and one utility that makes use of those. But there is a positive fact to note here: when the recursively-built libraries are not installed, libtool is usually smart enough not to link them up with the link editor, but only create an archive file. Unfortunately this doesn’t work too well with build systems like evolution’s, that force the -module option. Sigh!

So why would you want to use non-recursive build systems? A non-recursive build system allows the build process to take care of all the independent units at once and leaving the dependent ones to finish, such as it is shown in the following two diagrams, where the build is instead modelled to be non-recursive:

Timing diagram of a non-recursive build with four jobs

Timing diagram of a non-recursive build with seven jobs

While the timing is not derived out of an actual build, by experience they should fit nicely with most of the builds you’ll found out there. It is probably interesting to note that if you have one final binary, you will still wait for all the other linking targets to be completed, which means you will lose most efficiency at the end of your build, but this is also the moment when most build systems build their documentation (to optimize this, automake makes it very explicit that executable targets are built before non-executable ones).

So why are recursive build systems still widely used? Well, there are a number of reasons unfortunately; some of them relate to style, others to complexity, and other again simply to upstream not being willing to admit that a non-recursive build system is, simply put, more efficient.

For instance, you could look at xine-lib’s build system, and note that I never made it non-recursive: the problem there was that my fellow Darren didn’t like the idea to separate the sources from their output result, so it kept recursive.

Last summer instead I worked on an util-linux branch that caused it to go almost entirely non-recursive, which made it blazing fast to build; unfortunately my original approach was stylistically bad, and indeed I could see the issues. Unfortunately, while I did intend rebasing the patchset solving the issues found with it, it was a time-consuming task, and I hadn’t had the time to work on it.

In other cases, a fully non-recursive build system is impractical simply because it relies on tools that were designed with recursive makefiles in mind, which is the case both for tools like gettext, and gtk-doc. Which is why udev – whose buildsystem I worked on a bit of time ago – still has recursive files for the two gtk-doc outputs: there is just no way to get gtk-doc to work fine with non-recursive Makefiles — and for what it’s worth, it also works badly with out-of-tree builds. Sigh!

What probably makes me the saddest, is that there are build systems designed to work just and only recursive; under the idea that you can do the same type of work, with the same efficiency, in less lines of code or with a much less complex software.. and most of those suck at parallel building, which makes them very inefficient for the single task. In this context, I understand and relate to Matthew’s post about people deciding that something is “too bloated” and just rewrite it from scratch to produce a vastly sub-optimal result — I think I made the same point about CMake before.

And this is not to say that there isn’t always space for improvements, and that less code can be more efficient, if you reuse libraries and apply other tricks — I’m just saying that you should be always wary of software, and build systems, suggesting simplicity, efficiency and features at the same time: recursive is sure easy to provide, but it’s far from efficient. And an efficient build system would require to know enough trickeries to pack up the dependency resolution to compress the timing. And all of this was without taking into consideration the issue of distributed build, where you can run in parallel a high number of compile tasks, but no more than a few linking or generating tasks.

Of course, one could argue Gustafson’s law and assert that rather than building a single package fast, people would want to build more packages at the same time, which is what emerge --jobs option is designed to do, and what ChromiumOS build system implements with the parallel_emerge script. But this only covers full-system builds and upgrades, while build systems should equally targeted to make a single-package build fast enough to not cause the developers to waste time.

At least I hope now it is clear why recursive build system simply suck for parallelism.

F-Spot problems

I like taking photos, although I have rarely some nice subjects outside of my garden (since I haven’t travelled a lot before, although I intend to fix this in the next few months, for instance going back to London and somewhere else too). Since most of the photos I take I want to be seen by friends and family, I resolved some time ago into publishing all of them on flickr with the pro-account, without upload limits and with full-size photos available as well.

I don’t want to go on with explaining why I prefer flickr to Picasa, let’s just leave it to the fact that I can interface to flickr with XSLT without having to implement the GData protocol, okay?

You might then expect that I was pretty upset when, after upgrading to F-Spot I didn’t find the option to upload photos to flickr (in the mean time, even Apple’s iPhoto gained that feature!). Turns out that the problem wasn’t really F-spot’s but rather a problem with the patch, applied in Gentoo, to “fix” parallel build (in truth, it was just a hack to provide some serialisation to avoid breaking when using parallel make).

So indeed I was able to fix the Flickr exporter in and that was it: it used a system copy of the FlickrNet .NET library that was used by the exporter, instead of building and using the internal copy, so the keywords were dropped, but in the end it worked fine.

But the way I did the “fix” in Gentoo was as much as a hack as the original “parallel make fix”, so I wanted to rewrite it properly for upstream, by giving them something that merged directly, and could be really used. When I started doing so, I finally found what broke: the original “fix” stopped make from recursing into the bundled libraries’ directories on all targets, included clean and, more importantly for us, install. So the libraries were built but not installed, and that was our original problem.

The final problem, instead, is that F-Spot bundles libraries! FlickrNet is one of those, but there seem to be more, including Mono.Addins (which is already in the tree by the way), and gnome-keyring-sharp (also). And as usual, this is a violation of Gentoo policy; bug #284732 was filed, and I’ll try to work on it when I have some time; but before doing that I have to hope that upstream will accept the changes I made up to now, I would rather not do the work if upstream is going to reject it.

If you’re interested to see what I changed, or you’re an F-Spot developer that arrived on my blog, you find my changes in my git repository which is available on this server.