Autotools Mythbuster: A practical case of TMT (Too Many Tests)

I have written already about the fact that using autoscan is simply a bad idea not too long ago. On the other hand, today I found one very practical case where using (an old version of) autoscan have create a very practical problem. While this post is not going to talk about the pambase problems there is a connection to my previous posts: it is related to the cross-compilation of Linux-PAM.

It is an unfortunately known problem that Linux-PAM ebuild fails cross-compilation — and there are a number of workaround that have never been applied in the ebuilds. The reason are relatively simple: I have insisted for someone else who had the issue at hand to send them upstream. Finally somebody did, and Thorsten fixed the issues with the famous latest release — or so is theory. A quick check shows me that the latest PAM version is also not working as intended.

Looking around the build log it seems like the problem is that the configure script for Linux-PAM, using the original AC_PROG_LEX macro, is unable to identify the correct (f)lex library to link against. Again, the problem is obvious: the cross-building wrappers that we provide in the crossdev package are causing dependencies present in DEPEND but not RDEPEND to be merged just on the root file system. Portage allows us to set the behaviour to merge them instead on the cross-root; but even that is wrong since we need flex to be present both as a (CBUILD) tool, and as a (CHOST) library. I’ve asked Zac to provide a “both” solution, we’ll see how we go from there.

Unfortunately a further problem happens when you try to cross-compile flex: it fails with undefined references to the rpl_malloc symbol. You can look it up and you’ll see that it’s definitely not a problem limited to flex. I know what I’m dealing with when finding these mistakes, but I guess it doesn’t hurt to explain them a bit further.

The rpl_malloc and rpl_realloc are two replacement symbols, #define’d during the configure phase by the AC_FUNC_MALLOC autoconf macro. They are used to replace the standard functions with two custom ones that can be conditioned; the fact that they are left dangling is, as I’ll show in a moment, pretty much a signal of overtesting.

Rather than simply checking for the presence of the malloc() function (can you really expect the function to be missing on any non-embedded system at all?), the AC_FUNC_MALLOC macro (together with its sibling AC_FUNC_REALLOC) checks for the presence of a glibc-compatible malloc() function. That “glibc-compatible” note simply means a function that will not return NULL when passed a length argument of 0 (which is the behaviour found in glibc and a number of other systems). It is a corner-case condition and most of the software I know is not relying at all on this happening, but sometimes it has been useful to test for, otherwise the macro wouldn’t be there.

Of course the sheer fact that nobody implemented the compatibility replacement functions in the flex source tree should make it safe to assume that there is no need for the behaviour to be present.

Looking at the original configure code really tells more of a story:

# Checks for typedefs, structures, and compiler characteristics.

AC_HEADER_STDBOOL
AC_C_CONST
AC_TYPE_SIZE_T

# Checks for library functions.

AC_FUNC_FORK
AC_FUNC_MALLOC
AC_FUNC_REALLOC
AC_CHECK_FUNCS([dup2 isascii memset pow regcomp setlocale strchr strtol])

The two comments are the usual ones that you’d find in a configure.in script created by autoscan; it’s also one of the few cases where you actually find a check for size_t, as most software assumes its presence anyway. More importantly, if you look at the long list of AC_CHECK_FUNCS macro call arguments, and then compare with the actual source code of the project, you can understand that the boilerplate checks are there, but their results are never checked for. That’s true for all the functions in there.

What do we make of it? Well, first of all, it shows that at least part of the autotools-based buildsystem in flex is not really well thought off (and you can add to that some quite idiotic stuff in the Makefile.am to express object file dependencies, which then required a patch that produced half-broken results). But it also shows that in all likeliness, the check for malloc() is there just because, and not because there is need for the malloc(0) behaviour.

A quick fix, consisting of removing useless checks, and rebuilding autotools with more modern autotools versions, and you have a perfectly cross-compilable, workaround-free flex ebuilds. Voilà!

So why am I blogging about this at all? Well, I’ll be updating the Guide later on today; in the mean time I wanted to let you know about this trickery because I sincerely have a problem with having to add tons of workaround for cross-building: it’s time to tackle them down, and quite a lot of those are related to this stupid macro usage, so if you find one, please remember this post and get to fix it properly!

Autotools Mythbuster: autoscan? autof—up

Beside the lack of up-to-date and sane documentation about autotools (for which I started my guide that you should remember is still only extended in my free time), there is a second huge user experience problem: the skeleton build system produced by the autoscan script.

Now, I do understand why they created it, the problem is that as it is, it mostly creates fubar’d configure.ac skeletons, that confuse newcomers and causes a lot of grief to packagers and to users of source-based distributions (and those few who still think of building software manually without getting it from their distribution).

The problem with autoscan is that it embodies again the “GNU spirit”, or actually the GNU spirit of the original days, back when GNU tried to support any operating system, to “give freedom” to users forced to use those OSes rather than Linux itself. Given that nowadays FSF seems to interest itself mostly on discouraging anybody from using non-free operating systems (or even, non-GNU based operating systems) – sometimes failing and actually discouraging companies from using Free Software altogether – it seems like they had a change of ideas in the middle of that. But that’s something for another post.

Anyway, assuming that you’ll have to make your software work on any operating system out there is something that you are well unlikely to do. First of all, a number of projects nowadays, for good or bad, target Linux only; sometimes even just GNU/Linux (that is they don’t support running on other C libraries) because they require specific features from the kernel, specific drivers and other requirements like that. Secondly, you can easily require your users to have a sane environment to begin with; unless you really have to run on a 15 years old operating system, you can assume at least some basic standard support. I have already written about pointless autotools checks but I guess I didn’t make it too clear yet.

But it’s not just the idea of just dropping support for anything that does not support a given standard, whatever that might be (C99, POSIX.1-2008, whatever), it’s more that the configure.ac generated by autoscan is not going to make it magically work on a number of operating systems it didn’t support before. What it does, for the most part, is adding a number of AC_CHECK_HEADERS and AC_CHECK_FUNCS calls, which will verify the presence of various functions and headers that your software is using… but it won’t change the software to provide alternatives; heck there might not be alternatives.

So if your software keeps on using strings.h (which is POSIX) and you check for it in configure phase, you’re just making the configure phase longer without any solution, because you’re not making use of the results from the configure phase. Again, this often translates to things like the following:

#ifdef HAVE_STRINGS_H
#include 
#endif

Okay, so what is the problem with this idea? Well, to begin with, I have seen it so many times without an idea of why it is there! A number of people expect that since autoscan added the check, and thus they have the definition, they have to use it. But if you use a function that is defined in that header, and the header is not there, what are you going to do? Not including it is not going to make your software any more portable, if anything you’re going to get an implicit declaration of the function, and probably fail later at runtime. So, if it’s not an optional header, or function, just running the check and using the definition is not enough.

A common alternative is to fail the configure step if the header or function is not found, while it makes a bit more sense, I still dislike that option, Sure you might be able to tell the reason why the function is needed and whether they have to install something else or upgrade their system, but in truth that made much more sense when there was near to no common ground between operating systems, and users were the common people running the ./configure script. Nowadays, that’s a task that is often limited to packagers, that know their systems much better. The alternative to failing in configure is failing during build, and it’s generally not too bad. Especially since you’ll be failing build for any condition you didn’t know about beforehand.

I have another reason to provide, as for why you shouldn’t be running all those tests for things you don’t support a fallback for: autoconf provides means to pass external libraries and include directives to the compiler; since having each package provide its replacement for common function is going to cause a tremendous amount of code duplication (which in turn may cause a lot of work for packages if one of them is broken, such as dtoa() anybody remembers that?), I’m always surprised that there aren’t many more libraries that provide compatibility replacements for the functions missing in the system C library (gnulib does not count as it’s solving the problem with code replication, if not duplication). Rather than fail, or trying to understand whether you can build or not depending on the OS used, just assume their presence if you can’t go without, and leave it to the developers running that system to come up with a fix, which might involve additional tests, or might not.

My suggestion here is thus to start considering first the operating systems you’re targeting directly; try to find what actually changes between them; in most cases, for instance, you might have still pieces of very-old systems around, like the include for malloc.h that is only useful if you want to call functions such as memalign() but is not used for malloc() since, well, ever (stdlib.h is enough for that), and that will cause errors on both FreeBSD and OS X if included. So once you find that a header is not present in some of your desired operating system, look up what replaces it, then make sure to check for it properly; that means using something like this:

dnl in configure.ac
AC_CHECK_HEADERS([stdint.h inttypes.h], [break;])

/* in your C code */
#if HAVE_STDINT_H
#  include 
#else HAVE_INTTYPES_H
#  include 
#endif

This way you won’t be running checks for a number of alternative headers on all systems: most modern C99-compatible system libraries will have stdint.h available, even though a few older systems will need for inttypes.h to be discovered instead. This might sound cheap, since it’s just two headers, but especially when you’re looking for the correct place of a library header, you might end up with an alternative among three or four headers, and add a bunch of alternatives here, and you’re going to have problems. The same trick can be used for functions, and the description is also on my guide so and I’ll soon expand it to cover functions as well.

It shouldn’t “sound wrong” to have a configure.ac with near to no AC_CHECK_* call at all! Most of the tests autoconf will do for you, and you have the ability to add further, but there is no need to strain at using them when they are unneeded. Take as example the feng configure.ac — it has a few checks, of course, but they are limited to a few optional features that we workaround if missing, in the code itself. And some I would probably just remove (like making ipv6 support optional… I’d sincerely just make it work if it’s found on the system, as you still need to enable it in the configuration file to use it anyway).

And please, please, just don’t use autoscan, from now on!