Using Gnulib to improve software portability

This article was originally published on Linux.com.

Many, if not most, free and open source software projects are developed primarily on Linux-based systems using the GNU C Library (glibc). Projects that use glibc are likely to depend on functions that are not available on systems that use different C libraries, such as the different BSD flavors. When packages are built on systems that don’t use glibc they often fail, because the other C libraries are missing functions found in glibc. The GNU Portability Library can help developers with cross-platform programming needs.

In the past there were many different libraries, such as publib, that tried to provide alternatives to the functions that are missing in the main C library. Unfortunately, handling compatibility libraries proved to be difficult. The additional libraries would require additional tests when running configuration scripts prior to compilation, and add dependencies for non-glibc systems.

As the number of new functions provided by the glibc increased, the GNU project started looking at the requirements for portability of programs on operating systems based on different libraries, and eventually created the GNU Portability Library (Gnulib) project.

Normally, a library is code that is compiled as a shared or static file, and then linked into the final executable. Gnulib is a source code library, more similar to a student’s collection of notes than a usual compiled library. Gnulib can’t be compiled separately; the code in it is intended to be copied into the projects using it.

###Using Gnulib

Two requirements limit Gnulib use, one technical and the other legal. First, the software you use it with must also use GNU Autotools, as Gnulib provides tests for replacements of functions and headers written in the M4 language, ready for usage with GNU Autoconf.

Second, it has to be licensed under the GNU General Public License (GPL) or the GNU Lesser General Public License (LGPL), as the code inside Gnulib is mostly (but not entirely) released under the GPL itself. Some of it is also released under the LGPL, and some of it is available as public domain software.

If you’re working on software released under other licenses, such as the BSD license or Apple Public Source License (APSL), it’s better to avoid the use of functions that are not available in a library licensed with more open terms. For example, you could take the libraries present in a BSD-licensed C library to replace missing functions in the current library, whichever license the software is using. Alternative, you could find replacement functions in other BSD-licensed software or create a “cleanroom” implementation without copying code from GPLed software, leaving the external interface the same but using different code.

It’s usually easy to re-implement functions or just copy missing functions from another project when they are not available through another C library, especially when they are simple functions that consist of less than 10 lines of code. Unfortunately, many projects depend firmly on GNU extensions, and won’t build with replacement functions, or the code is already so complex that adding cases to maintain manually is an extra encumbrance for developers.

What Gnulib provides is not only the source code of the missing functions, but an entire framework to allow a project to depend on GNU extensions, while retaining portability with non-GNU based systems.

The core of the framework is the gnulib-tool script, which is the automated tool for extracting and manipulating source code from Gnulib. Using gnulib-tool, you can see the list of available modules (gnulib-tool –list) or test them (one by one, or all together, using the –test or –megatest options), but more importantly you can automatically maintain the replacement functions for a source tree.

A practical example should help explain the concept. Let’s say that there’s a foofreak package that uses the strndup() function (not available on BSD systems for instance) and the timegm() functions (not available on Solaris). To make the source portable, a developer can run gnulib-tool –import strndup timegm from the same directory of the source code, and the script will copy (in the default directories) the source code and the M4 autoconf tests for strndup(), timegm(), and their dependencies – for example, strndup() depends on strnlen().

After running, gnulib-tool tells you to make a few changes to your code to allow the replacement to be checked and used when needed. It requires the Makefile in lib/ to be generated from the configure script, so it has to be added to AC_OUTPUT or AC_CONFIG_FILES. At the same time the lib/ subdirectory has to be added to the SUBDIRS variable in the Makefile.am. The M4 tests are not shipped with other packages, so they must be copied in the m4/ directory, and that has to be added as an include directory for aclocal. Finally, two macros have to be called within the configure.ac to initialize the checks (gl_EARLY and gl_INIT).

You can specify the name of the subdirectories and the prefix of the macros by running gnulib-tool with the parameters –source-base, –m4-base, and –macro-prefix, respectively. It’s also important to note that the replacement functions are built in an auxiliary library called libgnu (by default, but the name can be overridden by using the –lib parameter), so the part of the software using those functions has to be linked against this too.

If later on your project also wants to use the iconv() function, gnulib-tool can detect the currently imported modules and add the required iconv module without rewriting everything from scratch. This makes it simple to add new modules when you use new functions.

The different replacement functions are called “modules” by gnulib-tool, and they consist of some source code, some header files, and an M4 macro test. As some functions depend on the behavior of other functions, the modules depend one on the other, so adding a single module can generate quite a few additional checks and replacements, which make sure that the behavior is safe.

As some modules are licensed under the GPL, while other are licensed under the LGPL, a package licensed under the latter might want to make sure that no GPL modules are pulled in, as that would break the license. To avoid adding GPLed modules, you can use gnulib-tool’s –lgpl option, which forces the use of LGPL modules.

You can also use alternative code to provide a replacement function instead of using the Gnulib modules, and to avoid problems with dependencies. Gnulib-tool has an –avoid option that prevents specified Gnulib modules from being pulled in.

Following the previous example, if foofreak already contains a strnlen() function, used when the system library doesn’t provide one, it would be possible to use that, instead of importing the strnlen() module from Gnulib, by issuing the command gnulib-tool –import strndup timegm –avoid strnlen. With this syntax the strnlen module will be ignored and the function already present in foofreak will be used. While this option is provided, it’s usually not advisable to use it if you don’t really know what you’re doing. A better alternative would be dropping strnlen() from the code where it was used, and using the replacement provided by Gnulib instead.

###Summary

Gnulib is an interesting tool for people working with GPL- or LGPL-licensed software that needs to be portable without dropping the use of GNU extensions, but it has some drawbacks. The major drawback is the license restrictions, which requires non-(L)GPL-licensed software to look elsewhere for replacements. It also requires the use of the GNU toolchain with Autotools, as it would be quite difficult to mimic the same tests with something like SCons or Jam.

Finally, the source code sharing between projects breaks one of the basic advantages in the use of libraries: the reuse of the same machine code. When the same function, required by 10 or 20 programs, has to be built inside the executable itself as the system does not provide it, there will be 10 or 20 copies of the same code in memory and on disk, and they may behave in different ways, leading to problems if they are linked inside a library used by third-party software.

Gnulib is worth a try, but you should not use it in critical software or software that might have a limited audience. In those situations, avoid the use of extension functions when possible, and add replacement functions only when they’re actually needed. There’s no point in having a replacement function for something that is works on 90% of modern systems and breaks only on obsolete or obscure operating systems or C libraries, especially if the software is written to be run on modern machines.

Best practices with autotools

This article was originally published on Linux.com.

Update: This article is obsoleted by Autotools Mythbuster that goes above and beyond all of this.

The core of GNU’s compile chain – the set of tools used to build GNU software packages – is the so-called “autotools,” a term that refers to the autoconf and automake programs, as well as libtoolize, autoheader, pkg-config, and sometimes gettext. These tools let you compile GNU software on a wide variety of platforms and Unix and Unix-like operating systems, providing developers a framework to check for the presence of the libraries, functions, and tools that they want to use. While autotools are great in the hands of an experienced developer, they can be quite a handful for the first-time user, and it’s not so rare that packages are shipped with working-but-broken autotools support. This article will cover some of the most common errors people make when using autotools and ways to achieve better results.

Regardless of anyone’s opinion about them, we currently have no valid alternative for autotools. Projects such as Scons are not as portable as autotools, and they don’t embody enough knowledge to be useful yet. We have tons of automatic checks with autotools, and a lot of libraries come with an m4 library with macros to check for their presence.

The basic structure of an autotooled project is simple. Autoconf uses a configure.ac file (formerly configure.in) written in m4 language to create a configure script with the help of an aclocal.m4 file (created by aclocal using the m4 libraries on its search path and acinclude.m4 file). For every directory there’s a Makefile.am file, used by automake to create the Makefile.in templates, which are processed and transformed in the real makefiles by the configure script. You can also avoid using automake and just write your own Makefile.in files, but this is quite complex, and you lose a few features of autotools.

In a configure.ac file you can use macros you define yourself, the default ones provided by autoconf and aclocal, or external macros provided, for instance, by other packages. In such a case aclocal will create the aclocal.m4 file adding the library files it finds on the system’s library with the defined macros; this is a critical step to have a working autotooled project, as we’ll see in a moment.

A Makefile.am is mainly a declaration of intents: you can fill some targets variables with the name of the targets you want to build. These variables are structured in a format like placetoinstall_TYPEOFTARGET. The place is the location in a hierarchical Unix filesystem (bin, lib, include, …), a non-used keyword that can be defined with an arbitrary path (using the keyworddir variable), or the special keyword noinst that marks the targets that need not to be installed (for example private headers, or static libraries used during build). After naming the target, you can use the name (replacing dots with underscores) as the prefix for the variables that affects its build. In this way you can provide special CFLAGS, LDFLAGS, and LDADD variables used during the build of a single target, instead of changing them for all the targets. You can also use variables collected during configure phase, if you passed them to the AC_SUBST macro in configure.ac, so that they are replaced inside makefiles. Also, though defining CFLAGS and LDFLAGS on a per-target basis seems useful, adding static flags in Makefile.am is a bad thing for portability, as you can’t tell if the compiler you’re using supports them, or if you really need them (-ldl put in LDFLAGS is a good example of a flag needed on Linux but not on FreeBSD); in such cases you should use configure.ac to add these flags.

The most commonly used macros in configure.ac are AC_CHECK_HEADERS, AC_CHECK_FUNCTS, and AC_CHECK_LIB, which test for the presence of, respectively, some header files, some library functions, and a given library (with a specific function in it). They are important for portability as they provides a way to check which headers are present and which are not (for example system headers that have different locations in different operating systems), and to check whether a function is present in the system library (asprintf() is missing in OpenBSD for example, while it’s present on GNU C library and FreeBSD), and finally to check for the presence of some third-party library or to see if a specific link to a library is needed to get some functions (for example dlopen() function is in libdl library on GNU systems, while it’s provided by the system’s C library on FreeBSD).

Along with testing for the presence or absence of functions or headers (and sometimes libraries) you usually need to change the code’s path (for example to avoid the use of missing functions, or to define a drop-in replacement for them). Autoconf is commonly coupled with another tool, autoheader, which creates a config.h.in template, used by configure script to create a config.h header in which are defined a few preprocessor macros in form of HAVE_givenfunction or HAVE_givenheader_H which can be tested with #ifdef/#ifndef directives inside a C or C++ source file to change the code according to the features present.

Here are some practices to keep in mind to help you use autotools to create the most portable code possible.

The config.h header file should be considered to be an internal header file, so it should be used just by the single package in which it’s created. You should avoid editing the config.h.in template to add your own code there, as this requires you to manually update it according to the configure.ac you’re writing.

Unfortunately a few projects, such as Net-SNMP, export this header file with other libraries’ headers, which requires any projects that use their libraries to include them (or provide their own copy of the internal Net-SNMP structures). This is a bad thing, as the autotools structure of a library project should be invisible to software using it (which might not use autotools at all). Also, changes in autotools behavior are anything but rare, so you can have two identical checks with different results due to changes in the way they are executed. If you need to define your own wrappers or replacements in case something is not in the environment you’re compiling for, you should do that in private headers that do not get installed (declared as noinst_HEADERS in Makefile.am files).

Always provide the m4 files you used. As autotools have been in use for years, many packages (for example libraries) that can be reused by other programs provide an m4 library file in /usr/share/aclocal that makes it possible to check for their presence (for example using the -config scripts) with a simple macro call. These files are used by aclocal to create the aclocal.m4 file, and they usually are present on the developers’ systems where aclocal is executed to create the release, but when they are for optional dependencies, they can be missing on users’ systems. While this is usually not a problem, because users rarely executes aclocal, it’s a problem for source distributions, such as Gentoo, where sometimes you need to patch a Makefile.am or the configure.ac and then re-run autoconf without having all the optional dependencies installed (or having different versions, which can be incompatible or bugged, of the same m4 file).

To avoid this problem, you should create an m4 subdirectory in your package’s directory and then put there the m4 library files you are using. You must then call aclocal with aclocal -I m4 options to search in that directory before the system library. You can then choose whether to put that directory under revision control (CVS, SVN, or whatever else you are using) or just create it for the releases. The latter case is the bare minimum requirement for a package. It minimizes the amount of revision-controlled code and ensures that you’re always using the latest m4 version, but has the drawback that anyone who checks out your repository won’t be able to execute autoconf without having to look on a release tarball to take the m4 from (and that might not work, as you can have updated the configure.ac to suit a newer macro or added more dependencies). On the other hand, putting the m4 directory under revision control sometimes tempts the developers to change the macros to suit their needs. Although this seems logical, as the m4 files are under your revision control, it will upset many package maintainers, as sometimes new versions of m4 files fix bugs or support newer options and installation paths (for example multilib setups), and having the m4 files modified makes it impossible to just replace them with updated versions. It also mean that when you’re going to update an m4 file you must redo the modification against the original.

m4 files are always a problem to work with. They must replicate almost the same code from library to library (depending on the way you need to provide CFLAGS/LDFLAGS: with tests or with a -config script). To avoid this problem, the GNOME and FreeDesktop projects developed a tool called pkg-config, which provides both an executable binary and an m4 file to include in configure.ac files, and lets developers check for the presence of a given library (and/or package), provided that the package itself installed a pkg-config .pc data file. This approach simplifies the work of maintaining configure.ac scripts, and requires a lot less time to be processed during execution of configure script, as it uses the information provided by the installed package itself instead of just trying if it’s present. On the other hand, this approach means that an error the developers make concerning a dependency can break the user program, as they just hardcode the compiler and linker flags in the data file and the configure script doesn’t actually check whether the library works. Fortunately, this doesn’t happen too often.

To create the configure file, you need PKG_CHECK_MODULES, contained in the pkg.m4 library. You should add that file to your m4 directory. If pkg-config dependency is mandatory (as the tool is run by the configure script) you can’t be sure that the m4 file you are using is the same as one on users’ systems, nor you can be sure that it does not include extra bugs, as it can be older than yours.

Always check for the libraries you’re going to link to, if you have them as mandatory dependencies. Usually autoconf macros or pkg-config data files define prerequisite libraries that you need to successfully link to your library. Also, some functions that are in extra libraries in some systems (like dlopen() in libdl on Linux and Mac OS X) can be in the libc of another system (the same function is in libc on FreeBSD). In these cases you need to check whether the function can be found without linking to anything, or if you need to use a specific library (for example to avoid linking to a non-existent libdl that would fail where it’s not needed).

Be careful with GNU extensions. One of the things that makes portability a big pain is the use of extension functions, which are provided by GNU libc but aren’t present on other C libraries like BSD’s or uClibc. When you use such functions, you should always provide a “drop-in replacement,” a function that can provide the same functionality as the library function, maybe with less performance or security, which can be used when the extension function is not present on system’s C library. Those functions must be protected by a #ifdef HAVE_function ... #endif block, so that they don’t get duplicated when they are already present. Make sure that these functions are not exported by the library to the external users; they should be declared inside an internal header, to avoid breaking other libraries that may be doing similar tricks.

Avoid compiling OS-specific code when not needed. When a program optionally supports specific libraries or specific operating systems, it’s not rare to have entire source files that are specific to that code path. To avoid compiling them when they’re not needed, use the AM_CONDITIONAL macro inside a configure.ac file. This automake macro (usable only if you’re using automake to build the project) allows you to define if .. endif blocks inside a Makefile.am file, inside which you can set special variables. You can, for example, add a platformsrcs variable that you set to the right source file for the platform to build for, then use in a _SOURCES variable.

However, there are two common errors developers make when using AM_CONDITIONAL. The first is the use of AM_CONDITIONAL in an already conditional branch (for example under an info or in a case switch), which leads to automake complaining about a conditional defined only conditionally (AM_CONDITIONAL must be called on global scope, out of every if block, so you must define a variable to contain the status of the conditions and then test against when calling the AM_CONDITIONAL). The other one is that you can’t change the targets’ variables directly, and you must define “commodity” variables, whose results empty out of the conditional, to add or remove source files and targets.

Many projects, to avoid compiling code for specific code paths, add the entire files in #ifdef ... #endif preprocessor conditionals. While this usually works, it makes the code ugly and error-prone, as a single statement out of the conditional block can be compiled where the source file is not needed. It also misleads users sometimes, as the source files seem to be compiled in situations where they don’t make sense.

Be smart in looking for operating system or hardware platform. Sometimes you need to search for a specific operating system or hardware platform. The right way to do this depends on where you need to know this. If you must know it to enable extra tests on configure, or you must add extra targets on makefiles, you must do the check in configure.ac. On the other hand, if the difference must be known in a source file, for example to enable an optional asm-coded function, you should rely directly on the compiler/preprocessor, so you should use #ifdef directives with the default macros enabled on the target platform (for example __linux__, __i386__, _ARCH_PPC, __sparc__, _FreeBSD_ and __APPLE__).

Don’t run commands in configure.ac. If you need to check for hardware or operating system in a configure.ac, you should avoid using the uname command, despite this being one of the most common way to do such a test. This is actually an error, as this breaks crosscompilation. Autotools supports crosscompile projects from one machine to another using hosts definitions: strings in the form “hardware-vendor-os” (actually, “hardware-vendor-os-libc” when GNU libc is used), such as i686-pc-linux-gnu and x86_64-unknown-freebsd5.4. CHOST is the host definition for the system you’re compiling the software for, CBUILD is the host definition for the system you’re compiling on; when CHOST and CBUILD differ, you’re crosscompiling.

In the examples above, the first host definition shows an x86-like system, with a pentium2-equivalent (or later) processor, running a Linux kernel with a GNU libc (usually this refers to a GNU/Linux system). The second refers to an AMD64 system with a FreeBSD 5.4 operating system. (For a GNU/kFreeBSD system, which uses FreeBSD kernel and GNU libc, the host definition is hw-unknown-freebsd-gnu, while for a Gentoo/FreeBSD, using FreeBSD’s kernel and libc, but with Gentoo framework, the host definition is hw-gentoo-freebsd5.4.) By using $host and $build variables inside a configure.ac script you can enable or disable specific features based on the operating system or on the hardware platform you’re compiling to or on.

Don’t abuse “automagic” dependencies. One of the most useful features of autotools are the automatic checks for the presence of a library, which are often used to automatically enable support for extra dependencies and such. However, abusing this feature makes the build of a package a bit of a problem. While this is quite useful for first-time users, and although most of the projects having complex dependencies (such as multimedia programs like xine and VLC) use a plugin-based framework that allows them to avoid most of the breakages, “automagic” dependencies are a great pain for packagers, especially ones working on source-based distributions such as Gentoo and ports-like frameworks. When you build something with automagical dependencies you enable the functions supported by the libraries found on the system on which the configure script is run. This means that the output binaries might not work on a system that shares the same base packages but misses one extra library, for example. Also, you can’t tell the exact dependencies of a package, as some might be optional and not be built when the libraries are not present.

To avoid this, autoconf allows you to add --enable/--disable and --with/--without options to configure scripts. With such options you can forcefully enable or disable a specific option (such as the support for an extra library or for a specific feature), and leave the default to automatic tests.

Unfortunately, many developers misunderstand the meaning of the two parameters of the functions used to add those options (AC_ARG_ENABLE and AC_ARG_WITH). They represent the code to execute when a parameter is passed and when one is not. Many developers mistakenly think that the two parameters define the code to execute when the feature is enabled and when is disabled. While this usually works when you pass a parameter just to change the default behavior, many source-based distributions pass parameters also to confirm the default behavior, which leads to errors (features explicitely requested missing). Being able to disable optional features if they don’t add dependencies (think of OSS audio support on Linux) is always a good thing for users, who can avoid building extra code if they don’t plan to use it, and prevents maintainers from doing dirty caching tricks to enable or disable features as their users request.

While autotools were a big problem for both developers and maintainers because there are different incompatible versions that do not get along well together (since they install in the same places, with the same names) and which are used in different combinations, the use of autotools saves maintainers from doing all sorts of dirty tricks to compile software. If you look at ebuild from Gentoo’s portage, the few that do not use autotools are the more complex ones, as they need to check variables on very different setups (we can or not have NPTL support; we can be on Linux, FreeBSD, or Mac OS X; we can be using GLIBC or another libc; and so on), while autotools usually take care of that on their own. It’s also true that many patches applied by maintainers are to fix broken autotools script in upstream sources, but this is just a little problem compared to the chaos of using special build systems that don’t work at all with little environmental changes.

Autotools can be quite tricky for newcomers, but when you start using them on a daily basis you find it’s a lot easier than having to deal with manual makefiles or other strange build tools such as imake or qmake, or even worse, special autotools-like build scripts that try to recognize the system they are building on. Autotools makes it simple to support new OSes and new hardware platforms, and saves maintainers and porters from having to learn how to custom-build a system to fix compilation. By carefully writing a script, developers can support new platforms without any changes at all.

Best practices for portable patches

This article was originally published on Linux.com.

One of the things I usually take care of as a Gentoo packages maintainer is sending patches to upstream developers. If a patch is applied upstream, we can remove it from future versions of a package so we have less work to do to maintain the package. Unfortunately, it seems that other distributions and packagers don’t always do the same. This is true not only for Linux distributions such as Debian, Fedora Core, and SUSE, but also for maintainers of packages in places like FreeBSD’s Ports, DarwinPorts or Fink. Here are some tips for developers on making things easier for yourself and everyone who has to touch your code. When upstream developers are unaware of the problems their software has on platforms they can’t test (perhaps because they use another distribution, another environment, another operating system, or another hardware platform) they can’t fix them and are likely to introduce more problems in further versions, if they assume that things are good as they are. Letting upstream developers know of the problems, filing bugs, and in general reporting problems is one of the best ways to help an open source project, and it’s something even users with no technical skills can do.

When you have the technical ability to fix a bug, though, you should try to provide to the upstream developers a patch. However, not all patches can be applied unconditionally upstream, and that makes it harder to fix a problem in the short term. Some of the errors people make while preparing a patch and sending it upstream can be fixed in a reasonably simple way, but I still see bad patches in many packaging systems (including Gentoo on occasion).

The first thing to take into account when writing a patch is that the environment in which you’re working can be different from other environments. By “environment” I mean all the factors that can affect the behaviour of a program, such as the operating system and its version; the distribution, if an operating system has more than one; the drivers used when there is more than one kind; the version of the libraries or of the tools; and so on. One of the most common assumptions developers make while creating patches is that the environment they’re using is the “right” one and everything else should follow it; however, even when the environment used is the “right one,” a patch should always be general enough to be applicable also to “broken” environments.

Using GNU’s Autotools is a good way to allow a C/C++ program to adapt itself to multiple environments. Although they have many problems, and are hated by most of their users, Autotools are currently the best way to handle a multi-platform, adaptable building system. They have facilities to check for headers, libraries, functions, and a lot more. Unfortunately Autotools are quite difficult to learn, and I’m not going into details on how to use them or how to write macros in Autotools’ M4 language, but here’s a brief overview.

In an Autotools-based project, you usually write a configure.ac script in M4, which defines the checks needed on the host machine (the machine which where the build is going to be done); this script is then translated into a shell script by autoconf. You also write makefile.am files which are used by automake to create the templates for makefiles that are created by the configure script after the checks. Usually the configure script can define conditionals for makefiles and create a config.h file where you can find some C preprocessor’s macros useful to create conditional code for a C or C++ source file (using #ifdefs).

For instance, you can’t always suppose that a given header is there, even if it’s part of a standard defined for Unix systems. Libraries change, including system libraries, and something you’re writing now can change in the future. Try to make sure that all the system and non-conventional headers you use are present before using them. You can usually do that with an AC_CHECK_HEADERS() call in configure.ac and using the config.h generated on the host machine to check whether they are present. Sometimes you get warnings during configure execution, stating that a given header “can be preprocessed but fails to compile”: while this is a warning at this time, it is going to be an error in the future, so try to fix it as soon as you can. The fix usually involves adding a couple of needed headers in the prerequisite argument. With this check, for example, you can usually avoid the infamous error of malloc.h on FreeBSD systems (however, malloc.h is actually deprecated; you can use stdlib.h in its place without problems).

System headers aren’t portable, so you should always avoid them if you’re not going to use a kernel service you can’t get in other ways. Try instead to get the information or the services you need; it’s usually more portable also between different version of the same operating system. It’s not unlikely that different Linux (kernel) versions have incompatible headers that can make some software fail to work.

Libraries, too, can be a problem. Glibc, used by every Linux distribution (apart from the ones for embedded usage, which normally use uclibc or dietlibc), provides not only the base functions a normal Clibrary should provide (for example, to interact with kernel) but also provides a complete iconv() implementation, basic gettext implementation, and a getopt_long() function used to get long parameters when a program uses getopt to parse the arguments given to it by the user. However, other system libraries, like FreeBSD’s, don’t provide all that; iconv() is provided on FreeBSD systems by GNU libiconv, while gettext is entirely provided by GNU gettext; for getopt_long(), you need another library just on 4.x series of FreeBSD, on DragonFly BSD, and probably on other non-Linux systems. Autotools provide the AC_CHECK_LIB() macro to check for presence of a given function in a library, allowing the packages to check if they must link to libiconv, libdl, libintl, and so on.

Size of basic types, like integer sizes, can vary among hardware platforms, as can pointer size. This problem is getting bigger lately, as x86_64 systems can now be considered the first 64-bit hardware platform for mainstream desktop users. Having to care about 64-bit cleanness can be a bit tricky for software developed assuming the usual x86 platform, but the fixes are usually easy enough to make in a couple of minutes. One of the most important ways to ensure that the variables’ sizes are right is to use the “standardized integers” which can be found on sys/types.h or stdint.h headers. These are renamed integer types, with a name in the form (u)intSIZE_t — so you have int8_t for 8-bit signed integers and uint64_t for 64-bit unsigned integers. Pointers can’t use those fixed-length types, as they depend on the architecture; they usually have the same length as long integers, but if you can, try ptrdiff_t before using long to store them. Other types, such as size_t and off_t, have their dimension fixed in all the platforms, so they are safe to be used as they are. You should never mix fixed-length and “named” types like unsigned int and long, and you should be sure that you’re using the right %-code when passing integers to printf or scanf. You can usually find macros defined that allow you to get the right code with the size, so PRId64 would be the equivalent of %d for 64-bit integers, while PRIu32 would be the equivalent of %u for 32-bit integers.

Assembler code has also been a problem lately. Before the introduction of x86_64 processors every hardware platform had its own assembler code, so there weren’t too many problems porting it. It was just a matter of having a non-assembler version of the code, which was maybe slower but portable. With the new architecture, instead, you can have assembler code shared between x86 and x86_64 systems, and this is especially true for multimedia applications that make use of extended instructions like MMX, SSE, or 3DNow!. Although the syntax of assembler is usually the same, there are a few things to think about, such as the size of the registers and the operations run on them. Using “base” operations on 32-bit registers on an x86_64 processor will fail, so you usually have to select the register to use as operands with conditionals.

The discussion about hardware compatibility is of course longer than this and deserves a book of its own, as there are a lot of other tricky conditions to take care of. Those working on non-x86 architectures already know how to fix eventual problems and are likely to provide patches in cases where the software is broken.

Returning to software compatibility issues: you should take into account the possibility of crosscompiling the software. Letting Autotools-based projects build in crosscompile is usually an easy task, if you have all the dependencies already crosscompiled (and, obviously, you have a working crosscompiler for the target architecture). Unfortunately, some configure scripts are broken by design and fail to work as they should in crosscompile. The most common error is to use the output of the uname command to select the architecture or platform to compile for; this will tell you about the current system, not the target platform. Instead of using that, you can rely on ${host} and ${target} variables, which contains a CHOST-like string defining the host system (the one you’re compiling on) and the target one (the one you’re compiling for). The CHOST-like strings are structured as tuples, either three components (arch-vendor-os) or four (arch-vendor-kernel-libc), although the latter is used only when the operating system has no one “native” libc. The arch part of the string defines the hardware platform used: i386, i686, x86_64, ppc, ppc64, sparc, and so on; the vendor part used to refer to the hardware vendor (for example, ibm for their mainframes) but lately is being used to refer to the software vendor, so you can find there redhat, debian, mandrake, gentoo (note that Gentoo uses it only on uclibc and Gentoo/*BSD systems) and others, but usually it’s just “pc” for x86 and “unknown” for other architectures. The os part is the most tricky one, as it defines the operating system used. Usually it’s “linux-gnu” for GNU/Linux systems, but it can be “linux-uclibc” or just “linux” for other Linux-based systems, or can be “freebsdX.Y” to refer to a given version of FreeBSD (just “freebsd” is not a valid value). Generally, this value is what you use to check which OS-specific code to enable.

I hope that this introductory article will inspire patch authors to ensure that the changes they make are not going to break other systems. By using the right combination of Autotools and conditional compilation to check for the right environment where the patch is needed, developers can allow upstream maintainers to fix their software, saving maintainers from having to reinvent the wheel every time something in the “common” environment changes.