Libtool archives and their pointless points

Since the discussion resumed about libtool files, and we’re back at deciding whether to kill them now or “plan” for the next five to ten years, I guess I better summarise again all the information regarding them, maybe trying to extend what I already wrote about a number of times and provide all the reasoning to abandon them as soon as possible.

I’m writing it here since this is what I use as main reference; I’ll see to send this to gentoo-dev tomorrow if I have time and I feel motivated enough, but if somebody wants to beat me to this, I’ll be just happy since it’ll mean less work for me.

About the chance of just removing all of them unconditionally. This is one thing I sincerely don’t think it’s possible, even though some trials have been done toward that target, for instance with the Portage-Multilib branch. The reasons are multiple; the most obvious one is that calling them *.la is not just enough; there is at least one package in the tree (in the subset of packages the tinderbox is able to merge, more to the point) that names files with .la suffix but simply are not libtool archives at all. So simply removing all of the files based on the file name and path is a bad idea.

About fixing this at the core, libtool. That’s a feasible, albeit way long term solution, if it wasn’t that I find it unlikely that upstream will accept that. Beside the fact that rather than coupling autotools more tightly to reduce duplication issues with version compatibility, they seem to be splitting it further and requiring users to use more code that might not be useful at all for them (for instance by deprecating the mktime() checks in favour of using gnulib, as they did with autoconf 2.68), libtool developers seems to have indicated they don’t intend conceding to modern system use cases if they might hinder support of much older systems.

While it is feasible to patch libtool to not emit the .la files, and quite a number of projects will rejoice of that, it cannot be done unconditionally, as I’ll explain further along the post. So this will require either doing it under conditional of -shared flag, or adding a new flag parameter to use. But even assuming that upstream were to merge the flag, fixing all of the packages upstream not to emit the libtool archive files, well, it’s a plan that takes way too many years, with user inconvenience not stopping until all of it is done. So as it is, it’s something that should be sought out, but won’t help on short to mid term.

About usefulness of libtool archives with plugins, which is something I wrote in detail over an year ago. I’m not sure if I can say “each and every software building plugins with automake uses libtool”, as I don’t know all of them (yet), but I can safely assume that most of the automake-based build systems out there can only make use of libtool to build plugins, shared objects that are dynamically loaded into processes to provide expanded features. This means that most of them will emit and install, by default, a number of libtool archive files.

But even if you build the plugins with libtool, it does not mean you’re loading them with it as well; you could be using the libltdl library that libtool provides, but sincerely, most of the time just learning about it is a waste of time. The most important feature it provides you with is a common interface for various dynamic linker interfaces; indeed there are different dynamic linker interfaces on Linux, Windows and Mac OS X (if I recall correctly, at least), but in this usage mode, there is no use of the archive files either, they are not really needed; the interface allows you to load any object file, as long as you use the full basename, including the operating system-specific extension (.so, .dll, .dylib…).

Where the libtool archive files are actually used is dynamic linker emulation, which is another work mode of libltdl; instead of accessing the standard dynamic linker (loader) in the system, the host software only relies on static linking, and the libtool archives are used for knowing what to look for. In this mode, the archives are used even when not using the emulation itself, for they describe the plugin is used from a shared object anyway. In this case you cannot remove the libtool archive without changing the underlying host software architecture considerably.

The result is that for most usage of libltdl, you can remove the .la files without thinking twice once you know that the software uses dlopen() to load the plugins (such is the case of xine-lib, which also uses a nasty Makefile.am hack to remove them altogether). You might require a bit more analysis for the software that does use libltdl.

About using libtool archive files for standard libraries is definitely a different, complex topic that I’ll have to write probably a lot about.

The first problem: static archives (and thus static linking) for libraries hosting plugins (like xine-lib, for instance). Most of these host programs do not support static linking at all, so for instance xine-lib never provided a static archive to begin with. This does not mean that there is no way to deal with static-linked plugins (the so-called built ins that I talked a lot about; heck Apache pulls it off pretty nicely as well). But it means that if the plugin links to the host library (and most do, to make sure proper linking is applied), then you cannot have a statically-linked copy of it.

Anything that uses dlopen(), anything that uses features such as PAM or NSS, will then not have a chance to work correctly with static linking (I admit, I’m oversimplifying here a bit, but please bear with me, getting into the proper details of that will require a different blog post altogether and I’m too tired to start drawing diagrams). After pushing this to an accepted state, we can now look at what libtool archive files provide for the standard libraries.

The libtool archive files were mainly created to overcome the limitations of the original library implementations for GNU and other operating systems, as they did not provide ABI version information, or dependency data. On some operating systems (but please don’t ask me to track down which), neither shared objects nor static archives provide these information; on most modern operating system, Unix, Unix-like, or not at all, at least shared objects are advanced enough to not require support files to provide metadata; ELF (used by Linux, BSD and Solaris) provides them in form of sonames and needed entries; Mach-O (used by OSX) and PE (used by Windows and .NET) have their own ways as well. Static libraries are a different matter.

For most Unix and Unix-like system, static libraries are rather called “static archives” because that’s what they are: archive files created with ar where a number of object files are added. For these, the extra information about static linking is somewhat valuable. It is not, though, as valuable as you might think it is. While it does provide dependency information, there are no warranty that the dependencies used to create the shared object and those to link the static archives are the same; transitive dependencies cover part of the issue, but it doesn’t let you know which builtins you’d have to link statically against. Also, they can be only used by the software that in turns use libtool to link.

With the current tendency at abandoning autotools (for good or bad, we have to accept that this is the current trend), this means that the .la files are getting even more useless, especially because the projects that do build libraries with libtool cannot simply rely on their users to use libtool on their own, and that means they have to provide them with options to link statically (if it’s at all supported) without using libtool. This usually boils down to using pkg-config to do the deed; this also have the positive effect of working for non-autotools and/or non-libtool based projects.

Sincerely, the only relatively big subsystem that relied heavily on libtool archive files was KDE 3; since they switched away from autotools (and thus, libtool) altogether, the rest of the software stack I know of, only consider libtool archives side effects, most of the time not thinking twice about their presence. A few projects are actively trying to avoid the presence of such files, for instance by removing them through an install hook (which is what xine-lib has been doing for years), but also has a drawback: make uninstall does not work if you do that, because it relies on the presence of the .la files (we don’t need those in ebuilds since we don’t use the uninstall target at all).

Taking all of this in consideration, I reiterate my suggestion that we start removing .la files on an ebuild basis (or on subsystem basis like it’s the case for X11 and gnome, as they can vouch for their packages not to idiotic things); doing so in one big swoop will cause more trouble (as a number of packages that do need them will require the change to be reverted, and a switch to be added); and simply adding a f..ine load of USE flags to enable/disable their installation is just going to be distracting, adding to the maintenance burden, and in general not helpful even for the packages that are not in tree at all.

Repeat after me: nobody sane would rely on libtool so much nowadays.

Using the SHA2 hash family with OpenPGPv2 cards and GnuPG

I’m sure I said this before, but I don’t remember when or to who, but most of the time it feels to me like GnuPG only works out of sheer luck, or sometimes fails to work just for my disgrace. Which is why I end up writing stuff down whenever I come to actually coerce it into behaving as I wish.

Anyway, let’s start with a bit of background; a bit of time ago, the SHA1 algorithm has been deemed by most experts to be insecure, which means that relying on it for Really Important Stuff was a bad idea; I still remember reading this entry by dkg that provided a good start to set up your system to use the SHA2 family (SHA256 in particular).

Unfortunately, when I actually got the FSFe smartcard and created the new key I noticed (and noted in the post) that only SHA1-signature worked; I set up the card to use SHA1 signatures, and forgot about it, to be honest. Today though, I went to sign an email and … it didn’t work, it reported me that the created signature was invalid.

A quick check around and it turns out that for some reason GnuPG started caring about the ~/.gnupg/gpg.conf file rather than the key preferences; maybe it was because I had to reset the PIN on the card when I mistyped it on the laptop too many times (I haven’t turned off the backlight since!). The configuration file was already set to use SHA256, so that failed because the card was set to use SHA1.

A quick googling around brought me to an interesting post from earlier this year. The problem as painted there seemed to exist only with GnuPG 1.4 (so not the 2.0 version I’m using) and was reportedly fixed. But the code in the actual sources of 2.0.16 tell a different story: the bug is the same there as it was in 1.4 back in January. What about 1.4? Well it’s also not fixed in the last release, but it is on the Subversion branch — I noticed that only afterwards, though, so you’ll see why that solution differs from mine.

Anyway, the problem is the same, in the new source file: gpg does not ask the agent (and thus scdaemon) to use any particular encoding if not RMD160, which was correct for the old cards but it definitely is not for the OpenPGP v2 that FSFE is now providing its fellows with. If you want to fix the problem, and you’re a Gentoo user, you can simply install gnupg-2.0.16-r1 from my overlay while if you’re not using Gentoo but building it by hand, or you want to forward it to other distributions’ packages, the patch is also available…

And obviously I sent it upstream and I’m now waiting on their response to see if it’s okay to get it applied in Gentoo (with a -r2). Also remember that you have to edit your ~/.gnupg/gpg.conf to have these lines if you want to use the SHA2 family (SHA256 in this particular case):

personal-digest-preferences SHA256
cert-digest-algo SHA256
default-preference-list SHA512 SHA384 SHA256 SHA224 AES256 AES192 AES CAST5 ZLIB BZIP2 ZIP Uncompressed

Smart Cards and Secret Agents

Update, 2016-11: The following information is fairly out of date, six years later, as now GnuPG uses stable socket names, which is good. Please see this newer post which includes some information on setting up agent forwarding.

I’ve been meaning to write about my adventure to properly set up authentication using the Fellowship of FSFe smartcard for quite a while, and since Markos actually brought the subject up earlier tonight I guess today is the right time. Incidentally, earlier in my “morning” I had to fight with getting it working correctly on Yamato so it might be useful after all…

First of all, what is the card and what is needed to use it… the FSFe Fellowship card is a smartcard with the OpenPGP application on it; smartcards can have different applications installed, quite a few are designed to support PKCS#11 and PKCS#15, but those are used by the S/MIME signature and encryption framework; the OpenPGP application instead is designed to work with GnuPG. When I went to FOSDEM, I set up my new key using the card itself.

The card provides three keys: a signing key, an encryption key, and an authentication key; the first two are used for GnuPG, as usual; the third instead is something that you usually don’t handle with GnuPG… SSH authentication. The gpg-agent program can actually handle your standard RSA/DSA keys for SSH, but that’s generally not very useful; if combined with the OpenPGP smartcard, this comes very useful.

So first of all you need a compatible smartcard reader; thankfully the CCID protocol is pretty standard and should work fine; I’ve got luck and three out of three smartcard readers I have work fine; one is from an Italian brand (but most likely built in Taiwan or China), the other is a GemAlto PinPad, and the third is the one integrated in my Dell laptop, Broadcom BCM5880v3. The last one requires an updated firmware and a ccid package capable of recognizing it… the one in Gentoo ~arch is already patched so that it works out of the box. I got mine at Cryptoshop which seems a decent place to get them in Europe.

Out of experience, at least GnuPG seems to have problems dealing with pinpads, and quite a few pinpad-provided readers seem to have driver problems; so get a cheaper, but just as valid, non-pinpad reader.

On the software side, there isn’t much to need: GnuPG itself could use the CCID readers directly, but my best luck has been using pcsc-lite; just make sure your pcsc-lite does not use HAL but rather has libusb support directly, by setting -hal usb as USE flags for it. GnuPG has to be built with the smartcard USE flag; pcsc-lite USE flag will give you the dependency as well, but it does not change the build at all. Update: Matija noted that there is also the need to install app-crypt/ccid (which is the userspace driver of the CCID-based smartcard readers); for whatever reason I assumed it was already a dependency of the whole set but that is not the case.

Make sure the pcscd service is started with the system, you’re gonna need it.

To actually make use of the key properly you’re going to need to replace ssh-agent with gnupg-agent…. more interesting, GNOME-Keyring also replaces ssh-agent, but if you let it do so, it won’t handler your OpenPGP card auth key! So you’re going to override that. Since using keyring with this setup seem to be impossible, my solution is to use a simple wrapper which I now release under CC-BY license.

You got run this script on every shell and your X session as well, for this to work as intended (it is needed in X session so that it works with libvirt over SSH otherwise virt-manager will still try to get the key from gnome-keyring). To do so I added a source of that script from both my ~/.shrc file and my ~/.xsession file, and make sure the latter is called; to do so I have this:

# in both ~/.shrc and ~/.xsession:
. /path/to/gpg-agent-wrapper

# in /etc/X11/xinit/xinitrc.d/01-xsession
[ -f ${HOME}/.xsession ] && . ${HOME}/.xsession

The trick of the script is making sure that gpg-agent is not already running, that it does not collide with the current information, but also it takes care of overriding gnome-keyring (it could be also done by changing the priority of ~/.xsession to be higher than gnome-keyring), and ensures that the SSH Agent Forwarding works… and yes it works even if on the client there is gpg-agent used for SSH, which means it can forward the card’s authentication credentials over a network connection.

So here it is, should be easy enough to set up for anybody interested.

Linux Containers on Gentoo, Redux

I’ve got a few further requests for Linux Containers support in Gentoo, one of which came from Alex, so I’ve decided to give it a try to update the current status of the stack, and give a decent description of what the remaining problems are.

First of all, the actual software: I’m still not fond of the way upstream is dealing with the lxc package itself; the build system is still trying to be smart and happening to be stupid, related to the LXC library. There are still very little failsafes, and there isn’t really enough space to manage LXC with the default tools as they are. While libvirt should support LXC just fine, I haven’t found the time to try it again and see if it works; I’m afraid it might only work if you use the forced setup that RedHat uses for LXC… but again I cannot say much until I find time to try it out and tweak it where needed.

*A note, as I stated before a lot of the documentation and tutorials regarding libvirt only apply to RedHat or Fedora. I can’t blame them for that, they do the work, they do the walk, but often it means that we have to adapt them or at least find a way to provide them with the pieces they expect in the right place. It requires a lot of my time to do that.*

I’ve finally added my “custom” init script to the ebuild, with version 0.7.2; it might change further with or without revision bump as I fix bugs reported; it should mostly auto-configure, the only requirement it has is to symlink to lxc.container to start the container defined in /etc/lxc/container.conf; it auto-detects the root path (so it won’t force you a particular filesystem layout), and works with both 32- and 64-bit containers transparently, so long there is a /sbin/init command (which I could have to change for systemd-based distributions at some point). What I now reason it lacks is support for detecting the network interface it uses and require that started up; I can add that at some point, in the mean time use /etc/conf.d/lxc.container and then add rc_need="net.yourif".

For what concerns networking, last I checked with lxc 0.7.1 userspace, and kernel 2.6.34, the macvlan system still isolated the host from the guests, which might be what you want but it’s definitely not what I care for. I’m guessing this might actually be by design; at any rate, even though technically slower, I find myself quite comfortable with using a Linux-based bridge as main interface, and bridge together the Virtual Ethernet device of the guest with the physical interface(s) of the host. This also works fine with libvirt/KVM, so it’s not a bad decision in my opinion. I just added 0.7.2 but I can’t see how that makes a difference, as macvlan is handled in kernel.

Thankfully, doing so with Gentoo’s networking system (which Roy wanted to deprecate, tsk!) is piece of cake: open /etc/conf.d/net, rename config_eth0 with config_br0, then add config_eth0="null" and bridge_br0="eth0".. exec ln -s net.lo /etc/init.d/net.br0, and use that for bringing the network up. Then on the LXC configuration side you got

lxc.network.type = veth
lxc.network.link = br0

and you’re all set. As I said, piece of cake.

Slightly more difficult is to properly handle the TTY devices; some people prefer to make use of the Linux virtual terminals to handle LXC containers; I sincerely don’t want it to mess with my real virtual terminals, I prefer using the lxc-console command to access it without networking. Especially since it messes up a lot if you are using KMS with the Radeon driver (which is what I’ve been doing for the past year or so).

For this to work out, I noted two things though: the first is that simply using the cgroup access control lists on the devices don’t help out that much (I actually haven’t tried to set them up properly just yet); on the other hand, LXC can create “pseudo-ttys” that can be used with lxc-console; the default number (9) does not work all that well, because the Gentoo init system set up twelve virtual terminals by default. So my solution is to use my custom static device tarball and the following snippet in the configuration:

lxc.tty = 12
lxc.pts = 128

This ensures that the TTY devices are all properly set up, so that they don’t mess with your virtual terminals, and lxc-console works like a charm in this configuration.

Now, the sad part: OpenRC is not yet stable, and I haven’t fixed yet the NFS bug I found (you stop it into the container and the host’s NFS exports are destroyed.. bad script, bad script!). On the other hand, I’m not doing LXC daily any longer for the simplest reason: the tinderbox is set up as I wish already, for the most part, so I have little to no incentive to work more on this; the good news is, I’m up for hire as I said for what concerns Ruby. So if you really want to use LXC in production and want me to improve whatever area Gentoo-related to it, including libvirt, you can just contact me.

Beside that, everything should be in place. Have fun working with LXC!

Removing .la files, for dum^W uncertain people

Since I have been still fighting with the damned .la files and I’m pretty sure that even though I have explained some use cases most of my colleagues haven’t really applied them, I decided to go with a different approach this time: graphical guides.

Since the post about the tree size has gotten so much feedback, probably because the graphs impacted on people, this might actually prove useful.

.la files removal flowchart

Note: I first tried to draw the chart with Inkscape, but the connector available on its code only draws straight lines, which are unusable for stuff like this; I found no way to anchor lines to an arbitrary point of objects either, so I gave up; dia is tremendously bad to work with; kivio 2 is not in Portage nor available as binary package for either Windows or OSX; OpenOffice to the rescue, worked almost flawlessly, unfortunately I didn’t want to waste time to define customised colours so you get the bad and boring ones in the image.

As you can see from this graph, my idea is that, at the end, every .la file is removed. Of course this is not immediate and depends on a series of factors; this graph shows at least the basic question you got to ask yourself when you have to deal with shared libraries. Please note that this does not apply the same to plugins and for that I’ll post another, different flow chart.

  • Does the package install internal libraries only? A lot of packages provide convenience libraries to share code between different executable programs (see this post for more information about it); this can be detected easily: there are no include files installed by default, the library is not in the ld path (such as /usr/lib/packagename). In this case, the .la files are not useful at all, and can be removed straight away.
  • Does the package only install shared objects? The .la files are only meaningful for static libraries that have no dependency information; if a package is not installing static libraries (.a files) it needs not the .la files.

  • Do the libraries in the package need other libraries? If the libraries are standalone, and only depend on the C library (libc.so), then there is no dependency information useful in the .la file, and can be dropped.

  • Is pkg-config the official way to link to the libraries? When using pkg-config, the dependency information is moved inside the .pc file, so the copy in the .la file is redundant, and thus unnecessary.

  • Is the package new? When adding a new package into Portage, there is no reason to keep the .la files around when the conditions shown above apply. For packages that are already in portage, the removal of .la files need to be considerate, or you’ll get the same kind of fire I got for trying to remove some (useless) .la files out of the blue. Not a situation that I like, but so is life.

Identifying pointless .la files for plugins

At this point I expect most users to know that .la files are evil and they are often useless and that removing them can save you from having to do work when packages drop them, either upstream or in Gentoo. Unfortunately, most developers tend to be overly conservative and keep them around even when not needed even remotely.

One of the cases for which I have said times and times again that .la files should be removed is for plugins; the .la files for plugins are useful when you use libltdl to wrap around plugins loading (and not even needed), but are totally pointless when using the straight dlopen() call.

Unfortunately even when that’s the case, it’s hard for ebuild developers to feel confident that the files are unneeded, so here it comes a practical case study to identify when they are not used at all. First step is to decide which package to test; I’ll go with eog since I have noticed this before and I know they are not used.

The eog package installs some .la files:

% qlist eog | fgrep .la
/usr/lib64/eog/plugins/libreload.la
/usr/lib64/eog/plugins/libstatusbar-date.la
/usr/lib64/eog/plugins/libfullscreen.la

Now we can see that the eog/plugins directory is where it’ll be looking for plugins, so we’ll start eog through strace and see if it tries to load any of that:

% strace -e open eog |& fgrep eog/plugins
open("/usr/lib64/eog/plugins/", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 14
open("/usr/lib64/eog/plugins/reload.eog-plugin", O_RDONLY) = 15
open("/usr/lib64/eog/plugins/statusbar-date.eog-plugin", O_RDONLY) = 15
open("/usr/lib64/eog/plugins/fullscreen.eog-plugin", O_RDONLY) = 15

A quick look at the strace outout can let us see that it’s not loading the plugins at all; indeed in this case eog was started without any plugin enabled, and it only opened the .eog-plugin files, which are ini-like files describing the plugins and their information; I’ll write more about this in the future when I’ll resume my posts about plugins which I’ve been slacking off from for a while. So let’s enable some plugins (all three of them!) and try again.

% strace -e open eog |& fgrep eog/plugins
open("/usr/lib64/eog/plugins/", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 14
open("/usr/lib64/eog/plugins/reload.eog-plugin", O_RDONLY) = 15
open("/usr/lib64/eog/plugins/statusbar-date.eog-plugin", O_RDONLY) = 15
open("/usr/lib64/eog/plugins/fullscreen.eog-plugin", O_RDONLY) = 15
open("/usr/lib64/eog/plugins/libreload.so", O_RDONLY) = 16
open("/usr/lib64/eog/plugins/libstatusbar-date.so", O_RDONLY) = 16
open("/usr/lib64/eog/plugins/libfullscreen.so", O_RDONLY) = 16

Now it looks better: it loads the .so directly; what does this mean? Simple: the package is most likely using dlopen() and not at all the libltdl wrapper. The .la files are not looked at, at all, so they can be removed without thinking twice.

With this method you can ensure that the .la files are just and only a side effect of using libtool rathre than something that is actually used by the software; to do that, though, you’ve got to make sure that the .so is at least loaded, otherwise you might not have loaded plugins at all (see the first output above).

Another common mistake is to consider the .la files needed or not depending on libltdl linking; this is not true in either case: software not linking to libltdl might make use of the .la files (kdelibs 3 for instance have their internal libltdl modified copy), and software using libltdl might not need .la files (PulseAudio for instance, that does not install .la files at all, in Gentoo).

This post brought to you by having so much work to do that just 2% of my actual mind was free!

Some details about our old friends the .la files

Today, I’m going to write about the so-called “libtool archives” that you might have read about in posts like What about those .la files? or Again about .la files (or why should they be killed off sooner rather than later) (anybody picking up the faint and vague citation here is enough of a TV geek).

Before starting I’m going to say that I’m not reading any public Gentoo mailing list lately which means that if I’m going to bring out points that have been brought up already, I don’t care. I’m just writing this for the sake of it and because Jorge asked me some clarification about what I did write some longish time ago. Indeed the first post is almost exactly one year ago. I do know that there has been more discussion about the need or not need of these libraries for ebuild-provided stuff, so I’m going to try to be as clear as possible.

First problem is to identify what these files are: they are simple text files, and they provide metadata about a library, or a pair of static and shared libraries; this metadata includes some obvious and non-obvious stuff like the names of the two type of libraries, the formal name (soname) of the shared library that can be used with dlopen() and a few more things included the directory the library is to be found in. The one piece of data that creates a problem for us is, though, the dependency list of libraries that needs to be linked against when linking against this library, but I’ll go back to that later. Just please note that it’s there.

Let’s go on with what these files are used for: libtool generates them, and libtool consumes them; they are used when linking with libtool to set the proper runpath if needed (by checking the library’s directory), to choose between the static or shared library version, and to bring in further dependency libraries. This latter thing is controversial and our main issue here: older operating systems had no way to define dependencies between libraries of any kind, and even nowadays on Linux we have no way to define the dependencies of static libraries (archives). So the dependency information is needed when using static link; unfortunately libtool does not ignore the dependencies with shared objects, which can manage their dependencies just fine (through the DT_NEEDED tag present in the .dynamic section), and pollutes the linking line causing problems that can be solved by --as-needed.

Another use for libtool archive files is to know how to load or preload modules (plugins). The idea is that when the operating system provides no way to dynamic load and link further shared objects (that is, a dlopen()-like interface), libtool can simulate that facility by linking together the static modules, using the libltdl library instead. Please note, though, that you don’t need the libtool archives to use libltdl if the operating systems does provide dynamic loading and linking.

This is all fine and dandy for theory, what about practicality? Do we need the .la files installed by ebuilds? And slightly related (as we’ll see later) do we need the static libraries? Unsurprisingly, the answer comes from the developer’s mantra: There is no Answer; it Depends.

To simplify my discussion here, I’m going to reduce the scope to the two official (or semi-official) operating systems supported by Gentoo: Linux and FreeBSD. The situation is likely to be different for some of the operating systems supported by Gentoo/Prefix, and while I don’t want to reduce their impact, the situation is likely to become much more complicated to explain by adding them to the picture.

Both Linux and FreeBSD use as primary executable and linkable format ELF (which stands exactly for “executable and linkable format”), they both support shared objects (libraries), the dlopen() interface, the DT_NEEDED tag, and they both use ar flat archives for static libraries. Most importantly, they use (for now, since FreeBSD is working toward changing this) the same toolchain for compile and link, GCC and GNU binutils.

In this situation, the libltdl “fake dlopen()” is sincerely a huge waste of time, and almost nobody use it; which means that most people wouldn’t want to use the .la files to open plugins (that is, with the exception of KDE 3), which makes installing libtool archives of, say, PulseAudio’s plugins, pointless. Since most software is likely not to use libltdl in the first place, like xine or PAM to cite two that I maintain somehow, their plugins also don’t need the libtool archive files to be installed. I already have reported some rogue pam modules that install pointless .la files (even worse, in the root fs). The rule of thumb here is that if the application is using plugins with standard dlopen() instead of libltdl (or an hacked libltdl like it’s the case for KDE 3), the libtool archives for these plugins are futile; this, as far as I know, includes glib’s GModule support (and you can see by using ls /usr/lib*/gnome-*/*.la that there are some installed for probably no good reason).

But this only provides a description of what to do with the libtool archive files for plugins (modules) and not with libraries; with libraries the situation is a bit more complicated, but not too much, since the rule is even simpler: you can drop all the libtool archives for libraries that are only ever shared (no static archive provided), and for those that have no dependency other than the C library itself (after making sure it does not simply forget to link the dependencies in). In those cases, the static library is enough to be listed by itself, and you don’t need any extra file to tell you what else is needed to be linked in. This already takes care of quite a bit of libtool files: grepping for the string “dependency_libs=''” (which is present in libraries that don’t have any further dependencies than the C library), provides 62 files in my system.

There is another issue that was brought up last year: libraries whose official discovery method is a -config script, or pkg-config; these libraries can ignore the need for providing dependencies for the static variant since they provide it themselves. Unfortunately this has two nasty issues: the first is that most likely someone is not using the correct scripts to find the stuff; I’ve read one blog post a week or two ago about a developer disgruntled because pkg-config was used for a library that didn’t provide it and suggested not to use pkg-config at all (which is quite silly actually). The other problem is that while pkg-config does provide a --static parameter to use different dependency lists between shared and static linking of a library (to avoid polluting the link line), I know of no way to tell autoconf to use that option during discovery at all. But there are also a few things that can be said, since there is enough space for binutils to just implement an extension to the static archives that can actually provide the needed dependency data, but that’s beside the point now I guess.

So let’s sidestep this issue for now and return to the three known cases when we can assert with a relative certainty that the libtool archives are unneeded: non-ltdl-fakeloaded plugins (xine, pam, GModule, …), libraries with no other dependencies than the C library and libraries that only install shared objects. While the first two are pretty obvious, there is something else to say about that last one.

By Gentoo policy we’re supposed to always install both the static and shared object version of a library; unless, that is, upstream doesn’t think so. The reason for this policy is that static linking is preferred for some mission-critical software that might not allow the system to boot up if the library is somehow broken (think bash and libreadline), and because sometimes, well, you just have to leave the user with the option of static linking stuff. There have been proposals of adding an USE flag to allow enabling/disabling build of static libraries, but that’s nowhere to use, yet; one of the problems was to link up the static USE flag with the static-libs USE flag of its dependencies; EAPI 2 USE dependencies can solve that just fine. There are, though, a few cases where you might be better off not providing a static library at all, even if upstream doesn’t say something outright about it, since most likely they never cared.

This is the case for instance of libraries that use, in turn, the dlopen() interface to load their plugins (using static linking with those can produce nasty results); for instance you won’t find a static library for Linux-PAM. There are a few more cases where having static libraries is not suggested at all and we might actually decide to take it out entirely, with the due caution. In those cases you can remove the libtool archive file as well since shared objects do take care of themselves.

Now, case in point, Peter took lots of flames for changing libpcre; there are mixed flames related to, from one side, him removing the libtool archive, and from the other to him removing the static library. I haven’t been part of the flames, in any way, because I’m still minding my own health first of all (is there any point in having a sick me not working on Gentoo?), yet here is my opinion: Peter did one mistake, and that was to unconditionally remove the static library. Funnily enough, what probably most people shouted him at for is the removal of the libtool archive, which is just nothing useful since, you can guess, the library has no further dependency beside the C library (it’s different for what regards libpcreposix, though).

My suggestion at this point is for someone to actually finish cleaning up the script that I think I posted to the mailing lists some time ago, and almost surely can be recreated quite quickly, that takes care of fixing the libtool archive files in the system without requiring a full rebuild of everything. Or even better get a post-processing task for Portage that replaces the direct references to libtool archives in the new libtool archives with generic library references (so that /usr/lib/libpcre.la would become -lpcre straight away and so on for all libraries; the result would be that breakage for future libtool archives removal wouldn’t exist at all).

Again about .la files (or why should they be killed off sooner rather than later)

I’ve been working as an experiment on rewriting xclip to use XCB rather than Xlib. This is mostly because I always have been interested in XCB but I never had time to learn the internals too much.

To make my task easier I ended up using some funcitons that are not available in the currently-released version of xcb-util, the side-package of XCB that contains some higher-level functions that make it easier to replace Xlib.

Beside the fact that xcb-util still haven’t bumped its version, which makes it impossible to check for the right version with pkg-config, there is one interesting point in using the latest available version through the x11 overlay.

Letting alone some problems with being able to actually fetch and install the packages I need (Donnie, I’ll send you the patches later if I can polish them a bit), over the actual GIT tree there are a few patches applied, coming from Jamey Sharp (an XCB developer) from March 2008. These remove one library (libxcb-xlib) and change the locking method used to make Xlib use the same socket as XCB. These changes not only break ABI (without changing the soname, alas!) but also make it impossible to build the old libX11 against the new libxcb. Using the live version of libX11 (that is also patched to use the new hand-out mechanism) fixes this problem, but the result is a way bigger trouble.

First of all, this is a perfectly good example of what I said about preserve-libs. If you are not using --as-needed, and you had libX11 built with xcb USE flag enabled, you’ll have libxcb.so.1 links on almost all X-using binaries in your system; after rebuilding the new libxcb and libX11 (which respectively would install libxcb.so.2, in theory, and let libX11 link to that), all the binaries will have in their process space both the old and the new libxcb. With different ABIs. And that’s a huge problem on itself.

Then there is the other problem, that is related to the .la files I discussed a few months back. As a huge amount of KDE modules (and not limited to) linked to Xlib, they also had libxcb-xlib listed in their .la files dependencies. Which causes everything to fail linking with libtool as it’s looking for the missing libxcb-xlib.la file.

I suppose it’s time to spend time to get a script to fix this situation, but I admit I’m not much motivated at the moment. Especially since my system is pretty slow when it comes to rebuild stuff for testing, and my employer is not going to pay me anytime soon to allow me getting a newer box.

Once the script is available, it should probably be much much easier to get rid of .la files in ebuilds, as we could just say to the users to run the fixing script and be done with that..

But I admit I was planning on doing some different things in the next days, I had little time for myself lately to begin with, and I’m following way too many things at once. Sigh.

Using Gnulib to improve software portability

This article was originally published on Linux.com.

Many, if not most, free and open source software projects are developed primarily on Linux-based systems using the GNU C Library (glibc). Projects that use glibc are likely to depend on functions that are not available on systems that use different C libraries, such as the different BSD flavors. When packages are built on systems that don’t use glibc they often fail, because the other C libraries are missing functions found in glibc. The GNU Portability Library can help developers with cross-platform programming needs.

In the past there were many different libraries, such as publib, that tried to provide alternatives to the functions that are missing in the main C library. Unfortunately, handling compatibility libraries proved to be difficult. The additional libraries would require additional tests when running configuration scripts prior to compilation, and add dependencies for non-glibc systems.

As the number of new functions provided by the glibc increased, the GNU project started looking at the requirements for portability of programs on operating systems based on different libraries, and eventually created the GNU Portability Library (Gnulib) project.

Normally, a library is code that is compiled as a shared or static file, and then linked into the final executable. Gnulib is a source code library, more similar to a student’s collection of notes than a usual compiled library. Gnulib can’t be compiled separately; the code in it is intended to be copied into the projects using it.

###Using Gnulib

Two requirements limit Gnulib use, one technical and the other legal. First, the software you use it with must also use GNU Autotools, as Gnulib provides tests for replacements of functions and headers written in the M4 language, ready for usage with GNU Autoconf.

Second, it has to be licensed under the GNU General Public License (GPL) or the GNU Lesser General Public License (LGPL), as the code inside Gnulib is mostly (but not entirely) released under the GPL itself. Some of it is also released under the LGPL, and some of it is available as public domain software.

If you’re working on software released under other licenses, such as the BSD license or Apple Public Source License (APSL), it’s better to avoid the use of functions that are not available in a library licensed with more open terms. For example, you could take the libraries present in a BSD-licensed C library to replace missing functions in the current library, whichever license the software is using. Alternative, you could find replacement functions in other BSD-licensed software or create a “cleanroom” implementation without copying code from GPLed software, leaving the external interface the same but using different code.

It’s usually easy to re-implement functions or just copy missing functions from another project when they are not available through another C library, especially when they are simple functions that consist of less than 10 lines of code. Unfortunately, many projects depend firmly on GNU extensions, and won’t build with replacement functions, or the code is already so complex that adding cases to maintain manually is an extra encumbrance for developers.

What Gnulib provides is not only the source code of the missing functions, but an entire framework to allow a project to depend on GNU extensions, while retaining portability with non-GNU based systems.

The core of the framework is the gnulib-tool script, which is the automated tool for extracting and manipulating source code from Gnulib. Using gnulib-tool, you can see the list of available modules (gnulib-tool –list) or test them (one by one, or all together, using the –test or –megatest options), but more importantly you can automatically maintain the replacement functions for a source tree.

A practical example should help explain the concept. Let’s say that there’s a foofreak package that uses the strndup() function (not available on BSD systems for instance) and the timegm() functions (not available on Solaris). To make the source portable, a developer can run gnulib-tool –import strndup timegm from the same directory of the source code, and the script will copy (in the default directories) the source code and the M4 autoconf tests for strndup(), timegm(), and their dependencies – for example, strndup() depends on strnlen().

After running, gnulib-tool tells you to make a few changes to your code to allow the replacement to be checked and used when needed. It requires the Makefile in lib/ to be generated from the configure script, so it has to be added to AC_OUTPUT or AC_CONFIG_FILES. At the same time the lib/ subdirectory has to be added to the SUBDIRS variable in the Makefile.am. The M4 tests are not shipped with other packages, so they must be copied in the m4/ directory, and that has to be added as an include directory for aclocal. Finally, two macros have to be called within the configure.ac to initialize the checks (gl_EARLY and gl_INIT).

You can specify the name of the subdirectories and the prefix of the macros by running gnulib-tool with the parameters –source-base, –m4-base, and –macro-prefix, respectively. It’s also important to note that the replacement functions are built in an auxiliary library called libgnu (by default, but the name can be overridden by using the –lib parameter), so the part of the software using those functions has to be linked against this too.

If later on your project also wants to use the iconv() function, gnulib-tool can detect the currently imported modules and add the required iconv module without rewriting everything from scratch. This makes it simple to add new modules when you use new functions.

The different replacement functions are called “modules” by gnulib-tool, and they consist of some source code, some header files, and an M4 macro test. As some functions depend on the behavior of other functions, the modules depend one on the other, so adding a single module can generate quite a few additional checks and replacements, which make sure that the behavior is safe.

As some modules are licensed under the GPL, while other are licensed under the LGPL, a package licensed under the latter might want to make sure that no GPL modules are pulled in, as that would break the license. To avoid adding GPLed modules, you can use gnulib-tool’s –lgpl option, which forces the use of LGPL modules.

You can also use alternative code to provide a replacement function instead of using the Gnulib modules, and to avoid problems with dependencies. Gnulib-tool has an –avoid option that prevents specified Gnulib modules from being pulled in.

Following the previous example, if foofreak already contains a strnlen() function, used when the system library doesn’t provide one, it would be possible to use that, instead of importing the strnlen() module from Gnulib, by issuing the command gnulib-tool –import strndup timegm –avoid strnlen. With this syntax the strnlen module will be ignored and the function already present in foofreak will be used. While this option is provided, it’s usually not advisable to use it if you don’t really know what you’re doing. A better alternative would be dropping strnlen() from the code where it was used, and using the replacement provided by Gnulib instead.

###Summary

Gnulib is an interesting tool for people working with GPL- or LGPL-licensed software that needs to be portable without dropping the use of GNU extensions, but it has some drawbacks. The major drawback is the license restrictions, which requires non-(L)GPL-licensed software to look elsewhere for replacements. It also requires the use of the GNU toolchain with Autotools, as it would be quite difficult to mimic the same tests with something like SCons or Jam.

Finally, the source code sharing between projects breaks one of the basic advantages in the use of libraries: the reuse of the same machine code. When the same function, required by 10 or 20 programs, has to be built inside the executable itself as the system does not provide it, there will be 10 or 20 copies of the same code in memory and on disk, and they may behave in different ways, leading to problems if they are linked inside a library used by third-party software.

Gnulib is worth a try, but you should not use it in critical software or software that might have a limited audience. In those situations, avoid the use of extension functions when possible, and add replacement functions only when they’re actually needed. There’s no point in having a replacement function for something that is works on 90% of modern systems and breaks only on obsolete or obscure operating systems or C libraries, especially if the software is written to be run on modern machines.

Best practices for portable patches

This article was originally published on Linux.com.

One of the things I usually take care of as a Gentoo packages maintainer is sending patches to upstream developers. If a patch is applied upstream, we can remove it from future versions of a package so we have less work to do to maintain the package. Unfortunately, it seems that other distributions and packagers don’t always do the same. This is true not only for Linux distributions such as Debian, Fedora Core, and SUSE, but also for maintainers of packages in places like FreeBSD’s Ports, DarwinPorts or Fink. Here are some tips for developers on making things easier for yourself and everyone who has to touch your code. When upstream developers are unaware of the problems their software has on platforms they can’t test (perhaps because they use another distribution, another environment, another operating system, or another hardware platform) they can’t fix them and are likely to introduce more problems in further versions, if they assume that things are good as they are. Letting upstream developers know of the problems, filing bugs, and in general reporting problems is one of the best ways to help an open source project, and it’s something even users with no technical skills can do.

When you have the technical ability to fix a bug, though, you should try to provide to the upstream developers a patch. However, not all patches can be applied unconditionally upstream, and that makes it harder to fix a problem in the short term. Some of the errors people make while preparing a patch and sending it upstream can be fixed in a reasonably simple way, but I still see bad patches in many packaging systems (including Gentoo on occasion).

The first thing to take into account when writing a patch is that the environment in which you’re working can be different from other environments. By “environment” I mean all the factors that can affect the behaviour of a program, such as the operating system and its version; the distribution, if an operating system has more than one; the drivers used when there is more than one kind; the version of the libraries or of the tools; and so on. One of the most common assumptions developers make while creating patches is that the environment they’re using is the “right” one and everything else should follow it; however, even when the environment used is the “right one,” a patch should always be general enough to be applicable also to “broken” environments.

Using GNU’s Autotools is a good way to allow a C/C++ program to adapt itself to multiple environments. Although they have many problems, and are hated by most of their users, Autotools are currently the best way to handle a multi-platform, adaptable building system. They have facilities to check for headers, libraries, functions, and a lot more. Unfortunately Autotools are quite difficult to learn, and I’m not going into details on how to use them or how to write macros in Autotools’ M4 language, but here’s a brief overview.

In an Autotools-based project, you usually write a configure.ac script in M4, which defines the checks needed on the host machine (the machine which where the build is going to be done); this script is then translated into a shell script by autoconf. You also write makefile.am files which are used by automake to create the templates for makefiles that are created by the configure script after the checks. Usually the configure script can define conditionals for makefiles and create a config.h file where you can find some C preprocessor’s macros useful to create conditional code for a C or C++ source file (using #ifdefs).

For instance, you can’t always suppose that a given header is there, even if it’s part of a standard defined for Unix systems. Libraries change, including system libraries, and something you’re writing now can change in the future. Try to make sure that all the system and non-conventional headers you use are present before using them. You can usually do that with an AC_CHECK_HEADERS() call in configure.ac and using the config.h generated on the host machine to check whether they are present. Sometimes you get warnings during configure execution, stating that a given header “can be preprocessed but fails to compile”: while this is a warning at this time, it is going to be an error in the future, so try to fix it as soon as you can. The fix usually involves adding a couple of needed headers in the prerequisite argument. With this check, for example, you can usually avoid the infamous error of malloc.h on FreeBSD systems (however, malloc.h is actually deprecated; you can use stdlib.h in its place without problems).

System headers aren’t portable, so you should always avoid them if you’re not going to use a kernel service you can’t get in other ways. Try instead to get the information or the services you need; it’s usually more portable also between different version of the same operating system. It’s not unlikely that different Linux (kernel) versions have incompatible headers that can make some software fail to work.

Libraries, too, can be a problem. Glibc, used by every Linux distribution (apart from the ones for embedded usage, which normally use uclibc or dietlibc), provides not only the base functions a normal Clibrary should provide (for example, to interact with kernel) but also provides a complete iconv() implementation, basic gettext implementation, and a getopt_long() function used to get long parameters when a program uses getopt to parse the arguments given to it by the user. However, other system libraries, like FreeBSD’s, don’t provide all that; iconv() is provided on FreeBSD systems by GNU libiconv, while gettext is entirely provided by GNU gettext; for getopt_long(), you need another library just on 4.x series of FreeBSD, on DragonFly BSD, and probably on other non-Linux systems. Autotools provide the AC_CHECK_LIB() macro to check for presence of a given function in a library, allowing the packages to check if they must link to libiconv, libdl, libintl, and so on.

Size of basic types, like integer sizes, can vary among hardware platforms, as can pointer size. This problem is getting bigger lately, as x86_64 systems can now be considered the first 64-bit hardware platform for mainstream desktop users. Having to care about 64-bit cleanness can be a bit tricky for software developed assuming the usual x86 platform, but the fixes are usually easy enough to make in a couple of minutes. One of the most important ways to ensure that the variables’ sizes are right is to use the “standardized integers” which can be found on sys/types.h or stdint.h headers. These are renamed integer types, with a name in the form (u)intSIZE_t — so you have int8_t for 8-bit signed integers and uint64_t for 64-bit unsigned integers. Pointers can’t use those fixed-length types, as they depend on the architecture; they usually have the same length as long integers, but if you can, try ptrdiff_t before using long to store them. Other types, such as size_t and off_t, have their dimension fixed in all the platforms, so they are safe to be used as they are. You should never mix fixed-length and “named” types like unsigned int and long, and you should be sure that you’re using the right %-code when passing integers to printf or scanf. You can usually find macros defined that allow you to get the right code with the size, so PRId64 would be the equivalent of %d for 64-bit integers, while PRIu32 would be the equivalent of %u for 32-bit integers.

Assembler code has also been a problem lately. Before the introduction of x86_64 processors every hardware platform had its own assembler code, so there weren’t too many problems porting it. It was just a matter of having a non-assembler version of the code, which was maybe slower but portable. With the new architecture, instead, you can have assembler code shared between x86 and x86_64 systems, and this is especially true for multimedia applications that make use of extended instructions like MMX, SSE, or 3DNow!. Although the syntax of assembler is usually the same, there are a few things to think about, such as the size of the registers and the operations run on them. Using “base” operations on 32-bit registers on an x86_64 processor will fail, so you usually have to select the register to use as operands with conditionals.

The discussion about hardware compatibility is of course longer than this and deserves a book of its own, as there are a lot of other tricky conditions to take care of. Those working on non-x86 architectures already know how to fix eventual problems and are likely to provide patches in cases where the software is broken.

Returning to software compatibility issues: you should take into account the possibility of crosscompiling the software. Letting Autotools-based projects build in crosscompile is usually an easy task, if you have all the dependencies already crosscompiled (and, obviously, you have a working crosscompiler for the target architecture). Unfortunately, some configure scripts are broken by design and fail to work as they should in crosscompile. The most common error is to use the output of the uname command to select the architecture or platform to compile for; this will tell you about the current system, not the target platform. Instead of using that, you can rely on ${host} and ${target} variables, which contains a CHOST-like string defining the host system (the one you’re compiling on) and the target one (the one you’re compiling for). The CHOST-like strings are structured as tuples, either three components (arch-vendor-os) or four (arch-vendor-kernel-libc), although the latter is used only when the operating system has no one “native” libc. The arch part of the string defines the hardware platform used: i386, i686, x86_64, ppc, ppc64, sparc, and so on; the vendor part used to refer to the hardware vendor (for example, ibm for their mainframes) but lately is being used to refer to the software vendor, so you can find there redhat, debian, mandrake, gentoo (note that Gentoo uses it only on uclibc and Gentoo/*BSD systems) and others, but usually it’s just “pc” for x86 and “unknown” for other architectures. The os part is the most tricky one, as it defines the operating system used. Usually it’s “linux-gnu” for GNU/Linux systems, but it can be “linux-uclibc” or just “linux” for other Linux-based systems, or can be “freebsdX.Y” to refer to a given version of FreeBSD (just “freebsd” is not a valid value). Generally, this value is what you use to check which OS-specific code to enable.

I hope that this introductory article will inspire patch authors to ensure that the changes they make are not going to break other systems. By using the right combination of Autotools and conditional compilation to check for the right environment where the patch is needed, developers can allow upstream maintainers to fix their software, saving maintainers from having to reinvent the wheel every time something in the “common” environment changes.