More tinderbox notes, just to say

To complete the topic I started with the previous post I would like to give you some more notes about the way the tinderbox work, and in particular about the manual fiddling I have to do on it to make sure that it works smooth; and the issues that haven’t been tackled yet.

As I said the tinderbox is a Linux Container; this helps isolating the test environment from my workstation: killall and other unbound arguments are never going to hit my system, which is good. On the other hand, this still have a couple of rough patches to go through. Even with the latest (unreleased) version of lxc, /dev is either statically created or bound: udev does not work properly inside the container, for somewhat obvious reasons. The problem of that is that if you bind the /dev directory (or mount devtmpfs that is basically the same thing with a recent kernel), then you’ll have only one directory were FIFOs and sockets are created.

This not only causes sysvinit to shut down the host instead than the container if you use the shutdown command, but also makes it impossible to have a running, working syslog from within the container. While this shouldn’t hinder the tinderbox work, but seems like it does .

Another problem is with something all users have to fight with every time: incompatible language updates: Python, Perl, OCaml, Haskel, you name it. Almost all of these languages come with an “updater” script that is intended to rebuild their extensions and libraries to make sure that they are again compatible with the new release; failing to run these scripts will introduce quite a few failure cases within the tinderbox that, obviously, will be spurious. The same goes for lafilefixer. I’ll probably have to write a maintenance script to improve the flow of that, so I don’t forget steps around.

Yamato’s hardware is designed to work in parallel (as Mike also found out it seems like the sweet spot for build is number of cores per two); so another problem that adds to the tinderbox is that it does sequential merging of everything: making that parallel it is quite hard because of interdependencies of packages. So to speed stuff up, the build process itself has to be parallel-safe; which you probably know it often is not and which is one of the reasons why I often fix packages around.

One pretty bad example of time wasted because of serial-make runs is boost: almost four hours yesterday for the merge, because the tests are built and executed in series: instead of building all the test binaries and then executing them in series (which is a good compromise if you cannot run them in parallel), it goes on building and testing; the result is obviously pretty bad on my system.

Quite a few times, by the way, the whole situation is exasperated by the fact that the build failures were already reported, often times by me, last year. Yep, we got year-old build failures in tree that hit users. And guess what? At least a couple of time the proposed solution is “use an overlay”. No, the right solution is not to let software bitrot in the tree!

Anyway, thanks Genone who sent me the patch to have better collision diagnostics, and thanks Mauro who’s working on new bashrcng plugins for the QA tests. Hopefully, some of the tests will also find their way into Portage soon; and again, I’ll suggest you consider the idea of contributing somehow (if you cannot contribute by code or fixes) — might not be extremely difficult to deal with the tinderbox, but sure is time-consuming, and time, well, is money…

For A Parallel World. Case Study n.4: jobserver unavailable

Here comes another case study fof fixing parallel make issues, in this case, I’m going to talk about a parallel make issue that does not cause the build to abort, but that forces serial make even when parallel make is requested.

If you look closely at the build messages coming out of various packages you might notice from time to time the error “jobserver unavailable” coming from make. When that warning is outputted, it means that GNU make is unable to properly handle parallel builds since it does not know how to discipline the build, for instance, this comes from the build of xfsprogs:

flame@yamato xfsprogs-2.10.1 % make -j16
=== include ===
gmake[1]: warning: jobserver unavailable: using -j1.  Add `+' to parent make rule.

I have to say that GNU make here is very nice with its messages: it does not simply say that the jobserver is unavailable, it also tells you that it is going to use -j1 and that you should add a plus sign to the “parent make rule”. But I guess most people wouldn’t know how to deal with this. Let’s look deeper.

The build system of xfsprogs is based on autoconf and libtool, but it’s custom made (which by itself caused me quite a few headaches in the past and I still loathe). It is also recursive just like automake based buildsystem, but how does it recurse? The main Makefile contains this:

default: $(CONFIGURE)
ifeq ($(HAVE_BUILDDEFS), no)
        $(MAKE) -C . $@
else
        $(SUBDIRS_MAKERULE)
endif

To find SUBDIRS_MAKERULE we have to dig a lot deeper, finally we can find it in include/buildmacros:

SUBDIRS_MAKERULE = 
        @for d in $(SUBDIRS) ""; do 
                if test -d "$$d" -a ! -z "$$d"; then 
                        $(ECHO) === $$d ===; 
                        $(MAKEF) -C $$d $@ || exit $$?; 
                fi; 
        done

So it’s serialising the subdirectories build, what is the problem here? The problem is that GNU make, to implement parallel build, requires special options and descriptors to be passed over the sub-make calls, this happens automatically when make is invoked directly or through $(MAKE) but if it’s indirected through variables, then it’s not happening automatically and the developer has to tell GNU make to actually pass the options along.

Now the only problem here is to identify which is the rule that you should add + to, but this is very simple since the rule here already has a @ symbol at its start, so just make it @+ and it’ll be done. A very big problem can arise if the rule executes something that is not make together with make (and something more than just test) since then stuff might break hugely.

At any rate, after you actually change this rule (as well as the SOURCE_MAKERULE one), xfsprogs can finally build in parallel, taking much less time than it otherwise would. Cool, isn’t it?

For A Parallel World. Case Study n.3: temporary files naming

A far less common problem than the last two I have written about, today I wish to analyse the failure in media-gfx/sam2p I reported. I have found similar problems before, and thus I think it’s another case worth talking about although the fix is very quick.

The failure in question would be this one:

Created executable file: ps_tiny (size: 47530).
ps_tiny: error at 1.2.1: tag %

The “premature EOF” error message usually means a file is truncated. With experience, you can tell this is a race condition: either the same broken rule or two rules are creating and deleting a file, and one of the two is arriving after it was deleted already.

In this case, looking at the original Makefile, it’s not the same broken rule:

l1g8z.pst: l1zip.psm psmlib.psm ps_tiny
        <$< >tmp.h perl -pe0
        $(CXX) -E $(L1_FLAGS) -DUSE_A85D=1 tmp.h >tmp.i
        tmp.pin $(PREPROC_STRIP)
        tmp.ps0 ./ps_tiny
        tmp.pst $(TTT_QUOTE) $@
        mv -f tmp.pst $@
l1ghz.pst: l1zip.psm psmlib.psm ps_tiny
        <$< >tmp.h perl -pe0
        $(CXX) -E $(L1_FLAGS) -DUSE_HEXD=1 tmp.h >tmp.i
        tmp.pin $(PREPROC_STRIP)
        tmp.ps0 ./ps_tiny
        tmp.pst $(TTT_QUOTE) $@
        mv -f tmp.pst $@
l1gbz.pst: l1zip.psm psmlib.psm ps_tiny
        <$< >tmp.h perl -pe0
        $(CXX) -E $(L1_FLAGS) -DUSE_BINARY=1 tmp.h >tmp.i
        tmp.pin $(PREPROC_STRIP)
        tmp.ps0 ./ps_tiny
        tmp.pst $(TTT_QUOTE) $@
        mv -f tmp.pst $@

I didn’t copy over all the rules, but this already shows the problem here. All the rules, while not exactly identical (the flags passed to the pre-processors are different depending on the target), use the same setting and use the same file names. The result is that while one rule runs the others will run too, creating the race condition.

For Gentoo I fixed it in a slightly sub-optimal way, changing all the reference to tmp. to $@.tmp. This is not exactly the nicest way as the correct way would have been to create different rules that generate the various temporary stages, so that then they could be executed in parallel as much as possible, rather than only sequentially, but as I see very little space for parallelism here, and the build system is a bit of a mess, I thought it was much easier to leave it at that. The result is that the rules above would become:

l1g8z.pst: l1zip.psm psmlib.psm ps_tiny
        <$< >$@.tmp.h perl -pe0
        $(CXX) -E $(L1_FLAGS) -DUSE_A85D=1 $@.tmp.h >$@.tmp.i
        <$@.tmp.i >$@.tmp.pin $(PREPROC_STRIP)
        <$@.tmp.pin >$@.tmp.ps0 ./ps_tiny
        <$@.tmp.ps0 >$@.tmp.pst $(TTT_QUOTE) $@
        mv -f $@.tmp.pst $@
l1ghz.pst: l1zip.psm psmlib.psm ps_tiny
        <$< >$@.tmp.h perl -pe0
        $(CXX) -E $(L1_FLAGS) -DUSE_HEXD=1 $@.tmp.h >$@.tmp.i
        <$@.tmp.i >$@.tmp.pin $(PREPROC_STRIP)
        <$@.tmp.pin >$@.tmp.ps0 ./ps_tiny
        <$@.tmp.ps0 >$@.tmp.pst $(TTT_QUOTE) $@
        mv -f $@.tmp.pst $@
l1gbz.pst: l1zip.psm psmlib.psm ps_tiny
        <$< >$@.tmp.h perl -pe0
        $(CXX) -E $(L1_FLAGS) -DUSE_BINARY=1 $@.tmp.h >$@.tmp.i
        <$@.tmp.i >$@.tmp.pin $(PREPROC_STRIP)
        <$@.tmp.pin >$@.tmp.ps0 ./ps_tiny
        <$@.tmp.ps0 >$@.tmp.pst $(TTT_QUOTE) $@
        mv -f $@.tmp.pst $@

The alternative using pipes, for the first rule, would probably be something like:

l1g8z.pst: l1zip.psm psmlib.psm ps_tiny
        perl -pe0 < $< | 
        $(CXX) -E $(L1_FLAGS) -DUSE_A85D=1 | 
        $(PREPROC_STRIP) | 
        ./ps_tiny | 
        $(TTT_QUOTE) $@ > $@

I haven’t changed it into this because I didn’t have too much time to look into how much difference it makes, or to test it; I’ve written it down to my TODO list for the future, maybe it is a possible improvement.

In general, for parallel make, pipes should be preferred to temporary files, and if temporary files are needed, they should have a different names for each target, so that they won’t overwrite one the other when make is run in parallel.

For A Parallel World. Case Study n.2: misknowing your make rules

Here comes another case study about parallel make failures and fixes. This time I’m going to write about a much less common, and more difficult to understand, type of failure. I have spotted and fixed this failure in gtk# (yes I have it installed).

Let’s see the failure to begin with:

Creating policy.2.4.glib-sharp.dll
Creating policy.2.4.glib-sharp.dll
Creating policy.2.4.glib-sharp.dll
ALINK: error A1019: Metadata failure creating assembly -- System.IO.FileNotFoundException: Could not find file "/var/tmp/portage/dev-dotnet/gtk-sharp-2.10.2/work/gtk-sharp-2.10.2/glib/policy.2.4.glib-sharp.dll".
File name: "/var/tmp/portage/dev-dotnet/gtk-sharp-2.10.2/work/gtk-sharp-2.10.2/glib/policy.2.4.glib-sharp.dll"
  at System.IO.FileStream..ctor (System.String path, FileMode mode, FileAccess access, FileShare share, Int32 bufferSize, Boolean anonymous, FileOptions options) [0x00000] 
  at System.IO.FileStream..ctor (System.String path, FileMode mode, FileAccess access, FileShare share) [0x00000] 
  at (wrapper remoting-invoke-with-check) System.IO.FileStream:.ctor (string,System.IO.FileMode,System.IO.FileAccess,System.IO.FileShare)
  at System.IO.File.OpenRead (System.String path) [0x00000] 
  at Mono.Security.StrongName.Sign (System.String fileName) [0x00000] 
  at System.Reflection.Emit.AssemblyBuilder.Save (System.String assemblyFileName, PortableExecutableKinds portableExecutableKind, ImageFileMachine imageFileMachine) [0x00000] 
  at System.Reflection.Emit.AssemblyBuilder.Save (System.String assemblyFileName) [0x00000] 
  at Mono.AssemblyLinker.AssemblyLinker.DoIt () [0x00000] 
ALINK: error A1019: Metadata failure creating assembly -- System.IO.IOException: Sharing violation on path /var/tmp/portage/dev-dotnet/gtk-sharp-2.10.2/work/gtk-sharp-2.10.2/glib/policy.2.4.glib-sharp.dll
  at System.IO.FileStream..ctor (System.String path, FileMode mode, FileAccess access, FileShare share, Int32 bufferSize, Boolean anonymous, FileOptions options) [0x00000] 
  at System.IO.FileStream..ctor (System.String path, FileMode mode, FileAccess access, FileShare share, Int32 bufferSize, Boolean isAsync, Boolean anonymous) [0x00000] 
  at System.IO.FileStream..ctor (System.String path, FileMode mode, FileAccess access) [0x00000] 
  at (wrapper remoting-invoke-with-check) System.IO.FileStream:.ctor (string,System.IO.FileMode,System.IO.FileAccess)
  at System.Reflection.Emit.ModuleBuilder.Save () [0x00000] 
  at System.Reflection.Emit.AssemblyBuilder.Save (System.String assemblyFileName, PortableExecutableKinds portableExecutableKind, ImageFileMachine imageFileMachine) [0x00000] 
  at System.Reflection.Emit.AssemblyBuilder.Save (System.String assemblyFileName) [0x00000] 
  at Mono.AssemblyLinker.AssemblyLinker.DoIt () [0x00000] 
Creating policy.2.6.glib-sharp.dll
Creating policy.2.6.glib-sharp.dll
Creating policy.2.6.glib-sharp.dll
ALINK: error A1019: Metadata failure creating assembly -- System.IO.IOException: Sharing violation on path /var/tmp/portage/dev-dotnet/gtk-sharp-2.10.2/work/gtk-sharp-2.10.2/glib/policy.2.6.glib-sharp.dll
  at System.IO.FileStream..ctor (System.String path, FileMode mode, FileAccess access, FileShare share, Int32 bufferSize, Boolean anonymous, FileOptions options) [0x00000] 
  at System.IO.FileStream..ctor (System.String path, FileMode mode, FileAccess access, FileShare share, Int32 bufferSize, Boolean isAsync, Boolean anonymous) [0x00000] 
  at System.IO.FileStream..ctor (System.String path, FileMode mode, FileAccess access) [0x00000] 
  at (wrapper remoting-invoke-with-check) System.IO.FileStream:.ctor (string,System.IO.FileMode,System.IO.FileAccess)
  at System.Reflection.Emit.ModuleBuilder.Save () [0x00000] 
  at System.Reflection.Emit.AssemblyBuilder.Save (System.String assemblyFileName, PortableExecutableKinds portableExecutableKind, ImageFileMachine imageFileMachine) [0x00000] 
  at System.Reflection.Emit.AssemblyBuilder.Save (System.String assemblyFileName) [0x00000] 
  at Mono.AssemblyLinker.AssemblyLinker.DoIt () [0x00000] 
Creating policy.2.8.glib-sharp.dll
Creating policy.2.8.glib-sharp.dll
Creating policy.2.8.glib-sharp.dll
ALINK: error A1019: Metadata failure creating assembly -- System.IO.IOException: Sharing violation on path /var/tmp/portage/dev-dotnet/gtk-sharp-2.10.2/work/gtk-sharp-2.10.2/glib/policy.2.8.glib-sharp.dll
  at System.IO.FileStream..ctor (System.String path, FileMode mode, FileAccess access, FileShare share, Int32 bufferSize, Boolean anonymous, FileOptions options) [0x00000] 
  at System.IO.FileStream..ctor (System.String path, FileMode mode, FileAccess access, FileShare share) [0x00000] 
  at (wrapper remoting-invoke-with-check) System.IO.FileStream:.ctor (string,System.IO.FileMode,System.IO.FileAccess,System.IO.FileShare)
  at System.IO.File.OpenWrite (System.String path) [0x00000] 
  at Mono.Security.StrongName.Sign (System.String fileName) [0x00000] 
  at System.Reflection.Emit.AssemblyBuilder.Save (System.String assemblyFileName, PortableExecutableKinds portableExecutableKind, ImageFileMachine imageFileMachine) [0x00000] 
  at System.Reflection.Emit.AssemblyBuilder.Save (System.String assemblyFileName) [0x00000] 
  at Mono.AssemblyLinker.AssemblyLinker.DoIt () [0x00000] 
make[3]: *** [policy.2.4.glib-sharp.dll] Error 1
make[3]: *** Waiting for unfinished jobs....
make[2]: *** [all-recursive] Error 1
make[1]: *** [all-recursive] Error 1
make: *** [all] Error 2

Okay so there are some failures during calling “alink”, in particular it reports “sharing violations”. I suppose the name of the error message is derived from the original .NET as “sharing violation” is what Windows reports when two applications try to write to the same file at once, or one tries to write to a file that is locked down by someone else.

But I want to put some emphasis on something in particular:

Creating policy.2.4.glib-sharp.dll
Creating policy.2.4.glib-sharp.dll
Creating policy.2.4.glib-sharp.dll
[...]
Creating policy.2.6.glib-sharp.dll
Creating policy.2.6.glib-sharp.dll
Creating policy.2.6.glib-sharp.dll
[...]
Creating policy.2.8.glib-sharp.dll
Creating policy.2.8.glib-sharp.dll
Creating policy.2.8.glib-sharp.dll
[...]
make[3]: *** [policy.2.4.glib-sharp.dll] Error 1
make[3]: *** Waiting for unfinished jobs....
make[2]: *** [all-recursive] Error 1
make[1]: *** [all-recursive] Error 1
make: *** [all] Error 2

As you can see each policy is reportedly created thrice. If you, like me, know what to look for in a parallel make failure, you’ll also notice that there are three policies being created there. This is quite important and interesting, as it already suggests to an experienced eye what the problem is, but let’s go on step by step.

Once again, we know the software is built with automake so you don’t expect parallel make failures, not from the default rules at least. But C#/Mono is not one of the languages that automake supports out of the box. Which means that almost surely there are custom rules involved.

As they are using custom rules, rather than automake the problem involves knowledge of GNU make (or any other make, but let’s assume GNU for now, it’s the most common in Free Software after all, for good or bad).

Let’s look for the “Creating” line in the Makefile.am file:

$(POLICY_ASSEMBLIES): $(top_builddir)/policy.config gtk-sharp.snk
        @for i in $(POLICY_VERSIONS); do        
          echo "Creating policy.$$i.$(ASSEMBLY)";       
          sed -e "s/@ASSEMBLY_NAME@/$(ASSEMBLY_NAME)/" -e "s/@POLICY@/$$i/" $(top_builddir)/policy.config > policy.$$i.config;  
          $(AL) -link:policy.$$i.config -out:policy.$$i.$(ASSEMBLY) -keyfile:gtk-sharp.snk;     
        done

If you had to deal with a similar failure before (as I did), you knew already what you were going to find in that rule. I’m referring to the for loop. It’s a common mistake for people not knowing make well enough to create a rule like this. They expect that declaring multiple targets in the rule means, for make “build all of these with a single command”, while it actually means “for any of these files, use this command to generate it”.

The result is that, as you’re going to need three different files, make will launch three times that code in parallel. Which not only will waste a huge amount of time but will also fail, as the three of them might try to access the same resource at once (like is happening here).

The solution for this kind of problem is not really obvious, as it often requires to rewrite the rules entirely. My usual way of thinking of the problem here is that whoever wrote the rule didn’t know make well enough and made a mistake, and it’s easier to just rewrite the rule.

Let’s decompose the rule then, ignoring the for loop, and the echo line, what we have is these two commands:

sed -e "s/@ASSEMBLY_NAME@/$(ASSEMBLY_NAME)/" -e "s/@POLICY@/$$i/" $(top_builddir)/policy.config > policy.$$i.config
$(AL) -link:policy.$$i.config -out:policy.$$i.$(ASSEMBLY) -keyfile:gtk-sharp.snk

Both of these two commands create a different file, one is intermediate, and is the policy configuration, the other is the final one. This again shows there’s a lack of understanding of how make is supposed to work, again a very common one, so I’m not blaming the developer here, make is a strange language. So there are two dependent steps involved here: the final requested result is the policy file, but to generate that you need the policy configuration.

Let’s start with the policy configuration then, the actual generation command is a simple sed call that takes the generic configuration and sets the assembly name and policy version in it. The problem here is obviously to replace the use of $$i (the variable used in the for loop) with the actual policy name. Just so we’re clear, the policy version is the 2.4, 2.6 and 2.8 string we have seen before. Luckily this is a pretty common task for a software like make and there is a construct that gets in our help: static pattern rules.

The name of the generated file is always in the format policy.$VERSION.config, and we need to know the $VERSION part for using it in sed. Nothing more suited for this than static pattern rules. Let’s replace the variable section of the filename with the magic symbol %, make will take care of expanding that as needed, and will also provide us a special variable in the rule, $* that will take the value of its expansion. The rule then becomes this:

policy.%.config: $(top_builddir)/policy.config
    sed -e "s/@ASSEMBLY_NAME@/$(ASSEMBLY_NAME)/" -e "s/@POLICY@/$*/" $(top_builddir)/policy.config > $@

And here we’ve created our policy configuration files, in a parallel build friendly way as none of them is dependent on the other, the three sed commands can easily be executed in parallel.

Now it’s time to create the actual policy assembly, again, we’re going to make use of the static pattern rules, and making the best use of the fact that you can also declare dependencies based on static patterns.

Instead of a simple two-entries rule, this is going to be a three-entries rule, the first entry defines the list of targets that this rule may apply to, that is the same as it was before ($(POLICY_ASSEMBLIES)), the second and third are the usual ones, defining target and dependencies.

While the original rule depended directly on the generic policy config, this one will only depend on the actual final config, as the rule we just wrote for the configuration files will take care of it. So the final rule to generate the wanted assembly will be:

$(POLICY_ASSEMBLIES) : policy.%.$(ASSEMBLY): policy.%.config gtk-sharp.snk
    $(AL) -link:policy.$*.config -out:$@ -keyfile:gtk-sharp.snk

At this point, the same has just to be applied to all the involved Makefile.am files in the package, like I did on the patch I submitted, and the package becomes totally parallel build friendly.

There is another nice addition to this: you’re trading one complex, difficult to read and broken rule with two one-liner rules, which makes the code much more readable and understandable if you’re looking for a mistake.

For A Parallel World. Case Study n.1: automake variables misuse

Following my post about parallel builds I started today to tackle down some issues with packages not properly building with parallel make. Most of them end up being quite easy to fix, some of them don’t have to be fixed at all, just need the -j1 dropped out of the ebuild because they already build fine (this usually is due to an older version failing and the ebuild never being revisited).

As I haven’t been able yet to find time and energy to restart writing full-fledged guides (the caffeine starvation doesn’t help), I decided to start writing some “case studies”. What I mean is that I’ll try to blog about some common problems I found in a particular package, and show the process to fix that. Hopefully, this way it’ll be easier for other to fix similar problems in the future. This also goes toward the goal of showing more of what Yamato does (by the way, once again thanks to everybody who contributed, and you all are still able to chip in if you want to help me).

The first case study in the list is for libbtctl (that I think is deprecated for what I can understand of its author’s comment).

When building with -j8 (and dropping the ebuild serialisation), the build will fail with an error similar to this:

libtool: compile:  x86_64-pc-linux-gnu-gcc -DHAVE_CONFIG_H -I. -I.. -g -I../intl -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include -I/usr/include -I/usr/include/pygtk-2.0 -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include -I/usr/include/python2.5 -I/usr/include -DDATA_DIR="/usr/share/libbtctl" -DGETTEXT_PACKAGE="libbtctl" -march=barcelona -O2 -ftracer -pipe -ftree-vectorize -Wformat=2 -Wno-error -Wno-pointer-sign -g -ggdb -Wstrict-aliasing=2 -Wno-format-zero-length -MT btctl-pymodule.lo -MD -MP -MF .deps/btctl-pymodule.Tpo -c btctl-pymodule.c -o btctl-pymodule.o >/dev/null 2>&1
libtool: compile:  x86_64-pc-linux-gnu-gcc -DHAVE_CONFIG_H -I. -I.. -g -I../intl -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include -I/usr/include -I/usr/include/pygtk-2.0 -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include -I/usr/include/python2.5 -I/usr/include -DDATA_DIR="/usr/share/libbtctl" -DGETTEXT_PACKAGE="libbtctl" -march=barcelona -O2 -ftracer -pipe -ftree-vectorize -Wformat=2 -Wno-error -Wno-pointer-sign -g -ggdb -Wstrict-aliasing=2 -Wno-format-zero-length -MT btctl-py.lo -MD -MP -MF .deps/btctl-py.Tpo -c btctl-py.c -o btctl-py.o >/dev/null 2>&1
libtool: link: cannot find the library `libbtctl.la' or unhandled argument `libbtctl.la'
make[3]: *** [btlist] Error 1
make[3]: *** Waiting for unfinished jobs....
libtool: link: cannot find the library `libbtctl.la' or unhandled argument `libbtctl.la'
make[3]: *** [btctl-async-test] Error 1
libtool: link: cannot find the library `libbtctl.la' or unhandled argument `libbtctl.la'
make[3]: *** [btctl-discovery-test] Error 1
libtool: link: cannot find the library `libbtctl.la' or unhandled argument `libbtctl.la'
make[3]: *** [btsignal-watch] Error 1
make[2]: *** [all] Error 2
make[1]: *** [all-recursive] Error 1
make: *** [all] Error 2

It’s an easy error to understand, it cannot find libbtctl.la, piece of cake. It’s more of a problem to find the cause if you don’t know that beforehand.

The first comment to have here is that the buildsystem used is standard autotools; standard autotools, if used with their internal rules, are not subject to parallel-make failures. They don’t build directories in parallel, but they do the rest in as much parallel as they can. This means that it’s either using a custom rule, or it has misused autotools.

Another common problem with “cannot find the library” problems with libtool is when the library is in a different directory, and the order of subdirectories is wrong; this rarely creeps into the distributed tarball, if upstream is smart enough to run a make distcheck or to at least build their own tarballs, but you never know; usually you find this while trying to change the way interdependent libraries links against so that they can be built with --as-needed.

But there’s a tell-tale sign in the message: the library is not prefixed with any path, so it’s not being built in a different directory but in the same one. This makes it very suspicious.

The first error comes from btlist, so let’s extract the source tarball, and look in src/Makefile.am (because that’s the most likely directory where it is defined, we could have grepped but it’s easier this way):

noinst_PROGRAMS=btlist [...]

[...]

btlist_LDFLAGS = 
        libbtctl.la  $(BTCTL_LIBS) 
        $(BLUETOOTH_LIBS) $(OPENOBEX_LIBS)

What do you know? this is the only property defined for the btlist target, and indeed, it doesn’t look right, the LDFLAGS variable should be used to pass flags to be used by the linker (like -Wl,--as-needed), not the names of libraries. Even worse, name of libraries that have to be built as prerequisites for the target.

Edit: Rémi made me notice that I didn’t give the actual solution here, for those who don’t know automake so well. The correct variable to pass the libraries on is either LIBADD (for other libraries) or LDADD (for final executables). As btlist is in PROGRAMS, the latter is what we need to use.

And obviously the same mistake is repeated for almost every target in the Makefile.am. But luckily there’s a very active upstream, and the bug can be solved the same day it is reported.

It’s not so difficult once you see how to do it, is it?