For A Parallel World. Case Study n.4: jobserver unavailable

Here comes another case study fof fixing parallel make issues, in this case, I’m going to talk about a parallel make issue that does not cause the build to abort, but that forces serial make even when parallel make is requested.

If you look closely at the build messages coming out of various packages you might notice from time to time the error “jobserver unavailable” coming from make. When that warning is outputted, it means that GNU make is unable to properly handle parallel builds since it does not know how to discipline the build, for instance, this comes from the build of xfsprogs:

flame@yamato xfsprogs-2.10.1 % make -j16
=== include ===
gmake[1]: warning: jobserver unavailable: using -j1.  Add `+' to parent make rule.

I have to say that GNU make here is very nice with its messages: it does not simply say that the jobserver is unavailable, it also tells you that it is going to use -j1 and that you should add a plus sign to the “parent make rule”. But I guess most people wouldn’t know how to deal with this. Let’s look deeper.

The build system of xfsprogs is based on autoconf and libtool, but it’s custom made (which by itself caused me quite a few headaches in the past and I still loathe). It is also recursive just like automake based buildsystem, but how does it recurse? The main Makefile contains this:

default: $(CONFIGURE)
ifeq ($(HAVE_BUILDDEFS), no)
        $(MAKE) -C . $@
else
        $(SUBDIRS_MAKERULE)
endif

To find SUBDIRS_MAKERULE we have to dig a lot deeper, finally we can find it in include/buildmacros:

SUBDIRS_MAKERULE = 
        @for d in $(SUBDIRS) ""; do 
                if test -d "$$d" -a ! -z "$$d"; then 
                        $(ECHO) === $$d ===; 
                        $(MAKEF) -C $$d $@ || exit $$?; 
                fi; 
        done

So it’s serialising the subdirectories build, what is the problem here? The problem is that GNU make, to implement parallel build, requires special options and descriptors to be passed over the sub-make calls, this happens automatically when make is invoked directly or through $(MAKE) but if it’s indirected through variables, then it’s not happening automatically and the developer has to tell GNU make to actually pass the options along.

Now the only problem here is to identify which is the rule that you should add + to, but this is very simple since the rule here already has a @ symbol at its start, so just make it @+ and it’ll be done. A very big problem can arise if the rule executes something that is not make together with make (and something more than just test) since then stuff might break hugely.

At any rate, after you actually change this rule (as well as the SOURCE_MAKERULE one), xfsprogs can finally build in parallel, taking much less time than it otherwise would. Cool, isn’t it?

About buildsystems and upstreams

Donnie correctly commented that my earlier proposal is not really a solution that can be proposed upstream. It’s true, it isn’t really upstreamable at all. But I don’t think that’s a problem on its own.

The problem here is that most of these issues are most likely to be present in software that is not being actively developed, for which getting to upstream is nearly impossible. Other parts of it are caused by software that simply doesn’t have a build system at all, and just ships the .c files (like the piechart tool I found the other day), which also is probably unfixable since upstream decided not to provide a build system in the first place.

But, even if we were to decide to actually go this route for the packages,it doesn’t stop us from trying to contact upstream and propose them to get their build system fixed by either using autotools for most complex stuff or providing a simple sample Makefile that works by our standard for the small stuff. But patching the Makefile in distribution, it’s likely a waste of time.

Unfortunately, as I wrote recently even the best coder can write a stupid build system, which is unfortunately very true. And some of them, like Ragel author demonstrated lately, refuse to use automake at all, even when they fail to provide the correct basic functionality needed for a distribution to package his software. The reasoning still baffles me by the way.

So yeah I think this is a point that really needs to be faced with open mind, and a knife between your teeth to use against the most clueless of upstreams!

For A Parallel World. Case Study n.3: temporary files naming

A far less common problem than the last two I have written about, today I wish to analyse the failure in media-gfx/sam2p I reported. I have found similar problems before, and thus I think it’s another case worth talking about although the fix is very quick.

The failure in question would be this one:

Created executable file: ps_tiny (size: 47530).
ps_tiny: error at 1.2.1: tag %

The “premature EOF” error message usually means a file is truncated. With experience, you can tell this is a race condition: either the same broken rule or two rules are creating and deleting a file, and one of the two is arriving after it was deleted already.

In this case, looking at the original Makefile, it’s not the same broken rule:

l1g8z.pst: l1zip.psm psmlib.psm ps_tiny
        <$< >tmp.h perl -pe0
        $(CXX) -E $(L1_FLAGS) -DUSE_A85D=1 tmp.h >tmp.i
        tmp.pin $(PREPROC_STRIP)
        tmp.ps0 ./ps_tiny
        tmp.pst $(TTT_QUOTE) $@
        mv -f tmp.pst $@
l1ghz.pst: l1zip.psm psmlib.psm ps_tiny
        <$< >tmp.h perl -pe0
        $(CXX) -E $(L1_FLAGS) -DUSE_HEXD=1 tmp.h >tmp.i
        tmp.pin $(PREPROC_STRIP)
        tmp.ps0 ./ps_tiny
        tmp.pst $(TTT_QUOTE) $@
        mv -f tmp.pst $@
l1gbz.pst: l1zip.psm psmlib.psm ps_tiny
        <$< >tmp.h perl -pe0
        $(CXX) -E $(L1_FLAGS) -DUSE_BINARY=1 tmp.h >tmp.i
        tmp.pin $(PREPROC_STRIP)
        tmp.ps0 ./ps_tiny
        tmp.pst $(TTT_QUOTE) $@
        mv -f tmp.pst $@

I didn’t copy over all the rules, but this already shows the problem here. All the rules, while not exactly identical (the flags passed to the pre-processors are different depending on the target), use the same setting and use the same file names. The result is that while one rule runs the others will run too, creating the race condition.

For Gentoo I fixed it in a slightly sub-optimal way, changing all the reference to tmp. to $@.tmp. This is not exactly the nicest way as the correct way would have been to create different rules that generate the various temporary stages, so that then they could be executed in parallel as much as possible, rather than only sequentially, but as I see very little space for parallelism here, and the build system is a bit of a mess, I thought it was much easier to leave it at that. The result is that the rules above would become:

l1g8z.pst: l1zip.psm psmlib.psm ps_tiny
        <$< >$@.tmp.h perl -pe0
        $(CXX) -E $(L1_FLAGS) -DUSE_A85D=1 $@.tmp.h >$@.tmp.i
        <$@.tmp.i >$@.tmp.pin $(PREPROC_STRIP)
        <$@.tmp.pin >$@.tmp.ps0 ./ps_tiny
        <$@.tmp.ps0 >$@.tmp.pst $(TTT_QUOTE) $@
        mv -f $@.tmp.pst $@
l1ghz.pst: l1zip.psm psmlib.psm ps_tiny
        <$< >$@.tmp.h perl -pe0
        $(CXX) -E $(L1_FLAGS) -DUSE_HEXD=1 $@.tmp.h >$@.tmp.i
        <$@.tmp.i >$@.tmp.pin $(PREPROC_STRIP)
        <$@.tmp.pin >$@.tmp.ps0 ./ps_tiny
        <$@.tmp.ps0 >$@.tmp.pst $(TTT_QUOTE) $@
        mv -f $@.tmp.pst $@
l1gbz.pst: l1zip.psm psmlib.psm ps_tiny
        <$< >$@.tmp.h perl -pe0
        $(CXX) -E $(L1_FLAGS) -DUSE_BINARY=1 $@.tmp.h >$@.tmp.i
        <$@.tmp.i >$@.tmp.pin $(PREPROC_STRIP)
        <$@.tmp.pin >$@.tmp.ps0 ./ps_tiny
        <$@.tmp.ps0 >$@.tmp.pst $(TTT_QUOTE) $@
        mv -f $@.tmp.pst $@

The alternative using pipes, for the first rule, would probably be something like:

l1g8z.pst: l1zip.psm psmlib.psm ps_tiny
        perl -pe0 < $< | 
        $(CXX) -E $(L1_FLAGS) -DUSE_A85D=1 | 
        $(PREPROC_STRIP) | 
        ./ps_tiny | 
        $(TTT_QUOTE) $@ > $@

I haven’t changed it into this because I didn’t have too much time to look into how much difference it makes, or to test it; I’ve written it down to my TODO list for the future, maybe it is a possible improvement.

In general, for parallel make, pipes should be preferred to temporary files, and if temporary files are needed, they should have a different names for each target, so that they won’t overwrite one the other when make is run in parallel.

For A Parallel World. Case Study n.2: misknowing your make rules

Here comes another case study about parallel make failures and fixes. This time I’m going to write about a much less common, and more difficult to understand, type of failure. I have spotted and fixed this failure in gtk# (yes I have it installed).

Let’s see the failure to begin with:

Creating policy.2.4.glib-sharp.dll
Creating policy.2.4.glib-sharp.dll
Creating policy.2.4.glib-sharp.dll
ALINK: error A1019: Metadata failure creating assembly -- System.IO.FileNotFoundException: Could not find file "/var/tmp/portage/dev-dotnet/gtk-sharp-2.10.2/work/gtk-sharp-2.10.2/glib/policy.2.4.glib-sharp.dll".
File name: "/var/tmp/portage/dev-dotnet/gtk-sharp-2.10.2/work/gtk-sharp-2.10.2/glib/policy.2.4.glib-sharp.dll"
  at System.IO.FileStream..ctor (System.String path, FileMode mode, FileAccess access, FileShare share, Int32 bufferSize, Boolean anonymous, FileOptions options) [0x00000] 
  at System.IO.FileStream..ctor (System.String path, FileMode mode, FileAccess access, FileShare share) [0x00000] 
  at (wrapper remoting-invoke-with-check) System.IO.FileStream:.ctor (string,System.IO.FileMode,System.IO.FileAccess,System.IO.FileShare)
  at System.IO.File.OpenRead (System.String path) [0x00000] 
  at Mono.Security.StrongName.Sign (System.String fileName) [0x00000] 
  at System.Reflection.Emit.AssemblyBuilder.Save (System.String assemblyFileName, PortableExecutableKinds portableExecutableKind, ImageFileMachine imageFileMachine) [0x00000] 
  at System.Reflection.Emit.AssemblyBuilder.Save (System.String assemblyFileName) [0x00000] 
  at Mono.AssemblyLinker.AssemblyLinker.DoIt () [0x00000] 
ALINK: error A1019: Metadata failure creating assembly -- System.IO.IOException: Sharing violation on path /var/tmp/portage/dev-dotnet/gtk-sharp-2.10.2/work/gtk-sharp-2.10.2/glib/policy.2.4.glib-sharp.dll
  at System.IO.FileStream..ctor (System.String path, FileMode mode, FileAccess access, FileShare share, Int32 bufferSize, Boolean anonymous, FileOptions options) [0x00000] 
  at System.IO.FileStream..ctor (System.String path, FileMode mode, FileAccess access, FileShare share, Int32 bufferSize, Boolean isAsync, Boolean anonymous) [0x00000] 
  at System.IO.FileStream..ctor (System.String path, FileMode mode, FileAccess access) [0x00000] 
  at (wrapper remoting-invoke-with-check) System.IO.FileStream:.ctor (string,System.IO.FileMode,System.IO.FileAccess)
  at System.Reflection.Emit.ModuleBuilder.Save () [0x00000] 
  at System.Reflection.Emit.AssemblyBuilder.Save (System.String assemblyFileName, PortableExecutableKinds portableExecutableKind, ImageFileMachine imageFileMachine) [0x00000] 
  at System.Reflection.Emit.AssemblyBuilder.Save (System.String assemblyFileName) [0x00000] 
  at Mono.AssemblyLinker.AssemblyLinker.DoIt () [0x00000] 
Creating policy.2.6.glib-sharp.dll
Creating policy.2.6.glib-sharp.dll
Creating policy.2.6.glib-sharp.dll
ALINK: error A1019: Metadata failure creating assembly -- System.IO.IOException: Sharing violation on path /var/tmp/portage/dev-dotnet/gtk-sharp-2.10.2/work/gtk-sharp-2.10.2/glib/policy.2.6.glib-sharp.dll
  at System.IO.FileStream..ctor (System.String path, FileMode mode, FileAccess access, FileShare share, Int32 bufferSize, Boolean anonymous, FileOptions options) [0x00000] 
  at System.IO.FileStream..ctor (System.String path, FileMode mode, FileAccess access, FileShare share, Int32 bufferSize, Boolean isAsync, Boolean anonymous) [0x00000] 
  at System.IO.FileStream..ctor (System.String path, FileMode mode, FileAccess access) [0x00000] 
  at (wrapper remoting-invoke-with-check) System.IO.FileStream:.ctor (string,System.IO.FileMode,System.IO.FileAccess)
  at System.Reflection.Emit.ModuleBuilder.Save () [0x00000] 
  at System.Reflection.Emit.AssemblyBuilder.Save (System.String assemblyFileName, PortableExecutableKinds portableExecutableKind, ImageFileMachine imageFileMachine) [0x00000] 
  at System.Reflection.Emit.AssemblyBuilder.Save (System.String assemblyFileName) [0x00000] 
  at Mono.AssemblyLinker.AssemblyLinker.DoIt () [0x00000] 
Creating policy.2.8.glib-sharp.dll
Creating policy.2.8.glib-sharp.dll
Creating policy.2.8.glib-sharp.dll
ALINK: error A1019: Metadata failure creating assembly -- System.IO.IOException: Sharing violation on path /var/tmp/portage/dev-dotnet/gtk-sharp-2.10.2/work/gtk-sharp-2.10.2/glib/policy.2.8.glib-sharp.dll
  at System.IO.FileStream..ctor (System.String path, FileMode mode, FileAccess access, FileShare share, Int32 bufferSize, Boolean anonymous, FileOptions options) [0x00000] 
  at System.IO.FileStream..ctor (System.String path, FileMode mode, FileAccess access, FileShare share) [0x00000] 
  at (wrapper remoting-invoke-with-check) System.IO.FileStream:.ctor (string,System.IO.FileMode,System.IO.FileAccess,System.IO.FileShare)
  at System.IO.File.OpenWrite (System.String path) [0x00000] 
  at Mono.Security.StrongName.Sign (System.String fileName) [0x00000] 
  at System.Reflection.Emit.AssemblyBuilder.Save (System.String assemblyFileName, PortableExecutableKinds portableExecutableKind, ImageFileMachine imageFileMachine) [0x00000] 
  at System.Reflection.Emit.AssemblyBuilder.Save (System.String assemblyFileName) [0x00000] 
  at Mono.AssemblyLinker.AssemblyLinker.DoIt () [0x00000] 
make[3]: *** [policy.2.4.glib-sharp.dll] Error 1
make[3]: *** Waiting for unfinished jobs....
make[2]: *** [all-recursive] Error 1
make[1]: *** [all-recursive] Error 1
make: *** [all] Error 2

Okay so there are some failures during calling “alink”, in particular it reports “sharing violations”. I suppose the name of the error message is derived from the original .NET as “sharing violation” is what Windows reports when two applications try to write to the same file at once, or one tries to write to a file that is locked down by someone else.

But I want to put some emphasis on something in particular:

Creating policy.2.4.glib-sharp.dll
Creating policy.2.4.glib-sharp.dll
Creating policy.2.4.glib-sharp.dll
[...]
Creating policy.2.6.glib-sharp.dll
Creating policy.2.6.glib-sharp.dll
Creating policy.2.6.glib-sharp.dll
[...]
Creating policy.2.8.glib-sharp.dll
Creating policy.2.8.glib-sharp.dll
Creating policy.2.8.glib-sharp.dll
[...]
make[3]: *** [policy.2.4.glib-sharp.dll] Error 1
make[3]: *** Waiting for unfinished jobs....
make[2]: *** [all-recursive] Error 1
make[1]: *** [all-recursive] Error 1
make: *** [all] Error 2

As you can see each policy is reportedly created thrice. If you, like me, know what to look for in a parallel make failure, you’ll also notice that there are three policies being created there. This is quite important and interesting, as it already suggests to an experienced eye what the problem is, but let’s go on step by step.

Once again, we know the software is built with automake so you don’t expect parallel make failures, not from the default rules at least. But C#/Mono is not one of the languages that automake supports out of the box. Which means that almost surely there are custom rules involved.

As they are using custom rules, rather than automake the problem involves knowledge of GNU make (or any other make, but let’s assume GNU for now, it’s the most common in Free Software after all, for good or bad).

Let’s look for the “Creating” line in the Makefile.am file:

$(POLICY_ASSEMBLIES): $(top_builddir)/policy.config gtk-sharp.snk
        @for i in $(POLICY_VERSIONS); do        
          echo "Creating policy.$$i.$(ASSEMBLY)";       
          sed -e "s/@ASSEMBLY_NAME@/$(ASSEMBLY_NAME)/" -e "s/@POLICY@/$$i/" $(top_builddir)/policy.config > policy.$$i.config;  
          $(AL) -link:policy.$$i.config -out:policy.$$i.$(ASSEMBLY) -keyfile:gtk-sharp.snk;     
        done

If you had to deal with a similar failure before (as I did), you knew already what you were going to find in that rule. I’m referring to the for loop. It’s a common mistake for people not knowing make well enough to create a rule like this. They expect that declaring multiple targets in the rule means, for make “build all of these with a single command”, while it actually means “for any of these files, use this command to generate it”.

The result is that, as you’re going to need three different files, make will launch three times that code in parallel. Which not only will waste a huge amount of time but will also fail, as the three of them might try to access the same resource at once (like is happening here).

The solution for this kind of problem is not really obvious, as it often requires to rewrite the rules entirely. My usual way of thinking of the problem here is that whoever wrote the rule didn’t know make well enough and made a mistake, and it’s easier to just rewrite the rule.

Let’s decompose the rule then, ignoring the for loop, and the echo line, what we have is these two commands:

sed -e "s/@ASSEMBLY_NAME@/$(ASSEMBLY_NAME)/" -e "s/@POLICY@/$$i/" $(top_builddir)/policy.config > policy.$$i.config
$(AL) -link:policy.$$i.config -out:policy.$$i.$(ASSEMBLY) -keyfile:gtk-sharp.snk

Both of these two commands create a different file, one is intermediate, and is the policy configuration, the other is the final one. This again shows there’s a lack of understanding of how make is supposed to work, again a very common one, so I’m not blaming the developer here, make is a strange language. So there are two dependent steps involved here: the final requested result is the policy file, but to generate that you need the policy configuration.

Let’s start with the policy configuration then, the actual generation command is a simple sed call that takes the generic configuration and sets the assembly name and policy version in it. The problem here is obviously to replace the use of $$i (the variable used in the for loop) with the actual policy name. Just so we’re clear, the policy version is the 2.4, 2.6 and 2.8 string we have seen before. Luckily this is a pretty common task for a software like make and there is a construct that gets in our help: static pattern rules.

The name of the generated file is always in the format policy.$VERSION.config, and we need to know the $VERSION part for using it in sed. Nothing more suited for this than static pattern rules. Let’s replace the variable section of the filename with the magic symbol %, make will take care of expanding that as needed, and will also provide us a special variable in the rule, $* that will take the value of its expansion. The rule then becomes this:

policy.%.config: $(top_builddir)/policy.config
    sed -e "s/@ASSEMBLY_NAME@/$(ASSEMBLY_NAME)/" -e "s/@POLICY@/$*/" $(top_builddir)/policy.config > $@

And here we’ve created our policy configuration files, in a parallel build friendly way as none of them is dependent on the other, the three sed commands can easily be executed in parallel.

Now it’s time to create the actual policy assembly, again, we’re going to make use of the static pattern rules, and making the best use of the fact that you can also declare dependencies based on static patterns.

Instead of a simple two-entries rule, this is going to be a three-entries rule, the first entry defines the list of targets that this rule may apply to, that is the same as it was before ($(POLICY_ASSEMBLIES)), the second and third are the usual ones, defining target and dependencies.

While the original rule depended directly on the generic policy config, this one will only depend on the actual final config, as the rule we just wrote for the configuration files will take care of it. So the final rule to generate the wanted assembly will be:

$(POLICY_ASSEMBLIES) : policy.%.$(ASSEMBLY): policy.%.config gtk-sharp.snk
    $(AL) -link:policy.$*.config -out:$@ -keyfile:gtk-sharp.snk

At this point, the same has just to be applied to all the involved Makefile.am files in the package, like I did on the patch I submitted, and the package becomes totally parallel build friendly.

There is another nice addition to this: you’re trading one complex, difficult to read and broken rule with two one-liner rules, which makes the code much more readable and understandable if you’re looking for a mistake.