Parallel emerge versus parallel make

Since I now have a true SMP system, it’s obvious that I’m expected to run parallel make to make use of this. Indeed, I set in my make.conf to use -j8 where I was using -j1 before. This has a few problems in general and it’s going to take some more work to be properly supported.

But before I start to get to those problems, I’d like to provide a public, extended answer to a friend of mine who asked me earlier today why I’m not using Portage 2.2’s parallel emerge feature.

Well, first of all, parallel emerge is helpful on SMP system during a first install, a world rebuild (which is actually what I’m doing now) or in a long update after some time spent offline; it is of little help when doing daily upgrades, or when installing a new package.

The reason is that you can effectively only merge in parallel packages that are independent of each other. And this is not so easy to ensure, to avoid breaking stuff, I’m sure portage is taking the safe route and rather serialise instead of risking brokenness. But even this, expects the dependency tree to be complete. You won’t find it complete because packages building with GCC are not going to depend on it. The system package set is not going to be put in the DEPEND variables of each ebuilds, as it is, and this opens the proverbial vase to a huge amount of problems, in my view. (Now you can also look up an earlier proposal of mine, and see if it had sense then already).

When doing a world rebuild, or a long-due update, you’re most likely going to find long chains that can be put in parallel, which I sincerely find risky, but they don’t have to be. When installing a new package, on a system that is already well installed and worked on for a few weeks even, you’ll be lucky (or unlucky) to find two or three chains at all. If you’re doing daily updates, finding parallel chains is also unlikely, as the big updates (gnome, kde, …) are usually interdependent.

Although it’s a nice feature to have, I don’t find it’s going to help a lot on the long run, I think parallel make is the thing that is going to make a difference in the medium term.

Okay, so what are the problems with using -j8 for ebuilds then?

  • we express the preference in term of (GNU) make parameters, but not all packages in Portage are built with make, let alone GNU’s;
  • ebuilds that use a non-make-compatible build system will try to parse the MAKEOPTS variable to find out the number of parallel jobs to execute; this does not always work right because there can be other options, like -s (which I use) that might make parsing difficult;
  • even -s option can be useful to some non-make-compatible build systems, but having to translate every option is tremendously pointless and boring;
  • some people use a high number of jobs because they have multiple box building as a cluster, using distcc or icecream; these won’t help with linking though, or with non-compile jobs; forcing non-compile tasks to a single job is going to discontent people using SMP systems, using a job count based on network hosts for non-compile tasks is going to slow down people with single-cpu and multi-host setups;
  • some tasks are being serialised by ebuilds when the could be ran in parallel;

And this is not yet taking into consideration buildsystem-specific problems!

What should be doing then? Well, I think the first point to solve is the way we express the preferences. Instead of expressing it in term of raw parameters to make, we should express it in term of number of jobs, and of features. For instance, a future version of portage might have a make.conf like this:


BUILD_OPTIONS="ccache icecream silent"

And then an ebuild would call, rather than simply emake, two new scripts: elmake and enmake (local and network), which would expand to the right number of jobs, for make-compatible buildsystems, that is. For other build systems eclasses could deal with that by getting the number of jobs and the features from there.

More options might be translated this way without having to parse the make syntax in each ebuild, or in each eclass. The ebuilds could also declare a global or per-phase limit to jobs, or a RESTRICT="parallel-make", that would make Portage use a single job.

The last point is probably the most complex one. Robin already dealt with a similar issues in the .bdf to .pcf translation of fonts, and solved it by having a new package provide a Makefile with the translation rules, the conversion could then be parallelised by make, instead of being serialised by the ebuild. I think we should do something like this in quite a few cases; the first one I can think of is the elisp compile for emacs extensions, and I don’t know whether Python serialises or execute in parallel the bytecode generation when providing multiple files to py_compile. And this is just looking at two eclasses I know doing something similar to this. But also Portage’s stripping and compressing of files should probably be parallelised, where there are enough resources to do so locally.

I guess I have found yet another task I’ll spend my time on, especially once I’m back from the hospital.