It’s been a while since I last wrote about parallel building. This has only to do with the fact that the tinderbox hasn’t been running for a long time (I’m almost set up with the new one!), and not with the many people who complained to me that spending time in getting parallel build systems to work is a waste of time.
This argument has been helped by the presence of a
--jobs option to Portage, with them insisting that the future will have Portage building packages in parallel, so that the whole process will take less time, rather than shortening the single build time. I said before that I didn’t feel like it was going to help much, and now I definitely have some first hand experience to tell you that it doesn’t help at all.
The new tinderbox is a 32-way system; it has two 16-core CPUs, and enough RAM for each of them; you can easily build with 64 process at once, but I’m actually trying to push it further by using the unbound
-j option (this is not proper, I know, but still). While this works nicely, we still have too many packages that force serial-building due to broken build systems; and a few that break in these conditions that would very rarely break on systems with just four or eight cores, such as lynx .
I then tried, during the first two rebuilds of world (one to set my choices in USE flags and packages, the other to build it hardened), running with five jobs in parallel… between the issue of the huge system set (yes that’s 4.24 years old article), and the fact that it’s much more likely to have many packages depending on one, rather than one depending on many, this still does not saturate the CPUs, if you’re still building serially.
Honestly seeing such a monstrous system take as much as my laptop, which is 1⁄4 in cores and 1⁄4 in RAM, to build the basic system was a bit… appalling.
The huge trouble seem to be for packages that don’t use make, but that could, under certain circumstances, be able to perform parallel building. The main problem with that is that we still don’t have a variable that tells us exactly how many build jobs we have to start, instead relying on the
MAKEOPTS variable. Some ebuilds actually try to parse it to extract the number of jobs, but that would fail with configurations such as mine. I guess I should propose that addition for the next EAPI version… then we might actually be able to make use of it in the Ruby eclasses to run tests in parallel, which would make testing so much faster.
Speaking about parallel testing, the next automake major release (1.13 — 1.12 was released but it’s not in tree yet, as far as I can tell) will execute tests in parallel by default; this was optional starting 1.11 and now it’s going to be the default (you can still opt-out of course). That’s going to be very nice, but we’ll also have to change our
src_test defaults, which still uses
emake -j1 which forces serialisation.
Speaking about which, even if your package does not support parallel testing, you should use parallel make, at least with automake, to call
make check; the reason is that the
check target should also build the tests’ utilities and units, and the build can be sped up a lot by building them in parallel, especially for test frameworks that rely on a number of small units instead of one big executable.
Thankfully, for the day there are two more packages fixed to build in parallel: Lynx (which goes down from 110 to 46 seconds to build!) and Avahi (which I fixed so that it will install in parallel fine).
I love parallel building.TGT=”3″ # number of physical cores * 1.5EMERGE_DEFAULT_OPTS=”–jobs –load=$(CP5) –verbose –tree –keep-going –with-bdeps=y”MAKEOPTS=”–jobs –load $(CP5)”Incidentally, this works wonderfully with distcc.
Can’t eutils.eclass’ makeopts_jobs be used for getting the desired number of jobs? It’s pretty naïve in its current state, but can’t it be improved for any case that make will accept instead of a new EAPI?
Can/should we be naming and shaming packages that don’t build in parallel?It sounds like a tinderbox would be good for doing that.
Mike, the load averaging is helpful indeed, but I’ve had bad experiences with it.Frank that _should_ work but I’ve seen it causing more trouble than it’s worth, mostly when used in conjunction with distcc.And finally, Sean, the tinderbox usually checks for that by side effect, and I’ve opened bugs for it before; unfortunately, most of the time there isn’t a quick way to fix them, and they are left untouched for months if not years. The Lynx failure has probably been there for a very long time, took a very long time for me to get it working now.
The problem I have with parallel builds is one of memory.with 4GB and 4 cores, parallel builds usually work well, but occasionally, e.g. chromium the parallel building uses so much RAM that it starts swapping, and then build speed slows right down.I guess it’s impossible to monitor and adapt to this during the build though.
Flameeyes, I’ve been mostly lurking in your blog for quite some time now. I’ll take this opportunity to say that I find it very … healing that your cares and gripes with system administration, especially as concerns gentoo, are largely the same as mine.Parallel building is an excellent example.See my experiences, for example https://bugs.gentoo.org/sho…Could you elaborate on the “bad experiences” you’ve had with load averaging? I’m often also tempted to remove the load cap on emerge and make; but allowing make to build with unlimited jobs causes some large packages to completely clog the machine – I’ve observed literally HUNDREDS of cc1 trying to run in parallel, which will put even a 12 GB RAM machine into swap. Then basically I have to go away and find something else to do until there are no more compilers waiting for swap.My observations with parallel building also seem to directly touch the “large system set (with large dependencies)” issue. See comment #5 of above-mentioned bug; something looks “over-conservative” with dependency calculation…I’ll wholeheartedly welcome any and all improvements to gentoo’s build performance in a parallel/distributed environment.Another common use case for me is to have a not-so-new-anymore machine (or a “surfboard”, i.e. netbook which is powerful enough for RUNNING a lot of things, but does not really lend itself to building) over for doing updates on. (Binary packages don’t really solve the issue for me – the fact that most of the systems have different requirements -> different USE flags is what causes the to run gentoo in the first place.)I try to have at least the compiles done on my “large” boxen; when there’s a really large performance difference I (would) also like to use distcc’s pump mode.Using this with portage currently is … not optimal: – pump can only be applied to a whole emerge run; quite commonly that’s hundreds of packages for me – pump will degrade when it has problems somewhere within one run (…) – userpriv/usersandbox cannot be used, since the set*id would be done “inside” the pump by emergeNow to wind up the rather long post, my objective here is twofold:1. Since the “distcc pump mode from emerge” issue looks fairly isolated right now, I’ll try to have a look and see whether I can improve, er, change things there. I hope (without wanting to ask for commitment on your part, I know what “real life” is like) that you would be willing to review any patches I come up with, and be able to provide *real* advice on them.2. I’d like to offer help (as far as my day job / money-earning commitments allow) working on gentoo, especially regarding build and “smooth upgrade” issues. I.e., if there’s a patch that improves things in this area that needs pre-release testing, I’m happy to provide data points (and code review where I feel qualified).Hm. I’ve had a quick look and did not see anything immediately useful – is there any documentation/tipps for a “hacking portage” workflow? I’d like to have the source code in some kind of revision control, but still have it available for “as immediate execution as possible” for testing. Hm. If it concerns only a few files/directories, perhaps I can come up with some symlink hackery.Last, but not least: Thank you for your consistent, ongoing work; it is very much appreciated. We appreciators are way too little outspoken.
As a reply to my own comment, two things:1. “distcc pump mode from emerge” is implemented, I was apparently just too blind to see it. Simply add ‘distcc-pump’ (in addition to ‘distcc’) to your FEATURES.2. There is useful documentation on how to work on portage, especially using the development version instead of the system-installed instance: http://www.gentoo.org/proj/…