Debunking ccache myths redux

Since my original post from two years ago didn’t reach yet all the users, and some of the developers as well, I would like to reiterate that you should not be enabling ccache unconditionally.

It seems like our own (Gentoo’s) documentation is still reporting that using ccache makes build “10 to 5 times faster”. I’ll call this statement for what it is: bullshit. The rebuild of the same package might have such a hit, but not the normal emerge process of a standard user with Gentoo. If anything at all, the use of ccache will slow your build down, and even add further failure cases and make it difficult to identify errors.

Now, since the approach last time might not have been clear enough, let me try a different one, by describing the steps it takes when you call it:

Now, we can identify three main time-consuming operations: preprocessing, hashing and copying; all of them are executed whether this is a hit or a miss; if it’s a miss you add to that the actual build. How do they fare about the kind of resources used? Hashing, just like compiling, is a CPU-intensive operation; preprocessing is mixed (you got to read the header files from around the disk); copying is I/O-intensive. Given that nowadays most systems have multiple CPU and find themselves slowing down on I/O (the tinderbox taught me that the hard way), the copying of files around is going to slow down the build quite a bit. Even more so when the hit-to-miss ration is high. The tinderbox, when rebuilding the same failing packages over and over again (before I started masking the packages that failed at any given time), had a 40% hit-to-miss ratio and was slowed down by using ccache.

Now, as I already wrote, there is no reason to expect that the same exact code is going to be rebuilt so often on a normal Gentoo system… even if minor updates to the same package were to share most of the source code (without touching the internal header files), for ccache to work you’d have to leave untouched compiler, flags, and all the headers of all the dependent libraries… and this often includes the system header files from linux-headers. And even if all these conditions were to hold true, you’d have to have rebuilt object files for a total size smaller than the cache size, in-between, or the objects would have had expired. If you think that 2GB is a lot, think again, if you were to use -ggdb especially.

Okay now there are some cases where you might care about ccache because you are rebuilding the same package; that includes patch-testing and live ebuilds. In these cases you should not simply set FEATURES=ccache, but you can instead make use of the per-package environment files. You can then choose two options: you can do what Portage does (setting PATH so that the ccache wrappers are found before the compilers themselves) or you can simply re-set the CC variable, such as export CC="ccache gcc". Just set it in /etc/portage/env/$CATEGORY/$PN and you’re done.

Now it would be nice if our terrific Documentation team – instead of deciding once again (the last time was with respect to alsa-drivers) that they know better what the developers should support – would understand that stating in the handbook that ccache somehow magically makes normal updates “5 to 10 times faster” is foolish and should be avoided. Unfortunately upon my request the answer hasn’t been what you’d expect from logic.

Exit mobile version