Debunking ccache myths

Doing support work in #gentoo-it is probably one of the most user-facing tasks I’ve been doing lately, it’s nice because you can often gather common misconceptions about problems and tools.

One of these is related to ccache. Some users seem to think that ccache will improve their compile speed in any situation. This couldn’t be more wrong.

First of all, in the situation of an always different source file, ccache will make build take a longer time than a non-cached build. The reason is pretty simple once you think of it. The cache is indexed by an hash, the md5 of the preprocessed source file; and the content of the cache is the resulting object file. When you build a given source file, ccache will have to take an md5 hash of the preprocessor’s result. Then it should look for it on the cache tree, and if it’s not found, it will have to compile it and write the output twice (once as the output of the build and once in the cache). It might not be a huge overhead but it’s an overhead nonetheless.

So there has to be a benefit to use ccache for it to be useful. The benefit is that, when you build the same sources twice, the hash, lookup and copy takes less time than the build, usually. But when do you actually build the same sources twice?

The first myth to debunk is that it’s helpful for packages using libtool as they build sources twice (one PIC and one non-PIC). While it’s true they build the same sources twice, they are not compiled in the same way, so the cache is not saving anything. If they were built the same way, there would be no reason to actually build it twice, no?

The second is that ccache helps when you change your CFLAGS. The idea of ccache is that it gives you the exact output the compiler would give you. And this means that changing CFLAGS will change the resulting output too. If it was ignoring the change in CFLAGS and returning data from cache it would be breaking your setup by disallowing you to change CFLAGS. Again, ccache is not helping you.

The third myth is that ccache makes changing USE flags a matter of seconds rather than hours. While it’s true that this is a case where commonly you do have an advantage on using ccache, it’s not that simple. Changing USE flags usually means changing the compiled code; there are rare cases (like xorg-server PDEPENDs) that allows you to keep the same exact sources when changing USE flags.

Even then, if you change versions of the libraries used by the software, then the preprocessed sources will change, and we’re back to square one.

All in all, ccache is not bad, it’s helpful in various situations, it’s quite useful for developers. But it’s not a panacea for Gentoo users.

16 thoughts on “Debunking ccache myths

  1. Are you sure about the part with changing CFLAGS? After all, most switches shouldn’t change the preprocessed source, only the later stages of compiling…

    Like

  2. Remove the comment, I remembered why this won’t work and shouldn’t. You DO want to recompile so ccache ignores files with different flags.

    Like

  3. A situation where ccache was useful to me was long compilations. Well mainlyOpenOffice.org. Because of other constraints I couldn’t leave my computeron long enough to finish in one go.ccache helped there as I wasn’t completely restarting from scratch and I eventually finished the compilation over several sessions.

    Like

  4. As with Francois, I used to use ccache for recompiling packages that hadn’t finished the first time — but for a different reason. At the time, I had overrated memory, and a BIOS without a memory clock option. The system was generally fine, but once in awhile would scramble a memory transfer enough to cause a crash. CCache was a LOT of help in that instance, as it’s what allowed me to even install Gentoo at all — there’s no way I could have gotten thru gcc, (monolitic) X, glibc, or the then unsplit KDE compiles, without it.Eventually I upgraded memory and that was no longer an issue. However, it’s still useful on a lot of upgrades, since in many cases only a few of the files will have changed. It was also of tremendous help a few months ago when I was following KDE-4 live-SVN builds (from the KDE overlay). With my still relatively new dual dual-core Opteron 290s and compiling into tmpfs, I could compile all of KDE4 from scratch in ~ 4 hours. With ccache, remerging weekly or so, it was ~2 hours, usually a bit more. When I was able to remerging daily, it went even faster, closer to an hour, but say 90 minutes. So rebuilding daily I was saving well over 50% on merge times, and that included all the stuff that ccache didn’t help with at all.Talking about compiling into tmpfs, those of us with PORTAGE_TMPDIR pointed at tmpfs only have to worry about ONE of those writes to disk you mentioned, the ccache write (caching it to disk is sort of the point), since the result written to PORTAGE_TMPDIR is usually only an intermediate result anyway, and deleted before the final qmerge to the live system filesystem on disk. Not having to write and delete all those intermediate results files makes a pretty big difference! =8^)Duncan

    Like

  5. I think one reason expectations are so high about CCache is this blurb in the handbook:

    In common compilations [using ccache] can result in 5 to 10 times faster compilation times.

    I don’t think it’s quite accurate. I’d expect it to help if your sources change very gradually, i.e. you update often. And of course the cache has to be sizable (classic space/speed tradeoff).

    Like

  6. Quoting “Mike from his blog”:http://multimedia.cx/eggs/c… ;)

    BTW, I found your ccache blog post while researching this problem. You are correct about all of the myths (true, when one understands how ccache operates, it makes absolutely no sense to make ccache part of Gentoo’s make.conf options). But your stated reasons are not all correct. ccache’s man page describes the uniqueness that goes into the cache determination. Thus, altering the the CFLAGS as described in your myth #2 will simply result in cache misses for every single file. Actually, I just realized that part of the uniqueness equation involves size of modification times of the compiler binaries. So keeping the same filenames throughout FATE’s gcc-SVN upgrades would still trigger proper cache misses.Also, ccache uses MD4 instead of MD5. I would have posted all of this on the relevant blog post but comments were closed. :-)

    Like

  7. It is usefull when you revdep-rebuild, emergae @preserved-rebuild, or change use flags.Ccache will improve compile time, because only some parts of programs need to change (mainly those depending on external libraries, that have changed).

    Like

  8. No, it is not useful even in those cases.If you’re running @revdep-rebuild@ or using the preserved-rebuild set, you’ve changed some of the external libraries used. It’s rare to find programs depending on external libraries that don’t include their header files, in those cases your hit/miss ration is probably very low still, if you have ccache enabled for _all_ rebuilds.And of course, as the cache is limited, you can’t expect that a package you compiled the previous time five months ago will still use it; nevermind that a simple glibc or linux-headers rebuild will make the cache totally useless.If you plan to try out multiple USE flags compilation for a given program the same day it *might* be that it makes a difference to enable ccache for those builds, but I sincerely doubt so even then, as it will change config.h and most likely enough of the internal code, for almost all major programs, that it won’t make it work well either.

    Like

  9. I suggest you try to back it with number at this point. Unless you do lots of preserved-libs related rebuilds, because of automagic deps and you’re testing the same exact software over and over, or you’re using a huge size of cache which is a bad thing for the disk space needed.If not in these two cases, you’re just feeling a placebo effect.

    Like

  10. All you said is right.After having discovered ccache while installing Gentoo and having used it for ~3 years without really going into the details, I found myself to have a 4Gb-compressed partition just to handle the cache, which was wasted space.Why? Because as you said, even changing a USE flag can trigger automagic deps and cause infinite cache misses. I actually found myself staring at ~250000 cached files and ~10000 cache hits (that is, less than 5%). And if you think that through symlinks, indirection, hashing, etc you waste more time than you wish to earn (because in more than 95% of cases you’ll have to recompile the source), you can safely disable ccache.I find it useful when rebuilding the stages with catalyst (which is a heavy cache-dependant task afterall); assuming you get an error in the middle of a stage3, you can skip a lot of rebuilds. Also, it can be beneficial on resource limited hardware acting as a server (like mine), where you may want to enable just a little feature every now and then but you’re unwilling to wait ~1hour for php or glibc to compile.Setting CCACHE_UNIFY variable might also be a good shot in some cases.Cheers,Neo2

    Like

  11. What about updates in which only some of the sourcefiles where changed but many were not?If 90% of the code are still the same, and these 90% are mostly in there own files, with the 10% in new files, only depending on the old ones?

    Like

  12. Arne that would only happen if upstrem releases a quick bugfix release… the cost/benefit ratio is heavily biased toward cost: * you need the same exact compiler; * you need the same exact flags (as far as I remember, it checks whole command-line, so stuff that defines @-DBUILD_DATE@ on command-line cause that to miss); * you need the same exact dependent libraries; * you need that the internal headers didn’t change a comma, because if they did change even just a prototype, the _whole_ program is invalidated.Sure of course it _might_ happen that a -r1 to fix a single source file happens from today to tomorrow, but how often does that happen? Usually you don’t have day-to-day -r1 updates for huge packages, you may have for smaller packages, and in that case the benefit of not rebuilding the whole entirety (if there is!) will not outweigh the cost of saving a copy of _all_ the object files that are being built (because that’s what ccache does and trust me, it’s a hefty cost!).

    Like

  13. Oh, hey, thanks for linking this again. It prompted me to check and subsequently disable it. Freed up 1.8 GB! For the record, fairly normal user (I think?); stats like so:cache hit 370985cache miss 1119008not a C/C++ file 142402files in cache 16606A pretty good number of hits, I guess, but probably still not worth it.

    Like

  14. An emerge -e –keep-going @world just stopped after 300 packages, and –resume –skipfirst did not work – I had to fix the problem and start over again from the beginning. So I was happy I had ccache enabled, this saved some time in this case, 99% was cached.I also like to use ccache for my own projects, because I often recompile them over and over again.But for daily Gentoo usage it is probably not really helpful, I agree.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s