Upstream, rice it down!

While Gentoo often gets a bad name because of the so-called ricers and upstream developer complains that we allow users to shoot themselves in the foot by setting CFLAGS as they please, it has to be said that not all upstream projects are good in that regard. For instance, there are a number of projects that, unless you enable debug support, will force you to optimise (or even over-optimise) the code, which is obviously not the best of ideas (this does not count in things like FFmpeg that rely on Dead Code Elimination to link properly — in those cases we should be even more careful but let’s leave it alone for now).

Now, what is the problem with forcing optimisation for non-debug builds? Well, sometimes you might not want to have debug support (extra verbosity, assertions, …) but you might still want to be able to fetch a proper backtrace; in such cases you have a non-debug build that needs to turn down optimisations. Why should I be forced to optimise? Most of the time, I shouldn’t.

Over-optimisation is even nastier: when upstream forces stuff like -O3, they might not even understand that the code might easily slow down further. Why is that? Well one of the reasons is -funroll-loops: declaring all loops to be slower than unrolled code is an over-generalisation that you cannot pretend to keep up with, if you have a minimum of CPU theory in mind. Sure, the loop instructions have an higher overhead than just pushing the instruction pointer further, but unrolled loops (especially when they are pretty complex) become CPU cache-hungry; where a loop might stay hot within the cache for many iterations, an unrolled version will most likely require more than a couple of fetch operations from memory.

Now, to be honest, this was much more of an issue with the first x86-64 capable processors, because of their risible cache size (it was vaguely equivalent to the cache available for the equivalent 32-bit only CPUs, but with code that almost literally doubled its size). This was the reason why some software, depending on a series of factors, ended up being faster when compiled with -Os rather than -O2 (optimise for size, the code size decreases and it uses less CPU cache).

At any rate, -O3 is not something I’m very comfortable to work with; while I agree with Mark that we shouldn’t filter or exclude compiler flags (unless they are deemed experimental, as is the case for graphite) based on compiler bugs – they should be fixed – I also would prefer avoiding to hit those bugs in production systems. And since -O3 is much more likely to hit them, I’d rather stay the hell away from it. Jesting about that, yesterday I produced a simple hack for the GCC spec files:

flame@yamato gcc-specs % diff -u orig.specs frigging.specs
--- orig.specs  2010-04-14 12:54:48.182290183 +0200
+++ frigging.specs  2010-04-14 13:00:48.426540173 +0200
@@ -33,7 +33,7 @@
 %(cc1_cpu) %{profile:-p}

 *cc1_options:
-%{pg:%{fomit-frame-pointer:%e-pg and -fomit-frame-pointer are incompatible}} %1 %{!Q:-quiet} -dumpbase %B %{d*} %{m*} %{a*} %{c|S:%{o*:-auxbase-strip %*}%{!o*:-auxbase %b}}%{!c:%{!S:-auxbase %b}} %{g*} %{O*} %{W*&pedantic*} %{w} %{std*&ansi&trigraphs} %{v:-version} %{pg:-p} %{p} %{f*} %{undef} %{Qn:-fno-ident} %{--help:--help} %{--target-help:--target-help} %{--help=*:--help=%(VALUE)} %{!fsyntax-only:%{S:%W{o*}%{!o*:-o %b.s}}} %{fsyntax-only:-o %j} %{-param*} %{fmudflap|fmudflapth:-fno-builtin -fno-merge-constants} %{coverage:-fprofile-arcs -ftest-coverage}
+%{pg:%{fomit-frame-pointer:%e-pg and -fomit-frame-pointer are incompatible}} %1 %{!Q:-quiet} -dumpbase %B %{d*} %{m*} %{a*} %{c|S:%{o*:-auxbase-strip %*}%{!o*:-auxbase %b}}%{!c:%{!S:-auxbase %b}} %{g*} %{O*} %{W*&pedantic*} %{w} %{std*&ansi&trigraphs} %{v:-version} %{pg:-p} %{p} %{f*} %{undef} %{Qn:-fno-ident} %{--help:--help} %{--target-help:--target-help} %{--help=*:--help=%(VALUE)} %{!fsyntax-only:%{S:%W{o*}%{!o*:-o %b.s}}} %{fsyntax-only:-o %j} %{-param*} %{fmudflap|fmudflapth:-fno-builtin -fno-merge-constants} %{coverage:-fprofile-arcs -ftest-coverage} %{O3:%eYou're frigging kidding me, right?} %{O4:%eIt's a joke, isn't it?} %{O9:%eOh no, you didn't!}

 *cc1plus:

flame@yamato gcc-specs % gcc -O2 hellow.c -o hellow; echo $?   
0
flame@yamato gcc-specs % gcc -O3 hellow.c -o hellow; echo $?
gcc: You're frigging kidding me, right?
1
flame@yamato gcc-specs % gcc -O4 hellow.c -o hellow; echo $?
gcc: It's a joke, isn't it?
1
flame@yamato gcc-specs % gcc -O9 hellow.c -o hellow; echo $?
gcc: Oh no, you didn't!
1
flame@yamato gcc-specs % gcc -O9 -O2 hellow.c -o hellow; echo $?
0
Code language: PHP (php)

Of course, there is no way I could put this in production as it is. While the spec files allow enough flexibility to hit the case for the latest optimisation level (the one that is actually applied), rather than for any parameter passed, they lack an “emit warning” instruction, the instruction above, as you can see from the value of $? is “error out”. While I could get it running in the tinderbox, it would probably produce so much noise and for failing packages that I’d spend each day just trying to find why something failed.

But if somebody feels like giving it a try, it would be nice to ask the various upstream to rice it down themselves, rather than always being labelled as the ricer-distribution.

P.S.: building with no optimisation at all may cause problems; in part because of reliance on features such as DCE, as stated above, and as used by FFmpeg; in part because headers, including system headers might change behaviour and cause the packages to fail.

Exit mobile version