This Time Self-Hosted
dark mode light mode Search

Mailbox: does GCC miscompile at -O0?

Jeremy asks via email:

I’m curious about the mention of gcc -O0 producing broken code in this article: do you have a reference to a particular bug for that? I think the article would benefit from being more specific about which versions of gcc produce broken code for -O0, otherwise readers will be scared off from using it for ever.

What he refers to is the Gentoo backtracing guide I wrote that gives some indications about using Portage features and ebuild support to get proper backtraces (stack traces), and how to avoid debug code instead.

If you really need a precise backtrace, you should use -O1 (using the optimisations of level 0 is not recommended as there are known issues with gcc building invalid code).

I’ll update the guide today if I can find the time, so you’ll probably find a more correct phrase up there, soonish.

The issue here is actually very complex, which is why I think a reply on the blog is worth the hassle. The first note here to do is GCC itself does not really miscompile with -O0 optimisations. The problems lie in the code of the software itself, which makes it much more difficult to deal with — when the problem is in the compiler you can isolate the problem and wait till it’s fixed to make sure you can use the feature; if the problem is in the code that gets compiled, then we cannot really be sure it all works, not unless you build and test everything and even then it might not be easy.

Anyway, -O0 does a little more than just disabling some optimisations, it changes a lot in the way code is handled, first of all by changing the preprocessor definitions:

flame@yamato ~ % diff -u <(echo | gcc -O0 -dM -E -) <(echo | gcc -O1 -dM -E -)
--- /proc/self/fd/11    2010-06-14 02:59:32.973351250 +0200
+++ /proc/self/fd/12    2010-06-14 02:59:32.974351316 +0200
@@ -31,6 +31,7 @@
 #define __UINTMAX_TYPE__ long unsigned int
 #define __linux 1
 #define __DEC32_EPSILON__ 1E-6DF
+#define __OPTIMIZE__ 1
 #define __unix 1
 #define __UINT32_MAX__ 4294967295U
 #define __LDBL_MAX_EXP__ 16384
@@ -89,7 +90,6 @@
 #define __UINT16_MAX__ 65535
 #define __DBL_HAS_DENORM__ 1
 #define __UINT8_TYPE__ unsigned char
-#define __NO_INLINE__ 1
 #define __FLT_MANT_DIG__ 24
 #define __VERSION__ "4.5.0"
 #define __UINT64_C(c) c ## UL

Beside showing the second big problem that I’ll talk about (disabling inline functions), it should remind you of my corner case testing from two years ago: disabling optimisation explicitly causes headers to change as they hide or show different interfaces, “proper” or “fast”. As you might guess already, this makes it slightly harder to actually track down a crash, or get an absolute stack trace, because it’s the (already pre-processed) code that changes itself; the crash might very well be only happening in the codepath taken when the optimised functions are used.

This gets combined with no inlining of functions: some of the system headers will replace definitions of some functions with inline wrappers when it is possible. These inlines often are faster, but they also take care of adding the warnings with _FORTIFY_SOURCE (the feature of recent GCC and GLIBC that adds warnings at build-time for probably-wrong behaviour of source code, ignoring return values, providing wrong parameters, or causing buffer overflows).

The two combined make using the system library (GLIBC) quite different between optimised and non-optimised builds; it’s almost like having two different basic C libraries, as you might guess, it’s important, to know where something’s crashing, to use the same C library to get the trace. In quite a few cases, rebuilding with -O0 can stop something from crashing, but it’s almost never caused by the compiler miscompiling something, it’ s usually due to the code only working with the “proper” interfaces, but failing when using “less proper” interfaces that are faster, but rely on the rules to be followed literally.

One example of this are the functions ntohs and ntohl: they are present as functions in the C library but they are also present as macros in the GLIBC includes, so that they can be expanded at build-time (especially when used on constants). This causes quite some change on the handling of temporary variables, if you play with pointers and type-punning (very long story now) so the behaviour of the ntohs or ntohl calls differs, and upstream is likely to have tested only one behaviour, rather than both.

Finally, there are possible build failures tied with the use of -O0, because, among other things, the Dead Code Elimination pass is hindered. Dead Code Elimination is a safe optimisation, so it’s executed even at -O0, its objective is to remove code that is never going to be executed, like an if (0) code. Using if (0) rather than adding #ifdef all over the code is a decent way to make sure that the syntax of the code is correct, even for those parts that are not enabled (as long as you have declarations of functions and the like). Unfortunately, without optimisations enabled, DCE will only work over explicit if (0) blocks, and not indirect ones. So for instance if you got code like this:

int bar() {
#if ENABLE_FOO
  int a = is_foo_enabled_runtime();
#else
  int a = 0;
#endif

  if ( a )
     process_foo_parameters();

  process_parameters();

  if ( a )
   foo();
  else
   not_foo();
}

DCE will only drop the if (a) branches when constant propagation is also applied, and will produce code with undefined references to process_foo_parameters() and foo() when -O0 is used for the build. This trick is often used in the wild to write code that is readable and validated for syntax even when features are not enabled, without using a huge quantity of #ifdef statements. And as I just shown, it can easily break when built with -O0. Among the others, FFmpeg uses this kind of code, so it cannot really be compiled at an absolute -O0.

Note: this mostly applies to GCC, as other compilers might do constant propagation even at -O0 given that it is a fairly stable and safe optimisation. Clang, though, behaves the same way.

While it can be argued that software shouldn’t rely on specific features of the C compilers but just on the C language specification, it is quite impossible for us to vouch that all the software out there will build and work properly when built with no optimisation at all, for the reasons I just explained. Gentoo of course is ready to accept bugs for those packages, but don’t expect most of us to go out of our way to fix it in the tree if you don’t provide a patch; alternatively, talk with upstream and get them to fix their code, or work it around somehow. If you find a package that fails to build or (worse) fail to work properly when built with -O0, please open a bug against the tracker after tracking down where the issue here. If you give us a patch we’ll likely merge it soon, otherwise it would still be feasible to track down the packages with problems.

Comments 1
  1. That’s very interesting. Back when I was trying to get GCC4 to go on the PS2 (oh the pain!), I did find that things seemed a little less broken with -O0 but it was never perfect. No doubt there were other problems left to fix but I’d never considered that -O0 itself could be a problem.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.