In my previous post about control I stated that we want to know about problems generated by compiler flags on packages, and that filtering flags is not a fix, but rather a workaround. I’d like to expand the notion by providing a few more important insights about the matter.
The first thing to note is that there are different kind of flags you can give the compiler; some are not supposed to be tweaked by users, other can be tweaked just as fine. Deciding whether a flag should or should not be touched by the user is a very tricky matter because different persons might have different ideas about them. Myself, I’d like to throw my two eurocents in to show the discretion I use.
The first point is a repeat of what I already expressed about silly flags that can be summed up in “if you’re just copying the flags from a forum post you’re doing it wrong”. If you really know what you’re doing it should be pretty easy for you to never have problems with flags, on the other hand if you just copy what others did, there is a huge chance you’re going to get burned by something one day or the one after that.
Compilers are huge, complex beasts, and being able to understand how they work is not something for the average user. Unfortunately to correctly assess the impact of a flag on the produced code, you do need to know a lot about the compiler. For this reason you often find some of the flags listed as “safe flags”, and briefly explained. Myself, I’m not going to do that, I’m just going to talk abstractly about them.
The first issue comes with understanding that there are “free” and “non-free” optimisations: some optimisation, like almost all the ones enabled at
-O2, don’t force any particular requirement on the code that the language the code is written in does not force before; actually sometimes it also makes it loose up a bit. An example of this is the dead code elimination phase that allows for functions only called in branches that are never executed to remain undefined at the final linking stage (as used by FFmpeg’s libavcodec to deal with optional codecs’ registration).
Before GCC 4.4, at least for x86, the
-O2 level also didn’t enforce (at least not really) some specifications of the C language, like strict aliasing, which reduced the chances for optimisation to loosen up the type of code that was allowed to compile properly. More than an allowance from GCC, though, this was due to the fact that the compiler didn’t have much to exploit by enforcing aliasing on registry-poor architectures like x86. With GCC 4.4, relying on this is no longer possible, though.
Other flags, though, do restrict the type of code that is accepted as proper and compiled, and may cause bugs that are too subtle for the average upstream developer, which then would declare custom flags “unsupported”. Unfortunately this is not some extremely rare case, it’s actually a norm for many upstream we deal with in Gentoo. These flags, with the most prominent example being
-ffast-math, break assumptions in the code; for instance this flag may provide slightly different results in mathematical functions that could lead to huge domino effects over code resolving complex formulae. On a similar note, but not the same note, the
-mfpmath=sse flag allows to generate SSE (instead of i387) code for floating point operations; it’s considered “safer” because it breaks an assumption that is only valid on x86 architecture (the non-standard 80-bit size of the temporary values), and only exploited by very targeted software rather than pure C code. Indeed this is what the x86-64 compiler do by default.
There are then a few flags that only work when the code is designed to make use of them; this is the case of the
-fvisibility flags family, that requires the code to properly declare the visibility of its function to work properly. Similarly, the
-fopenmp flag requires the code to be written using OpenMP, otherwise it won’t magically make your software faster by using parallel optimisation (there are, though, flags that do that as far as I know they are quite experimental for now). Enabling of these flags should be left only to the actual upstream buildsystem and not by users.
Some flags might interfere with hand-written ASM code; for instance the
-fno-omit-frame-pointer (need to get some decent output from kernel-level profilers), which is actually an un-optimisation, can make the ebx x86 register unavailable (when coupled with
-fPIE at least). While I experienced myself problems with
-ftree-vectorize in a single case (on x86-64 at least; on x86 I know it has created faulty code more than once, on whether this is a GCC bug or some assumption, I have no idea): with mplayer’s mp3lib and an hand-written piece of asm code that didn’t use local labels, the flag duplicated a code path and the pasted code from the
asm() block tried to declare twice the same (global) label.
Finally, some flags, like
-fno-rtti for C++ can cause some pretty heavy optimisation, but should never be used if not by upstream. Doing so it will cause some hard to track down issues like the ones that Ardour devs complained about as you’re actually disabling some pretty big parts of the language, in a way that makes the resulting ABI pretty much incompatible between libraries.
And I almost forgot the most important thing to keep in mind: not always the code most optimised for execution speed is faster, which is why on the first models of x86-64 CPUs, the code produced by
-Os sometimes performed better than the code produced by
-O2. In this case, the relatively small L2 cache on the CPU could slow down the execution of the most aggressively optimised code because it was larger and couldn’t fit in the cache. The simplest example to understand this is to think about unrolled loops: a loop is inherently slower than the unrolled code: it needs an iterator that might not be needed otherwise, it requires to jump up the stream, it might require to actually move a cursor of some sorts. On the other hand, especially for big loop bodies (with inline and static functions included of course), unrolling the loop might result in code that requires lots of cache fetches; and on the other hand, smaller loops that can be entirely kept in cache might not take that much time to jump back since the code is already there.
So what is the bottom line of this post? One could argue that the solution is to leave it to the compiler; but as Mike points out repeatedly at least ICC is beating the crap out of GCC on newer chips (at least Intel chips; I also have some concerns about their use of tricks like the ones I said above about unsafe floating point optimisations but I don’t want to go there today). So the compiler might not know really much better.
My suggestion is to leave it to the experts; while I don’t like the idea of making it an explicit USE flag to use your own CFLAGS definition (I also want the control) we shouldn’t be, usually, overriding upstream-provided CFLAGS is they are good. Sometimes though they might require a bit of help; for instance in the case of xine I can remember the original CFLAGS definition to be pretty much crazy, with all the possible optimisations forced on even when they don’t produce that good of a result on average at all. I guess it’s all a bet.
Gentoo maintainer node and help-call: it seems like either my PCI sound card fried up or there is some nasty bug in the ALSA driver for it I don’t really have the time to deal with (I’ll be updating my previous post about it since after a few more tries it turned out not to be related to the hardware outside of Yamato). This already was a problem in the past two months or so since kernel 2.6.29 didn’t work properly, and it starts to be a big deal. My contributions to PulseAudio especially on Gentoo side has been quite hectic because of it, and the package is in serious need of ordinary and extraordinary work on it.
I might just go out one day of this week and fetch a new USB card, but to be honest I’d like to avoid that for now (I had already enough hardware failure for the past months, and a few more hardware bits that I had to replace/buy for other reasons, as well as a scheduled acquisition of one or two eSATA disks to move around data that I have no longer space for). So I added one USB soundcard (as suggested by Lennart to be fine under Linux) to my wishlist (thanks to the fact that Amazon now ships electronics components to Italy, whooo!) but I could just use some old Linux-supported card if somebody had one to give me; my only requirement is for it to support digital ouput (iec958, S/PDIF), it really doesn’t matter whether it uses coaxial or optical cable; I admit coaxial might be a bit nicer (so that the receiver can deal with both Yamato and Merrimac, with the latter only providing optical), but really either are fine.
Yes I know this sounds a lot like a shameless plug – it probably is – but I’ve got over 1300 bugs open in Bugzilla, and Yamato is crunching its hard drives to find the issues before they hit users, I guess you can let me have this plug, can’t you? Thanks.
-ftree-vectorize is enabled in -O3 for newer versions of GCC. This is a very clear statment. Any code breaking due to this flags triggers a bug in GCC or uses undefined behaviour anyway.-O* options are meant to be safe for use by the average user. Wheter -O2/3/s is the best is a different question.