Strangely enough this post is not brought but something that happened recently (I usually write in response of stuff that happens), but it’s a generic indication that I’ve had to explain to too many people in the past. The most recent thing that might link to my writing this is my note about Pidgin crashing more on Fedora and a little discussion on the matter with a friend of mine.
So, we all know compiler optimisation flags in Gentoo, and most users trying some exotic ones probably know that a lot of ebuilds tend to filter, strip or otherwise reduce the number of flags actually used at build time. This is, in many cases, a violation of Gentoo policies, and Mark being both QA and Toolchain master usually get upset by them. Since this is often abused I’d like to explain here what the problem is.
First of all, not all compiler flags are the same: there are flags that change behaviour of the source code, and others that should not change that behaviour. For instance the
-ffast-math flag enables some more loose mathematical rules, this change the behaviour of the math source code as it’s no longer perfect; on the other hand the
-ftree-vectorize only changes the output code and not the meaning of the source code, and should then be counted in as a safe flag.
You can see already the gist here:
-ftree-vectorize has been called for build and runtime errors in the past few years, so it’s often not considered safe at all and indeed it’s often considered one of the less safe flags. But there are a few catches here: the first is that yes, the implementation of the flag might be at fault, and in the past it caused quite a few internal compiler errors, or miscompilation of source code into something that fails at runtime. But both these issues has to be reported to the GCC developers to be fixed because they are bug in GCC to begin with, so if the issue is just ignored by disabling the flag, they won’t be fixed any time soon.
Sometimes, though, the issues are neither a problem of miscompilation nor a bug in GCC, yet the package fails to execute properly or fails to build entirely; the latter happened with mplayer not too long ago. In these cases there’s still a bug, and it’s in the software itself, and needs to be fixed. In the case of mplayer for instance it has shown that the inline assembler code was using global labels rather than local lables like it should have been in the first place. Fixing the code wasn’t that hard, compared with the flag’s filtering.
Now, don’t get me wrong, I know there are at least a few issues with the approach I just noted: the first is that as the FFmpeg developers found out,
-ftree-vectorize is not often a good idea, and can actually produce slower code on most systems, at least for the common multimedia usage methods. The second problem is that, with the exception of the mplayer bug, most of the build and runtime failures aren’t straightforward to fix; and when the problem is in GCC, it might take quite a while before the issue is fixed; how should we work those situations out then, if not by filtering?
Well, filtering works fine as a temporary option, a workaround, a band-aid to hide the problem from users. So indeed we should use filtering; on the other hand, this is a problem akin to those related to parallel make or
--as-needed: you should not let the user be bitten by the problem, but at the same time you should accept that you haven’t fixed the bug just yet. My indication is thus keep the bug open if you “solved” it by filtering flags!
I know lots of developers dislike having bugs open at all, but it’s not really fixed if you just applied a workaround. And if you close it, nobody will ever see it again, and this will result in a phantom bug that will take a much longer time to reproduce, verify, and fix properly. This is for instance the problem when I hit a package that, without any comment in either ebuild or change log, has a
strip-flags call, which reduces the amount of flags passed to the compiler: finding whether the call is there because of a reported bug, or just because the Gentoo developer involved couldn’t be bothered by following the policy, requires time.
And finally, users please understand that the flags like
-fvisibility that do change the meaning of the source code should not be used by users but should rather be applied directly by upstream if they are safe!