x86 and SIMD extensions

The approach I’m taking for the tinderbox is to not use any special optimisation flag in it, even though I could use it to test stuff that can get tricky, such as -ftracer. It is important to note that it doesn’t even set any particular architecture, meaning that the compiler is told to use the default i686 architecture without any SIMD extension enabled.

Why is this relevant? Well, there seems to be some trouble with this idea. A number of packages, both multimedia and not, seem to rely to the presence of some of those instructions. The problem is not when they actually insert literal ASM code using said extensions, as that is passing right through the compiler and sent directly to the assembler program, which is always supporting the whole set of extensions, at least for what concerns x86, for what I can tell (other architectures seem to have more constrained assemblers).

The problem is that a C compiler such as GCC provides an “higher-level” interface to the processor’s: intrinsics. Intrinsics allow to write C code that calls instructions directly on the C objects, without using volatile ASM literals, and reusing similar syntax between MMX/SSE or other extensions. This is not really a common way to implement SIMD-based optimisations, as most of the developers involved in hard optimisation, such as in libav, prefer writing their own code directly, tweaking it depending on the processor’s design and not just on the extension used. As you can see in libav’s sources, the most recent optimisations have been written directly in assembly source files compiled with yasm rather than in C files.

Unfortunately, this is hindering my tinderboxing efforts, as it is going to stop me from installing in the container some of the packages, which in turn means that their dependencies can’t be installed. Now it is true that many times this situation is mostly something that the developer didn’t consider, and can be fixed, but in a number of projects there is no intention to remove the blocker, as they refuse to support older CPUs without said SIMD, mostly because of performance reasons.

I’m not sure how to proceed here: I could either change the tinderbox’s flags so that SSE is enabled, which would reflect your average modern system, or I could keep filing bugs for those packages until we sort out a way to deal with them. Suggestions are definitely welcome.

Tell-tale signs that your Makefile is broken

Last week I sent out a broad last-rite email for a number of gkrellm plugins that my tinderbox reported warnings about that shows that they have been broken for a long time. This has been particularly critical because the current maintainer of all the gkrellm packages, Jim (lack), seems not to be very active on them.

The plugins I scheduled for removal are mostly showing warnings related to the gdk_string_width() function called with a completely different object than it should have been called with, which will result in unpredictable behaviour at runtime (most likely, it’ll crash). A few more were actually buffer overflows, or packages failing because their dependencies changed. If you care about a plugin that is scheduled for removal, you’re suggested to look into it yourself and start proxy-maintain it.

I originally though I was able to catch all of the broken packages; but since then, another one appeared with the same gdk_string_width() error, so I decided running the tinderbox specifically against the gkrellm plugins; there was another one missing and then I actually found all of them. A few more were reported ignoring LDFLAGS, but nothing especially bad turned up on my tinderbox.

What it did show though, is that the ignored LDFLAGS are just a symptom of a deeper problem: most of the plugins have broken Makefile that are very poorly written. This could be seen in a number of small things, but the obvious one is the usual ”job server unavailable” message that I have written about last year.

So here’s a good checklist of things that shows that your Makefile is broken:

  • you call directly the make command — while this works perfectly fine on GNU systems, where you almost always use the GNU make implementation, this is not the case in most BSD systems, and almost always the Makefile is good enough only to work with the GNU implemenation; the solution is to call $(MAKE) which is replaced with the name of the make implementation you’re actually using;
  • it takes you more than one command to run make in a subdirectory (this can also be true for ebuilds, mind you) — things like cd foo && make or even worse (cd foo; make; cd ..; ) are mostly silly to look at and, besides, will cause the usual jobserver unavailable warning; what you might not know here is that make is (unfortunately) designed to allow for recursive build, and provides an option to do so without requiring changing the working directory beforehand: make -C foo (which actually should be, taking the previous point into consideration, $(MAKE) -C foo) does just that, and only changes the working directory for the make process and its children rather than for the current process as well;
  • it doesn’t use the builtin rules — why keep writing the same rules to build object files? make already knows how to compile .c files into relocatable objects; instead of writing your rules to inject parameters, just use the CFLAGS variable like make is designed to do! Bonus points if, for final executables, you also use the built-in linking rule (for shared objects I don’t think there is one);
  • it doesn’t use the “standard” variable names — for years I have seen projects written in C++ insisting on using CPP and CPPFLAGS variables, well that’s wrong, as here “cpp” refers to the C Pre-Processor; the correct variables are CXX and CXXFLAGS; inventing your own variable names to express parameters that can be passed by the user tends to be a vary bad choice, as you break the expectations of the developers and packagers using your software.

Now, taking this into consideration, can you please clean up your packages? Pretty please with sugar on top?

Upstream, rice it down!

While Gentoo often gets a bad name because of the so-called ricers and upstream developer complains that we allow users to shoot themselves in the foot by setting CFLAGS as they please, it has to be said that not all upstream projects are good in that regard. For instance, there are a number of projects that, unless you enable debug support, will force you to optimise (or even over-optimise) the code, which is obviously not the best of ideas (this does not count in things like FFmpeg that rely on Dead Code Elimination to link properly — in those cases we should be even more careful but let’s leave it alone for now).

Now, what is the problem with forcing optimisation for non-debug builds? Well, sometimes you might not want to have debug support (extra verbosity, assertions, …) but you might still want to be able to fetch a proper backtrace; in such cases you have a non-debug build that needs to turn down optimisations. Why should I be forced to optimise? Most of the time, I shouldn’t.

Over-optimisation is even nastier: when upstream forces stuff like -O3, they might not even understand that the code might easily slow down further. Why is that? Well one of the reasons is -funroll-loops: declaring all loops to be slower than unrolled code is an over-generalisation that you cannot pretend to keep up with, if you have a minimum of CPU theory in mind. Sure, the loop instructions have an higher overhead than just pushing the instruction pointer further, but unrolled loops (especially when they are pretty complex) become CPU cache-hungry; where a loop might stay hot within the cache for many iterations, an unrolled version will most likely require more than a couple of fetch operations from memory.

Now, to be honest, this was much more of an issue with the first x86-64 capable processors, because of their risible cache size (it was vaguely equivalent to the cache available for the equivalent 32-bit only CPUs, but with code that almost literally doubled its size). This was the reason why some software, depending on a series of factors, ended up being faster when compiled with -Os rather than -O2 (optimise for size, the code size decreases and it uses less CPU cache).

At any rate, -O3 is not something I’m very comfortable to work with; while I agree with Mark that we shouldn’t filter or exclude compiler flags (unless they are deemed experimental, as is the case for graphite) based on compiler bugs – they should be fixed – I also would prefer avoiding to hit those bugs in production systems. And since -O3 is much more likely to hit them, I’d rather stay the hell away from it. Jesting about that, yesterday I produced a simple hack for the GCC spec files:

flame@yamato gcc-specs % diff -u orig.specs frigging.specs
--- orig.specs  2010-04-14 12:54:48.182290183 +0200
+++ frigging.specs  2010-04-14 13:00:48.426540173 +0200
@@ -33,7 +33,7 @@
 %(cc1_cpu) %{profile:-p}

 *cc1_options:
-%{pg:%{fomit-frame-pointer:%e-pg and -fomit-frame-pointer are incompatible}} %1 %{!Q:-quiet} -dumpbase %B %{d*} %{m*} %{a*} %{c|S:%{o*:-auxbase-strip %*}%{!o*:-auxbase %b}}%{!c:%{!S:-auxbase %b}} %{g*} %{O*} %{W*&pedantic*} %{w} %{std*&ansi&trigraphs} %{v:-version} %{pg:-p} %{p} %{f*} %{undef} %{Qn:-fno-ident} %{--help:--help} %{--target-help:--target-help} %{--help=*:--help=%(VALUE)} %{!fsyntax-only:%{S:%W{o*}%{!o*:-o %b.s}}} %{fsyntax-only:-o %j} %{-param*} %{fmudflap|fmudflapth:-fno-builtin -fno-merge-constants} %{coverage:-fprofile-arcs -ftest-coverage}
+%{pg:%{fomit-frame-pointer:%e-pg and -fomit-frame-pointer are incompatible}} %1 %{!Q:-quiet} -dumpbase %B %{d*} %{m*} %{a*} %{c|S:%{o*:-auxbase-strip %*}%{!o*:-auxbase %b}}%{!c:%{!S:-auxbase %b}} %{g*} %{O*} %{W*&pedantic*} %{w} %{std*&ansi&trigraphs} %{v:-version} %{pg:-p} %{p} %{f*} %{undef} %{Qn:-fno-ident} %{--help:--help} %{--target-help:--target-help} %{--help=*:--help=%(VALUE)} %{!fsyntax-only:%{S:%W{o*}%{!o*:-o %b.s}}} %{fsyntax-only:-o %j} %{-param*} %{fmudflap|fmudflapth:-fno-builtin -fno-merge-constants} %{coverage:-fprofile-arcs -ftest-coverage} %{O3:%eYou're frigging kidding me, right?} %{O4:%eIt's a joke, isn't it?} %{O9:%eOh no, you didn't!}

 *cc1plus:

flame@yamato gcc-specs % gcc -O2 hellow.c -o hellow; echo $?   
0
flame@yamato gcc-specs % gcc -O3 hellow.c -o hellow; echo $?
gcc: You're frigging kidding me, right?
1
flame@yamato gcc-specs % gcc -O4 hellow.c -o hellow; echo $?
gcc: It's a joke, isn't it?
1
flame@yamato gcc-specs % gcc -O9 hellow.c -o hellow; echo $?
gcc: Oh no, you didn't!
1
flame@yamato gcc-specs % gcc -O9 -O2 hellow.c -o hellow; echo $?
0

Of course, there is no way I could put this in production as it is. While the spec files allow enough flexibility to hit the case for the latest optimisation level (the one that is actually applied), rather than for any parameter passed, they lack an “emit warning” instruction, the instruction above, as you can see from the value of $? is “error out”. While I could get it running in the tinderbox, it would probably produce so much noise and for failing packages that I’d spend each day just trying to find why something failed.

But if somebody feels like giving it a try, it would be nice to ask the various upstream to rice it down themselves, rather than always being labelled as the ricer-distribution.

P.S.: building with no optimisation at all may cause problems; in part because of reliance on features such as DCE, as stated above, and as used by FFmpeg; in part because headers, including system headers might change behaviour and cause the packages to fail.

Garbage-collecting sections is not for production

Some time ago I wrote about using --gc-sections to avoid unused functions to creep into final code. Today instead I’d like to show how that can be quite a problem if it was used indiscriminately.

I’m still using at least for some projects the diagnostic of --gc-sections to identify stuff that is unused but still kept around. Today I noticed one bad thing with it and pulseaudio:

/usr/lib/gcc/x86_64-pc-linux-gnu/4.4.2/../../../../x86_64-pc-linux-gnu/bin/ld: Removing unused section '.data.__padsp_disabled__' in file 'pulseaudio-main.o'

The __padsp_disabled__ symbol is declared in main.c to avoid pulseaudio’s access to OSS devices to be wrapped around by the padsp script. When I first have seen this, I thought the problem was a missing #ifdef directive: if I didn’t ask for the wrapper, it might still have declared the (unused) symbol. That was not the case.

Looking at the code, I found what the problem was: the symbol is (obviously) never used by pulseaudio itself; it is, rather, checked through dlsym() by the DSP wrapper library. For this reason, for compiler and linker, the symbol looks pretty much unused, and when asking for it to be dropped explicitly, it is. Since the symbol is loaded via the runtime linker, neither building nor executing pulseaudio will have any problem. And indeed, the only problem would be when running pulseaudio as a children of padsp, and using the OSS output module (so not on most Linux systems).

This shows how just using -fdata-sections -ffunction-sections -Wl,--gc-sections is not safe at all and why you shouldn’t get excited about GCC and ld optimisations without understanding how they work in detail.

In particular, even I thought that it would be easier to work around than it actually seem to be: while GCC provides a used attribute that allows to declare a variable (or a function) as used even though the compiler can’t tell that by itself (it’s often used together with inline hand-written ASM the compiler doesn’t check for), this does not propagate to the linker, so it won’t save the section from being emitted. The only solution I can think of is adding one instruction that sets the variable to itself, but that’s probably going to be optimised away. Or giving a way for gcc to explicit that the section is used.

There are flags and mlags…

In my previous post about control I stated that we want to know about problems generated by compiler flags on packages, and that filtering flags is not a fix, but rather a workaround. I’d like to expand the notion by providing a few more important insights about the matter.

The first thing to note is that there are different kind of flags you can give the compiler; some are not supposed to be tweaked by users, other can be tweaked just as fine. Deciding whether a flag should or should not be touched by the user is a very tricky matter because different persons might have different ideas about them. Myself, I’d like to throw my two eurocents in to show the discretion I use.

The first point is a repeat of what I already expressed about silly flags that can be summed up in “if you’re just copying the flags from a forum post you’re doing it wrong”. If you really know what you’re doing it should be pretty easy for you to never have problems with flags, on the other hand if you just copy what others did, there is a huge chance you’re going to get burned by something one day or the one after that.

Compilers are huge, complex beasts, and being able to understand how they work is not something for the average user. Unfortunately to correctly assess the impact of a flag on the produced code, you do need to know a lot about the compiler. For this reason you often find some of the flags listed as “safe flags”, and briefly explained. Myself, I’m not going to do that, I’m just going to talk abstractly about them.

The first issue comes with understanding that there are “free” and “non-free” optimisations: some optimisation, like almost all the ones enabled at -O2, don’t force any particular requirement on the code that the language the code is written in does not force before; actually sometimes it also makes it loose up a bit. An example of this is the dead code elimination phase that allows for functions only called in branches that are never executed to remain undefined at the final linking stage (as used by FFmpeg’s libavcodec to deal with optional codecs’ registration).

Before GCC 4.4, at least for x86, the -O2 level also didn’t enforce (at least not really) some specifications of the C language, like strict aliasing, which reduced the chances for optimisation to loosen up the type of code that was allowed to compile properly. More than an allowance from GCC, though, this was due to the fact that the compiler didn’t have much to exploit by enforcing aliasing on registry-poor architectures like x86. With GCC 4.4, relying on this is no longer possible, though.

Other flags, though, do restrict the type of code that is accepted as proper and compiled, and may cause bugs that are too subtle for the average upstream developer, which then would declare custom flags “unsupported”. Unfortunately this is not some extremely rare case, it’s actually a norm for many upstream we deal with in Gentoo. These flags, with the most prominent example being -ffast-math, break assumptions in the code; for instance this flag may provide slightly different results in mathematical functions that could lead to huge domino effects over code resolving complex formulae. On a similar note, but not the same note, the -mfpmath=sse flag allows to generate SSE (instead of i387) code for floating point operations; it’s considered “safer” because it breaks an assumption that is only valid on x86 architecture (the non-standard 80-bit size of the temporary values), and only exploited by very targeted software rather than pure C code. Indeed this is what the x86-64 compiler do by default.

There are then a few flags that only work when the code is designed to make use of them; this is the case of the -fvisibility flags family, that requires the code to properly declare the visibility of its function to work properly. Similarly, the -fopenmp flag requires the code to be written using OpenMP, otherwise it won’t magically make your software faster by using parallel optimisation (there are, though, flags that do that as far as I know they are quite experimental for now). Enabling of these flags should be left only to the actual upstream buildsystem and not by users.

Some flags might interfere with hand-written ASM code; for instance the -fno-omit-frame-pointer (need to get some decent output from kernel-level profilers), which is actually an un-optimisation, can make the ebx x86 register unavailable (when coupled with -fPIC or -fPIE at least). While I experienced myself problems with -ftree-vectorize in a single case (on x86-64 at least; on x86 I know it has created faulty code more than once, on whether this is a GCC bug or some assumption, I have no idea): with mplayer’s mp3lib and an hand-written piece of asm code that didn’t use local labels, the flag duplicated a code path and the pasted code from the asm() block tried to declare twice the same (global) label.

Finally, some flags, like -fno-exceptions and -fno-rtti for C++ can cause some pretty heavy optimisation, but should never be used if not by upstream. Doing so it will cause some hard to track down issues like the ones that Ardour devs complained about as you’re actually disabling some pretty big parts of the language, in a way that makes the resulting ABI pretty much incompatible between libraries.

And I almost forgot the most important thing to keep in mind: not always the code most optimised for execution speed is faster, which is why on the first models of x86-64 CPUs, the code produced by -Os sometimes performed better than the code produced by -O2. In this case, the relatively small L2 cache on the CPU could slow down the execution of the most aggressively optimised code because it was larger and couldn’t fit in the cache. The simplest example to understand this is to think about unrolled loops: a loop is inherently slower than the unrolled code: it needs an iterator that might not be needed otherwise, it requires to jump up the stream, it might require to actually move a cursor of some sorts. On the other hand, especially for big loop bodies (with inline and static functions included of course), unrolling the loop might result in code that requires lots of cache fetches; and on the other hand, smaller loops that can be entirely kept in cache might not take that much time to jump back since the code is already there.

So what is the bottom line of this post? One could argue that the solution is to leave it to the compiler; but as Mike points out repeatedly at least ICC is beating the crap out of GCC on newer chips (at least Intel chips; I also have some concerns about their use of tricks like the ones I said above about unsafe floating point optimisations but I don’t want to go there today). So the compiler might not know really much better.

My suggestion is to leave it to the experts; while I don’t like the idea of making it an explicit USE flag to use your own CFLAGS definition (I also want the control) we shouldn’t be, usually, overriding upstream-provided CFLAGS is they are good. Sometimes though they might require a bit of help; for instance in the case of xine I can remember the original CFLAGS definition to be pretty much crazy, with all the possible optimisations forced on even when they don’t produce that good of a result on average at all. I guess it’s all a bet.

Gentoo maintainer node and help-call: it seems like either my PCI sound card fried up or there is some nasty bug in the ALSA driver for it I don’t really have the time to deal with (I’ll be updating my previous post about it since after a few more tries it turned out not to be related to the hardware outside of Yamato). This already was a problem in the past two months or so since kernel 2.6.29 didn’t work properly, and it starts to be a big deal. My contributions to PulseAudio especially on Gentoo side has been quite hectic because of it, and the package is in serious need of ordinary and extraordinary work on it.

I might just go out one day of this week and fetch a new USB card, but to be honest I’d like to avoid that for now (I had already enough hardware failure for the past months, and a few more hardware bits that I had to replace/buy for other reasons, as well as a scheduled acquisition of one or two eSATA disks to move around data that I have no longer space for). So I added one USB soundcard (as suggested by Lennart to be fine under Linux) to my wishlist (thanks to the fact that Amazon now ships electronics components to Italy, whooo!) but I could just use some old Linux-supported card if somebody had one to give me; my only requirement is for it to support digital ouput (iec958, S/PDIF), it really doesn’t matter whether it uses coaxial or optical cable; I admit coaxial might be a bit nicer (so that the receiver can deal with both Yamato and Merrimac, with the latter only providing optical), but really either are fine.

Yes I know this sounds a lot like a shameless plug – it probably is – but I’ve got over 1300 bugs open in Bugzilla, and Yamato is crunching its hard drives to find the issues before they hit users, I guess you can let me have this plug, can’t you? Thanks.

Bitrotting

When I went looking for unit testing frameworks, I was answered by more than one people (just Tester publicly though), that I shouldn’t care even if there are no new releases of check or cunit, that as long as they do what they have to, it’s normal they don’t get developed. Sincerely, I don’t agree: projects that don’t get maintained actively start to bitrot.

On a related note, I always find it quite intriguing when you can apply organic concepts to software, like bitrot and softdiversity. It mean that even human creations of pure logic abide to laws their creators didn’t explicitly take interest in.

Anyway, the bitrotting process is very natural, and can easily be seen. If you try to use and build the sources of a project written ten years ago or so, it’ll be very difficult for you to. The reason is that dependencies evolve, deprecate interfaces, improve other interfaces, tools also change and try to do a better work, reducing the chances for users to shoot themselves in the feet and so on. Even a software written using no additional dependency but the C library and the compiler will probably not work as intended, unless it relies purely and squarely on standards and it was written so that each part of the code was 100% error-free. As you can guess, it’s an inhuman work to make sure that everything is perfect and it’s rare to provide such a certainty.

But it’s not really limited to this: software is rarely perfect as it is, but hardware also evolves. Software written with a clear limit on the amount of memory to use, used on a modern system is going to be suboptimal; software written for a pure sequential execution, is not going to make proper use of modern multi-core systems. Software features also improve, and you obviously want to make sure that you make the best use of the software so that your performance don’t get hindered by obsolete techniques and features (think of fam versus inotify).

Sometimes bitrotting attacks also very commonly used tools, like strace, and just sometimes they get taken over, like dosfstools. Some other times, the bitrotting attacks more the buildsystem and the syntax part of the code rather than the actual code behind it, like it’s the case for FooBillard (which if I’ll ever have spare time, I’ll import in git and fix up — it’s the only game I actually would like to play on Yamato and I cannot because it doesn’t build here).

But bitrotting does not stop at complete complex software projects, it also applies to small things like ebuilds: a lot of ebuilds I filed bugs for are broken because nobody tended to them in a lot of time: the compiler got stricter and they didn’t get tested. While Patrick did already a few sweeps with his tinderbox on software for the stable tree, there were and still are lots of packages in ~arch that didn’t get tested, and so I’m getting to fill bugs related to glibc 2.8 and gcc 4.3 once again. But it’s not just that, mistakes in DEPEND/RDEPEND variables, problems with ebuilds not using die to terminate on failure, and so on.

There are of course problems with other things, like packages that don’t use slot dependencies yet (and let’s not even start about use dependencies that were introduced so recently — by the way should this mean I can finally have my cxx, perl and python USE flags turned off by default?), but those are quite limited. Instead, I found that the problem with -O0 building that I found quite some time ago is not that common, although I admit I’m not sure whether that’s due to more packages actually knowing to include locale.h or if it’s just that -O0 is not respected.

Hopefully, one day these sweeps will be so common that the glibc 2.8 problems would be found and reported in the first week of entering portage, so that the developers are also fresh enough to know how to deal with those errors. On the other hand, after I’ll be done with this and I’ll have enough free time (since I’m also working on lscube, and FFmpeg, and so on), I’ll see to fix the packages in maintainer-needed for which I reported bugs, it’ll help direct people to the right thing to do, I hope.

Respecting CFLAGS and CXXFLAGS, reality testing!

As I have written in my post Flags and flags, I think that one way out of the hardened problem would be to actually respect the CFLAGS and CXXFLAGS the user requests so that they actually apply to the ebuilds. Unfortunately, not all the ebuilds in the tree respect the flags, and finding out which ones do and which ones don’t hasn’t been, up to now, an easy task.

There are many reasons for this, the most common one is to look at the build output and spot that all the compile lines lack your custom flags, but this is difficult to automate, another option is to inject a fake definition option (-DIWASHERE) and grep for it in the build logs, but this is messed up if you consider that a package might ignore CFLAGS just for a subset of its final outputs.

While I was without Enterprise I spent some time thinking about this and I came to find a possible solution, which I’m going to experiment on Yamato, starting tonight (which is Friday 29th for what it’s worth).

The trick is that GCC provides a flag that allows you to include an extra file, unknown to the rest of the code. With a properly structured file, you can easily inject some beacon that you can later pick up.

And with a proper beacon injected in the build files, it shouldn’t be a problem to check using scanelf or similar tools if the flags were respected.

The trick here is all in the choice of the beacon and in looking it up; the first requirement for the proper beacon is to make sure it does not intrude and disrupt the code or the compilation, this means it has to have a name that is not common, and thus does not risk to collide with other pieces of code, and won’t clash between different translation units.

To solve this, the name can be just very long so that it’s impractical that somebody might have used it for a funciton or variable name, let’s say we call that beacon cflags_test_cflags_respected. This is the first step, but it still doesn’t solve the problem of clashing traslation units. If we were to write it like this:

const int cflags_test_cflags_respected = 1234;

two translation units with that in them, linked together, will cause a linker error that will stop the build. This cannot happen or it’ll make our test useless. The solution is to make the symbol a common symbol. In C, common symbols are usually the ones that are declared without an initialisation value, like this:

int cflags_test_cflags_respected;

Unfortunately this syntax doesn’t work on C++, as the notion of common symbol hasn’t crossed that language barrier. Which means that we have to go deeper in the stack of languages to find the way to create the common symbol. It’s not difficult, once you decide to use the assembly language:

asm(".comm cflags_test_cflags_respected,1,1");

will create a common symbol of size 1 byte. It won’t be perfect as it might increase the size of .bss section for a program by one byte, and thus screw up perfect non-.bss programs, but we’re interested in the tests rather than the performance, as of this moment.

There is still one little problem though: the asm construct is not accepted by the C99 language, so we’ll have to use the new one instead: __asm__, that works just in the same way.

But before we go on with this, there is something else to take care of. As I have written in the entry linked at the start of this one, there are packages that mix CFLAGS and CXXFLAGS. As we’re here, it could be easy to just add some more test beacons that track down for us if the package has used CFLAGS to build C++ code or CXXFLAGS to build C code. With this in mind, i came to create two files: flagscheck.h and flagscheck.hpp, respectively to be injected through CFLAGS and CXXFLAGS.

flame@yamato ~ % sudo cat /media/chroots/flagstesting/etc/portage/flagscheck.h
#ifdef __cplusplus
__asm__(".comm cflags_test_cxxflags_in_cflags,1,1");
#else
__asm__(".comm cflags_test_cflags_respected,1,1");
#endif
flame@yamato ~ % sudo cat /media/chroots/flagstesting/etc/portage/flagscheck.hpp
#ifndef __cplusplus
__asm__(".comm cflags_test_cflags_in_cxxflags,1,1");
#else
__asm__(".comm cflags_test_cxxflags_respected,1,1");
#endif

And here we are, now it’s just time to inject these in the variables and check for the output. But I’m still not satisfied with this. There are packages that, mistakenly, save their own CFLAGS and propose them to other programs that are linked against; to avoid these to falsify our tests, I’m going to make the injection unique on a package level.

Thanks to Portage, we can create two functions in the bashrc file, pre_src_unpack and post_src_unpack, in the former, we’re going to copy the two header files in the ${T} directory of the package (the temporary directory), then we can mess with the flags variables and insert the -include command. This way, each package will get its own particular path; when a library passes the CFLAGS assigned to itself to another package, it will fail to build.

pre_src_compile() {
    ln -s /etc/portage/flagscheck.{h,hpp} "${T}"

    CFLAGS="${CFLAGS} -include ${T}/flagscheck.h"
    CXXFLAGS="${CXXFLAGS} -include ${T}/flagscheck.hpp"
}

After the build completed, it’s time to check the results, luckily pax-utils contains scanelf, which makes it piece of cake to check whether one of the four symbols is defined, or if none is (and thus all the flags were ignored), the one line function is as follow:

post_src_compile() {
    scanelf "${WORKDIR}" 
        -E ET_REL -R -s 
        cflags_test_cflags_respected,cflags_test_cflags_in_cxxflags,cflags_test_cxxflags_respected,cflags_test_cxxflags_in_cflags
}

At this point you just have to look for the ET_REL output:

ET_REL cflags_test_cflags_respected /var/tmp/portage/sys-apps/which-2.19/work/which-2.19/tilde/tilde.o 
ET_REL cflags_test_cflags_respected /var/tmp/portage/sys-apps/which-2.19/work/which-2.19/tilde/shell.o 
ET_REL  -  /var/tmp/portage/sys-apps/which-2.19/work/which-2.19/getopt.o 
ET_REL cflags_test_cflags_respected /var/tmp/portage/sys-apps/which-2.19/work/which-2.19/bash.o 
ET_REL  -  /var/tmp/portage/sys-apps/which-2.19/work/which-2.19/getopt1.o 
ET_REL cflags_test_cflags_respected /var/tmp/portage/sys-apps/which-2.19/work/which-2.19/which.o

And it’s time to find out why getopt.o and getopt1.o are not respecting CFLAGS while the rest of the build is.

Flags and flags

This post, and probably a few more posts that will come to be, is being written about a day before it’s actually being posted. The reason for this is that, as I’ll be probably be hospitalised at the end of August, I want to have something going on so I don’t need to write during the hospitalisation.

I was reflecting tonight with Mark (Halcy0n) that for having hardened features on GCC 4.x you shouldn’t, in general, need any particular support in the compiler. What hardened would be doing for the modern compilers is creating new “spec files” that tell the compiler which flags to use by default. This would force the compiler to always generate PIE (Position Independent Executable) code and SSP (Stack Smashing Protection).

In general, to have the same features it would be enough to properly set CFLAGS and CXXFLAGS. The idea is that once you put -fPIE in your flags, all the code that Portage built would be PIE, and if you set -fstack-protector in your CFLAGS (and not CXXFLAGS because SSP is known not to cope properly with C++ code), you expect your system to be built with stack protector turned on.

The problem is, reality and theory don’t seem to coincide. The problem is that a huge lot of ebuilds ignore your flags entirely, others strip them, and might strip -fPIE and -fstack-protector, and quite a few mix CFLAGS and CXXFLAGS, using the former to build C++ code and the latter to build C code. The result is that you end up with something different than what you asked for.

Even worse, there are packages that save your CFLAGS in their -config files, letting your custom flags creep into other projects that might not want them.

So the result is that if we want to make it much easier for everybody to enable hardened, we should be making sure that the behaviour of ebuilds is standardised on the policy of respecting the flags set by the user, not filtering them unless really needed, and even then letting most of the non-optimisation flags through. And to actually use the correct variable depending on the language used.

What are the problems? The first is obviously upstreams that don’t want users to use their own flags for building their code (MPlayer, anyone?), then there is at least the problem of broken build systems that either don’t understand the difference between CFLAGS and CXXFLAGS or don’t support custom flags at all.

If you wish to help, there is an easy way to actually find where the flags are mixed up. As the most obvious problem is CFLAGS used for building C++ code (rather than the other way around), you can add -Wno-pointer-sign to your CFLAGS. When the variable is misused, it turns out this error:

cc1plus: warning: command line option "-Wno-pointer-sign" is valid for C/ObjC but not for C++

When you see that, it’s time to report it against bug #234011 so that the maintainers know they need to fix something in the build system to keep the two variables separated.

As to how to fix this, on custom build systems it’s difficult to say, on autotools-based systems, the problem might be in the configure.ac, if code similar to this is present:

CFLAGS="${CFLAGS} -DSOMETHING"
CXXFLAGS="${CFLAGS} -DSOMETHING"

An alternative is that the build system is adding to _CXXFLAGS the value of a variable reported by one of the foo-config scripts that are bugged and report the flags used to build the source package rather than just the flags needed to get their include directories right. In that case the bug lies in a different package, and is there that it has to be fixed.

Hopefully, this kind of fixes will become routine and new packages won’t be added to the tree if they mix CFLAGS and CXXFLAGS… I can always dream, can’t I?

But yes this is another point of my checklist when creating an ebuild, if the new ebuild is not needed immediately and upstream fails to understand CFLAGS and CXXFLAGS differences, then I avoid adding it at all. I hope other developers will start considering this, too :)

Oh yeah I’m sorry I’m actually filing bugs now without providing a fix immediately. The reason why I stopped providing fixes right away is that first of all I’m opening a huge amount of bugs when I find them, rather than waiting to have time to debug and fix them, and I have not enough time for myself to take care of that stuff too, and I’d rather explain how to fix them and then see them fixed by the actual maintainers. And also, I think I have bugs with patches still waiting on maintainers, so…

Why it is a bad idea to record user settings during compile

So, today I decided to work a bit more on Gentoo than the last past days. One of the things I had in my TODO list is the marking of the remaining myspell-* dictionaries, as hunspell is now marked ~amd64 and ~x86-fbsd (it would be useless without the spell checkers). That I did in a batch, by using some for loops. I couldn’t actually evaluate the reliability of the spelling, but that’s… something I think just a few people would be able to.

Anyway, I’ve also decided to update db as today Paul released a 4.3 release. I took this opportunity also to fix KDevelop so that it links to the latest db available instead of fixing on 4.1 that is not needed anymore (at least on my systems). With this, I also had to keyword it ~x86-fbsd.

But during this, I ended up having to rebuild both apr and apr-util, on both Linux and FreeBSD. Why? Well, on Linux it recorded the -Wl,-Bdirect flag, on FreeBSD it had recorded the i686-gentoo-freebsd6.1-gcc compiler.

So you see why it is a bad idea to record user settings during compile?
Because you don’t frigging ask me to rebuild the whole world to remove a flag.