Compilers’ rant

Be warned that this blog’s style is in form of a rant, because I’ve spent the past twelve hours fighting with multiple compilers trying to make sense of them while trying to get the best out of my unpaper fork thanks to the different analysis.

Let’s start with a few more notes about the Pathscale compiler I already slightly ranted about for my work on Ruby-Elf. I know I didn’t do the right thing when I posted that stuff as I should have reported the issues upstream directly, but I didn’t have much time, I was already swamped with other tasks, and going through a very bad personal moment, so I quickly written up my feelings without doing proper reports. I have to thank Pathscale people for accepting the critiques anyway, as Måns reported me that a couple of the issues I noted, in particular the use of --as-needed and the __PIC__ definition were taken care of (sorta, see in a moment).

First problem with the Pathscale compiler: by mistake I have been using the C++ compiler to compile C code; rather than screaming at me, it passed through properly, with one little difference: a static constant gets mis-emitted and this is not a minor issue at all, even though I am using the wrong compiler! Instead of having the right content, the constant is emitted as an empty, zeroed-out array of characters of the right size. I only noticed because of Ruby-elf’s cowstats reporting what should have been a constant into the .bss section. This is probably the most worrisome bug I have seen with Pathscale yet!

Of course its impact is theoretically limited by the fact that I was using the wrong compiler, but since the code should be written in a way to be both valid C and C+, I’m afraid the same bug might exist for some properly-written C+ code.. I hope it might get fixed soon.

The killer feature for Pathscale’s compiler is supposedly optimisation, though, and in that it looks like it is doing quite a nice job, indeed I can see from the emitted assembler that it is finding more semantics to the code than GCC seems to, even though it requires -O3 -march=barcelona to make something useful out of it — and in that case you give up debugging information as the debug sections may reference symbols that were dropped, and the linker will be unable to produce a final executable. This is hit and miss of course, as it depends on whether the optimiser will drop those symbols, but it makes difficult to use -ggdb at all in these cases.

Speaking about optimisations, as I said in my other post, GCC’s missed optimisation is still missed by Pathscale even with full optimisation (-O3) turned on, and with the latest sources. And is also still not fixed the wrong placement of static constants that I ranted about in that post.

Finally, for what concerns the __PIC__ definition that Måns referred as being fixed, well, it isn’t really as fixed as one would expect. Yes, using -fPIC now implies defining __PIC__ and __pic__ as GCC does, but there are two more issues:

  • While this does not apply to x86 and amd64 (but just for m68k, PowerPC and Sparc), GCC supports two modes for emission of position-independent code, one that is limited by the architecture’s global offset table maximum size, and the other that overrides such maximum size (I never investigated how it does that, probably through some indirect tables). The two options are enabled through -fpic (or -fpie) and -fPIC (-fPIE) and define the macros as 1 and 2, respectively; Path64 does only ever define them to 1.
  • With GCC, using -fPIE – that is used to emit Position Independent Executables – or the alternative -fpie of course, implies the use of -fPIC, which in turn means that the two macros noted above are defined; at the same time, two more are defined, __pie__ and __PIE__ with the same values as described in the previous paragraph. Path64 defines none of these four macros when building PIE.

But enough rant about Pathscale, before they feel I’m singling them out (which I’m not). Let’s rant a bit about Clang as well, the only compiler up to now that properly dropped write-only unit-static variables. I had very high expectations for what concerns improving unpaper through its suggestions but.. it turns out it cannot really create any executable, at least that’s what autoconf tells me:

configure:2534: clang -O2 -ggdb -Wall -Wextra -pipe -v   conftest.c  >&5
clang version 2.9 (tags/RELEASE_29/final)
Target: x86_64-pc-linux-gnu
Thread model: posix
 "/usr/bin/clang" -cc1 -triple x86_64-pc-linux-gnu -emit-obj -disable-free -disable-llvm-verifier -main-file-name conftest.c -mrelocation-model static -mdisable-fp-elim -masm-verbose -mconstructor-aliases -munwind-tables -target-cpu x86-64 -target-linker-version 2.21.53.0.2.20110804 -momit-leaf-frame-pointer -v -g -resource-dir /usr/bin/../lib/clang/2.9 -O2 -Wall -Wextra -ferror-limit 19 -fmessage-length 0 -fgnu-runtime -fdiagnostics-show-option -o /tmp/cc-N4cHx6.o -x c conftest.c
clang -cc1 version 2.9 based upon llvm 2.9 hosted on x86_64-pc-linux-gnu
#include "..." search starts here:
#include <...> search starts here:
 /usr/local/include
 /usr/bin/../lib/clang/2.9/include
 /usr/include
 /usr/lib/gcc/x86_64-pc-linux-gnu/4.6.1/include
End of search list.
 "/usr/bin/ld" --eh-frame-hdr -m elf_x86_64 -dynamic-linker /lib64/ld-linux-x86-64.so.2 -o a.out /usr/lib/../lib64/crt1.o /usr/lib/../lib64/crti.o crtbegin.o -L -L/../../../../lib64 -L/lib/../lib64 -L/usr/lib/../lib64 -L/../../.. /tmp/cc-N4cHx6.o -lgcc --as-needed -lgcc_s --no-as-needed -lc -lgcc --as-needed -lgcc_s --no-as-needed crtend.o /usr/lib/../lib64/crtn.o
/usr/bin/ld: cannot find crtbegin.o: No such file or directory
/usr/bin/ld: cannot find -lgcc
/usr/bin/ld: cannot find -lgcc_s
clang: error: linker command failed with exit code 1 (use -v to see invocation)
configure:2538: $? = 1
configure:2576: result: no

What’s going on? Well, Clang doesn’t provide its own crtbegin.o file for the C runtime prologue (while Path64 does), so it relies on the one provided by GCC, which has to be on the system somewhere. Unfortunately, to identify where this file is… they try hitting and missing.

% strace -e stat clang test.c -o test |& grep crtbegin.o
stat("/usr/lib/gcc/x86_64-pc-linux-gnu/4.5.2/crtbegin.o", 0x7fffc937eff0) = -1 ENOENT (No such file or directory)
stat("/usr/lib64/gcc/x86_64-pc-linux-gnu/4.5.2/crtbegin.o", 0x7fffc937eff0) = -1 ENOENT (No such file or directory)
stat("/usr/lib/gcc/x86_64-pc-linux-gnu/4.5.1/crtbegin.o", 0x7fffc937eff0) = -1 ENOENT (No such file or directory)
stat("/usr/lib64/gcc/x86_64-pc-linux-gnu/4.5.1/crtbegin.o", 0x7fffc937eff0) = -1 ENOENT (No such file or directory)
stat("/usr/lib/gcc/x86_64-pc-linux-gnu/4.5/crtbegin.o", 0x7fffc937eff0) = -1 ENOENT (No such file or directory)
stat("/usr/lib64/gcc/x86_64-pc-linux-gnu/4.5/crtbegin.o", 0x7fffc937eff0) = -1 ENOENT (No such file or directory)
stat("/usr/lib/gcc/x86_64-pc-linux-gnu/4.4.5/crtbegin.o", 0x7fffc937eff0) = -1 ENOENT (No such file or directory)
stat("/usr/lib64/gcc/x86_64-pc-linux-gnu/4.4.5/crtbegin.o", 0x7fffc937eff0) = -1 ENOENT (No such file or directory)
stat("/usr/lib/gcc/x86_64-pc-linux-gnu/4.4.4/crtbegin.o", 0x7fffc937eff0) = -1 ENOENT (No such file or directory)
stat("/usr/lib64/gcc/x86_64-pc-linux-gnu/4.4.4/crtbegin.o", 0x7fffc937eff0) = -1 ENOENT (No such file or directory)
stat("/usr/lib/gcc/x86_64-pc-linux-gnu/4.4.3/crtbegin.o", 0x7fffc937eff0) = -1 ENOENT (No such file or directory)
stat("/usr/lib64/gcc/x86_64-pc-linux-gnu/4.4.3/crtbegin.o", 0x7fffc937eff0) = -1 ENOENT (No such file or directory)
stat("/usr/lib/gcc/x86_64-pc-linux-gnu/4.4/crtbegin.o", 0x7fffc937eff0) = -1 ENOENT (No such file or directory)
stat("/usr/lib64/gcc/x86_64-pc-linux-gnu/4.4/crtbegin.o", 0x7fffc937eff0) = -1 ENOENT (No such file or directory)
stat("/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.4/crtbegin.o", 0x7fffc937eff0) = -1 ENOENT (No such file or directory)
stat("/usr/lib64/gcc/x86_64-pc-linux-gnu/4.3.4/crtbegin.o", 0x7fffc937eff0) = -1 ENOENT (No such file or directory)
stat("/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/crtbegin.o", 0x7fffc937eff0) = -1 ENOENT (No such file or directory)
stat("/usr/lib64/gcc/x86_64-pc-linux-gnu/4.3.3/crtbegin.o", 0x7fffc937eff0) = -1 ENOENT (No such file or directory)
stat("/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.2/crtbegin.o", 0x7fffc937eff0) = -1 ENOENT (No such file or directory)
stat("/usr/lib64/gcc/x86_64-pc-linux-gnu/4.3.2/crtbegin.o", 0x7fffc937eff0) = -1 ENOENT (No such file or directory)
stat("/usr/lib/gcc/x86_64-pc-linux-gnu/4.3/crtbegin.o", 0x7fffc937eff0) = -1 ENOENT (No such file or directory)
stat("/usr/lib64/gcc/x86_64-pc-linux-gnu/4.3/crtbegin.o", 0x7fffc937eff0) = -1 ENOENT (No such file or directory)
stat("/usr/lib/gcc/x86_64-pc-linux-gnu/4.2.4/crtbegin.o", 0x7fffc937eff0) = -1 ENOENT (No such file or directory)
stat("/usr/lib64/gcc/x86_64-pc-linux-gnu/4.2.4/crtbegin.o", 0x7fffc937eff0) = -1 ENOENT (No such file or directory)
stat("/usr/lib/gcc/x86_64-pc-linux-gnu/4.2.3/crtbegin.o", 0x7fffc937eff0) = -1 ENOENT (No such file or directory)
stat("/usr/lib64/gcc/x86_64-pc-linux-gnu/4.2.3/crtbegin.o", 0x7fffc937eff0) = -1 ENOENT (No such file or directory)
stat("/usr/lib/gcc/x86_64-pc-linux-gnu/4.2.2/crtbegin.o", 0x7fffc937eff0) = -1 ENOENT (No such file or directory)
stat("/usr/lib64/gcc/x86_64-pc-linux-gnu/4.2.2/crtbegin.o", 0x7fffc937eff0) = -1 ENOENT (No such file or directory)
stat("/usr/lib/gcc/x86_64-pc-linux-gnu/4.2.1/crtbegin.o", 0x7fffc937eff0) = -1 ENOENT (No such file or directory)
stat("/usr/lib64/gcc/x86_64-pc-linux-gnu/4.2.1/crtbegin.o", 0x7fffc937eff0) = -1 ENOENT (No such file or directory)
stat("/usr/lib/gcc/x86_64-pc-linux-gnu/4.2/crtbegin.o", 0x7fffc937eff0) = -1 ENOENT (No such file or directory)
stat("/usr/lib64/gcc/x86_64-pc-linux-gnu/4.2/crtbegin.o", 0x7fffc937eff0) = -1 ENOENT (No such file or directory)
stat("/crtbegin.o", 0x7fffc937f170)     = -1 ENOENT (No such file or directory)
stat("/../../../../lib64/crtbegin.o", 0x7fffc937f170) = -1 ENOENT (No such file or directory)
stat("/lib/../lib64/crtbegin.o", 0x7fffc937f170) = -1 ENOENT (No such file or directory)
stat("/usr/lib/../lib64/crtbegin.o", 0x7fffc937f170) = -1 ENOENT (No such file or directory)
stat("/../../../crtbegin.o", 0x7fffc937f170) = -1 ENOENT (No such file or directory)

Yes you can see that it has a hardcoded list of GCC versions that it looks for, from higher to lower, until it falls back to some generic paths (which don’t make that much sense to me to be honest, but nevermind). This means that on my system, that only has GCC 4.6.1 installed, you can’t use clang. This was reported last week and while a patch is available, a real solution is still not there: we shouldn’t be patching and bumping clang each time a new micro version of GCC is released that upstream didn’t list already!

Sigh. While GCC sure has its shortcomings, this is not really looking promising either.

Impressions of Path64 compiler

So I noticed today that an ebuild for the Path64 compiler hit Portage; being the ELF nerd that I am, it interested me on the technical level, more than for the optimizations (especially since I’m never happy to hear about “the most sophisticated” about anything; claims like that tend to be simply bothersome to me).

Before starting with testing the compiler, I got to say that the ebuilds themselves had a bit of trouble: the pre-built binary one (dev-lang/ekopath) is changing the path at each update, which breaks Makefiles and other scripts where you could be using a full path as compiler (which has to be the case if you wish to target the binary toolchain rather than the custom-built one), while the custom-built one (dev-lang/path64) does not check for the validity of the dynamic linker name when trying to gather it from GCC, and breaks when using my customized specs for forced --as-needed as they change the commandline used to call collect2. Both problems are now reported in bugzilla and I hope they’ll be solved soon.

What is my baseline test? Well, let’s start with something simple: Ruby-ELF has a number of tests implemented for multiple compilers, in particular GCC, SunStudio and ICC on Linux/AMD64; adding a new compiler just requires rebuilding some object files, and then add some lines of code in the testsuite to check those out. There are always a few attributes that need to be adapted, such as the ELF entry points, but that’s beside the point now, and it is expected of compilers to have small variations in their behaviour, otherwise it wouldn’t make sense to have multiple compilers at all.

This test alone caused me to feel like I’m playing with an alpha-version of a compiler rather than something already targeted at production use, like it seems to be sold to the public. Given that the testfiles I use are very small and simplistic, I wasn’t expecting any difference at all, beside the most obvious ones. For instance, I already know that ICC appends a .0 suffix on all the local symbols (unit-static ones), and SunCC uses common symbols rather than BSS symbols for external TLS variables. But all in all, they are very similar. Turns out that Path64 has more semantic differences than the others.

First issue: on a very simple, hello-world type executable, where only one symbols – printf() – is used, all the compilers manage to only link to libc.so.6, which provides that symbol. Path64 instead adds one more dependency over libgcc.so, or rather its own variation of it. This in turn adds a dependency over libm.so, which makes it two extra objects to be loaded for simple executables (yes it might sound like it is impossible not to load the math library, but there are cases where that actually happens). This is extra nasty because linking to that library also means emitting “weak symbols” used for C++ language support.

Not extremely difficult to work around though: just add -Wl,--as-needed to the command line to make it skip over libgcc.so as it is really unused — this is what GCC does in its specs files by the way, it enables as-needed linking, lists its support library, then disable it again, so that the original semantics are restored.

There is one particularity to the Pathscale compiler: it sets the OS ABI on the ELF file to the code for Linux, on static executables. Neither GCC nor ICC do so (I’m not sure about SunStudio as I was unable to produce a static executable out of it last time). Nothing wrong with this, and I’m actually often wondering why compilers never did that.

Next up start the trouble for the compiler: one of the tests is designed to make sure that Ruby-ELF can provide the correct nm-style description code for the symbols in the object files. This is the most compiler-specific test of the whole suite, as both the notes I wrote above about ICC and SunStudio come from this one. Path64 is not as much inconsistent as it seems to be buggy in this area though.

The first difference is that the other three compilers are emitting, in the relocatable object file, an absolute symbol with the name of the source translation unit. This is not the case for Path64, but it isn’t much of a problem: the symbol is probably helpful during debug but not for real usage of the object, so it would just be an issue of rewiring the test. Where the problems arise is when it comes to the .data.rel.ro section and Copy-on-Write which is one of my pet peeves.

The test source file contains combination of static, exported, and external variables and constants; since the unit is compiled as PIC, it also contains combination of constants that contain relocated and non-relocated code:



char external_variable[] = "foo";
static char static_variable[] tc_used = "foo";

const char external_constant[] = "foo";
static const char static_constant[] tc_used = "foo";

const char *relocated_external_variable = "foo";
const char *const relocated_external_constant = "foo";

static const char *relocated_static_variable tc_used = "foo";
static const char *const relocated_static_constant tc_used = "foo";

All three of the compilers implemented up to now are good and emit the non-relocated constants in the .rodata section, keeping only the relocated ones (i.e., the pointers) in the .data.rel.ro sections that are copy-on-write.

Finally, for those who keep scores, the missed optimization I noted back in April, is missing in path64 as well as GCC and ICC. Only clang up to now was able to actually make the best out of that code.

I guess I’ll have some reports to do to PathScale, and I’ll keep an eye on this compiler. On the other hand, please don’t ask for this to be tested in any tinderbox for now. Before I can even just consider this, it’ll need to improve a bit further… and I’ll need a more powerful machine to use for tinderboxing.