Newcomers, Teachers, and Professionals

You may remember I had already a go at tutorials, after listening in on one that my wife had been going through. Well, she’s now learning about C after hearing me moan about higher and lower level languages, and she did that by starting with Harvard’s CS50 class, which is free to “attend” on edX. I am famously not a big fan of academia, but I didn’t think it would make my blood boil as much as it did.

I know that it’s easy to rant and moan about something that I’m not doing myself. After all you could say “Well, they are teaching at Harvard, you are just ranting on a c-list blog that is followed by less than a hundred people!” and you would be right. But at the same time, I have over a decade of experience in the industry, and my rants are explicitly contrasting what they say in the course to what “we” do, whether it is in opensource projects, or a bubble.

I think the first time I found myself boiling and went onto my soapbox was when the teacher said that the right “design” (they keep calling it design, although I would argue it’s style) for a single-source file program is to have includes, followed by the declaration of all the functions, followed by main(), followed by the definition of all the functions. Which is not something I’ve ever seen happening in my experience — because it doesn’t really make much sense: duplicating declarations/definitions in C is an unfortunate chore due to headers, but why forcing even more of that in the same source file?

Indeed, one of my “pre-canned comments” in reviews at my previous employer was a long-form of “Define your convenience functions before calling them. I don’t want to have to jump around to see what your doodle_do() function does.” Now it is true that in 2020 we have the technology (VSCode’s “show definition” curtain is one of the most magical tools I can think of), but if you’re anyone like me, you may even sometimes print out the source code to read it, and having it flow in natural order helps.

But that was just the beginning. Some time later as I dropped by to see how things were going I saw a strange string type throughout the code — turns out that they have a special header that they (later) define as “training wheels” that includes typedef char *string; — possibly understandable given that it takes some time to get to arrays, pointers, and from there to character arrays, but… could it have been called something else than string, given the all-too-similarly named std::string of C++?

Then I made the mistake of listening in on more of that lesson, and that just had me blow a fuse. The lesson takes a detour to try to explain ASCII — the fact that characters are just numbers that are looked up in a table, and that the table is typically 8-bit, with no mention of Unicode. Yes I understand Unicode is complicated and UTF-8 and other variable-length encodings will definitely give a headache to a newcomer who has not seen programming languages before. But it’s also 2020 and it might be a good idea to at least put out the idea that there’s such a thing as variable-length encoded text and that no, 8-bit characters are not enough to represent people’s names! The fact that my own name has a special character might have something to do with this, of course.

It went worse. The teacher decided to show some upper-case/lower-case trickery on strings to show how that works, and explained how you add or subtract 32 to go from one case to the other. Which is limited not only by character set, but most importantly by locale — oops, I guess the teacher never heard of the Turkish Four Is, or maybe there’s some lack of cultural diversity in the writing room for these courses. I went on a rant on Twitter over this, but let me reiterate this here as it’s important: there’s no reason why a newcomer to any programming language should know about adding/subtracting 32 to 7-bit ASCII characters to change their case, because it is not something you want to do outside of very tiny corner cases. It’s not safe in some languages. It’s not safe with characters outside the 7-bit safe Latin alphabet. It is rarely the correct thing to do. The standard library of any programming language has locale-aware functions to uppercase or lowercase a string, and that’s what you need to know!

Today (at the time of writing) she got to allocations, and I literally heard the teacher going for malloc(sizeof(int)*10). Even to start with a bad example and improve from that — why on Earth do they even bother teaching malloc() first, instead of calloc() is beyond my understanding. But what do I know, it’s not like I spent a whole lot of time fixing these mistakes in real software twelve years ago. I will avoid complaining too much about the teacher suggesting that the behaviour of malloc() was decided by the clang authors.

Since there might be newcomers reading this and being a bit lost of why I’m complaining about this — calloc() is a (mostly) safer alternative to allocate an array of elements, as it takes two parameters: the size of a single element and the number of elements that you want to allocate. Using this interface means it’s no longer possible to have an integer overflow when calculating the size, which reduces security risks. In addition, it zeroes out the memory, rather than leaving it uninitialized. While this means there is a performance cost, if you’re a newcomer to the language and just about learning it, you should err on the side of caution and use calloc() rather than malloc().

Next up there’s my facepalm on the explanation of memory layout — be prepared, because this is the same teacher who in a previous lesson said that the integer variable’s address might vary but for his explanation can be asserted to be 0x123, completely ignoring the whole concept of alignment. To explain “by value” function calls, they decide to digress again, this time explaining heap and stack, and they describe a linear memory layout, where the code of the program is followed by the globals and then the heap, with the stack at the bottom growing up. Which might have been true in the ’80s, but hasn’t been true in a long while.

Memory layout is not simple. If you want to explain a realistic memory layout you would have to cover the differences between physical and virtual memory, memory pages and pages tables, hugepages, page permissions, W^X, Copy-on-Write, ASLR, … So I get it that the teacher might want to simplify and skip over a number of these details and give a simplified view of how to understand the memory layout. But as a professional in the industry for so long I would appreciate if they’d be upfront with the “By the way, this is an oversimplification, reality is very different.” Oh, and by the way, stack grows down on x86/x86-64.

This brings me to another interesting… mess in my opinion. The course comes with some very solid tools: a sandbox environment already primed for the course, an instance of AWS Cloud9 IDE with the libraries already installed, a fairly recent version of clang… but then decides to stick to this dubious old style of C, with strcpy() and strcmp() and no reference to more modern, safer options — nevermind that glibc still refuses to implement C11 Annex K safe string functions.

But then they decide to not only briefly show the newcomers how to use Valgrind, of all things. They even show them how to use a custom post-processor for Valgrind’s report output, because it’s otherwise hard to read. For a course using clang, that can rely on tools such as ASAN and MSAN to report the same information in more concise way.

I find this contrast particularly gruesome — the teacher appears to think that memory leaks are an important defect to avoid in software, so much so that they decide to give a power tool such as Valgrind to a class of newcomers… but they don’t find Unicode and correctness in names representation (because of course they talk about names) to be as important. I find these priorities totally inappropriate in 2020.

Don’t get me wrong: I understand that writing a good programming course is hard, and that professors and teachers have a hard job in front of them when it comes to explain complex concepts to a number of people that are more eager to “make” something than to learn how it works. But I do wonder if sitting a dozen professionals through these lessons wouldn’t make for a better course overall.

«He who can, does; he who cannot teaches» is a phrase attributed to George Bernand Shaw — I don’t really agree with it as it is, because I met awesome professors and teachers. I already mentioned my Systems’ teacher, who I’m told retired just a couple of months ago. But in this case I can tell you that I wouldn’t want to have to review the code (or documentation) written by that particular teacher, as I’d have a hard time keeping to constructive comments after so many facepalms.

It’s a disservice to newcomers that this is what they are taught. And it’s the professionals like me that are causing this by (clearly) not pushing back enough on Academia to be more practical, or building better courseware for teachers to rely on. But again, I rant on a C-list blog, not teach at Harvard.

Boosting my morale? I wish!

Let’s take a deep breath. You probably remember I’m running a tinderbox which is testing some common base system packages before they are unmasked (and thus unleashed on users); in particular I use it for testing new releases of GCC (4.7) and GLIBC (2.16).

It didn’t take me much after starting GLIBC 2.16 testing to find out that the previously-latest version of Boost (1.49) was not going to work with it. The problem is that there is a new definition that both of them tried to provide, TIME_UTC (probably relates to C++11/C11). Now unfortunately since it’s an API breakage to replace that definition, it means that it can’t be applied to the older versions, and it means that packages need to be fixed. Furthermore, the new 1.50 version has also broken the temporary compatibility introduced in 1.48 for their filesystem module’s API. This boils down to a world of pain for maintainers of packages using Boost (which includes yours truly, luckily none is directly maintained by me, just proxy).

So I had to add one extra package to the list, and ran the reverse dependencies — the positive side is that it didn’t take long to fill the bug although there are still a few issues with older boost versions not being supported yet. This brought up a few issues though…

First problem is the way Boost build itself, and its tests, is obnoxious: it’s totally serial, no parallelisation at all! The result is that to run the whole testsuite it takes over eight hours on Excelsior! The big issue is that for the testing, it takes some 10-20 times longer to build the test than run it (lovely language, C++), so a parallel build of the tests, even if the tests were executed in series, would mean a huge impact, and would also likely mean that the tests would become meaningful. As they sit, the (so-called) maintainer of the package has admitted to not run them when he revbumps, but only on full new versions.

The second problem is how Boost versions are discovered. The main issue is that Boost, instead of using proper sonames to keep binary compatibility, embeds its major/minor version pair in the library name — although most distributions symlinks the preferred version to the unversioned name (in Gentoo this is handled through the eselect boost tool). This is not extremely far from what most distributions do with Berkeley DB — but it causes problem when you have to find which one you should link to, especially when you consider that sometimes the unversioned name is not there at all.

So both CMake and Autotools (actually Autoconf Archive) provide macros to try a few different libraries. The former does it almost properly, starting from the highest version and then going in descending order — but uses a pre-defined list of versions to try! Which mean that most packages with CMake will try 1.49 first, as they don’t know that 1.50 is out yet! If no known version is found, then it will fallback to the unversioned library, which makes it work quite differently whether you have only one or more than one version installed!

For what concerns the macros from Autoconf Archive, they are quite nasty; from one side they really aren’t portable at all, as they use GNU sed syntax, they use uname (which makes them totally useless during cross-compilation), but most worrisome of all, is that they use ls to find which boost libraries are available and then take the first one that is usable. This means that if you have 1.50, 1.49 and 1.44 installed, it’ll use the oldest! Similarly to CMake, it uses the unversioned library last. In this case, though, I was able to improve the macros by reversing the check order, which makes them work correctly for most distributions out there.

What is even funnier about the AX macros (that were created for libtorrent, and are used by Gource, which I proxy-maintain for Enrico), is that due to the way they are implemented, it is feasible that they end up using the old libraries and the new headers (it was the case for me here with 1.491.50, as it didn’t fail to compile, just to link). As long as the interface used have different names and the linker will error out, all is fine. But if you have interfaces that are source-compatible, linker-compatible, but with different vtables, you have a crash waiting to happen.

Oh well…

Debunking x32 myths

There has been many comments on my previous post about the new x32 ABI; some are interesting, others are more “out there” — the feeling I get is that there is quite a bit of cargo culting, with people thinking “there has to be a reason why is is developed, so it’ll be good for me!” without actually having the technical background to judge the usefulness of all this.

So in the same spirit with which I commented on ccache almost exactly four years ago (wow, I have been keeping a blog for a very long time, haven’t I?), I’ll try to debunk a few of the myths and misconception around this new ABI.

The new x32 ABI has proven to be faster. Not really; what we have right now are a few benchmarks, published by those who actually created the ABI, Of course you’d expect that those who spent time to set it up found it interesting and actually faster, but I honestly have doubts about the results, for reasons that will be clearer by reading the next few entries.

It’s also interesting to note that while the overall benchmarks seem to be positive, the numbers are quite close in general.. and even Intel’s presentation only gives you actual “big” numbers only when comparing with the original x86 ABI — which nobody is saying is better than what x86-64 is!

The data is also coming from a synthetic test, not from an actual overall system usage, and if you have any clue about benchmarks you know that the numbers can easily lie out of their teeth!

The new ABI generates smaller code, which means more instruction will fit in cache, and you’ll have smaller files as well. This is absolutely false. Not only the code generated is generally the same as x86-64 (you’re not changing the instruction set at all, you’re just changing the so-called “data model”, which means you change the size of long (and related types) and of the pointers (and thus the address space).

From one side it is theoretically correct that you’re going to have smaller data structures, which means you can make better use of the data cache (not of the instruction cache, be sure!) — but is this the correct approach? In my informed opinion, it should be a better idea to look into actually writing code that considers the cachelines, if your code is cache-hungry! You can use dev-util/dwarves which is a set of utilities by Arnaldo (acme) — pahole will tell you how your data structures will be split in memory.

Also remember that for compatibility the syscalls are kept the same with x86-64, which means that all the kernel code executed, and all the data structures that are shared with the kernel are the same as x86-64 (which means that a number of data structures won’t even change their size with the new ABI).

Actually, referring again to the same slides you can see on on slide 24 that the x32 code can be longer than x86’s original code — it would have been nice if they included the same code in x86-64, especially since I don’t speak VCISC, but I think it’s just the same code.

It might be of interest to compare the size of the libc.so.6 file itself; this is the output of rbelf-size from my Ruby Elf suite:

        exec         data       rodata        relro          bss     overhead    allocated   filename
     1239436         7456       341974        13056        17784        94924      1714630   /lib/libc.so.6
     1259721         4560       316187         6896        12884        87782      1688030   x32/libc.so.6

The executable code is actually bigger in the x32 variant — the big change is of course in the data sections (data, rodata, relro and bss) as the pointers have been halved — I honestly wonder how’s it possible for the C library to have so many pointers in its own structures, but it’s a question beside the point. Even if these numbers are halved, the difference is not that big, in total you have something along the lines of 30KB less data allocated, which is unlikely to even change the memory map.

The data size reduction is useful. Okay this seems to be a common issue. Sure it is the case that the data structures are smaller with x32, that’s its design after all. The main question would probably be “is this significant?” — I don’t think it is. Even in the example above with the C library, the difference while still “big enough”, is just under 20% of the allocated space … of the C library! A library that is supposed to implement the very minimal interface.

Now if you add up all the possible libraries, you probably can shave off a few megabytes of data of course but … you’ll have to add in all the porting issues that I’m going to discuss soon. Yes it is true that C++ and most VM languages will have less pressure, especially when copying objects, thanks to the reduced pointers’ size, but this is still quite a stretch. Especially since for the most part you’ll have to keep data buffers aligned to at least 8 bytes (64-bit) to make use of the new instructions — you already to align them to 16 bytes (128-bit) to make use of some SIMD sets.

And for those who think that x32 is reducing the size of files on disk — remember that as it is you can’t run a pure-x32 install; what you get is usually going to be a mix of three ABIs: x86, amd64 and x32!

But there is no reason for $application to deal with more than 4GiB memory. Yes of course that might be true, but really, do you care about the pointer size? If you really want to make sure that the application doesn’t use more than a given amount of memory, use system limits! They are definitely less intrusive than building a new ABI altogether.

Interestingly there are two way different, contrasting, applications of a full 64-bit address space on systems with less than 4GiB of RAM: ASLR (Address Space Layout Randomization — which can really load the various objects an application require at widely different addresses), and Prelink (which can then make sure that every unique object on the system is always loaded at the same address, yes that’s really the opposite of what ASLR does!).

Applications use long but they don’t need the full 64-bit space. And of course the solution is to create a new ABI for it, according to some people.

I’m not going to say that there are many applications that still use long without a clue on why they do that; they probably have some very little range of values they want to use and yet they use “big values” such as long, as they probably learnt programming on systems that use it as a synonym for int — or even better they learnt programming on systems where long is 32-bit but int is 16-bit (hello MS-DOS!).

The solution to this is simply to use the standard integers provided by stdint.h such as uint32_t and int16_t — so that you always use the data size you’re expecting and needing! This also has the side-effect of working on many more systems than you expect, and works with FFI and other techniques.

Hand-coded assembly is rare. This is one thing a few people told me after my previous post as I complained about the fact that with the new ABI as it is we’re losing most of the hand-coded assembly. This might strictly be true, but it might be less rare than you think. Even excluding all the multimedia software, crypto software usually makes good use of SIMD as well, and that’s done through hand-coded assembly, not through the compiler’s intrinsics.

There is also another issue with hand-coded assembly in software such as Ruby — while Ruby 1.9 fails to build on x32, it gets much more interesting on Ruby 1.8 because while it builds just file, it_segfaults at runtime_. Reminds you of something?

Furthermore, it’s the C library itself that comes with most of the handcoded assembly — the only reason why you don’t feel the porting pressure is simply that H.J. Lu that takes care of most of those is one of the authors of the new ABI, which means the code is already ported there.

x32 is going to be compatible with x86, if not now in the future. Okay this I didn’t have a comment about before, but it’s one misconception I’ve noticed before being thrown around. Luckily, the presentation comes to help, slide 22 makes it very clear that the ABI are not compatible. Among other things you have to consider that the x32 ABI at least corrects some of the actual mistakes in x86, including the use of 32-bit data types for off_t and similar. Again, something I talked about two years ago.

This is the future of 64-bit processors. No; again refer to the slides in particular slide 10. This has been explicitly designed for closed systems rather than as a replacement for x86-64! How does that feel now?

The porting effort is going to be trivial, you just have to change the few lines of assembler and change the size of pointer arithmetic. This is not the case. The porting requires a number of other issues to be tackled, and handcrafted assembly is just the tip of the iceberg. Breaking the assumption that x86-64 has 64-bit pointers is, by itself, quite a big deal, but not as big as one might assume at first (it’s the same way on Windows), what I think is going to be a big issue is going to be the implementation of FFI style C bindings — remember I said it wasn’t an easy answer?

CPUs perform better on 32-bit operands than 64-bit. Interestingly, the only CPU that Intel admits do perform better on 32-bit on the presentation I already linked a few times, is the Atom — the quote is actually “64bit imul latency is twice of 32bit imul on Atom”.

Now, what the heck is imul? That’s a signed multiply operation. Do you multiply pointers? It doesn’t make sense. Besides, pointers are not signed. Are you telling me that your most concern is for a platform (Atom) that has extra latency on an operation when people use 64-bit data types and they should instead use 32-bit? And your solution for that concerns is to create a new ABI where it’s harder to use 64-bit data types instead of going to fix whatever program is causing the problem?

I guess I should end it here, because this last note about the Atom and imul is probably going to make the day of most people who have half a clue.

Why Foreign Function Interfaces are not an easy answer

With the term FFI you usually refer to techniques related to GCC’s libffi and their various bindings, such as Python’s ctypes. The name should instead encompass a number of different approaches that work in very different ways.

What I’m going to talk about is a subset of FFI techniques that work the way FFI works, which means they also cover .NET’s P/Invoke — which I briefly talked about in an old post.

The idea is that the code for the language you’re writing in, declares the arguments that the foreign language interfaces are expecting. While this works in theory, it has quite a few practical problems, which are not really easy to see, especially for developers whose main field of expertise is interpreted languages such as Ruby, or intermediate ones like C#. This, just because the problems are related to the ABI: Application Binary Interface.

While the ABI for C and C++ is quite different, I’ll start with the worst case scenario, and that is using FFI techniques for C interfaces. A C interface (a function) is exposed only through its name, and no other data; the name does not encode either the number of the type of parameters, which means that you can’t reflectively load the ABI based off the symbols in a shared object.

What you end up doing, in these cases, is declare in the Ruby code (or whatever else, I’ll stick with Ruby because that’s where I usually have experience) the list of parameters to be used with that function. And here it gets tricky: which types are you going to use for parameters? Unless you’re sticking with C99’s standard integers, and mostly pure functions, you’re going to have trouble, sooner or later.

  • the int, long and short types do not have fixed sizes, and depending on the architecture and the operating system they are going to be of different size; Win64 and eglibc’s x32 are making that even more interesting;
  • the size of pointers (void*) depends once again on the operating system and architecture;
  • some types such as off_t and size_t depends not just on the architecture and operating systems but also on the configuration of said system: on glibc/x86, by default they are 32-bit, but if you do enable the so-called largefile support they are 64-bit (the same goes with st_ino as that post suggest);
  • on some architectures, the char type is unsigned, on others it is signed, which is one of the things that made PPC porting quite interesting, if you weren’t using C99’s types;
  • if structures are involved, especially with bitfields, you’re just going to suffer, since the layout of the structure, if not packed, depends on both the size of the fields and the endianness of the architecture — plus you have to factor in the usual chance for difference due to architecture and operating system.

Up until now, the situation doesn’t seem to be unsolvable; indeed it should be quite easy, if not for the structures, if you create type mappings for each and every standard type that could change, and make sure developers use them… of course things don’t stop there.

Rule number one of the libraries’ consumer: ABI changes.

If you’re a Gentoo user you’re very likely to be on not-too-friendly terms with revdep-rebuild or the new preserved libraries feature. And you probably have heard or read that the issue with requiring to rebuild other packages is that one of the dependencies changed ABI. To be precise, what changes in those situation is the soname which is declaring the library changed ABI, which is nice of them.

But most of the changes in ABI are not declared, either by mistake or for proper reasons. In the former case, what you have is a project that didn’t care enough about its own consumers and didn’t make sure that its ABI is compatible one release with the other, and that didn’t follow soname bumping rules, which is actually all too common. In the latter scenario, instead, you have a quite more interesting situation, especially for what FFI is concerned.

There are some cases where you can change ABI, and yet keep binary compatibility. This is usually achieved by two types of ABI changes: new interfaces and versioned interfaces.

The first one is self-explanatory: if you add a new exported function to a library, it’s not going to conflict with the other exposed interfaces (remember I’m talking about C here; this is not strictly true for C++ methods!). Yet that means thast the new versions of the library have functions that are not present in the older ones — this, by the way, is the reason why downgrading libraries is never well-supported, especially in Gentoo (if you rebuilt the library’s consumers, it is possible that they used the newly-available functions — they wouldn’t be there after the downgrade, and yet the soname didn’t change, so revdep-rebuild wouldn’t flag them as bad).

The second option is trickier; I have written something about versioning before, but I never went out of my way to describe the whole handling of it. Suffice to say that by using symbol versioning, you can get an ABI-compatible change, for an API-compatible change that would otherwise break the ABI.

A classical example is moving from uint32_t to uint64_t for the parameters of a function: changing the function declaration is not going to break API because you’re increasing the integer size (and I explicitly referred to unsigned integers so you don’t have to worry about sign extension), so a simple rebuild of the consumer would be enough for the change to be integrated. At the same time, such a change in the C ABI would make the change incompatible, as the size of the parameters on the stack doubled, so calls to the previous API would crash on the new one.

This can be solved – if you used versioning to begin with (due to the bug in glibc I discussed in the article linked earlier) – by keeping a wrapper around the new API which still uses the old one, and making each of them use a new version for the symbol. At that point, the programs built against the old API will keep using the symbol with the original version (the wrapper), while the new ones will build straight to the new API. There you are: compatible API change leads to compatible ABI change.

Yes I know what you’re thinking: you can just add a suffix to the function and use macros to switch consumers to the new interface, without using versioning at all; that’s absolutely true, but I’m not trying to discuss the merits of symbol versioning here, just explaining how it connects to FFI trouble.

Okay, why is it all this relevant then? Well, what the FFI techniques use to load the libraries they wrap around is the dlopen() and dlsym() interfaces; the latter in particular is going to follow the step of the link editor, when a symbol with multiple versions is encountered: it will use the one that is declared to be the “default symbol”, that is, the latest added (usually).

Now return to the example above: you have wrapped through FFI the function to require two parameters as uint32_t, but now dlsym() is loading in its place a function that expects two uint64_t parameters.. there you are, your code has just crashed.

Of course it is possible to override this throught he use of dlvsym() but that’s not optimal because, as far as I can tell, it’s a GNU extension, and most libraries wouldn’t be caring about that at all. At the same time, symbol versioning, or at least this complex and messed up version of it, is mostly exclusive to GNU operating systems, and its use is discouraged for libraries that are supposed to be portable… the blade there is two-sided.

Since these methods are usually just employed by Linux-specific libraries, there aren’t so many that are susceptible to this kind of crash; on the other hand, since most non-Linux systems don’t offer this choice, most Ruby developers (who seem to use OS X or Windows, seeing how often we encountered case-sensitivity issues compared to any other class of projects) would be unaware of its very existence…

Oh and by the way, if your FFI-based extension is loading libfoo.so without any soversion, you’re not really understanding shared objects, and you should learn a bit more about them before wrapping them.

What’s the morale? Be sure about what you want to do: wrapping C-based libraries often is a good choice to avoid reimplement everything, but consider if it might not be a better idea to write the whole thing in Ruby, it might not be so time-critical as you think it is.

Writing C-based extension move the compatibility issues at build-time, which is a bit safer: even if you write tests for each and every function you wrap (which you should be doing), the ABI can change dynamically when you update packages, making install-time tests not much reliable for this kind of usage.

C++ name demangling

I’ve been having some off time (well, mostly time I needed to do something that kept me away from facebooker craziness since that was driving me crazy quite literally), and I decided to get more work to do in Ruby-Elf (which now has its own page on my site — and a Flattr button as well, like this blog and Autotools Mythbuster thanks to Sebastian.)

What I’m working on right now is supporting C++ name demangling, in pure Ruby; the reason for that is that I wanted to try something “easier” before moving on to trying to implement the full DWARF specification in Ruby-Elf. And my reason to wishing for a DWARF parser in there is that Måns confirmed my suspicion that it is possible to statically-analyze the size of the stack used by a function (well, with some limitations, but let’s not dig into that right now). At any rate I decided to take a stab at that because it would come useful for the other tools in Ruby-Elf.

Now, while I could assume that most of my readers already know what a demangler is (and thus what is mangling), I’ll see to introduce it for all the others who would otherwise end up bored by my writing. C++ provides a much harder challenge for linkers and loaders, because the symbols are no longer identified just by their “simple” name; C++ symbols have to encode namespaces and class levels, a number of special functions (the operators), and in the case of functions and operators, the list of parameters (because two functions with the same name and different set of parameters, in C++, are valid). To do so, all compilers implement some kind of scheme to translate the symbols into identifiers that use a very limited subset of ASCII characters.

Different compilers use different schemes to do so; sometimes even differing depending on the operating system they are building on (mostly that’s for compatibility with other “native” compilers on that platform). The scheme that you can commonly find on Linux binaries is the so-called GCC3-mangling that is used, as the name leaves to intend, by GCC 3 and later and by ICC on Linux (and iirc OSX). The complete definition of the mangling scheme is available as part of the Itanium ABI definition (thanks to Luca who found me the link to that document); you can recognize the symbols mangled under this algorithm by their starting with _Z. Other mangling schemes are described among other documentation — link provided by Dark Shikari.

While standardising over the same mangling algorithm has been proposed many times, compilers make use of the difference in mangling scheme to prevent risk of cross-ABI linking. And don’t expect that the compilers will standardise their ABI anytime soon.

At any rate, the algorithm itself looks easy at the first glance; most of the names are encoded by having the length of the name encoded in ASCII and then the name itself; so an object bar in a namespace foo would be encoded as _ZN3foo3barE (N-E delimit the global object name). Unfortunately when you add more details in, the complexity increases, a lot. To avoid repeating the namespace specification time after time, when dealing with objects in the same namespace, or repeating the full type name when accepting objects of its own class, or multiple parameters with the same type, for instance, the algorithm supports “backreferences” (registers); name fragments and typenames (but not function names) are saved into these registers and then recovered with specific character sequences.

Luckily, Ragel helps a lot to parse this kind of specifications; unfortunately I have reached a point where proceeding further require definitely a lot more work than I’d have expected. The problem is, you have recursion within the state machines: a parameter list might contain.. another parameter list (for function pointers); a type might contain a typelist, as part of a template… and doing this in Ragel is far from easy.

There are also shorthands used to replace common name fragments, such as std:: or std::allocator that would be used so many times that the symbol names would be exceedingly huge. All in all, it’s a quite complex operation. And not a perfect one either; different symbols can demangle to the same string; and not even the c++filt shipping with binutils take care of validating the symbol names it finds, so for instance _Zdl is translated to “operator delete” even though there is nothing like that in C++: you need a parameter for that operator to work as intended. I wonder if this can be used to obscure proprietary libraries interfaces…

Now, since Luca also asked about this, I’d have to add, as Måns already confirmed, that having too-long symbol names can slow down the startup of a program; in particular it increases the time needed for bindings; even though generally-speaking the loader will not compare the symbol name itself, but rather its hash on the hash table, the longer the symbol the more work the hash function has to do. And to give an idea of how long a name we’re talking about, take for instance the following real symbol; exported symbol, nonetheless, coming from gnash:


gnash::iterator_find(boost::multi_index::multi_index_container, mpl_::na, mpl_::na>, boost::multi_index::ordered_unique, boost::multi_index::const_mem_fun, mpl_::na>, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, std::allocator >&, int)

The second line is filtered through the binutils demangler to provide the decorated name of the function for C++. If you cannot guess, one of the problems is that the mangling algorithm contains no shorthand for boost templates, contrarily to what it has for the standard library templates. I know I cannot pretend that this relates directly with the problems I see with C, but it shows two things: the first is that C++ has an unseen amount of complexity that even long-time developers fail to catch properly; the second is that ELF itself doesn’t seem well-designed to handle the way C++ is compiled into code; and this is all without even looking at the recent undocumented changes introduced in GLIBC to make sure that C++ is implemented correctly.

I wonder if I’ll ever be able to complete this demangler, and then start with the other schemes supported in ELF files…

An appeal to upstream KDE developers

While I have stopped using KDE almost two years ago, I don’t hate KDE per-se… I have trouble with some of the directions KDE has been running in – and I fear the same will happen to GNOME, but we’ll see that in a few months – but I still find they have very nice ideas in many fields.

Unfortunately, there is still one huge technical problem with KDE, and KDE-based third party software (well, actually, with any Qt-based software): they are still mostly written in C++.

Now, it’s no mystery that I moved from having a liking at that language to hating it outright, preferring things like C# to it (I still have to find the time to learn ObjectiveC to be honest). One of the problems I have with it is the totally unstable nature of it, that is well shown by the number of problems we have to fix each time a new GCC minor version is released. I blogged last week about GCC 4.5 and what actually blew my mind out is what Harald said about the Foo::Foo() syntax:

The code was valid C++98, and it wasn’t a constructor call. In C++98, in every class C, the name C exists as both a constructor and a type name, but the constructor could only be used in certain contexts such as a function declaration. In other contexts, C::C simply means C. C::C::C::C::C was also a perfectly valid way to refer to class C. Only later did there turn out to be a need to refer to the constructor in more contexts, and at that point C::C was changed from a typename to a constructor. GCC 4.4 and earlier follow the original C++ standard in this respect, GCC 4.5 follows the updated standard.

Harald van Dijk

Now, I thank Harald for clearing this up, but even so I think this aliasing in the original standard was one of the most obnoxious things I ever read… it was actually making much more sense to me that it was an oversight! Anyway, enough with the C++ bashing for its own sake.

What I would ask the upstream KDE developers is to please set up some way to test your code on the newest, unstable, unusable for day-to-day coding upcoming GCC branches. This way we would avoid having the latest, very recent, release of a KDE-related tool to fail the day the new GCC exits beta status and gets released properly. I’m not pretending that you fiddle to make sure that there are no ICEs (Internal Compiler Errors) caused by your code (although finding them will probably help GCC upstream a lot), but at least make sure that the syntax changes won’t break as soon as a new GCC is released, pretty please.

Of course, this kind of continuous integration, even extended to all of the KDE software hosted by KDE.org is not going to be enough as there are huge numbers of packages that are being developed out of that… but it would at least give the idea that there is need for better CI in general in FLOSS!

Tinderbox summary for May 2010: GCC 4.5, Berkeley DB 5.0; libpng 1.4

I’m a bit surprised sincerely, and not exactly in a good way, since I started with GCC 4.5 just over a month ago, that the tinderbox almost caught up with its queue already.

Now, admittedly part of the reason might be related to my optimisation of the filesystems and partitions — especially after last week, since I moved all the stuff around to divide it into three pairs of disks: two 320GB WD RAID Edition disks for the RAID1 with the OS, my home and work stuff; two 500G Samsung disks for the “scratch” partitions (/var/tmp, the tinderbox’s filesystem), and finally two 1TB WD Caviar Green for storage space (multimedia files, including samples, and distfiles, 150GB of them!).

What makes me doubtful regarding the goodness of this situation is that a lot of packages were skipped because of dependencies failing. With GCC 4.5 we have no MySQL; with Berkeley DB 5.0 we have no Apache (because of apr-util). Without those, the tinderbox drops tons and tons of packages, a whole deptree of packages that will not be tested until the roots are fixed.

At any rate, now that I actually went through the packages, I can finally say what the most common problems with GCC 4.5 are. And surprisingly, it comes down to mostly two problems, a nasty runtime one, and one “usual” boring one.

First of all, the nasty one: GCC now seem to provide runtime-based overflow protection, not totally unlike the Stack Smashing Protection that the Gentoo Hardened project used to provide (and thanks to Magnus might come back at providing); this is a good thing from one side, because overflow protection is a nice safety feature (if not a security one), but it also means that we’re going to find a lot of packages failing at runtime because of this, and that stuff is much harder to deal with; one such package is the TCL interpreter, that is overflowing at runtime for so many packages that it’s boring. The problems tied to these features have their own tracker that was started back into 4.3 series already.

The boring problem is, once again, related to C++ (can you see why Luca is so worried now?): for some reason, up until now GCC supported one very strange syntax for it:

Foo::Bar x = Foo::Bar::Bar();

This basically consists of explicitly calling the constructor function of the class, rather than using the constructor through conversion. I would have always considered this syntax invalid, since I started learning the language, but I can tell how it could be typed wrongly; what surprised me is that it was allowed before. Sigh. The fact that Free Software has become a strict GCC monoculture does not help here, it means that instead of actually being tested, the code is just accepted if GCC supports it. It sucks now that LLVM seems to become more interesting.

The Berkeley DB situation is much worse, I’m afraid, in term of time needed to solve it; the main problem there is that a lot of packages that fail with it fall into the mail software categories, and the net-mail team is near non-existant for way too long now. This can be noted by the fact that we have stuff like mailx failing, continuous file collisions that are not being solved (and the tentatives with the “mailwrapper” stuff resulted in a total revert), and generally broken and out of date packages.

We could use a few more “mailmen” working in Gentoo, since I most definitely have just barely enough clue to manage my postfix installs with the help of The Definitive Guide (that’s one of the best thing I could buy from O’Reilly, without that I’d be seriously screwed).

The importance of opaque types

I sincerely don’t remember whether I already discussed about this before or not; if I didn’t, I’ll try to explain here. When developing in C, C++ and other languages that support some kind of objects as type, you usually have two choices for a composited type: transparent or opaque. If the code using the type is able to see the content of the type, it’s a transparent type, if it cannot, it’s an opaque type.

There are though different grades of transparent and opaque types depending on the language and the way they would get implemented; to simplify the topic for this post, I’ll limit myself to the C language (not the C++ language, be warned) and comment about the practises connected to opaque types.

In C, an opaque type is a structure whose content is unknown; this usually is declared in ways such as the following code, in a header:

struct MyOpaqueType;
typedef struct MyOpaqueType MyOpaquetype;

At that point, the code including the header will have some limitations compared to transparent types; not knowing the object size, you cannot declare objects with that type directly, but you can only deal with pointers, which also means you cannot dereference them or allocate new objects. For this reason, you need to provide functions to access and handle the type itself, including allocation and deallocation of them, and these functions cannot simply be inline functions since they would need to access the content of the type to work.

All in all you can see that the use of opaque types tend to be a big hit for what concerns performance; instead of a direct memory dereference you need always to pass through a function call (note that this seems the same as accessor functions in C++, but those are usually inline functions that will be replaced at compile-time with the dereference anyway); and you might even have to pass through the PLT (Procedure Linking Table) which means further complication to get to the type.

So why should you ever use opaque types? Well they are very useful when you need to export the interface of a library: since you don’t know either the size or the internal ordering of an opaque type, the library can change the opaque type without changing ABI, and thus requiring a rebuild of the software using it. Repeat with me: changing the size of a transparent type, or the order of its content, will break ABI.

And this gets also particularly important when you’d like to reorder some structures, so that you can remove padding holes (with tools like pahole from the dwarves package, see this as well if you want to understand what I mean). For this reason, sometimes you might prefer having slower, opaque types in the interface, instead of faster but riskier transparent types.

Another place where opaque types are definitely helpful is when designing a plugin interface especially for software that was never designed as a library and has, thus, had an in-flux API. Which is one more reason why I don’t think feng is ready for plugins just yet.

Another C++ piece hits the dust

You might remember my problem with C++ especially on servers; yes that’s a post of almost two years ago. Well today I was able to go one step further, and kill another piece of C*+ from at least my main system (it’s going to take a little more for it to apply to the critical systems). And that piece is nothing less than groff.

Indeed, last night we were talking in #gentoo-it about FreeBSD’s base system and the fact that, for them similarly to us, their only piece of C++ code in base system is groff; an user pointed out that they were considering switching to something else, so a Google run later I come up with the heirloom project website.

The heirloom project contains some tools ported from the OpenSolaris code base, but working fine on Linux and other OSes; indeed, they work quite well in Gentoo, after creating an ebuild for them, removed groff from profiles, and fixed the dependencies of man and zsh.

A few notes though:

  • the work is not complete yet so pleas don’t start already complaining if something doesn’t look right;
  • man is not configured out of the box; I’m currently wondering what’s the best way to do this;
  • after configuring (manually) man, you should be able to read most man pages without glitches;
  • for some reason, we currently install man pages in different encodings (for instance man’s own man page in Italian is written in latin-1); heirloom-doctools use UTF-8 by default, which is a good thing, I guess; groff does seem to have a lot of problems with UTF-8 (and man as well, since the localised Italian output often have broken encoding!);
  • groff (and man) both have special handling of Japanese for some reason, I don’t know whether the heirloom utilities are better or worse for Japanese, somebody should look into it.

How _not_ to fix GCC 4.4 bugs

So with GCC 4.4 and glibc 2.10, C++ support went, once again, stricter. Now, leaving aside all my possible comments on a language for which there is still an absolute vacuum of actual implementations years after publishing, let just look at what the problem is this time around.

The main issue, in which glibc 2.10 is also related, is that the C-style string functions now return pointers with the same constant modifier as they are given; so if you look for a characters in a constant string, it’ll return a constant string pointer, and vice-versa if you give it a variable string, it’ll return you a variable string pointer (more here if you’re bored).

This means that the following code will build fine with either GCC 4.3 or glibc 2.10 but will fail when both of those (or more recent) versions are used:

#include <cstring>

int main() {
  char *foo = strchr("ciao", 'a');
% g++-4.3.3 test-const.cc -o test-const    
% g++-4.4.0 test-const.cc -o test-const 
test-const.cc: In function ‘int main()':
test-const.cc:4: error: invalid conversion from ‘const char*' to ‘char*'

The error from the compiler is quite real: you’re mixing up different type of variables, although in this particular instance you’re not doing anything wrong with it, but for instance take the following code rather than the one shown earlier:

#include <cstring>

int main() {
  char *foo = strchr("ciao", 'a');
  *foo = 'e';

This code is trying to change something that it shouldn’t; in particular, given the pointer is now pointing inside a literal, which is then inside the .rodata section of the ELF and in a shared, non-writeable area of memory, when executed this will cause a segmentation fault (a crash, for those not used to this terminology). But it can get less obvious and more sneaky, since instead of a literal, you could have a parameter, declared constant.

Of course, whenever you have a variable or a parameter that is declared constant, but is not actually residing in read-only memory areas (like .rodata), you’re just a cast away from having it non-constant. But then you’d be seeing the cast, and that would be like a yellow light sign. On the other hand, with the old method of just having the function cast away the constant modifier, it was less obvious at first sight.

Okay so we know what the problem is, why the error was introduced, let’s go down to business with what the problem is. I have seen more than a few patches out there that, to make software build on GCC 4.4/glibc 2.10 simply cast away the constant modifier, C-style:

#include <cstring>

int main() {
  char *foo = (char*)strchr("ciao", 'a');

This is wrong. You should not do that. Why? Because you’re hiding a problem; in more than half the cases, the solution is simply to change the declaration of the variable:

#include <cstring>

int main() {
  const char *foo = strchr("ciao", 'a');

Of course, this does not cover all the cases; there are a few when the pointer is actually used to change memory areas. In those cases, since fixing the issue might be overkill, I’d highly suggest a different syntax:

#include <cstring>

int main() {
  char *foo = const_cast(strchr("ciao", 'a'));
  *foo = 'e';

This uses the explicit const_cast keyword from C++, and the very fact that it’s an eyesore in the code should be enough to scream “Workaround!”, which is what it is in truth.

So please, don’t just cast it away C-style, give it a bit more thoughts, please!