Some insights on why a linker can be so slow

In the last weeks there has been a lot of talking about gold, a new linker imported in Binutils, developed by Ian Lance Taylor (for Google as far as I know). While I didn’t have time yet to try it out, as Enterprise is still headless and I’m trying not to use it too much remotely, I did read a lot about it and I’m just waiting to try it out.

Taylor’s blog has a lot of insight about gold a about what a linker does. I admit I didn’t read most of it, I just skimmed through entries for now, but it does sound quite interesting.

What I want to write about is just a summary, for those “more than users but less than linker experts” out there who don’t understand why a linker has to be that complex, and especially why C++ takes so much more time to link than C.

Most of modern Unix systems use the ELF format for executables and shared objects (dynamic libraries if you come from Windows land). ELF implements a namespace that is directly compatible with standard Unix dynamic loaders, a totally flat namespace.

Having a flat namespace means that all the symbols fall into it, no matter if they are functions or data (constant or variables). If they are functions, the parameters they have don’t matter either. This works fine with C code, that does not have classes and objects, nor multiple namespaces or other things like that. For C++, Java (JNI) and other languages, this is usually a bad restriction.

This is why C++ code takes the name of a symbol, make sure to know which kind of symbol it is, its type and so on, and then mangle it down to a single string. This translate the int foo(char c, int b) function to _Z3fooci. For each object that is referenced, as well as namespaces, the length of the mangled symbol will increase.

What has this to do with its slowness? Well it relates in many ways:

ELF files have stringtables that are optimized to be used by C (or C-based) languages. They are just a sequence of NULL-terminated strings, and when they are referenced, they are referenced as an offset from the start of the table (so that you can just sum the section address of the table in memory, and have a pointer to the string to use). To try reducing the size of the string table, the linker compacts substrings, so for instance if there are two symbols named foobar and simply bar, the linker will create a single string in the table “foobar” and then refer at the two strings with offsets 0 and 4.

The amount of space that you save may seem trivial, but there is one thing to consider: rather than using -Bsymbolic, the usual trick to avoid self-loop resolving was to define, for a given symbol sym, an internal symbol _sym or __sym with a prefix. So statistically you’ll be sharing the ending of a string lots of times. Just try this: nm -D /lib/libc.so.6 | grep __.

With mangled names, you usually share prefixes, but you can’t share suffixes at all, as all the symbols will start with _Z. For this reason, the duplicates suppression will be totally useless. Executing it, though, is an expensive process, as you’ll have to scan all the strings somehow. I can think of more than a couple of algorithms for the suppression, but they all require scanning a huge amount of strings, which is quite expensive, and quite depressing knowing that the symbol can’t be suppressed in any way.

I admit I’m not sure whether ld standard is good enough to identify whether it can search for duplicates or if it will ignore symbols prefixed with _Z. I’m not even sure if gold does avoid that, it probably could considering it was designed to link C++.

Of course the ones I listed here are not the only reasons why ld is so slow. I think Taylor actually hit the first nail on the head directly: ld is not designed for ELF only, and passing through libbfd it makes it not the best performing Unix linker. I sincerely would like to see what FreeBSD will do as they were talking about designing their own ld, which also would support ELF-only.

Exit mobile version