Some insights on why a linker can be so slow

In the last weeks there has been a lot of talking about gold, a new linker imported in Binutils, developed by Ian Lance Taylor (for Google as far as I know). While I didn’t have time yet to try it out, as Enterprise is still headless and I’m trying not to use it too much remotely, I did read a lot about it and I’m just waiting to try it out.

Taylor’s blog has a lot of insight about gold a about what a linker does. I admit I didn’t read most of it, I just skimmed through entries for now, but it does sound quite interesting.

What I want to write about is just a summary, for those “more than users but less than linker experts” out there who don’t understand why a linker has to be that complex, and especially why C++ takes so much more time to link than C.

Most of modern Unix systems use the ELF format for executables and shared objects (dynamic libraries if you come from Windows land). ELF implements a namespace that is directly compatible with standard Unix dynamic loaders, a totally flat namespace.

Having a flat namespace means that all the symbols fall into it, no matter if they are functions or data (constant or variables). If they are functions, the parameters they have don’t matter either. This works fine with C code, that does not have classes and objects, nor multiple namespaces or other things like that. For C++, Java (JNI) and other languages, this is usually a bad restriction.

This is why C++ code takes the name of a symbol, make sure to know which kind of symbol it is, its type and so on, and then mangle it down to a single string. This translate the int foo(char c, int b) function to _Z3fooci. For each object that is referenced, as well as namespaces, the length of the mangled symbol will increase.

What has this to do with its slowness? Well it relates in many ways:

  • first off, C++ generates a lot symbols, as every class needs a set of symbols (typeinfo, vtable, …); the symbols need to be resolved and linked in the final elf file, they need to be listed as exported and so on, so the increased amount of symbols makes a difference here;
  • the classic System V hash table of symbols can easily be thrown off by the huge amount of symbols with the same prefix (for class methods and namespaces), creating duplicate hashes, which needs to be taken care of, both at build and at run time; this is being solved as now binutils creates a .gnu.hash section with an hash table using a different algorithm, which makes it possible to avoid most of the collisions; unfortunately the default of ld is still to produce both .hash with SysV-style hashes and .gnu.hash, which requires computation of both hashes;
  • hidden visibility reduces the amount of symbols that the linkers has to take care of, which means less symbols to export, less hashes to calculate, but it has a disadvantage, as the build-time linker (ld) has to identify the symbols to hide, bind them statically inside the output file, and then throw them away from the list of symbols to export;
  • but what I think can be a real bottleneck is the duplicates suppression of symbols in the stringtables.

ELF files have stringtables that are optimized to be used by C (or C-based) languages. They are just a sequence of NULL-terminated strings, and when they are referenced, they are referenced as an offset from the start of the table (so that you can just sum the section address of the table in memory, and have a pointer to the string to use). To try reducing the size of the string table, the linker compacts substrings, so for instance if there are two symbols named foobar and simply bar, the linker will create a single string in the table “foobar” and then refer at the two strings with offsets 0 and 4.

The amount of space that you save may seem trivial, but there is one thing to consider: rather than using -Bsymbolic, the usual trick to avoid self-loop resolving was to define, for a given symbol sym, an internal symbol _sym or __sym with a prefix. So statistically you’ll be sharing the ending of a string lots of times. Just try this: nm -D /lib/libc.so.6 | grep __.

With mangled names, you usually share prefixes, but you can’t share suffixes at all, as all the symbols will start with _Z. For this reason, the duplicates suppression will be totally useless. Executing it, though, is an expensive process, as you’ll have to scan all the strings somehow. I can think of more than a couple of algorithms for the suppression, but they all require scanning a huge amount of strings, which is quite expensive, and quite depressing knowing that the symbol can’t be suppressed in any way.

I admit I’m not sure whether ld standard is good enough to identify whether it can search for duplicates or if it will ignore symbols prefixed with _Z. I’m not even sure if gold does avoid that, it probably could considering it was designed to link C++.

Of course the ones I listed here are not the only reasons why ld is so slow. I think Taylor actually hit the first nail on the head directly: ld is not designed for ELF only, and passing through libbfd it makes it not the best performing Unix linker. I sincerely would like to see what FreeBSD will do as they were talking about designing their own ld, which also would support ELF-only.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s