ABI changes keeping backward compatibility

So I’ve written about what ABI is and what breaks it and how to properly version libraries and I left saying that I would be writing about some further details on how to improve compatibility between library versions. The first trick is using symbol versioning to maintain backward compatibility.

The original Unix symbol namespace used for symbols (which is the default one) was flat; that means that a symbol (function, variable or constant) is only ever identified by its name. To make things more versatile, Sun Solaris and GNU C library both implemented “symbol versioning”: a method to assign to the same function one extra “namespace” identifier; the use of this feature is split between avoiding symbol collisions and maintaining ABI compatibility.

In this post I’m focusing on using the feature to maintain ABI compatibility, by changing the ABI of a single function, simulating two versions of the same library, and yet allowing software built against the old version to work just fine with the new one. While it’s possible to achieve the same thing in many different systems, I’m going to focus for ease of explanation to glibc-based Linux systems, in particular on my usual Yamato Gentoo box.

While it does not sound like a real life situation, I’m going to use a very clumsy API-compatible function which ABI can be broken, and which simply reports the ABI version used:

/* lib1.c */
#include 

void my_symbol(const char *str) {
  printf("Using ABI v.1n");
}

The ABI break will come in the form of the const specifier being dropped; since this un-restricts what the function can do with the parameter, it is a “silent” ABI break (in the sense that it wouldn’t warn the user if we weren’t taking precaution). Indeed if we were having the new version written this way:

/* lib2.c */
#include 

void my_symbol(char *str) {
  printf("Using ABI v.2n");
}

the binaries would still execute “cleanly” reporting the new ABI (of course if we were changing the string, and the passed string was a static constant string, that would be a problem; but since I’m using x86-64 I cannot show that since it still does not seem to produce a bus error or anything). Let’s assume that instead of reporting the new ABI version the software would crash because something happened that shouldn’t.

The testing code we’re going to use is going to be pretty simple:

/* test.c */
#if defined(LIB1)
void my_symbol(const char *str);
#elif defined(LIB2)
void my_symbol(char *str);
#endif

int main() {
  my_symbol("ciao");
}

Now we have to make the thing resistant to ABI breakage, which involves using inline ASM and GNU binutils extensions in our case (because I’m trying to simplify for now!). Instead of a single my_symbol function, we’re going to define two public my_symbol_* functions, with the two ABI version variants, and then alias them:

/* lib2.c */
#include 

void my_symbol_v1(const char *str) {
  printf("Using ABI v.1n");
}

void my_symbol_v2(char *str) {
  printf("Using ABI v.2n");
  str[0] = '';
}

__asm__(".symver my_symbol_v1,my_symbol@");
__asm__(".symver my_symbol_v2,my_symbol@@LIB2");

The two aliases are the ones doing the magic, the first defines the v1 variant as having the default versioning (nothing after the “at” character), while the v2 variant, which is the default used for linking if no version is defined (two “at” characters), has the LIB2 version. This does not work right away though: first of all the LIB2 version is defined nowhere, second the two v1 and v2 symbols are also exported allowing other software direct access to the two variants. Since you cannot use the symbol visibility definitions here (you’d be hiding the aliases too) you have to provide the linker with a linker script:

/* lib2.ldver */
LIB2 {
     local:
    my_symbol_*;
};

With this script you’re telling the linker that all the my_symbol_ variants are to be considered local to the object (hidden) and you are, as well, creating a LIB2 version to assign to the v2 variant.

But how well does it work? Let’s build two binaries against the two libraries, then execute them with the two:

flame@yamato versioning % gcc -fPIC -shared lib1.c -Wl,-soname,libfoo.so.0 -o abi1/libfoo.so.0.1
flame@yamato versioning % gcc -fPIC -shared lib2.c -Wl,-soname,libfoo.so.0 -Wl,-version-script=lib2.ldver -o abi2/libfoo.so.0.2
flame@yamato versioning % gcc test.c -o test-lib1 abi1/libfoo.so.0.1                              
flame@yamato versioning % gcc test.c -o test-lib2 abi2/libfoo.so.0.2                              
flame@yamato versioning % LD_LIBRARY_PATH=abi1 ./test-lib1       
Using ABI v.1
flame@yamato versioning % LD_LIBRARY_PATH=abi1 ./test-lib2
./test-lib2: abi1/libfoo.so.0: no version information available (required by ./test-lib2)
Using ABI v.1
flame@yamato versioning % LD_LIBRARY_PATH=abi2 ./test-lib1
Using ABI v.1
flame@yamato versioning % LD_LIBRARY_PATH=abi2 ./test-lib2
Using ABI v.2

As you see, the behaviour of the program linked against the original version is not changed when moving to the new library, while the opposite is true for the program linked against the new and executed against the old one (that would be forward compatibility of the library, which is rarely guaranteed).

This actually makes it possible to break ABI (but not API) and then be able to fix bugs, it shouldn’t be abused since it does not always work that well and it does not work on all the systems out there. But it helps to deal with legacy software that needs to be kept backward-compatible and yet requires fixes.

For instance, if you didn’t follow the advice of always keeping alloc/free interfaces symmetrical, and you wanted to replace a structure allocation from standard malloc() to g_malloc() to GSlice, you can do that by using something like this:

/* in the header */

my_struct *my_struct_alloc();
#define my_struct_free(x) g_slice_free(my_struct, x)

/* in the library sources */

my_struct *my_struct_alloc_v0() {
  return malloc(sizeof(my_struct));
}

my_struct *my_struct_alloc_v1() {
  return g_malloc(sizeof(my_struct));
}

my_struct *my_struct_alloc_v2() {
  return g_slice_new(my_struct);
}

__asm__(".symver my_struct_alloc_v0,my_struct_alloc@");
__asm__(".symver my_struct_alloc_v1,my_struct_alloc@VER1");
__asm__(".symver my_struct_alloc_v2,my_struct_alloc@@VER2");

Of course this is just a proof of concept, if you only allocated the structure, two macros would be fine; expect something to be done to the newly allocated function then. If versioning wasn’t used, software built against the first version, using malloc(), would crash when freeing the memory area with the GSlice deallocator; on the other hand with this method, each software built against a particular allocator will get its own deallocator properly, even in future releases.

Have you seen some gold?

Since I have in my TODO list to work on two binutils problems (the warning on softer —as-needed and the fix for PulseAudio build), I also started wondering why I haven’t heard, or rather read, anything about the gold linker .

Saying that I’m disappointed does not really cover much of it to be honest, since I don’t really wish to switch to a linker written in C++ any time soon. But I really hoped that it would generate enough momentum to find a solution. Because, yes, the ld linker that ships with binutils is tremendously slow to link C++ code, and as Linkers & Loaders let me understand now, the problem is not just the length of the (mangled) symbol names, but also the way that templates are expanded and linked together.

But still, I think it’s really worth investigating some alternative, which in my opinion needs not to be written in C++, with all the problems related to that. Saying that the gold linker is fast just because of the language it is written is absolutely naïve, since the problems lie quite deeper than that.

The main problem is that the current ld implementation is based, like the rest of the binutils tools, upon libbfd, an abstraction that allows to support multiple binary formats, not just ELF. It basically allows to use mostly the same interface on different operating systems with different executable formats: ELF under Linux, BSD and Solaris, Mach-O under Mac OS X and PE under Windows and more. While this allows to get a much more powerful ld command, it’s actually a bit of a bottleneck.

Even though the thing is designed well enough for not crumble easily, it is probably a good area to investigate to find why it’s so slow. Having an alternative, ELF-only linker available for users, Gentoo users especially, would likely be a good test. This would follow the same thing that Apple does on OSX (GCC calls Apple’s linker) as well as Sun under Solaris with their copy of GCC.

While I’m all for generic code, sometimes you need to have specialised tools if you want to access advanced features of files, or if you want to have a fast, optimised software.

The same thing can be said for the analysis tool provided by binutils, as I’ve written in my post about elfutils the nm, readelf and objdump tools as provided by binutils, to be generic, lack some of the useful defaults and different interface that elfutils have. Which goes to show why specialised tools here could help. I know that FreeBSD was working on providing replacement for these tools, under the BSD license as their usual. While that’s certainly an important step, I don’t remember reading anything about a new linker.

As it is, I haven’t gone out of my way to see if there are already some alternative linkers that work under Linux, beside the one provided by Sun’s compiler in Sun Studio Express (which has lots of problems on its own). If there is already one we should look at how it stands for what concerns features.

What we desire from a specialised linker, beside speed, is proper support for .gnu.hash section, --as-needed-like features, no text relocation emitted in the code (which is a problem gold used to have at least), and possibly a better support to garbage collection of unused sections that could allow using it in production code without huge impact on performance as it seems to happen with -fdata-sections and -ffunction-sections.

I’m not going to work on this, but if somebody is interested in my opinion about using, in Gentoo, any linker in particular I’d be glad to look at them, not going to spare words though, so that you know.

A softer –as-needed

Following my previous blog about un-released autoconf I wanted to write a bit about an unreleased change in binutils’ ld, that Sébastien pointed me at a few days ago. Unfortunately, since things piled up, the code is now actually released, and I briefly commented about it in the as needed by default bug. The change is only in the un-keyworded snapshot of pre-2.20 binutils so it’s not released to users, which makes it worth commenting before hand anyway.

The change is as follows:

--as-needed now links in a dynamic library if it satisfies undefined symbols in regular objects, or in other dynamic libraries. In the latter case the library is not linked if it is found in a DT_NEEDED entry of one of the libraries already linked.

If you know how --as-needed works and the ELF-related terms, you should be able already to guess what it’s actually doing. If you’re not in the known with this, you should probably read again my old post about it. Basically the final result of this is that the first situation:

Messy linking diagram

gets expanded in the wished linking situation:

Hoped linking situation

instead of the broken one that wouldn’t work.

This is all good, you’d expect, no? I have some reserves about it. First of all, the reason for this change is to accommodate the needs of virtual implementation libraries like blas and similar. In particular the thread refers to the requirements of gsl to not link its blas implementation leaving it to the user linking the final application. While I agree that’s a desired feature, it has to be noted that all the libraries needs to keep the same ABI, otherwise just changing it on the linker call is not going to work. Which means that you can technically change the implementation by using the LD_PRELOAD environment variable to interpose the new symbols at runtime, allowing to change the possible implementation at runtime without having to relink anything.

Of course, using LD_PRELOAD is not very handy especially if you want to do it on a per-command basis or anything like that. But one could probably wonder “Why on Earth didn’t someone think of a better method for it before?” and then answer to himself after a bit of search “Someone already did!”. Indeed a very similar situation arouse on FreeBSD 5 series since there were multiple PThread implementations available. Since the ABI of the implementations is the same, they can be switched at both link editing time and at runtime linking. And to make it easier to switch it at runtime, they created a way to configure it through the /etc/libmap.conf file.

Indeed, the method to choose different implementations of PThread under FreeBSD used before libmap.conf introduction was the same that gsl wants to use. The result of which already shown that --as-needed was unusable on FreeBSD because of a similar problem: libpthread was never added to the dependencies of any library and was supposed to be linked on the final executable, that might not have any requirement for PThread by itself.

So basically the whole reasoning for softening up --as-needed is to allow working around a missing feature in the GNU runtime linker. Which is what, to me, makes it wrong. Not wrong in the sense of the wrong thing to do, but the wrong reason to do it. But it’s not that simple. Indeed this change means that there will be much less build failures with --as-needed, making it much much more likely to become part of the default options of the compiler once binutils 2.20 is released. On the other hand, I think I’ll submit a patch for ld to warn when the new code is triggered.

My reasoning is quite simple: libraries should be, as much as possible, be linked completely so that all their dependencies are well stated, especially since leaving them to the final executable to link can create a huge mess (think if the final executable is linked on a system where a different, ABI-incompatible, dependency is present, the final executable will have subtle problems running, like unexpected crashes and the like), and also, if ten executables need a single library, which forgets to state its dependency on, just as an example, libexpat, you get ten links to libexpat that needs to be re-created (while the original library will not be picked up at all by the way, so will still expect the ABI of the previous version), rather than just one.

Since indeed the softer --as-needed makes it much simpler to enable it by default, I think it’s not a good idea to revert from the new behaviour, but having a warning that would say something like “-lexpat salvaged for -lfoo” would make it easy to identify the issue and assess on a case by case basis whether this is an intended situation or just a bug. So that the latter can be corrected.

On the other hand I also have a case of failure with recursive linking, coming out of the next PulseAudio release, which I need to get fixed, hopefully before PulseAudio is released.

For elves, size matters

This weekend I was supposed to take it slow, relax, and spend it rebuilding my strength, playing and stuff. At the end I wasn’t able to do much of that, and instead I spent it looking at projects I left dangling for a while, like my very own Ruby-Elf.

One beef I always had with ELF analysis tools was with the size command, that is used to get some quick statistics about the internal size distribution of an ELF file. I was interested in that because it’s tightly related to what cowstats tells you already. Indeed if you look at the following output, the data column is the total size of Copy on Write sections for the file.

% size /usr/lib/libxml2.so              
   text	   data	    bss	    dec	    hex	filename
1425219	  35772	   5368	1466359	 165ff7	/usr/lib/libxml2.so

You’d expect me to be happy that there is already a tool that extracts that information out of compiled ELF files, but I really am not because this does not actually provide enough granularity for what my targets are, usually.

First of all, you have to learn that the three names for the columns are not the actual names of the involved sections, even though .text, .data and .bss are three common ELF sections. The actual meanings are for bss, the sum of the sizes of all the allocated sections which are not mapped to the zero page (like .bss and .tbss), for data the sum of the sizes of all the allocated writeable sections, and for text the sum of the sizes of all the remaining allocated sections. This means that .rodata sections are counted in text rather than data, which can easily be confusing.

I blame this confusion also on the size(1) man page that comes from binutils, since it contains this:

Here is an example of the Berkeley (default) format of output from size:

“`
$ size –format=Berkeley ranlib size
text data bss dec hex filename
294880 81920 11592 388392 5ed28 ranlib
294880 81920 11888 388688 5ee50 size
“`

This is the same data, but displayed closer to System V conventions:

“`
$ size –format=SysV ranlib size
ranlib :
section size addr
.text 294880 8192
.data 81920 303104
.bss 11592 385024
Total 388392

size :
section size addr
.text 294880 8192
.data 81920 303104
.bss 11888 385024
Total 388688
“`

This would make you expect that the three columns map directly to the three named sections. Instead if you actually used the System V format for size on libxml2 you’d get a result much different:

% size --format=SysV /usr/lib/libxml2.so
/usr/lib/libxml2.so  :
section             size      addr
.gnu.hash          13012       456
.dynsym            43464     13472
.dynstr            35235     56936
.gnu.version        3622     92172
.gnu.version_r       128     95800
.rela.dyn          76872     95928
.rela.plt           3216    172800
.init                 24    176016
.plt                2160    176040
.text            1011448    178208
.fini                 14   1189656
.rodata           132592   1189696
.eh_frame_hdr      19508   1322288
.eh_frame          83924   1341800
.ctors                16   3526016
.dtors                16   3526032
.jcr                   8   3526048
.data.rel.ro       28056   3526080
.dynamic             448   3554136
.got                 720   3554584
.got.plt            1096   3555304
.data               5412   3556416
.bss                5368   3561856
.gnu_debuglink        28         0
Total            1466387

It counts all the sections one by one rather than grouping them. I’m still not sure on how it decides whether to print or not a given section, I’d have to look at the actual code from binutils to say that. It’s not counting just the allocated sections (so the ones that count in for the vmsize once the file is loaded to spawn a process), or otherwise .gnu_debuglink wouldn’t be in the list, but it’s also not showing all of the sections, since .shstrtab is not there at all. Interestingly enough, if you use the eu-size program that comes with Elfutils rather than the version coming from GNU Binutils, you’ll see that .gnu_debuglink is not in the list, which suggests a bug in Binutils.

But it’s not just this inconsistency in naming that bothers me, it’s more that the output information is almost entirely pointless to make actual guesses about the behaviour of a program, although that’s the most common usage of size. The problem is that you cannot distinguish between increases in the size of the code, or in the size of the constant tables it uses. It also does not allow to easily know whether the amount of data is or not mitigated by prelink.

The SysV-style output is even more useless since if you want to know the full data of the sections you can just use the readelf -S command, which reports among other things the size and the address it is loaded at, which is just what SysV style does. The only difference is that you cannot change radix and that it does not give you a total, but even then I don’t see what’s the point after all. By the way if you think that readelf -S is too difficult to read, try eu-readelf -S, it’s nicer.

So at the end of the day what am I supposed to do? Well at this point I’m writing my own script, rbelf-size, which shows a different set of headers, which allows me to actually distinguish between the different sections so I can judge the changes in software and libraries in different versions and with different patches. I hope to have it available on Ruby-Elf’s repository by the end of next week, maybe it can be useful to others too.

Multiple mini libraries, –as-needed and wrappers

In a previous post of mine, Mart (leio) advocated the use of misdirected link to enable splitting the non-RSS-hungry libxml2 modules from the ones that create a lot of dirty pages; his concern is very true and I can feel it very well, since libxml2 is indeed a bit of a memory-hungry library. On my firefox instance it reports this:

     vmsize   rss clean   rss dirty   file
      32 kb        0 kb       32 kb   /usr/lib64/libxml2.so.2.7.2
       8 kb        0 kb        8 kb   /usr/lib64/libxml2.so.2.7.2
    1396 kb      336 kb        0 kb   /usr/lib64/libxml2.so.2.7.2

While it is shared, it still has 336KiB of resident memory, which is something that is not too bad but not even too good, after all. But how would one split that library? Well you got to know libxml2 interface a bit to understand this fully, so let’s just try to say that libxml2 has a modular design, and it offers a series of interfaces that are more or less tied together.

For instance, for my daily job I had to write a proprietary utility that uses libxml2 XPath interface as well as the writer module that allows for easy writing of XML files with a very nice interface (the work was done under Windows; building and using libxml2 was much easier than trying to get Codegear’s parser to work, or to interface to Microsoft’s MSXML libraries). I disabled everything that was not needed for this to work, and reduced libxml2 to the minimum amount of needed code.

Software that only needs parsing wouldn’t need the writer module, and not all would require DOM, SAX or PUSH, or XPath and XPointer, and so on so forth. To be able to disable the extra stuff there are a series of ./configure flags, but mapping those to USE flags is not really feasible since you’d be breaking ABI; plus a solution should be found with upstream in my opinion.

So what Mart suggested was breaking the library in half, with a “mini” version being the non-memory-hungry and the rest of the interfaces. My proposal here would be much bigger, breaking the ABI a lot, but also very very exhaustive: break up libxml2 in a series of small libraries each representing an interface. A software needing one of them would link it in and be done with it. Beside breaking ABI, this would also break all the software using libxml2 though, even rebuilding it, which is very very bad. Well, the solution is actually much easier:

OUTPUT_FORMAT ( elf64-x86-64 )
GROUP ( AS_NEEDED ( libxml2-sax2.so libxml2-schemas.so libxml2-schematron.so libxml2-writer.so libxml2-xpath.so .... ) )

This is an ldscript, which tells the linker what to do; save it as libxml2.so and linking with -lxml2 will pull in just the required libraries for the interface used by the program. If you look at your /usr/lib, you got already quite a few of these because Gentoo installs those for the libraries that are moved into /lib instead. This works around the inability to use misdirected linking for wrappers.

Now of course this trick does not work with every linker out there; but it works with GNU ld and with Sun’s linker, and those are the two for which --as-needed makes sense; if libxml2 where to break itself in multiple libraries, they could decide depending on a configure option whether to install a ldscript wrapper or a non-asneeded capable library, so that Linux, FreeBSD and Solaris (and others) would use the ldscript without adding further ELF files, and the others would go with a compatibility method.

Please also note that using pkg-config for libraries discovery would make this also easier without having wrappers at all, as libxml2.pc would just have to list all the interfaces in their Require: line.

OpenSolaris Granularity

If you follow my blog since a long time ago you know I had to fight already a couple of time with Solaris and VirtualBox to get a working Solaris virtual machine to test xine-lib and other sotware on.

I tried again yesterday to get one working, since Innotek was bought by Sun, VirtualBox support for Solaris improved notably, to the point they now have a different network card emulated by default, that works with Solaris (that has been the long-standing problem).

So I was able to install OpenSolaris, and thanks to Project Indiana I was able to check which packages were installed, to remove stuff I don’t need and add what I needed. Unfortunately I think the default granularity is a bit concerning. Compiz on a virtual machine?

The first thing I noticed is that an update of a newly-installed system with the last released media requires to download almost the size of the whole disk in updates, the disk is a simple 650MB CD image, and the updates were over 500MB. I suppose this is to be expected, but at that point, why not pointing to some updated media by default, considering updating is far from being trivial? Somehow I was unable to perform the update properly with the GUI package manager, and I had to use the command-line tools.

Also, removing superfluous packages is not an easy task, since the dependency tracking is not exactly the best out there: it’s not strange for a set of packages not to be removed because some of them are dependencies… of one of them being removed (this usually seems to be due to plugin-ins; even after removing the plugins, it’d still cache the broken dependency and disallow me from removing the packages).

It’s not all here of course, for instance to find the basic development tools in their package manager is a problem of its own; while if you look for “automake” it will find a package named SUNWgnu-automake, if you look for “autoconf” it will find nothing; the package is called SUNWaconf. I still haven’t been able to find pkg-config, although the system installs .pc files just fine.

I guess my best bet would be to remove almost everything out the system from their own package manager and decide to try prefixed-Portage, but I just haven’t had the will to look into that just yet. I hope it would also help with the version of GCC that Sun provides (3.4.3).

I got interested back into Solaris since, after a merge of Firefox 3.0.2, I noticed cowstats throwing up an error on an object file, and following to that, I found out a couple of things:

  • cowstats didn’t manage unknown sections very well;
  • Firefox ships with some testcases for the Google-developed crash handler;
  • one of these testcases is an ELF ET_EXEC file (with .o extension) built for Solaris, that reports a standard SysV ABI (rather than a Sun-specific one), but still contains Sun-specific sections;
  • readelf from Binutils is not that solid as its homologue from Elfutils.

Now cowstats should handle these corner-cases pretty well, but I want to enrich my testcases with some Solaris objects. Interestingly enough, in ruby-elf probably 80% of the size of an eventual tarball would be taken up by test data rather than actual code. I guess this is a side-effect of TDD, but also exactly why TDD-based code is usually more solid (every time I find an error of mine in ruby-elf, I tend to write a test for it).

Anyway, bottom line: I think Project Indiana would have been better by adapting RPM to their needs rather than inventing the package manager they invented, since it doesn’t seem to have any feature lacking in Fedora, but it lacks quite a bit of other things.

Some numbers about duplicate strings collapsing in ld

I haven’t had time to test gold yet, but at least Bernard saved me from finding out how to do so :)

I plan on doing it today, as I just finished a script, based again on my ruby-elf, that calculates how much space is saved by collapsing duplicated strings as one.

Interestingly enough, the biggest differences don’t come from glibc as I expected, although it is in the third place:

/usr/lib64/libbonobo-2.so: current size 36369, full size 44794 difference 8425
/usr/lib64/perl5/vendor_perl/5.8.8/x86_64-linux-thread-multi/auto/SVN/_Core/_Core.so: current size 15464, full size 19296 difference 3832
/lib64/libc-2.7.so: current size 21430, full size 24361 difference 2931
/usr/lib64/libasound.so: current size 37976, full size 40698 difference 2722
/usr/lib64/libMagickWand.so: current size 17056, full size 19740 difference 2684

At least in the case of ALSA, the problem sees to lie in __-prefied entries. But I see a lot of them having the dlsym name in it, which seems to be a way to mix different ALSA libraries… probably something that hidden visibility could take care of, which would probably be very nice for a lot of reasons: 38KB of strings seems to be a huge amount of memory to me.

For what concerns libMagickWand, it seems like it’s wrapping around functions from libMagick itself, which explains why they all are copied with a Magick prefix to them.

In the case of the SVN Perl bindings, which are present with almost all of their modules, it seems to me like a very handling of symbols by swig’s part: there are lots of _wrap_-prefixed symbols, followed by the name of the original symbol.

As you have guessed by now, most of the libraries having huge saving by collapsing strings are those implementing wrappers around other libraries. This is because they usually just add a prefix to the symbol, and the names of defined and imported symbols are handled in a single pool of strings. If instead of a prefix they added a suffix, there would be no saving at all.

What does all of this tell me? Well first of all I think that the extra memory that would be used up by disabling the collapsing (I care about the memory, not the on-disk size, unless we’re talking about embedded systems), may as well be worth a faster linking time; eventually, it would be nice to be able to just use the collapsing for the libraries that implement those wrappers.

On a different level, one could ask why all the _wrap_ functions in Subversion’s Perl bindings are exported, as one would probably expect them not to be needed but internally. Seems like Perl bindings often suffer from this problem, as also Net-SSLeay has a similar problem, even though the prefix is XS_Net__SSLeay_ instead.

And one can easily see a probable mistake done by stuff that wraps symbols around: adding a suffix rather than a prefix. When using a prefix, in case you need to reduce the size, you can do the collapsing, when using a suffix, you simply can’t, at least the way ELF files are handled right now. Or change the name of the original symbol, like to capitalise the first letter of the symbol, or to drop a common prefix to replace it with another, which is what the XML-LibXML binding does (and thus doesn’t share a lot of strings that it could share, although again the proper solution would be not to export them at all).

Anyway, the only substantial difference I see are mostly in Perl’s bindings, in a lot of Perl’s bindings to be honest, so I suppose one way to make the difference really unimportant for most desktop/server uses would be to fix XS and swig to hide their symbols. Then making it optional to collapse the duplicates would be a really really good idea.

Some insights on why a linker can be so slow

In the last weeks there has been a lot of talking about gold, a new linker imported in Binutils, developed by Ian Lance Taylor (for Google as far as I know). While I didn’t have time yet to try it out, as Enterprise is still headless and I’m trying not to use it too much remotely, I did read a lot about it and I’m just waiting to try it out.

Taylor’s blog has a lot of insight about gold a about what a linker does. I admit I didn’t read most of it, I just skimmed through entries for now, but it does sound quite interesting.

What I want to write about is just a summary, for those “more than users but less than linker experts” out there who don’t understand why a linker has to be that complex, and especially why C++ takes so much more time to link than C.

Most of modern Unix systems use the ELF format for executables and shared objects (dynamic libraries if you come from Windows land). ELF implements a namespace that is directly compatible with standard Unix dynamic loaders, a totally flat namespace.

Having a flat namespace means that all the symbols fall into it, no matter if they are functions or data (constant or variables). If they are functions, the parameters they have don’t matter either. This works fine with C code, that does not have classes and objects, nor multiple namespaces or other things like that. For C++, Java (JNI) and other languages, this is usually a bad restriction.

This is why C++ code takes the name of a symbol, make sure to know which kind of symbol it is, its type and so on, and then mangle it down to a single string. This translate the int foo(char c, int b) function to _Z3fooci. For each object that is referenced, as well as namespaces, the length of the mangled symbol will increase.

What has this to do with its slowness? Well it relates in many ways:

  • first off, C++ generates a lot symbols, as every class needs a set of symbols (typeinfo, vtable, …); the symbols need to be resolved and linked in the final elf file, they need to be listed as exported and so on, so the increased amount of symbols makes a difference here;
  • the classic System V hash table of symbols can easily be thrown off by the huge amount of symbols with the same prefix (for class methods and namespaces), creating duplicate hashes, which needs to be taken care of, both at build and at run time; this is being solved as now binutils creates a .gnu.hash section with an hash table using a different algorithm, which makes it possible to avoid most of the collisions; unfortunately the default of ld is still to produce both .hash with SysV-style hashes and .gnu.hash, which requires computation of both hashes;
  • hidden visibility reduces the amount of symbols that the linkers has to take care of, which means less symbols to export, less hashes to calculate, but it has a disadvantage, as the build-time linker (ld) has to identify the symbols to hide, bind them statically inside the output file, and then throw them away from the list of symbols to export;
  • but what I think can be a real bottleneck is the duplicates suppression of symbols in the stringtables.

ELF files have stringtables that are optimized to be used by C (or C-based) languages. They are just a sequence of NULL-terminated strings, and when they are referenced, they are referenced as an offset from the start of the table (so that you can just sum the section address of the table in memory, and have a pointer to the string to use). To try reducing the size of the string table, the linker compacts substrings, so for instance if there are two symbols named foobar and simply bar, the linker will create a single string in the table “foobar” and then refer at the two strings with offsets 0 and 4.

The amount of space that you save may seem trivial, but there is one thing to consider: rather than using -Bsymbolic, the usual trick to avoid self-loop resolving was to define, for a given symbol sym, an internal symbol _sym or __sym with a prefix. So statistically you’ll be sharing the ending of a string lots of times. Just try this: nm -D /lib/libc.so.6 | grep __.

With mangled names, you usually share prefixes, but you can’t share suffixes at all, as all the symbols will start with _Z. For this reason, the duplicates suppression will be totally useless. Executing it, though, is an expensive process, as you’ll have to scan all the strings somehow. I can think of more than a couple of algorithms for the suppression, but they all require scanning a huge amount of strings, which is quite expensive, and quite depressing knowing that the symbol can’t be suppressed in any way.

I admit I’m not sure whether ld standard is good enough to identify whether it can search for duplicates or if it will ignore symbols prefixed with _Z. I’m not even sure if gold does avoid that, it probably could considering it was designed to link C++.

Of course the ones I listed here are not the only reasons why ld is so slow. I think Taylor actually hit the first nail on the head directly: ld is not designed for ELF only, and passing through libbfd it makes it not the best performing Unix linker. I sincerely would like to see what FreeBSD will do as they were talking about designing their own ld, which also would support ELF-only.

Let’s rebuild the world

I’ve just launched an emerge -e world. Why? Because of the binutils bug I found earlier today, that brought me to downgrade binutils… and now all the libraries built with the newer version results unreadable by the older version:

flame@enterprise ~ % strip `which kopete`
strip: /usr/kde/3.5/bin/kopete: File format not recognized

lovely, isn’t it?

So I’ll be rebuilding the whole system here on defiant, which will take a couple of days to a minimum, and I won’t be able to do much more than that for now. But I want to give an hint of what I’m going to do as soon as this system is back up: I’m going to clean up the mess of aRTs… not sure if I’m going to patch kdelibs to allow KNotify to work without arts (the patch I found on genkdesvn is broken!), but I’ll be removing the arts useflag on packages that don’t have any use for it.

After that I’ll probably be also looking for kdnssd_avahi package, as I’d like to improve the general KDE desktop on Gentoo, up to now we remained as vanilla as possible, if we exclude the KPDF patch to use Poppler that I’ve applied to avoid security bumping it every two weeks, but the result is that too many users use overlays that ends up being a burden to support… I’d rather increase slightly the general changes and try to have more similar setups among our users. It’s also useful to avoid broken patches like the one I’ve found on genkdesvn, and others in the past.

Anyway, Now I’m off to rebuild.

My thoughts about the GNU hash style

So, I know I’ve been silent for the past two days, and also today I won’t write too much. I’m still upset after the other night and sad for a goodbye I had to say even if I didn’t want to, which means that I’m in no mood for hard development (and I’m still without a system to work on Gentoo/FreeBSD).

Fortunately, Timothy is now taking care of some of the stuff I left behind, like keywording and some patching. And thanks to Alex now sandbox should be pretty fine on Gentoo/FreeBSD; I’ll be unmasking it as soon as I can test it firsthand (not base-profile-wide, but only for 6.2 profile).

Anyway, I was today considering, after about a week of usage of the new GNU hash style introduced with glibc 2.5 and binutils 2.17. The startup speed improvement is unnoticed, compared to the improvements of direct bindings (although they were all but safe).

The downside seems to be that when using the newest binutils, all the ELF files have 2MB of data added, which is very bad, because you end up trading a CPU-intensive task (the symbols’ binding) with the I/O intensive loading. I’m not sure why that happens, but it’s due to 2.17.50 versions of binutils. With the power poured in CPUs, I won’t accept this trade right now, as my I/O is all but lightspeed (still using SATA drives, and I have other tasks that are I/O intensive on their own). Consider that Konqueror links to 43 libraries, which means 86MB of hashes to load, KMail to 65, Amarok to 33 but you need to add the xine plugins, and all of this is with as-needed enabled, it gets worse if you don’t use it.

So, for the ricers out there, who are using the snapshot versions of binutils because they think they’ll get better performance… reconsider your thoughts, binutils 2.17.50.5 will decrease your performances, not improve them. I’ll try to see if the next releases fix this that seems to me like a bug, and reconsider hash style then.