Some numbers about duplicate strings collapsing in ld

I haven’t had time to test gold yet, but at least Bernard saved me from finding out how to do so 🙂

I plan on doing it today, as I just finished a script, based again on my ruby-elf, that calculates how much space is saved by collapsing duplicated strings as one.

Interestingly enough, the biggest differences don’t come from glibc as I expected, although it is in the third place:

/usr/lib64/ current size 36369, full size 44794 difference 8425
/usr/lib64/perl5/vendor_perl/5.8.8/x86_64-linux-thread-multi/auto/SVN/_Core/ current size 15464, full size 19296 difference 3832
/lib64/ current size 21430, full size 24361 difference 2931
/usr/lib64/ current size 37976, full size 40698 difference 2722
/usr/lib64/ current size 17056, full size 19740 difference 2684

At least in the case of ALSA, the problem sees to lie in __-prefied entries. But I see a lot of them having the dlsym name in it, which seems to be a way to mix different ALSA libraries… probably something that hidden visibility could take care of, which would probably be very nice for a lot of reasons: 38KB of strings seems to be a huge amount of memory to me.

For what concerns libMagickWand, it seems like it’s wrapping around functions from libMagick itself, which explains why they all are copied with a Magick prefix to them.

In the case of the SVN Perl bindings, which are present with almost all of their modules, it seems to me like a very handling of symbols by swig’s part: there are lots of _wrap_-prefixed symbols, followed by the name of the original symbol.

As you have guessed by now, most of the libraries having huge saving by collapsing strings are those implementing wrappers around other libraries. This is because they usually just add a prefix to the symbol, and the names of defined and imported symbols are handled in a single pool of strings. If instead of a prefix they added a suffix, there would be no saving at all.

What does all of this tell me? Well first of all I think that the extra memory that would be used up by disabling the collapsing (I care about the memory, not the on-disk size, unless we’re talking about embedded systems), may as well be worth a faster linking time; eventually, it would be nice to be able to just use the collapsing for the libraries that implement those wrappers.

On a different level, one could ask why all the _wrap_ functions in Subversion’s Perl bindings are exported, as one would probably expect them not to be needed but internally. Seems like Perl bindings often suffer from this problem, as also Net-SSLeay has a similar problem, even though the prefix is XS_Net__SSLeay_ instead.

And one can easily see a probable mistake done by stuff that wraps symbols around: adding a suffix rather than a prefix. When using a prefix, in case you need to reduce the size, you can do the collapsing, when using a suffix, you simply can’t, at least the way ELF files are handled right now. Or change the name of the original symbol, like to capitalise the first letter of the symbol, or to drop a common prefix to replace it with another, which is what the XML-LibXML binding does (and thus doesn’t share a lot of strings that it could share, although again the proper solution would be not to export them at all).

Anyway, the only substantial difference I see are mostly in Perl’s bindings, in a lot of Perl’s bindings to be honest, so I suppose one way to make the difference really unimportant for most desktop/server uses would be to fix XS and swig to hide their symbols. Then making it optional to collapse the duplicates would be a really really good idea.