So I’ve decided to dust off my link collision script and see what the situation is nowadays. I’ve made sure that all the suppression file use non-capturing groups on the regular expressions – as that should improve the performance of the regexp matching – make it more resilient to issues within the files (metasploit ELF files are barely valid), and run it through.
Well, it turns out that the situation is bleaker than ever. Beside the obvious amount of symbols with a too-common name, there are still a lot of libraries and programs exporting default bison/flex symbols the same way I found them in 2008:
Symbol yylineno@ (64-bit UNIX - System V AMD x86-64) present 59 times
Symbol yyparse@ (64-bit UNIX - System V AMD x86-64) present 53 times
Symbol yylex@ (64-bit UNIX - System V AMD x86-64) present 49 times
Symbol yy_flush_buffer@ (64-bit UNIX - System V AMD x86-64) present 48 times
Symbol yy_scan_buffer@ (64-bit UNIX - System V AMD x86-64) present 48 times
Symbol yy_scan_bytes@ (64-bit UNIX - System V AMD x86-64) present 48 times
Symbol yy_scan_string@ (64-bit UNIX - System V AMD x86-64) present 48 times
Symbol yy_create_buffer@ (64-bit UNIX - System V AMD x86-64) present 47 times
Symbol yy_delete_buffer@ (64-bit UNIX - System V AMD x86-64) present 47 times
[...]
Note that at least one library got to export them to be listed in this output; indeed these symbols are present in quite a long list of libraries. I’m not going to track down each and every of them though, but I guess I’ll keep an eye on that list so that if problems arise that can easily be tracked down to this kind of collisions.
Action Item: I guess my next post is going to be a quick way to handle building flex/bison sources without exposing these symbols, for both programs and libraries.
But this is not the only issue — I’ve already mentioned a long time ago that a single installed system already brings in a huge number of redundant hashing functions; on the tinderbox as it was when I scanned it, there were 57 md5_init
functions (and this without counting different function names!). Some of this I’m sure boils down to gnulib making it available, and the fact that, unlike the BSD libraries, GLIBC does not have public hashing functions — using libcrypto is not an option for many people.
Action item: I’m not very big of benchmarks myself, never understood the proper way to go around getting the real data rather than being fooled by the scheduler. Somebody who’s more apt at that might want to gather a bunch of libraries providing MD5/SHA1/SHA256 hashing interfaces, and produce some graphs that can let us know whether it’s time to switch to libgcrypt, or nettle, or whatever else that provides us with good performance as well as with a widely-compatible license.
The presence of duplicates of memory-management symbols such as malloc
and company is not that big of a deal, at first sight. After all, we have a bunch of wrappers that use interposing to account for memory usage, plus another bunch to provide alternative allocation strategies that should be faster depending on the way you use your memory. The whole thing is not bad by itself, but when you get one of graphviz’s libraries (libgvpr
) to expose malloc
something sounds wrong. Indeed, if even after updating my suppression filter to ignore the duplicates coming from gperftools and TBB, I get 40 copies of realloc()
something sounds extremely wrong:
Symbol realloc@ (64-bit UNIX - System V AMD x86-64) present 40 times
libgvpr
/mnt/tbamd64/bin/ksh
/mnt/tbamd64/bin/tcsh
/mnt/tbamd64/usr/bin/gtk-gnutella
/mnt/tbamd64/usr/bin/makefb
/mnt/tbamd64/usr/bin/matbuild
/mnt/tbamd64/usr/bin/matprune
/mnt/tbamd64/usr/bin/matsolve
/mnt/tbamd64/usr/bin/polyselect
/mnt/tbamd64/usr/bin/procrels
/mnt/tbamd64/usr/bin/sieve
/mnt/tbamd64/usr/bin/sqrt
/mnt/tbamd64/usr/lib64/chromium-browser/chrome
/mnt/tbamd64/usr/lib64/chromium-browser/chromedriver
/mnt/tbamd64/usr/lib64/chromium-browser/libppGoogleNaClPluginChrome.so
/mnt/tbamd64/usr/lib64/chromium-browser/nacl_helper
/mnt/tbamd64/usr/lib64/firefox/firefox
/mnt/tbamd64/usr/lib64/firefox/firefox-bin
/mnt/tbamd64/usr/lib64/firefox/mozilla-xremote-client
/mnt/tbamd64/usr/lib64/firefox/plugin-container
/mnt/tbamd64/usr/lib64/firefox/webapprt-stub
/mnt/tbamd64/usr/lib64/mercury/lib/asm_fast.gc/libcurs.so
/mnt/tbamd64/usr/lib64/mercury/lib/asm_fast.gc/libmcurses.so
/mnt/tbamd64/usr/lib64/mercury/lib/asm_fast.gc.memprof/libcurs.so
/mnt/tbamd64/usr/lib64/mercury/lib/asm_fast.gc.memprof/libmcurses.so
/mnt/tbamd64/usr/lib64/mercury/lib/asm_fast.gc.prof/libcurs.so
/mnt/tbamd64/usr/lib64/mercury/lib/asm_fast.gc.prof/libmcurses.so
/mnt/tbamd64/usr/lib64/mercury/lib/asm_fast.gc.trseg.debug/libcurs.so
/mnt/tbamd64/usr/lib64/mercury/lib/asm_fast.gc.trseg.debug/libmcurses.so
/mnt/tbamd64/usr/lib64/mercury/lib/asm_fast.gc.trseg/libcurs.so
/mnt/tbamd64/usr/lib64/mercury/lib/asm_fast.gc.trseg/libmcurses.so
/mnt/tbamd64/usr/lib64/mercury/lib/hlc.gc/libcurs.so
/mnt/tbamd64/usr/lib64/mercury/lib/hlc.gc/libmcurses.so
/mnt/tbamd64/usr/lib64/mercury/lib/hlc.gc.trseg/libcurs.so
/mnt/tbamd64/usr/lib64/mercury/lib/hlc.gc.trseg/libmcurses.so
/mnt/tbamd64/usr/lib64/OpenFOAM/OpenFOAM-1.6/lib/libhoard.so
/mnt/tbamd64/usr/lib64/thunderbird/mozilla-xremote-client
/mnt/tbamd64/usr/lib64/thunderbird/plugin-container
/mnt/tbamd64/usr/lib64/thunderbird/thunderbird
/mnt/tbamd64/usr/lib64/thunderbird/thunderbird-bin
Now it is true that it’s possible, depending on the usage patterns, to achieve a much better allocation strategy than the default coming from GLIBC — on the other hand, I’m also pretty sure that GLIBC’s own allocation improved a lot in the past few years so I’d rather use the standard allocation than a custom one that is five or more years old. Again this could use some working around.
In the list above, Thunderbird and Firefox for sure use (and for whatever reason re-expose) jemalloc; I have no idea if libhoard
in OpenFOAM is another memory management library (and whether OpenFOAM is bundling it or not), and Mercury is so messed up that I don’t want to ask myself what it’s doing there. There are though a bunch of standalone programs listed as well.
Action item: go through the standalone programs exposing the memory interfaces — some of them are likely to bundle one of the already-present memory libraries, so just make them use the system copy of it (so that improvements in the library trickle down to the program), for those that use custom strategies, consider making them optional, as I’d expect most not to be very useful to begin with.
There is another set of functions that are similar to the memory management functions, which is usually brought in by gnulib; these are convenience wrappers that do error checking over the standard functions — they are xmalloc
and friends. A quick check, shows that these are exposed a bit too often:
Symbol xmemdup@ (64-bit UNIX - System V AMD x86-64) present 37 times
liblftp-tasks
libparted
libpromises
librec
/mnt/tbamd64/usr/bin/csv2rec
/mnt/tbamd64/usr/bin/dgawk
/mnt/tbamd64/usr/bin/ekg2
/mnt/tbamd64/usr/bin/gawk
/mnt/tbamd64/usr/bin/gccxml_cc1plus
/mnt/tbamd64/usr/bin/gdb
/mnt/tbamd64/usr/bin/pgawk
/mnt/tbamd64/usr/bin/rec2csv
/mnt/tbamd64/usr/bin/recdel
/mnt/tbamd64/usr/bin/recfix
/mnt/tbamd64/usr/bin/recfmt
/mnt/tbamd64/usr/bin/recinf
/mnt/tbamd64/usr/bin/recins
/mnt/tbamd64/usr/bin/recsel
/mnt/tbamd64/usr/bin/recset
/mnt/tbamd64/usr/lib64/lftp/4.4.2/liblftp-network.so
/mnt/tbamd64/usr/lib64/libgettextlib-0.18.2.so
/mnt/tbamd64/usr/lib64/man-db/libman-2.6.3.so
/mnt/tbamd64/usr/libexec/gcc/x86_64-pc-linux-gnu/4.6.3/cc1
/mnt/tbamd64/usr/libexec/gcc/x86_64-pc-linux-gnu/4.6.3/cc1obj
/mnt/tbamd64/usr/libexec/gcc/x86_64-pc-linux-gnu/4.6.3/cc1plus
/mnt/tbamd64/usr/libexec/gcc/x86_64-pc-linux-gnu/4.6.3/f951
/mnt/tbamd64/usr/libexec/gcc/x86_64-pc-linux-gnu/4.6.3/jc1
/mnt/tbamd64/usr/libexec/gcc/x86_64-pc-linux-gnu/4.6.3/lto1
/mnt/tbamd64/usr/libexec/gcc/x86_64-pc-linux-gnu/4.7.2/cc1
/mnt/tbamd64/usr/libexec/gcc/x86_64-pc-linux-gnu/4.7.2/cc1obj
/mnt/tbamd64/usr/libexec/gcc/x86_64-pc-linux-gnu/4.7.2/cc1plus
/mnt/tbamd64/usr/libexec/gcc/x86_64-pc-linux-gnu/4.7.2/f951
/mnt/tbamd64/usr/libexec/gcc/x86_64-pc-linux-gnu/4.7.2/jc1
/mnt/tbamd64/usr/libexec/gcc/x86_64-pc-linux-gnu/4.7.2/lto1
/mnt/tbamd64/usr/libexec/gnat-gcc/x86_64-pc-linux-gnu/4.5/cc1
/mnt/tbamd64/usr/libexec/gnat-gcc/x86_64-pc-linux-gnu/4.5/gnat1
/mnt/tbamd64/usr/libexec/gnat-gcc/x86_64-pc-linux-gnu/4.5/lto1
In this case they are exposed even by the GCC tools themselves! While this brings me again to complain that gnulib show now actually be libgnucompat
and be dynamically linked, there is little we can do about these in programs — but the symbols should not creep in system libraries (mandb has the symbols in its private library which is marginally better).
Action item: check the libraries exposing the gnulib symbols, and make them expose only their proper interface, rather than every single symbol they come up with.
I suppose that this is already quite a bit of data for a single blog post — if you want a copy of the symbols’ index to start working on some of the action items I listed, just contact me and I’ll send it to you, it’s a big too big to just have it published as is.