Redundant symbols

So I’ve decided to dust off my link collision script and see what the situation is nowadays. I’ve made sure that all the suppression file use non-capturing groups on the regular expressions – as that should improve the performance of the regexp matching – make it more resilient to issues within the files (metasploit ELF files are barely valid), and run it through.

Well, it turns out that the situation is bleaker than ever. Beside the obvious amount of symbols with a too-common name, there are still a lot of libraries and programs exporting default bison/flex symbols the same way I found them in 2008:

Symbol yylineno@ (64-bit UNIX - System V AMD x86-64) present 59 times
Symbol yyparse@ (64-bit UNIX - System V AMD x86-64) present 53 times
Symbol yylex@ (64-bit UNIX - System V AMD x86-64) present 49 times
Symbol yy_flush_buffer@ (64-bit UNIX - System V AMD x86-64) present 48 times
Symbol yy_scan_buffer@ (64-bit UNIX - System V AMD x86-64) present 48 times
Symbol yy_scan_bytes@ (64-bit UNIX - System V AMD x86-64) present 48 times
Symbol yy_scan_string@ (64-bit UNIX - System V AMD x86-64) present 48 times
Symbol yy_create_buffer@ (64-bit UNIX - System V AMD x86-64) present 47 times
Symbol yy_delete_buffer@ (64-bit UNIX - System V AMD x86-64) present 47 times

Note that at least one library got to export them to be listed in this output; indeed these symbols are present in quite a long list of libraries. I’m not going to track down each and every of them though, but I guess I’ll keep an eye on that list so that if problems arise that can easily be tracked down to this kind of collisions.

Action Item: I guess my next post is going to be a quick way to handle building flex/bison sources without exposing these symbols, for both programs and libraries.

But this is not the only issue — I’ve already mentioned a long time ago that a single installed system already brings in a huge number of redundant hashing functions; on the tinderbox as it was when I scanned it, there were 57 md5_init functions (and this without counting different function names!). Some of this I’m sure boils down to gnulib making it available, and the fact that, unlike the BSD libraries, GLIBC does not have public hashing functions — using libcrypto is not an option for many people.

Action item: I’m not very big of benchmarks myself, never understood the proper way to go around getting the real data rather than being fooled by the scheduler. Somebody who’s more apt at that might want to gather a bunch of libraries providing MD5/SHA1/SHA256 hashing interfaces, and produce some graphs that can let us know whether it’s time to switch to libgcrypt, or nettle, or whatever else that provides us with good performance as well as with a widely-compatible license.

The presence of duplicates of memory-management symbols such as malloc and company is not that big of a deal, at first sight. After all, we have a bunch of wrappers that use interposing to account for memory usage, plus another bunch to provide alternative allocation strategies that should be faster depending on the way you use your memory. The whole thing is not bad by itself, but when you get one of graphviz’s libraries (libgvpr) to expose malloc something sounds wrong. Indeed, if even after updating my suppression filter to ignore the duplicates coming from gperftools and TBB, I get 40 copies of realloc() something sounds extremely wrong:

Symbol realloc@ (64-bit UNIX - System V AMD x86-64) present 40 times

Now it is true that it’s possible, depending on the usage patterns, to achieve a much better allocation strategy than the default coming from GLIBC — on the other hand, I’m also pretty sure that GLIBC’s own allocation improved a lot in the past few years so I’d rather use the standard allocation than a custom one that is five or more years old. Again this could use some working around.

In the list above, Thunderbird and Firefox for sure use (and for whatever reason re-expose) jemalloc; I have no idea if libhoard in OpenFOAM is another memory management library (and whether OpenFOAM is bundling it or not), and Mercury is so messed up that I don’t want to ask myself what it’s doing there. There are though a bunch of standalone programs listed as well.

Action item: go through the standalone programs exposing the memory interfaces — some of them are likely to bundle one of the already-present memory libraries, so just make them use the system copy of it (so that improvements in the library trickle down to the program), for those that use custom strategies, consider making them optional, as I’d expect most not to be very useful to begin with.

There is another set of functions that are similar to the memory management functions, which is usually brought in by gnulib; these are convenience wrappers that do error checking over the standard functions — they are xmalloc and friends. A quick check, shows that these are exposed a bit too often:

Symbol xmemdup@ (64-bit UNIX - System V AMD x86-64) present 37 times

In this case they are exposed even by the GCC tools themselves! While this brings me again to complain that gnulib show now actually be libgnucompat and be dynamically linked, there is little we can do about these in programs — but the symbols should not creep in system libraries (mandb has the symbols in its private library which is marginally better).

Action item: check the libraries exposing the gnulib symbols, and make them expose only their proper interface, rather than every single symbol they come up with.

I suppose that this is already quite a bit of data for a single blog post — if you want a copy of the symbols’ index to start working on some of the action items I listed, just contact me and I’ll send it to you, it’s a big too big to just have it published as is.

Reduced system set and dependency issues

I first proposed reducing the size of the system set – the minimal set of packages that form a Gentoo installation – back in 2008 with the explicit idea of making the dependency tree more accurate by adding the packages to the list of dependencies of those using them, rather than forcing them all in. While it is not going as far as I’d have liked at the time, finally Gentoo is moving in that direction and, a couple of weeks ago, a bunch of build-time packages were removed from the system set, including autotools themselves.

It is interesting to note that a few people complained that this was a moot point since they are among the first packages that you got to merge anyway, as they are build-time dependency of probably half the tree, so you can’t have a Gentoo system without them.. but reality is a bit happier when you look at it. On a “normal” Gentoo system, indeed, you can’t get rid of autotools easily: by default, since build-time dependencies are preserved on the system even after build has completed, they wouldn’t even be removed — on the other hand, they wouldn’t get upgraded until a package needs to be built that use them, which is still a nice side effect. Where this matters most is on systems built with binary packages, which is what I do with my home router and on the two vservers that host this blog and xine’s bugzilla so I can make use of the higher-end hardware of Yamato.

By removing autotools from the system set and instead expecting them to be listed as dependencies (as it happens when using autotools.eclass), the binpkg-based systems can be set up without installing autotools at all, in most cases. Unfortunately; that is not always the case. The problem lies with the libltdl library, which is libtool’s wrapper to the dlopen() interface — the so-called dynamic runtime linking, which is implemented in most modern operating systems, but with various, incompatible interfaces. This library is provided by the sys-devel/libtool package, and is used at runtime by some packages such as PulseAudio, OpenSC and ImageMagick, making libtool a runtime dependency of those packages, and not simply a build-time one. Unfortunately, since libtool itself relies mostly on autoconf and automake, this also adds to the runtime dependency tree those packages, causing the system to be “dirty” again.

Luckily, it appears that libltdl is falling out of preference, and is used by a very limited set of packages nowadays. The most obnoxious to avoid is ImageMagick, especially on a server. I don’t remember whether GraphicsMagick allows you to forgo libltdl altogether when not using dynamic plug-ins, I think it should but I wouldn’t bet on it.

More obnoxious are probably the few failures caused by not depending on less-commonly-used tools such as flex and bison (or their grandparents lex and yacc). While I did some work at the time I proposed the system set reduction to identify packages that lacked a dependency over flex, new packages are added, old packages are reworked, and we probably have a number of packages that lack such dependencies. It is not only bothersome for the users, who might find a failure due to a package not being installed when it should have been; it is also very annoying for me when running the tinderbox because I can’t get the complete list of reverse-dependencies to test a new version of the package (it happened before that I needed to test the reverse-dependencies of flex, it wasn’t nice).

This begs a question: why isn’t my tinderbox catching these problems? The answer is actually already out there since the end of the same year: my tinderbox is not focusing non minimal-dependencies. That is, it runs with all the packages installed at once, which means it checks for collisions and can identify (some) automagic dependencies, but it rarely can tell if a dependency is missing in the list. Patrick used to run such a tinderbox setup, but I don’t know if he’s still doing so. It sure would be handy to see what broke with the recent changes.

Parsing configuration files

One of the remaining big issues I have with feng, one of the areas that we haven’t yet rewritten completely from since I joined, is the code to parse its configuration file. I have written about it not too long ago but I hadn’t received enough comments to move me away from the status quo yet. But since we’re now trying our best to implement whole new features that will require new configuration file options, it’s time for the legacy code to be killed.

In the previous post I linked, I was considering the use of an ISC-style configuration file derived from the syntax used by dhcpd.conf; since then I decided to switch on the very-similar bind-style syntax used by named.conf – it’s almost the same, with the difference that there are no “top-level” options, and that should make it easier to parse – and then I even tried to look into re-using the parsing code directly from bind.

Interestingly, the bind distribution contains an almost standalone configuration file parser code, with a prebuilt lexer. Unfortunately, re-using that code directly was not an option: while it was providing a clean, split interface, its backend code is very much entangled with the bind internals; including, but not limited to, their own memory allocators. Trying to reduce these dependencies would require as much work as creating something anew.

So the other night I decided I would find a way to implement the new parser… I spent the whole night, till 10am the morning after, to find possible alternatives; the obvious choice would have been using the good old classic unix tools: lex & yacc… unfortunately the tutorials that I could Google over, and the not-really-mine copy of the eponymous O’Reilly book didn’t help me at all. After deciding that half the stuff I was reading was obsoleted and old, I decided to tackle the problem in another direction.

John Levine – of Linkers & Loaders fame, a cornerstone for linker geeks like me – was one of the authors of the original lex & yacc book, the last edition of which was published in 1992; he has since updated the book by writing flex & bison and describing the modern versions of the tools. Thanks to O’Reilly’s digital distribution buying a copy of the book was both fast and cheap: with the 50% off discount over two books I got both that and Being Geek – a book I have been curious about for a while but I didn’t care just enough to buy it alone – for $20 total. Nice deal.

More importantly, not half an hour after downloading the PDF version of the book I was able to complete the basic parser… which makes it a totally worthwhile investment. Some of the problems I was facing were mostly due to legacy of the old tools, and with modern documentation at hand, it became trivial to implement what I needed. On the other hand, it turned out to be a verbose, and repetitive task, so once again I resolved to my “usual” Rube Goldberg machines by using XML and XSLT… (I really really have to find a different approach sooner or later).

The current code I’m working on has a single, short XML document that describes the various sections and entries in the configuration file, and a number of (not-too-long) stylesheets that produce the input for flex/bison, and a header file with the C structures with the options; add to this, is a separate source file with pure-C callbacks to commit the read configuration data into C structures, presetting defaults where needed, and validating the semantic of the code.

Right at this moment, feng has the new config parser set in; together with the stuff I developed for it, I also posted patches for the Autoconf Archive to improve the checks for flex and bison themselves (not simple lex/yacc), and a few more fixes there. All in all, it seems to be a decrease of many kilobytes of compiled object code size.

Autotools Mythbuster: A practical case of TMT (Too Many Tests)

I have written already about the fact that using autoscan is simply a bad idea not too long ago. On the other hand, today I found one very practical case where using (an old version of) autoscan have create a very practical problem. While this post is not going to talk about the pambase problems there is a connection to my previous posts: it is related to the cross-compilation of Linux-PAM.

It is an unfortunately known problem that Linux-PAM ebuild fails cross-compilation — and there are a number of workaround that have never been applied in the ebuilds. The reason are relatively simple: I have insisted for someone else who had the issue at hand to send them upstream. Finally somebody did, and Thorsten fixed the issues with the famous latest release — or so is theory. A quick check shows me that the latest PAM version is also not working as intended.

Looking around the build log it seems like the problem is that the configure script for Linux-PAM, using the original AC_PROG_LEX macro, is unable to identify the correct (f)lex library to link against. Again, the problem is obvious: the cross-building wrappers that we provide in the crossdev package are causing dependencies present in DEPEND but not RDEPEND to be merged just on the root file system. Portage allows us to set the behaviour to merge them instead on the cross-root; but even that is wrong since we need flex to be present both as a (CBUILD) tool, and as a (CHOST) library. I’ve asked Zac to provide a “both” solution, we’ll see how we go from there.

Unfortunately a further problem happens when you try to cross-compile flex: it fails with undefined references to the rpl_malloc symbol. You can look it up and you’ll see that it’s definitely not a problem limited to flex. I know what I’m dealing with when finding these mistakes, but I guess it doesn’t hurt to explain them a bit further.

The rpl_malloc and rpl_realloc are two replacement symbols, #define’d during the configure phase by the AC_FUNC_MALLOC autoconf macro. They are used to replace the standard functions with two custom ones that can be conditioned; the fact that they are left dangling is, as I’ll show in a moment, pretty much a signal of overtesting.

Rather than simply checking for the presence of the malloc() function (can you really expect the function to be missing on any non-embedded system at all?), the AC_FUNC_MALLOC macro (together with its sibling AC_FUNC_REALLOC) checks for the presence of a glibc-compatible malloc() function. That “glibc-compatible” note simply means a function that will not return NULL when passed a length argument of 0 (which is the behaviour found in glibc and a number of other systems). It is a corner-case condition and most of the software I know is not relying at all on this happening, but sometimes it has been useful to test for, otherwise the macro wouldn’t be there.

Of course the sheer fact that nobody implemented the compatibility replacement functions in the flex source tree should make it safe to assume that there is no need for the behaviour to be present.

Looking at the original configure code really tells more of a story:

# Checks for typedefs, structures, and compiler characteristics.


# Checks for library functions.

AC_CHECK_FUNCS([dup2 isascii memset pow regcomp setlocale strchr strtol])

The two comments are the usual ones that you’d find in a script created by autoscan; it’s also one of the few cases where you actually find a check for size_t, as most software assumes its presence anyway. More importantly, if you look at the long list of AC_CHECK_FUNCS macro call arguments, and then compare with the actual source code of the project, you can understand that the boilerplate checks are there, but their results are never checked for. That’s true for all the functions in there.

What do we make of it? Well, first of all, it shows that at least part of the autotools-based buildsystem in flex is not really well thought off (and you can add to that some quite idiotic stuff in the to express object file dependencies, which then required a patch that produced half-broken results). But it also shows that in all likeliness, the check for malloc() is there just because, and not because there is need for the malloc(0) behaviour.

A quick fix, consisting of removing useless checks, and rebuilding autotools with more modern autotools versions, and you have a perfectly cross-compilable, workaround-free flex ebuilds. Voilà!

So why am I blogging about this at all? Well, I’ll be updating the Guide later on today; in the mean time I wanted to let you know about this trickery because I sincerely have a problem with having to add tons of workaround for cross-building: it’s time to tackle them down, and quite a lot of those are related to this stupid macro usage, so if you find one, please remember this post and get to fix it properly!

Flex and (linking) conflicts, or a possible reason why PHP and Recode are so crashy

So, this past week I was off teaching a course about programming and Linux at a company I worked for for a while now. Some of the insight about what people need to know and rarely do know are helping me to decide what to focus on in the future in my blog (and not limited to).

Today, though, I want to blog about not something I explained during the course, but something that was explained to me, about Bison and Flex. It’s related to the output of my linking script:

Symbol yytext@@ (64-bit UNIX System V ABI AMD x86-64 architecture) present 7 times

yytext, together with a few other yy symbols are generated by Flex when generating the code for a lexer (for what most of my readers are concerned, these are part of a parser). These symbols are private to a single parser, and should not be exported to other parsers. I wasn’t sure about their privateness so I haven’t reported them up to now, but now I am sure: they should not be shared between two different parsers.

Both librecode and PHP export their parser’s symbols, which would create a situation where the two parsers are sharing buffers and… well, let’s just say you don’t want to share a plate between someone eating Nutella, and someone else eating Pasta, would you?

This might actually cause quite a few problems, and as hoffie said, recode is used by PHP and is often broken when used together other extensions. I can’t be sure the problems are all limited to this, but this is certainly a good point to start if we want to fix those.

Easy way out, would be to make sure that php executables don’t export symbols that extensions don’t need to use; proper way out would be adding to that also proper visibility inside recode itself, but I wonder if it’s still maintained, release 3.6 is quite old, and even patching it is a hard task as it doesn’t even recognise AMD64 by default.

New results from my elven script

My work on ruby-elf is continuing, albeit slowly as I’m currently preparing more envelopes. I’ve improved the nm.rb tool to be a bit more alike to the real nm utility from binutils, by taking properly consideration of the Absolute and Common symbols, and today while working on the conflicts finder I also fixed the symbols’ versions parser. Now the parser and the tool should be pretty much solid; too bad that by fixing the parser to actually look in the libraries recursively I ended up making it quite more slow (well, it has a lot more libraries to look up now), and it eats more than 70MB for the database alone.

Now, let me first talk about symbol versions on ELF files. This is a GNU extension, that is not obviously documented in elf(3), as that man page comes out of BSD. There’s little to no documentation available about it, the only thing that can be somewhat reliable is a page from Ulrich Drepper on his home site on RedHat, which of course is the usual hyped version of the truth, in the usual Drepper style everybody is used to. The elf.h header does not contain much information beside the data structures with a few comments, that are not good enough to actually implement anything without a lot of analysis of the data itself, and some fill ups from Drepper’s hyped page. One of the entries, commented as «Unused» in the header, and also in Drepper’s page in the structure definition, carries the most important data of the structure: the index used to retrieve the version information.

Three tables are used, plus .dynstr that carries the symbols’ names and thus also the versions’ names (they are declared as absolute symbols of value 0); as a symbol can have multiple names if it obsoleted other symbols, the records are of variable length, rather than fixed length, which makes it more difficult to parse them. The versions for defined (exported) and for undefined (imported) symbols are declared in different structure, subtly similar enough to confuse you, then the third table tells you which version every symbol refers to. As the «auxiliary» structures for both defined and undefined symbols are not part of the version definition, but pointed by through an offset, and only carry the name of a symbol, they can be shared. Now don’t ask me why there should be two different version specifications pointing at the same name (the only reason I’d see would be if defined and undefined symbols had the same version name, but also the auxiliary structures are not the same between the two, and are defined in two different sections, so those can’t be shared), the only case I found up to now is Berkley DB that uses —default-symver).

After being able to implement an ELF parser that can read symbols’ versions, without looking at glibc source code – because I don’t want to hurt my eyes looking at it, and also because if I need I can implement it again without being restricted to GPL-2 license – I have to say that I don’t like it and I hope never to rely on it in my life! Seems to me like it would have been possible to use a few more bytes but make the load process faster by using fixed-length records; there are also flags variables pretty much unused that could have been used for defined or non-defined symbols if that matters so much, but the fact that you find the version through the symbols rather than the other way around makes it pretty much pointless: you already know if a symbol is defined or not!

Anyway after this a-bit-offtopic time, let me show a bit of results coming from the script itself:

  • bug #178919, #178920, #178921 and #178923: there are three ebuilds (one being python and two being python extensions) that brings in their own copy of expat, 1.95.8, which is incompatible with expat 2.0 that is in ~arch for a while now; Compress-Raw-Zlib instead carries its own copy of zlib, which is a waste as virtually any system at a given time has a copy of zlib already in memory.
  • I found a few more common libraries in kdepim-kresources that cut down three more megabytes from the total size of the package installed on my system; note that the size includes full debug information in dwarf format (-ggdb), but no debug code; the memory reduction should be similar in a proportion, but it’s not the same amount of course; still it’s a sensible gain.
  • I also prepared a patch for kmix to install two more libraries: libkmixprivate and libkmixuiprivate, as kmix, kmixctrl and kmix_applet were sharing half of their code to begin with; through my patch they share it effectively on disk, and during build the files are built once only.
  • Samba would probably make use of a libsambainternal library of its own, as a lot of symbols seems to be shared between different tools from the Samba package. Note that internal libraries are an useful way of sharing code, as you can put there the functions used by everything, and just declare them to be private, making sure that users of the project won’t try to use it, or at least will know that the ABI can change from one release to the other.

With the more widespread look of the script I also had to extend a lot the suppressions’ file, as there is a lot of software using plugin-based interface nowadays.
One problem I’m facing now, though, is that there are a lot of drop-in replacement libraries in a common system, for instance libncurses and libncursesw, OpenSSL and GNU-TLS’s replacement, and so on. I need to write a suppression file for those too, so that the symbols common between those libraries are not reported as actual collisions, but skipped over.

One thing I’m not sure about are the yy* symbols that come out of flex/yacc/whatever it is. There are a lot of libraries that export them, and I’m not sure if it’s correct for them to, as the different flex versions might be incompatible. Unfortunately I never used flex in my life so I can’t tell.

If somebody knows how those functions are supposed to work with shared libraries, I’ll be very grateful to know; and if they are a problem, I’ll see to report and fix all of them.