Tonight feels like a night after a very long day. But it was just half a day spent on trying to find the end of a bug saga that started about a month ago for me.
It starts like this:
postgresql-server started failing; the link editor – BFD-based
ld – reported that one of the static archives installed by
postgresql-base didn’t have a proper index, which should have been generated by
ranlib. But simply executing
ranlib on said file didn’t solve the problem.
I originally blamed the build system of PostgreSQL, but when yesterday I launched an
emerge -e world to rebuild everything with GCC 4.6, another package failed in the same way:
lvm2, linking to
/usr/lib64/libudev.a — since I know the udev build system very well, almost like I wrote it myself, I trusted that the archive was built correctly, so it was time to look at what the real problem was.
After poking around a bit, I found that binutils’s
objdump and at that point even
ld refused to display information for some relocated objects (
ET_REL files). This would have made it very difficult to debug the issue if not for two things: first,
eu-nm could see the file just fine, and second, my own home-cooked
nm.rb tool that I wrote to test Ruby-Elf reported issues with the file — but without exploding.
flame@yamato mytmpfs % nm dlopen.o
nm: dlopen.o: Bad value
flame@yamato mytmpfs % eu-nm -B dlopen.o
0000000000000000 n .LC0
00000000000001a0 T dlopen_LTX_get_vtable
0000000000000000 t vl_exit
00000000000000a0 t vm_close
0000000000000100 t vm_open
0000000000000040 t vm_sym
0000000000000000 d vtable
0000000000000000 n wt.1af52e75450527ed
0000000000000000 n wt.2e36542536402b38
0000000000000000 n wt.32ec40f73319dfa8
0000000000000000 n wt.442ae951f162d46e
0000000000000000 n wt.90e079bbb773abcb
0000000000000000 n wt.ac43b6ac10ce5688
I don’t have the original output from my tool since I have since fixed it, but the issues were related, as you can guess from that output, to the various
wt. symbols at the end of the list. Where do they come from? What does the ‘n’ symbol they are coded with mean? And why is BFD failing to deal with them? I set out to find those answers with, well, more than a hunch of what the problem would turn out to be.
So what are those symbols? Google doesn’t help at all here since searching for “wt”, even enclosed in double quotes, turns up only results for “weight”. Yes, I know it is a way to shorten that word, but what the heck, I’m looking for a certain string! The answer, actually is simple: they are additional debug symbols that are added by
-gdwarf-4, which is used to include the latest DWARF format revision. This was implemented in GCC 4.6 and is supposed to reduce the size of the debug information, which is generally positive, and include more debug information.
Turns out that libbfd (the library that implements all the low level access for
ld and the other utilities) doesn’t like those symbols, not sure if it’s the sections they are defined on, their type (which is set to
STT_NONE), or what else, but it doesn’t like them at all. Interestingly enough, this does not happen with final executables and dynamic libraries, which makes it at least bearable: only less then 40 packages had to be rebuilt on my system because they had broken static objects; unfortunately one of those was LibreOffice, d’oh!
Now, let’s look back at the
nm issue though: when I started writing Ruby-Elf, I decided not to reimplement the whole suite of ELF tools, since there are already quite a few implementations of those out there. But I did write a
nm tool to debug my own issues — it also worked quite nicely, because implementing access to the codes used by
nm allowed me to use the same output in my
elfgrep tool to show the results. This implementation, that was actually never ported to the tools framework I wrote for Ruby-Elf, didn’t get installed, and it was just part of the repository for my own good.
But after noticing that I’m more resilient than binutils’s version, and it produced more interesting output than elfutils’s, I decided to rework it and make it available as
rbelf-nm, writing a man page, and listing the codes for the various symbol kinds. But before all this, I also rewrote the function of code choice. Before, it relied on binding types and then on section names to produce the right code; now it relies on the symbol type, binding types, and sections’ flags and type, making it as resilient as elfutils’s, and as informative as binutils’s, up to what I encountered right now.
I also released a new version (188.8.131.52, don’t ask!) that includes the new tool, and it is already on RubyGems and Portage if you wish to use it.
Please remember that the project has a Flattr page so if you like the project, your support is definitely welcome.