Using dwarves for binaries’ data mining

I’ve wrote a lot about my linking collisions script that also shown the presence of internal copies of libraries in binaries. It might not be understood that this is just a side-effect, and that the primary scope of my script is not to find the included libraries, but rather to find possible collisions between two software with similar symbols and no link between them. This is what I found in Ghostscript bug #689698 and poppler bug #14451 . Those are really bad things to happen and that was my first reason for writing the script.

One reason why this script cannot be used with discovery of internal copies of libraries as main function is that it will not find internal copies of libraries if they have hidden visibility, which is a prerequisite for properly importing an internal copy of whatever library (skipping over the fact that is not a good idea to do such an import).

To find internal copies of libraries, the correct thing to do is to build all packages with almost full debug information (so -ggdb), and use the dwarf data in them to find the definition of functions. These definitions won’t disappear with hidden visibility so they can be relied upon.

Unfortunately parsing DWARF data is a very complex matter, and I doubt I’ll ever add DWARF parsing support to ruby-elf, not unless I can find someone else to work with me on it. But there is already a toolset that you can use for this: dwarves (dev-util/dwarves). I haven’t written an harvesting and analysis tool yet, and I’m just wasting a lot of CPU cycles to scan all the ELF files for single functions, at the moment, but I’ll soon write something for that.

The pfunct tool in dwarves allows you to find a particular function in a binary file, I ran pfunct over all the ELF files in my system, looking for two functions up to now: adler32 and png_free. The first is a common function from zlib, the latter is, obviously, a common function from libpng. Interestingly enough, I found two more packages that use an internal copy of zlib (one of which is included in an internal copy of libpng): rsync and doxygen.

It’s interesting to see how a base system package like rsync is suffering from this problem. It means that it’s not just uncommon libraries to be bundled by remotely used programs, but also widely known and accepted software to include omnipresent libraries like zlib.

I’m now looking for internal copies of popt, which I’ve also seen imported more than a couple of time by software (cough distcc cough), and is a dependency of system packages already. The problem is that dwarf parsing is slow and takes time for pfunct to scan all the system. That’s why I should use another harvest and an analyse script.

Oh well, more analysis for the future :) And eliasp, when I’ve got this script done, then I’ll likely accept your offer ;)

3 thoughts on “Using dwarves for binaries’ data mining

  1. You know this probably better than I do, but isn’t the use of the “static” keyword in front of global variables or functions one of the ways to hide variables/functions and prevent these collisions??I’ve copied the paragraph below from: http://www.cprogramming.com… and the K&R book states something similar including that it is used to hide functions.The last use of static is as a global variable inside a file of code. In this case, the use of static indicates that source code in other files that are part of the project cannot access the variable. Only code inside the single file can see the variable. (It’s scope — or visibility — is limited to the file.) This technique can be used to simulate object oriented code because it limits visibility of a variable and thus helps avoid naming conflicts. This use of static is a holdover from C.This might be an easier and better way to get these possible collisions fixed upstream as it doesn’t involve renaming functions/variables although I must admit that still the problem is not clear to me.

    Like

  2. Static works fine as long as the function is local to the translation unit. The problem is when the function (or variable, constant) is not local to the unit, but is local to the shared object. For those you use extern with hidden visibility.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s