Library SONAME bumps and .la files: some visual clues

Before going on with the post, I’ll give users who’re confused by the post’s title some pointers on how to decipher it: I discussed .la files extensively before, and you can find a description of SONAMEs in another post of mine.

Long- and medium-time Gentoo users most likely remember what happened last time libpng was bumped last year, and will probably worry now that I’m telling them that libpng 1.5 is almost ready to be unmasked (I’m building the reverse dependencies in the tinderbox as we speak to see what breaks). Since I’ve seen through it with the tinderbox, I’m already going to tell you that it’s going to hurt, as a revdep-rebuild call will ask you to rebuild oh-so-many packages due to .la files that, myself, I’ll probably take the chance to move to the hardened compiler and run an emerge -e world just for the kicks.

But why is it this bad? Well, mostly it is the “viral propagation” of dependencies in .la files, which by itself is the reason why .la files are so bad. Since libgtk links to libcairo, and libcairo to libpng, any other library linking with libgtk will be provided with a -lpng entry to link to libpng, no matter whether it uses it or not. Unfortunately, --as-needed does not apply to libtool archives, so they end up overlinking, and only the link editor can drop the unused libraries.

For the sake of example, Evolution does not use libpng directly (the graphic files are managed through GTK’s pixbuf interface), but all of its plugins’ .la files will refer to libpng, which in turn means that revdep-rebuild will pick it up to rebuild it. D’oh!

So what about the visual clue? Well, I’ve decided to use the data from the gold based tinderbox to provide a graph of how many ELF objects actually link to the most common libraries, and how many libtool archives reference them. The data wasn’t easy to extract, mostly because at a first glance, the .la files seemed to be dwarfed by the actually linked objects.. until I remembered that ELF executable can’t have a corresponding .la file.

Library linking histogram

I’m sorry of some browsers might fail to display the image properly; please upgrade to a decent, modern browser as it’s a simple SVG file. The gnuplot script and the raw data file are also available if you wish to look at them.

The graph corroborates what I’ve been saying before, that the bump of libraries such as libexpat and libpng only is a problem because of overlinking and .la files. Indeed you can see that there are about 500 .la files listing either of the two libraries, when there are fewer than a hundred shared objects referencing them. And for zlib it’s even worse: while there are definitely more shared objects using it (348), there are four times as many .la files listing it as one of the dependencies, for no good reason at all.

A different story applies to GLib and GTK+ themselves: the number of shared objects using them is higher than the number of .la files that list them among their dependencies. I guess the reason here is that a number of their users are built with non-libtool-based build systems, and another good amount of .la files are removed by the less lazy Gentoo packagers (XFCE should be entirely .la free nowadays, and yes, it links to GTK+).

Now it is true that the amount of .la files and ELF files is not proportional to the number of packages installing them (for instance Evolution installs 24 .la files and 69 ELF objects), so you can’t really say much about the number of packages you’d have to rebuild when one of the three “virulent” libraries (libpng, libexpat, libz) is installed, but it should still be clear that marking five hundreds files as broken simply because they list a library that is gone, without their respective binary actually having anything to do with said library, is not the best approach we can have.

Dropping the .la file for libcairo (which is where libgtk picks it up) should probably make it much more resilient to the libpng bumps, which have proven to be the nastiest ones. I hope somebody will step up to do so, sooner or later.

Charting and Graphing

You might remember my last post about the Gentoo Portage size which contained some pie charts showing the proportion of space used by the various pieces that make up Portage itself.

On a related note, yes I know that pie charts are often the wrong tool; I still think that for the use case of that post, the pie chart was the best option because I only had to give proportions and not a comparison; as you’ll see though, I’m going to use it sparingly.

Finding a decent way to plot the charts has always been a problem for me, since I would have wanted more than once to give some graphical representation of improvements and benchmarks, but the tools available all have their own set of quirks, which means that I have to fight with them a lot more than I can afford myself to:

  • gnuplot (which, by the way, does not make pie charts at all), is tremendously complex, to the point that even the quite good book about it does not help to easily make use of it, if you’re not already an expert in statistical analysis;
  • R, if anything, is even worse, to the point I don’t really want to discuss it!
  • using gruff should allow to draw all the needed chart, given you can easily represent the values in Ruby; unfortunately it doesn’t really work extremely well, and more than once, both for pie charts and bar charts, I found the colours not to properly cover one the other, with quite shitty results;
  • using Google Docs, with the spreadsheet component, looked almost good, if it wasn’t for the fact that lots of people have had trouble loading the charts in my previous post; while the Google application is definitely well-designed, especially for what concerns user interface and basic functionalities (just as an example, the ability to move the graph to its own dedicated sheet, which I remember being available on Microsoft Excel 97, is not available in OpenOffice, while it is present in Google’s spreadsheet), it also lacks some more “public” features: there is no way to ask for the graphs of a given size when exporting (for instance for thumbnails), and at the same time, the auto-generated text in the public, exported chart seems to always be in the locale the generating interface was set in… guess what? I have it in Italian;
  • I ended up reconsidering OpenOffice; it worked great with flowcharts so I wanted to see the good “old” suite at work to do something that it should probably be designed for.

Now, since I’m not sure whether I’ll post this before or after the results (I’m writing it before the results’ post, but that’s not to say much to be honest, since I have a queue of posts already written, as usual), I cannot really say much about the results themselves, but my area of analysis this time has been the distribution of sizes and overhead with different block sizes (as suggested by Robin). I used Ruby to gather the data, and I’ve copied it into a large sheet into Calc (reaching column AL) — incidentally, the amount of data to handle is the reason why I didn’t go with Google Docs this time: with Firefox is definitely too slow to work with it; probably it’s designed to be faster with Chrome. Then it was time to condition the data…

OpenOffice definitely have some usability issues in that matter! First of all, when selecting the range of data to plot, there is no easy way to select non-contiguous columns, since once you release the mouse button the interface returns to the chart wizard. The trick is to choose the columns manually, using the form A1:A8,C1:C8 and so on so forth. I used again Ruby to generate the list of columns for me or it was definitely a mess… I gave up when I had to re-do the graph for the third time because I didn’t select some stuff, so I just used another sheet to copy the information I needed, and then filter out what I wouldn’t be needing.

As I noted above, there is no way, that I could find or Google, to create a sheet that only holds a chart. I’m pretty sure that Microsoft Excel 97 had a feature like that… and I’m definitely certain it has it in version 2007 (because I have a fully licensed Office 2007 here). Google Docs, as I said, has it as well. The reason why I’m upset that it lacks that feature, is because it would have made it quite a lot easier to export the charts for publication; instead the only way I found was to copy the chart, and then paste it in a Draw document: at that point, while the chart was still tweakable to reorder columns and stuff like that, I had to re-tweak it every time I noticed a flaw in the data, since it was disconnected from its original data source.

Another area that OpenOffice definitely got to improve is the handling of colours: everywhere you select colours you’re not allowed to freely select one, you have to add it to the OOo palette first… which in turn requires a restart of OOo itself since sometimes it fails to pick it up in all the instances. This might not be such a huge deal when seen by most users who just need “a” colour, but it really is upsetting when you know exactly which colour you want. And indeed it reduces the usability of Draw: for a word processor or a spreadsheet, precise colours might not be that important, but for software like Draw (or Impress), the ability of choosing an arbitrary colour without having to jump through a long series of hoops is definitely important!

This is definitely something that sometimes upsets me: OpenOffice has almost all the cards ready to be a perfect poker of productivity software, but there are is a number, toward infinity, of details that need to be fixed up (just to add another quickly: the fact that the packages, ebuilds included, don’t install the templates by default, and you got to install them from the Sun extensions site, which by the way installs them in your home directory and I don’t really like that). I really hope that this is going to get fixed in the future, but counting in the go-oo split there is really a lot of mess around OpenOffice, like a lot of other huge projects (OpenJDK/IcedTea, Mozilla/IceCat/IceWeasel, …). Why Free Software developers can’t really get along together for more than their own itches?

Graphing

If you remember an old post of mine I was looking for a way to graph some statistics about bindings and other issues related to that. Even though Philipp Janert did send me a PDF copy of his book, I haven’t been able to proceed on that because of my health issues which did set back a lot of stuff in my TODO list (that was long enough at the time).

The other day I wanted to graph something else, in particular I wanted to graph the EAPI usage in tree; as a start, a simple piechart to show the proportion between the three currently-used EAPI was what I had in mind. How difficult could it be to draw a piechart with free tools, I thought. Well a bit more than I expected before starting.

For starting, gnuplot does not produce piecharts; and at the same time, the piechart tool they suggest is a bit old, rusty and bitrotting. But it does not seem to have so much drawing logic in it at first glance, so I decided to see if I could reimplement it in Ruby. My choice of target was first SVG, but since it’s not so easy to do it in SVG (even though Scribus has some doc about the calculations that are needed to find the right coordinates for the angles), I ended up doing something much easier and as much interesting: I used Cairo.

Cairo is a very nice library that somehow reminds me of LOGO, since it also accepts movement and draw commands. I used the Ruby bindings, but I could have as well used the C bindings to do the whole thing, but since I don’t need the speed, Ruby will work just fine for me. The basic results are acceptable, even though writing text with Pango and Cairo is slightly a mess. I guess it’d be easier to implement some less-primitive graphing support over Cairo, but that’s beside our point here (hey does anybody know if anybody tried to write a LOGO interpreter using Cairo for display? would make a nice example usage of it for average-complexity matters, and I would like to have something like that for my nephew to start playing with a computer).

After doing a very rough script that only interpreted whatever I needed, I thought that the most easy way to handle this would have been to write a library that could create the pie chart for me, starting from the data, using just Ruby and Cairo. But today while looking at something entirely different, I found gruff that seems to be doing almost what I need. It uses RMagick (and thus ImageMagick) instead of Cairo for the primitives though, and it doesn’t support SVG output yet (which I use to make the thumbnail and would love to have as source for the images).

This brings me to a quite interesting thought for the next few days to relax myself by shifting almost everything else down in the TODO line: porting gruff so that instead of using RMagick it would use Cairo. A bit of work but the primitives are more or less the same (maybe I could just write an RMagick-workalike using Cairo) and it would have the nice advantage of having SVG and PDF support out of the box.

My main reason to be willing to write this is that it would make my bindings’ grapher script much easier, since then I’d just have to graph something like this:

{ "First try" => { "Module 1" => [ 12, 144 ], "Module 2" => [ 13, 12445 ] },
  "Second try" => { { "Module 1" => [ 0, 12 ], "Module 2" => [ 0, 134 ] }

in a viewable graph to show the differences.

And similarly, it would help me show other ELF-related statistics, so that what I write on the blog wouldn’t be just numbers any longer, and could be noticed by people. If just there was an easy way to interface Ruby and Python, it would be nice to have access to Portage’s information on my Ruby scripts..

Oh well, a new project to start, maybe.

Dear lazyweb, I need a gnuplot expert

And I end up asking again for help to whoever is around, this time I’m looking for a gnuplot expert :)

As solar suggested, I wanted to prepare a few graphs to show the changes with visibility, with fixing COW pages, and so on. While performance analysis through benchmark would probably be a good idea too, I wanted to start with something easier, maybe less interesting, but that would help me tackling down the issues to get better visual impact. This way the more important stuff will not just look like crap and be dismissed ;)

For now I’ve modified my parser for LD_DEBUG=bindings output to generate data that could be represented visually by gnuplot. Or at least I hope I’ll be able to make it be represented visually by gnuplot.

I’d like to have some clustered and row-stacked histograms; it would divide all the objects in a particular program (shared objects) in clusters of histograms, each containing N histograms for N runs to compare (no visibility, hidden/default visibility, hidden/protected visibility), then each histogram would have its height split in three (outgoing bindings, incoming bindings, self-fulfilled bindings), to show the changes in those.

Unfortunately, what I have up to now shows the graphics just fine, but the labels for the various objects are unreadable. If you want to get the package of the data and script here.

If somebody can help me to get these data graphed in a decent way… that would be very helpful :)