Users: why you should read my blog

I’ve been noticing a drop in feedback since I started writing about the tinderbox and the work going on there. I can, from some points of view at least, understand it, since it really does not say much that is useful for end users to know. Indeed it’s often technicalities, or ebuild guidelines, or QA information, all stuff that the average user of any kind of software likely don’t care about.

So why should Gentoo users read my blog? Why should developers for what matters? Well I’m not sure I can properly answer these question, after all even I don’t know why I do it so how can I explain, to someone else, why what I do is important?

Well, I don’t know why I do what I do, but I know what I do. Okay this is a very intricate phrase, but the bottom line is that I know that the topics I write about are important and should really be properly considered. When I speak about the problems I face with the tinderbox, I speak about problems with real ebuilds, with ebuilds that, one way or another, got into the tree. By talking about the problems with them, I’m trying to explain what the new ebuilds should look like not to rot too easily.

Especially with the widespread usage of sunrise, and the lowered bar for ebuilds to be picked up from there, powerusers wanting to write their own ebuilds should try to learn from the mistakes of the developers that came before: live ebuilds for static revisions, gems for Ruby packages, and stuff like that are all things that should be avoided like the plague by the new ebuilds. Unfortunately it doesn’t seem to happen.

There are new ebuilds added with bundled libs, abused blockers, broken autotools, and so on so forth. While I know that some of what I wrote about should just be added to the official documentation, I’m afraid I don’t have the time. My blog, that I used to use as a reference documentation, starts to be too big, too chunky, too complex to maintain for that use; that I know, and that’s why I started working on my autotools guide (and I’m tempted to transpose For A Parallel World as an Appendix of that guide). But integrating the QA notes into the official documentation, that’s a task I cannot embark in right now.

So please, even though I know my titles don’t look tremendously appealing in general, try to give them a read, I’ll try my best to make them more exciting… maybe somebody can suggest me an ebuild to review or some package that needs --as-needed or parallel make fixes to write a case study about.

Buying sleep (by the minute)

One of the worst thing that can happen in somebody’s life is when your dreams are scaring you out of your own sleep. As it turns out I’m in one of those situations. A nice period of my life ended just before Christmas, and now I’m in a bit of a pinch, with a late job, and no future (stable) job in view. I’m also out of luck with publishers since the last article I submitted to LWN was not even worth a reply, it seems.

I should be at least well happy about my health, one would expect, given that I am feeling better after the surgery and I just need to visit the hospital for some check-ups now. But even that is out of schedule, since I was supposed to be in for January, and it’s middle February now. The professor I had to reach is unreachable, so I had to pass through another doctor in the staff (whom I’m very grateful to for my previous staying too!).

But as it is said in Italy “one Pope dead, a new one is made”; I admit I’m not sure what the English equivalent would be but I’d expect it to refer to kings.

I’m currently feeling in quite a bluish mood but it’s going to be just fine as soon as I get some good nights’ sleep; relaxed sleep. The problem as I said is that my own dreams, or rather the content and the characters of the dreams I’m having lately, chase me out of bed. Even though I cannot remember the dreams by themselves, the general mood follows me when I wake up and, even though they should be pleasant dreams, they upset me very much.

Luckily I learnt to fight dreams, and nightmares, since I went to the hospital. My way of keeping them away from my mind is to listen to something that turns my attention to something much different just before sleeping. Podcasts have helped a lot about that, but sometimes I need more, longer content I haven’t listened to before. This is especially true when, like right now, Bill Maher is not on HBO so I cannot listen to new Real Time’s podcast episodes. For these times I corrupt a bit of my soul and buy audiobooks from the iTunes Store, yes with the freedom-hungry DRM on.

I was thus quite pleased when an anonymous sent me The Hitchhiker’s Guide to the Galaxy CDs from BBC Radio (and I have to say I envy British people for BBC Radio 4, News Quiz is one of my favourite shows). Even though it also sprouted for me a technical problem: how to convert the CDs in a format that makes use of 100% iPod’s features using just Free Software? I’m afraid I’m unable to answer that question just yet but I hope to be able to soon. Also thanks to the (for now unknown since it hasn’t arrived yet) person who sent me “I’m Sorry I Haven’t a Clue” CDs. I’m not sure what it is but I find the British humour refreshing. Yes I know this is neither normal nor sane …

The problem is that, the way this is going I’m unable to rest, even when I sleep, and thus I cannot work for more than a few hours on Free Software without my head starting to ache. And it’s difficult to sleep in the first place. While I would like to try cutting down on coffee, it turns out that I’m quite addicted to caffeine to the point that twice already in the past three weeks, when I tried to stay a day without getting one I would get a migraine so powerful I would be unable to crawl out of bed.

Anyway so that you know, even if I haven’t blogged about it in a while, nor I have opened new bugs, the tinderbox (or tinderflame to make it distinct from Patrick’s) is still working and crunching data. The new disks do help, since there was one (I’m afraid I know which one, I’ll write about it specifically in the future) that would make the system go stuck on pdflush, which as you might guess is not the nicest of the things. Now it seems to be working better.

Anyway, if you wish I made a special list to see if I can solve my sleep deprivation (although I’m waiting already a few things I ordered myself, so I should be set for a while), but even more importantly, there are two thing I’m going to ask users and developers reading me alike.

If you’re an user, try to raise concern with upstream projects about problems like proper --as-needed usage, parallel build and similar, I know my blog isn’t exactly the nicest place to look up information from but it should have enough to go around with issues like that. Any upstream package that fixes parallel make, --as-needed or autotools by itself is one less package I’ll have to look at when I decide to push forward my agenda of having proper packages around.

If instead you’re a developer, please help me by at least reviewing what I write, correcting me if needed, and especially submitting patches to my projects if you see they are wrong or incomplete. Having people collaborate on my projects is one thing I always miss.

Have you seen some gold?

Since I have in my TODO list to work on two binutils problems (the warning on softer —as-needed and the fix for PulseAudio build), I also started wondering why I haven’t heard, or rather read, anything about the gold linker .

Saying that I’m disappointed does not really cover much of it to be honest, since I don’t really wish to switch to a linker written in C++ any time soon. But I really hoped that it would generate enough momentum to find a solution. Because, yes, the ld linker that ships with binutils is tremendously slow to link C++ code, and as Linkers & Loaders let me understand now, the problem is not just the length of the (mangled) symbol names, but also the way that templates are expanded and linked together.

But still, I think it’s really worth investigating some alternative, which in my opinion needs not to be written in C++, with all the problems related to that. Saying that the gold linker is fast just because of the language it is written is absolutely naïve, since the problems lie quite deeper than that.

The main problem is that the current ld implementation is based, like the rest of the binutils tools, upon libbfd, an abstraction that allows to support multiple binary formats, not just ELF. It basically allows to use mostly the same interface on different operating systems with different executable formats: ELF under Linux, BSD and Solaris, Mach-O under Mac OS X and PE under Windows and more. While this allows to get a much more powerful ld command, it’s actually a bit of a bottleneck.

Even though the thing is designed well enough for not crumble easily, it is probably a good area to investigate to find why it’s so slow. Having an alternative, ELF-only linker available for users, Gentoo users especially, would likely be a good test. This would follow the same thing that Apple does on OSX (GCC calls Apple’s linker) as well as Sun under Solaris with their copy of GCC.

While I’m all for generic code, sometimes you need to have specialised tools if you want to access advanced features of files, or if you want to have a fast, optimised software.

The same thing can be said for the analysis tool provided by binutils, as I’ve written in my post about elfutils the nm, readelf and objdump tools as provided by binutils, to be generic, lack some of the useful defaults and different interface that elfutils have. Which goes to show why specialised tools here could help. I know that FreeBSD was working on providing replacement for these tools, under the BSD license as their usual. While that’s certainly an important step, I don’t remember reading anything about a new linker.

As it is, I haven’t gone out of my way to see if there are already some alternative linkers that work under Linux, beside the one provided by Sun’s compiler in Sun Studio Express (which has lots of problems on its own). If there is already one we should look at how it stands for what concerns features.

What we desire from a specialised linker, beside speed, is proper support for .gnu.hash section, --as-needed-like features, no text relocation emitted in the code (which is a problem gold used to have at least), and possibly a better support to garbage collection of unused sections that could allow using it in production code without huge impact on performance as it seems to happen with -fdata-sections and -ffunction-sections.

I’m not going to work on this, but if somebody is interested in my opinion about using, in Gentoo, any linker in particular I’d be glad to look at them, not going to spare words though, so that you know.

RDEPEND safety

I’m hoping this post is going to be useful for all the devs and devs to be that want to be sure their ebuilds have proper runtime dependencies. It has sprouted by the fact it seems at least a few developers were oblivious of the implications of what I’m going to describe (which I described briefly on gentoo-core a few days ago, without any response).

First of all, I have to put my hands forwards and say that I’m going to focus on just the binary ELF packages, and this is far from a complete check for proper runtime dependencies. Scripting code is much more difficult to check, while Java is at least somewhat simpler thanks to the Java team’s script.

So you got a simple software that installs ELF executable fils or shared libraries, and you want to make sure all the needed dependencies are listed. The most common mistake there is to check the link chain with ldd (which is just a special way to invoke the loader, dumping out the called libraries). This would most likely show you a huge amount of false positives:

yamato ~ # ldd /usr/bin/mplayer
    linux-gate.so.1 =>  (0xf7f8d000)
    libXext.so.6 => /usr/lib/libXext.so.6 (0xf7eec000)
    libX11.so.6 => /usr/lib/libX11.so.6 (0xf7dfd000)
    libpthread.so.0 => /lib/libpthread.so.0 (0xf7de5000)
    libXss.so.1 => /usr/lib/libXss.so.1 (0xf7de1000)
    libXv.so.1 => /usr/lib/libXv.so.1 (0xf7ddb000)
    libXxf86vm.so.1 => /usr/lib/libXxf86vm.so.1 (0xf7dd4000)
    libvga.so.1 => /usr/lib/libvga.so.1 (0xf7d52000)
    libfaac.so.0 => /usr/lib/libfaac.so.0 (0xf7d40000)
    libx264.so.65 => /usr/lib/libx264.so.65 (0xf7cae000)
    libmp3lame.so.0 => /usr/lib/libmp3lame.so.0 (0xf7c37000)
    libncurses.so.5 => /lib/libncurses.so.5 (0xf7bf3000)
    libpng12.so.0 => /usr/lib/libpng12.so.0 (0xf7bcd000)
    libz.so.1 => /lib/libz.so.1 (0xf7bb9000)
    libmng.so.1 => /usr/lib/libmng.so.1 (0xf7b52000)
    libasound.so.2 => /usr/lib/libasound.so.2 (0xf7a9a000)
    libfreetype.so.6 => /usr/lib/libfreetype.so.6 (0xf7a13000)
    libfontconfig.so.1 => /usr/lib/libfontconfig.so.1 (0xf79e6000)
    libmad.so.0 => /usr/lib/libmad.so.0 (0xf79cd000)
    libtheora.so.0 => /usr/lib/libtheora.so.0 (0xf799b000)
    libm.so.6 => /lib/libm.so.6 (0xf7975000)
    libc.so.6 => /lib/libc.so.6 (0xf7832000)
    libxcb-xlib.so.0 => /usr/lib/libxcb-xlib.so.0 (0xf782f000)
    libxcb.so.1 => /usr/lib/libxcb.so.1 (0xf7815000)
    libdl.so.2 => /lib/libdl.so.2 (0xf7810000)
    /lib/ld-linux.so.2 (0xf7f71000)
    libjpeg.so.62 => /usr/lib/libjpeg.so.62 (0xf77ef000)
    librt.so.1 => /lib/librt.so.1 (0xf77e6000)
    libexpat.so.1 => /usr/lib/libexpat.so.1 (0xf77bf000)
    libogg.so.0 => /usr/lib/libogg.so.0 (0xf77b9000)
    libXau.so.6 => /usr/lib/libXau.so.6 (0xf77b4000)
    libXdmcp.so.6 => /usr/lib/libXdmcp.so.6 (0xf77ae000)

In this output, for instance, you can see listed the XCB libraries, and Expat, so you could assume that MPlayer depends on those. On the other hand, it really doesn’t, and they are just indirect dependencies, that the loader will have to load anyway. To avoid being fooled by that the solution would be to check the file itself for the DT_NEEDED entries in the .dynamic section of the ELF file. This can be achieved by checking the output of readelf -d or much more quickly by using scanelf -n:

yamato ~ # scanelf -n /usr/bin/mplayer
 TYPE   NEEDED FILE 
ET_EXEC libXext.so.6,libX11.so.6,libpthread.so.0,libXss.so.1,libXv.so.1,libXxf86vm.so.1,libvga.so.1,libfaac.so.0,libx264.so.65,libmp3lame.so.0,libncurses.so.5,libpng12.so.0,libz.so.1,libmng.so.1,libasound.so.2,libfreetype.so.6,libfontconfig.so.1,libmad.so.0,libtheora.so.0,libm.so.6,libc.so.6 /usr/bin/mplayer 

As you can see here MPlayer does not use either of those libraries, which means that they should not be in MPlayer’s RDEPEND. There is, though, another common mistake here. If you don’t use --as-needed (especially not forcing it), you’re going to get indirect and misguided dependencies . So you can only trust DT_NEEDED when the system has been built with --as-needed from the start. This is not always the case and thus you can get polluted dependencies. And thanks to the fact that now the linker silently ignores --as-needed on broken libraries this is likely to create a bit of stir.

One of the entries in my ever so long TODO list (explicit requests for tasks during donation helps, just so you know) is to write a ruby-elf based script that can check the dependencies without requiring the whole system to be built with --as-needed. It would probably be a lot like the script that Serkan pointed me at for Java, but for ELF files.

After you got the required dependencies are seen by the loader right, though, your task is not complete yet. A program has more dependencies that it might appear to have, since it might require data files to be opened, like icon themes and similar, but also more important dependencies in form of other programs or libraries. And that is not always too obvious. While you can check if the software is using the dlopen() interface to load dynamically further libraries, again using scanelf, that is not going to tell you much and you have to check the source code. Also the program can call another through way of the exec family of functions, or through system(). And even if your program does not call any of these functions you cannot be sure that you got the complete dependencies right without opening it

This is because libraries adds indirection to these things too. The gmodule interface in glib allows for dynamically loading plugins, and can actually load plugins you don’ t see and check, and Qt (used to) provide a QProcess class that allows to execute other software.

All in all, even for non-scripting programs, you really need to pay attention to the sources to be safe that you got your dependencies right and you should never ever rely purely on the output of a script. Which is another reason why I think that most work in Gentoo cannot be fully automated, not just yet at least. At any rate, I’m hoping to provide developers with an usable script one day soonish, at least it’ll be a step closer than it is now.

The battles and the losses

In the past years I picked up more than a couple of “battles” to improve Free Software quality all over. Some of these were controversial, like `–as-needed and some of them have been just lost causes (like trying to get rid of C++ strict requirements on server systems). All of those though, were fought with the hope of improving the situation all over, and sometimes the few accomplishments were quite a satisfaction by themselves.

I always thought that my battle for --as-needed support was going to be controversial because it does make a lot of software require fixes, but strangely enough, this has been reduced a lot. Most of the newly released software works out of the box with --as-needed, although there are some interesting exceptions, like GhostScript and libvirt. On the positive exceptions, there is for instance Luis R. Rodriguez, who made a new release of crda just to apply an --as-needed fix with a failure that was introduced in the previous release. It’s very refreshing to see that nowadays maintainers of core packages like these are concerned with these issues. I’m sure that when I’ve started working on --as-needed nobody would have made a new point release just to address such an issue.

This makes it much more likely for me to work on adding the warning to the new --as-needed and even more needed for me to find why ld fails to link PulseAudio libraries even though I’d have expected him to.

Another class of changes that I’ve been working on that have shown more interest around than I would have expected is my work on cowstats which, for the sake of self-interest, formed most of the changes in the ALSA 1.0.19 release for what concerns the userland part of the packages (see my previous post on the matter).

On this case, I wish first to thank _notadev_ for sending me Linkers and Loaders, that is going to help me improve Ruby-Elf more and more; thanks! And since I’m speaking of Ruby-Elf, I finally decided its fate: it’ll stay. My reasoning is that first of all I was finally able to get it to work with both Ruby 1.8 and 1.9 adding a single thin wrapper (that is going to be moved to Ruby Bombe once I actually finish that), and most importantly, the code is there, I don’t want to start from scratch, there is no point in that, and I think that both Ruby 1.9 and JRuby can improve from each other (the first losing the Global Interpreter Lock and the other one trying to speed up its starting time). And I could even decide to find time to write a C-based extension, as part of Ruby-Bombe, that takes care of byteswapping memory, maybe even using OpenMP.

Also, Ruby-Elf have been serving its time a lot with the collision detection script which is hard to move to something different since it really is a thin wrapper around PostgreSQL queries, and I don’t really like to deal with SQL in C. Speaking about the collision detection script, I stand by my conclusion that software sucks (but proprietary software stinks too).

Unfortunately while there are good signs to the issue of bundled libraries, like Lennart’s concerns with the internal copies of libltdl in both PulseAudio (now fixed) and libcanberra (also staged for removal) the whole issue is not solved yet, there are still packages in the tree with a huge amount of bundled libraries, like Avidemux and Ardour, and more scream to enter (and thankfully they don’t always do). -If you’d like to see the current list of collisions, I’ve uploaded the LZMA-compressed output of my script.- If you want you can clone Ruby-Elf and send me patches to extend the suppression files, to remove further noise from the file.

At any rate I’m going to continue my tinderboxing efforts, while waiting for the new disks, and work on my log analyser again. The problem with that is I really am slow at writing Python code, so I guess it would be much easier if I were to reimplement the few extra functions that I’m using out of Portage’s interface in Ruby and use those, or find a way to interface with Portage’s Python interface from Ruby. This is probably a good enough reason for me to stick with Ruby, sure Python can be faster, sure I can get better multithreading with C and Vala, but it takes me much less time to write these things with Ruby than it would take me in any of the other languages. I guess it’s a problem with the mindset.

And on the other hand, if I have problems with Ruby I should probably just find time to improve the implementation; JRuby is enough evidence to show that my beef against Ruby 1.9 runtime not supporting multithreading are an implementation issue and not a language issue.

A softer –as-needed

Following my previous blog about un-released autoconf I wanted to write a bit about an unreleased change in binutils’ ld, that Sébastien pointed me at a few days ago. Unfortunately, since things piled up, the code is now actually released, and I briefly commented about it in the as needed by default bug. The change is only in the un-keyworded snapshot of pre-2.20 binutils so it’s not released to users, which makes it worth commenting before hand anyway.

The change is as follows:

--as-needed now links in a dynamic library if it satisfies undefined symbols in regular objects, or in other dynamic libraries. In the latter case the library is not linked if it is found in a DT_NEEDED entry of one of the libraries already linked.

If you know how --as-needed works and the ELF-related terms, you should be able already to guess what it’s actually doing. If you’re not in the known with this, you should probably read again my old post about it. Basically the final result of this is that the first situation:

Messy linking diagram

gets expanded in the wished linking situation:

Hoped linking situation

instead of the broken one that wouldn’t work.

This is all good, you’d expect, no? I have some reserves about it. First of all, the reason for this change is to accommodate the needs of virtual implementation libraries like blas and similar. In particular the thread refers to the requirements of gsl to not link its blas implementation leaving it to the user linking the final application. While I agree that’s a desired feature, it has to be noted that all the libraries needs to keep the same ABI, otherwise just changing it on the linker call is not going to work. Which means that you can technically change the implementation by using the LD_PRELOAD environment variable to interpose the new symbols at runtime, allowing to change the possible implementation at runtime without having to relink anything.

Of course, using LD_PRELOAD is not very handy especially if you want to do it on a per-command basis or anything like that. But one could probably wonder “Why on Earth didn’t someone think of a better method for it before?” and then answer to himself after a bit of search “Someone already did!”. Indeed a very similar situation arouse on FreeBSD 5 series since there were multiple PThread implementations available. Since the ABI of the implementations is the same, they can be switched at both link editing time and at runtime linking. And to make it easier to switch it at runtime, they created a way to configure it through the /etc/libmap.conf file.

Indeed, the method to choose different implementations of PThread under FreeBSD used before libmap.conf introduction was the same that gsl wants to use. The result of which already shown that --as-needed was unusable on FreeBSD because of a similar problem: libpthread was never added to the dependencies of any library and was supposed to be linked on the final executable, that might not have any requirement for PThread by itself.

So basically the whole reasoning for softening up --as-needed is to allow working around a missing feature in the GNU runtime linker. Which is what, to me, makes it wrong. Not wrong in the sense of the wrong thing to do, but the wrong reason to do it. But it’s not that simple. Indeed this change means that there will be much less build failures with --as-needed, making it much much more likely to become part of the default options of the compiler once binutils 2.20 is released. On the other hand, I think I’ll submit a patch for ld to warn when the new code is triggered.

My reasoning is quite simple: libraries should be, as much as possible, be linked completely so that all their dependencies are well stated, especially since leaving them to the final executable to link can create a huge mess (think if the final executable is linked on a system where a different, ABI-incompatible, dependency is present, the final executable will have subtle problems running, like unexpected crashes and the like), and also, if ten executables need a single library, which forgets to state its dependency on, just as an example, libexpat, you get ten links to libexpat that needs to be re-created (while the original library will not be picked up at all by the way, so will still expect the ABI of the previous version), rather than just one.

Since indeed the softer --as-needed makes it much simpler to enable it by default, I think it’s not a good idea to revert from the new behaviour, but having a warning that would say something like “-lexpat salvaged for -lfoo” would make it easy to identify the issue and assess on a case by case basis whether this is an intended situation or just a bug. So that the latter can be corrected.

On the other hand I also have a case of failure with recursive linking, coming out of the next PulseAudio release, which I need to get fixed, hopefully before PulseAudio is released.

Multiple mini libraries, –as-needed and wrappers

In a previous post of mine, Mart (leio) advocated the use of misdirected link to enable splitting the non-RSS-hungry libxml2 modules from the ones that create a lot of dirty pages; his concern is very true and I can feel it very well, since libxml2 is indeed a bit of a memory-hungry library. On my firefox instance it reports this:

     vmsize   rss clean   rss dirty   file
      32 kb        0 kb       32 kb   /usr/lib64/libxml2.so.2.7.2
       8 kb        0 kb        8 kb   /usr/lib64/libxml2.so.2.7.2
    1396 kb      336 kb        0 kb   /usr/lib64/libxml2.so.2.7.2

While it is shared, it still has 336KiB of resident memory, which is something that is not too bad but not even too good, after all. But how would one split that library? Well you got to know libxml2 interface a bit to understand this fully, so let’s just try to say that libxml2 has a modular design, and it offers a series of interfaces that are more or less tied together.

For instance, for my daily job I had to write a proprietary utility that uses libxml2 XPath interface as well as the writer module that allows for easy writing of XML files with a very nice interface (the work was done under Windows; building and using libxml2 was much easier than trying to get Codegear’s parser to work, or to interface to Microsoft’s MSXML libraries). I disabled everything that was not needed for this to work, and reduced libxml2 to the minimum amount of needed code.

Software that only needs parsing wouldn’t need the writer module, and not all would require DOM, SAX or PUSH, or XPath and XPointer, and so on so forth. To be able to disable the extra stuff there are a series of ./configure flags, but mapping those to USE flags is not really feasible since you’d be breaking ABI; plus a solution should be found with upstream in my opinion.

So what Mart suggested was breaking the library in half, with a “mini” version being the non-memory-hungry and the rest of the interfaces. My proposal here would be much bigger, breaking the ABI a lot, but also very very exhaustive: break up libxml2 in a series of small libraries each representing an interface. A software needing one of them would link it in and be done with it. Beside breaking ABI, this would also break all the software using libxml2 though, even rebuilding it, which is very very bad. Well, the solution is actually much easier:

OUTPUT_FORMAT ( elf64-x86-64 )
GROUP ( AS_NEEDED ( libxml2-sax2.so libxml2-schemas.so libxml2-schematron.so libxml2-writer.so libxml2-xpath.so .... ) )

This is an ldscript, which tells the linker what to do; save it as libxml2.so and linking with -lxml2 will pull in just the required libraries for the interface used by the program. If you look at your /usr/lib, you got already quite a few of these because Gentoo installs those for the libraries that are moved into /lib instead. This works around the inability to use misdirected linking for wrappers.

Now of course this trick does not work with every linker out there; but it works with GNU ld and with Sun’s linker, and those are the two for which --as-needed makes sense; if libxml2 where to break itself in multiple libraries, they could decide depending on a configure option whether to install a ldscript wrapper or a non-asneeded capable library, so that Linux, FreeBSD and Solaris (and others) would use the ldscript without adding further ELF files, and the others would go with a compatibility method.

Please also note that using pkg-config for libraries discovery would make this also easier without having wrappers at all, as libxml2.pc would just have to list all the interfaces in their Require: line.

The disk problem

My first full tree build with dependencies, to check for --as-needed support has almost finished. I have currently 1344 bugs open in “My Bugs” search, that contains reports for packages failing to build or breaking with --as-needed, packages failing to build for other reasons, packages with file collision that lack blockers around them (there are quite a lot, even totally unrelated one with the other), and packages bundling internal copies of libraries such as zlib, expat, libjpeg, libpng and so on.

I can tell you, the amount of tree packages not following policies such as respecting user LDFLAGS, not using bundled libraries, and not installing stuff randomly in /usr is much higher than one might hope for.

I haven’t even started filing bugs for pre-stripped packages since I have to check those for being filed already, by either me in a previous run, or by Patrick with his tinderbox or other people as well. I also wanted to check this against a different problem: packages installing useless debug info using split-debug, by not passing -g/-ggdb properly to the build system and thus not including debug information at all. Unfortunately for this one I need much more free space than I have right now on Yamato. And here I start with my disks problems.

The first problem is space; I allocated 75GB of space for the chroots partition, which uses XFS, after extending it a lot; with a lot of packages missing, I’m reaching for the last 20GB free. I’ll have to extend it more, but to do that I have to get rid of the music and video partitions after moving them to the external drive that Iomega replaced for me (now running RAID1 rather than JBOD; and HFS+ since I want to share it with the laptop if I need the data and Yamato is off). I also will have to get rid of the Time Machine volume I created in my LVM volume group, and start sharing the copy on the external drive; I did that so that the laptop was still backed up while I waited for the replacement disk.

The distfiles directory has reached over 61GB of data, and this does not include most of the fetch-restricted packages, of course I already share it between Yamato’s system and all the chroots (by the way, I currently have it as /var/portage/distfiles, but I’m considering moving it to /var/cache/portage/distfiles since it seems to make more sense; maybe I should propose this to be the actual default in the future, as using /usr for this does not sound kosher to me), like I share the actual synced tree. Still, it is a huge amount of data.

Also, I’m not using in-RAM build, even though I have 16GB of memory in this box. There are multiple reasons for this; the first is that I leave the build run even when I’m doing something else, which might require RAM by itself, and thus I don’t want the two to disrupt themselves so easily, and also, I often go away to watch movies, playing or something while it builds, so I have to look back at the build even a day after; and sometimes colleagues ask me to look at a particular build that might have happened a few days earlier. Having the build on disk helps me a lot here, especially for epatch, eautoreconf and econf logs.

Another reason is that the ELF scan process that scanelf uses is based on memory mapped files, which is very nice when you have to run a series of scanelf calls on the same set of files, since the first run will cache all of them in memory and the others will just to traverse the filesystem to find them. So I want to have as much memory free as I can.

So at the end the disks get to be used a lot, which is not very nice especially since they are the disks that host the whole system for now. I start to fear for their health, and I’m looking for a solution, which does not seem to be too obvious.

First of all, I don’t want to go buying more disks, possibly I’d rather not buy any new hardware for now since I haven’t finished paying for Yamato yet (even though quite a few users contributed, whom I thank once again; I hope they’re happy to know what Yamato’s horsepower is being used for!), so any solution has to be able to be realised using what I have already in house, or need to be funded somehow.

Second, speed is not much of an issue although it cannot be entirely ignored; the build reached sys-power today at around 6pm, and it started last Friday, so I have to assume that a full build, minus KDE4, is going to take around ten days. This is not optimal yet since kde-base makes the ebuild rebuild the same packages over and over switching between modular and monolithic, the solution would be to use binpkgs to cache the rebuilds, which is going to be especially useful to avoid rebuilds on collision-protect failures, and on unmerged packages due to blockers, but that’s going to slow down the build a notch. I haven’t used ccache either, I guess I could have, but I’d have to change the cache directory to avoid resetting the caching I use for my own projects.

So what is my current available hardware?

  • two Samsung SATA (I) disks, 160GB big; they were the original disks I bought for Enterprise, they currently are one in Farragut (which is lacking a PSU and a SATA controller, after I turned it off last year), and one in Klothos (the Sun Ultra 5 with G/FBSD);
  • one Maxtor 80GB EIDE disk;
  • one Samsung 40GB EIDE disk;
  • just one free SATA port on Yamato’s motherboard;
  • a Promise SATA (I) PCI controller;
  • no free PCI slots on Yamato;
  • one free PCI-E x16 slot;

The most logical solution would be to harness the two Samsung SATA disks in a RAID0 software array, and use it as /var/tmp, but I don’t have enough SATA ports; I could set up the two EIDE drives but they are not the same size so RAID0 would be restricted to the 40GB size of the smallest one, which may still be something, since the asneeded chroot’s /var/tmp is currently 11GB.

Does anybody know if a better solution to my problems? Maybe I should be using external drive enclosures or look for small network attached storage systems, but those are things that I don’t have available, and I’d rather not go to buy until I finished paying for Yamato. By itself, Yamato has enough space and power to handle more disks, I guess I could be using a SATA port multiplier too, but I don’t really know about their performance, nor brands or anything, and again would be requiring to buy more hardware.

If I get to have enough money one day, I’m going to consider cabling with gigabit network my garage and set up there a SAN with Enterprise or some other box, a lot of HDDs, and serve them through ZFS+iSCSI or something. For now, that’s a mere dream.

Anyway, suggestions, advices and help about how to reorganise the disk problem are very welcome!

Misguided link and –as-needed

In my previous post I’ve noted that there are some cases where --as-needed stops a program from building even though it’s not because of an indirect link. I like to call this class of failures the “misguided link” failures.

Consider the following diagram showing such a case:

diagram showing the broken relationship between a program, libssl and libcrypto

We have a given software, linking to libssl and instead using libcrypto. This is the inverse of the indirect case I wrote about last time, but it still features a link relationship with no use relationship, which is going to be cut by --as-needed. This is one of the most interesting cases since it’s really difficult to identify without going to check either the source code or the missing symbols. It’s not limited to OpenSSL libraries, it’s actually pretty common in general, but it happens quite a lot with them since people forget that OpenSSL is more than just libssl.

So how can we identify this problem? Well the first issue here is to identify what can cause this. Let’s say we have a simple software that calculates the MD5 of its standard input, something like this:

#include <stdint.h>
#include <stdio.h>
#include <openssl/md5.h>

int main() {
  MD5_CTX md5;
  uint8_t md5digest[MD5_DIGEST_LENGTH];
  int i;

  MD5_Init(&md5);

  while(!feof(stdin)) {
    char buff[4096] = { 0, };
    size_t read = fread(buff, 1, sizeof(buff), stdin);

    MD5_Update(&md5, buff, read);
  }

  MD5_Final(&md5digest[0], &md5);

  for(i = 0; i < sizeof(md5digest); i++)
    printf("%02x", md5digest[i]);

  printf("n");
  return 0;
}

Now if we try to compile this on a system without forced --as-needed (and no --as-needed in LDFLAGS) linking it with -lssl, it will work just fine

% GCC_SPECS="" gcc md5-ssl.c -o md5-ssl -lssl
% scanelf -n md5-ssl 
 TYPE   NEEDED FILE 
ET_EXEC libssl.so.0.9.8,libc.so.6,libcrypto.so.0.9.8 md5-ssl 
% ldd md5-ssl 
	linux-vdso.so.1 =>  (0x00007fff11bfe000)
	libssl.so.0.9.8 => /usr/lib/libssl.so.0.9.8 (0x00007f070961e000)
	libc.so.6 => /lib/libc.so.6 (0x00007f07092ab000)
	libcrypto.so.0.9.8 => /usr/lib/libcrypto.so.0.9.8 (0x00007f0708f19000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f0709870000)
	libdl.so.2 => /lib/libdl.so.2 (0x00007f0708d15000)

but if we try to compile it with forced --as-needed, or even just --as-needed in LDFLAGS, the results are quite different:

% gcc md5-ssl.c -o md5-ssl -lssl   
/tmp/.private/flame/cc8kRKqi.o: In function `main':
md5-ssl.c:(.text+0x10): undefined reference to `MD5_Init'
md5-ssl.c:(.text+0x8d): undefined reference to `MD5_Update'
md5-ssl.c:(.text+0xae): undefined reference to `MD5_Final'
collect2: ld returned 1 exit status
% GCC_SPECS="" gcc -Wl,--as-needed md5-ssl.c -o md5-ssl -lssl 
/tmp/.private/flame/ccVWCirl.o: In function `main':
md5-ssl.c:(.text+0x10): undefined reference to `MD5_Init'
md5-ssl.c:(.text+0x8d): undefined reference to `MD5_Update'
md5-ssl.c:(.text+0xae): undefined reference to `MD5_Final'
collect2: ld returned 1 exit status

A lot of people at this point would be thrown off since the library is there, after the source files (or object files), there are no commodity libraries involved, so the linking line should be correct. But instead it fails, and the problem lies in using the wrong library.

As the name tells you, libssl contains functions that are used for implementing Secure Socket Layer, while MD5 is also used for the implementation, it’s not part of the interface. And indeed, MD5 functions are not part of the library’s interface.

Now, since even the man page for these function does not tell you which library to find them in (while most Linux, *BSD and Solaris man pages tell you which library a function comes from), you have to rely on either experience, or test to find which is the correct library.

Let’s try two different approaches here, just so that people can understand how I end up debugging these things in the first place.

To begin with, let’s check whether libssl provides the symbols we’re missing, we don’t expect it to since the link failed; easy way to do this? nm and grep:

% nm -D /usr/lib/libssl.so | egrep 'MD5_(Init|Update|Final)'
%

There is no defined nor undefined symbol with those names, which means there is no MD5 interface defined nor used in that library. Which explains why the link failed. Now since we know the build works without --as-needed we check which library libssl brings in as dependencies:

% ldd /usr/lib/libssl.so
	linux-vdso.so.1 =>  (0x00007fff1dbfe000)
	libcrypto.so.0.9.8 => /usr/lib/libcrypto.so.0.9.8 (0x00007f2d1551d000)
	libc.so.6 => /lib/libc.so.6 (0x00007f2d151aa000)
	libdl.so.2 => /lib/libdl.so.2 (0x00007f2d14fa5000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f2d15b28000)

The first library is the virtual dynamic shared object of the Linux kernel, let’s ignore it; the last is the dynamic linker (or loader) itself, which we also want to ignore; we can exclude libc or otherwise the program wouldn’t have failed, since that’s always brought in. We’re left with two candidates: libdl and libcrypto. Now let’s be very dumb and ignore the name “crypto”, as well as ignoring that libdl is home of dlopen() and other known functions, and look in the two of them for the symbols:

% nm -D --defined-only /lib/libdl.so.2 | egrep 'MD5_(Init|Update|Final)'
% nm -D --defined-only /usr/lib/libcrypto.so.0.9.8 | egrep 'MD5_(Init|Update|Final)'
000000000006a5e0 T MD5_Final
000000000006a5a0 T MD5_Init
000000000006a6e0 T MD5_Update

So we found the problem, and indeed you can try yourself that requesting -lcrypto directly in the build of the program above will make it work just fine with and without --as-needed, with the added benefit that libssl is not being loaded when running the software.

Now this is a slightly boring and long approach, the alternative approach, which work just fine in Gentoo, requires just one command:

% scanelf -ql -s +MD5_Init
MD5_Init  /usr/lib64/libcrypto.so.0.9.8
MD5_Init  /usr/lib64/libgnutls-openssl.so.26.11.3

The scanelf call we have here will go searching for the correct library we need, although it might confuse you since it might report different implementation or totally unrelated libraries in case of symbol collisions (which is something I use to identify broken software by the way). Note that here I just targeted one symbol, the reason for this is that the current version of scanelf from 0.1.18 is not working properly with regex-based search; in the current CVS version you could be using scanelf -gqls 'MD5_(Init|Update|Final)', but it would just find the first anyway.

Is this easy enough to fix, in your opinion? Also consider that if software were to use pkg-config, right now, it would be listing -lssl -lcrypto -ldl, which would stop --as-needed from breaking, but is most likely going to break in the future if libssl.pc is updated to use Require.private to list libcrypto.

Relationship between –as-needed and –no-undefined – Part 1: what do they do?

I think that after my writeup and Robert’s bugspree some people might have the wrong idea about the relationship between the --as-needed and the --no-undefined flags.

Let’s begin to say exactly what --no-undefined does: it makes the linker reject building targets that have undefined references that are not satisfied by any of the libraries it links to directly and indirectly. The linker already rejects this for final executables, but for a series of reason the default is to allow undefined references in shared objects. But if you have a library A that calls functions from the library B but does not directly link to it, with --no-undefined the linker will refuse to build A entirely; the default would be for A to still build, and the software SW that is going to use A to be forced to explicitly link in B. The following image shows what I mean in the form of graph:

a diagram showing the indirect linking problem

In the above image you can see the “Use relationship” and the “Link relationship” not being balanced, and here comes the problem, since --as-needed has, as task, to remove the link relationships that are not paired with an use relationship. Now just to make sure everybody is on the right page, the reason why --as-needed exists is that, thanks to the original conception of libtool and pkg-config and others, we can easily have programs whose linking diagram is something like the following:

a diagram showing a complex and overextended linking of a program

If you look carefully you can see that there are some linking branches that are not actually used at all; this is because the linker, by default, links in whatever you tell it to link, so you can easily link in a program some libraries that are never ever used, which is not only a waste of time, but it also wastes time and resources at link time since the linker may have to take care of relocations, wastes time during symbol resolution because the extra libraries need to be scanned too, and might take up resources if they have constructor functions for instance, that will cause initialisation functions to be executed, which might open datafiles, allocate data structure and so on.

Now, what we’d like would be something like the following diagram, which shows the linking reduced to the actual needed parts, still following the same rules as the original:

a diagram showing the hopefully pruned out linking of a program

This would reduce the amount of objects loaded to the minimal, needed part, so that no foreign object gets to be loaded at runtime, or linked in at build time for what matters. Unfortunately, this is a dream: the way the linker works, --as-needed produces different results than what you see here, it produces this:

a diagram showing the as-needed pruned linking of a program

I’ve changed colours for the object in this diagram: the yellow objects are the ones that lack a linking relationship, and thus won’t be loaded up, the red objects are those which are broken, since they use an object they don’t link to. You can see that there actually is an exception to that rule, since the yellow object in the middle of the graph uses the blue one at the right end of it, but don’t link directly to it; while it’s probably a good idea to make sure that all you use is also linked in, that situation is legal since the link to the highest object is available indirectly from another object.

Now, --no-undefined would have caught the two broken objects at their build-time rather than when they were to be used by another project, but as I’ll try to explain in the next days, this option is not a panacea, but it helps to identify the issues up in the stream. On the other hand, there are some situation where --as-needed finds trouble that --no-undefined wouldn’t identify early on.

Since people often tell me I write way too much in a single blog entry, I’ll try to wait adding the rest of the content till this entry is digested, the next chapter hopefully in two days, in the mean time feel free to write any question you have in the comments, so I can answer, either on the comments (as you may guess I read them but not always have time to address them directly), or in the next posts.

By the way, the speed to which I can write these article depends directly on the amount of caffeine in my bloodstream, so if you wish to have more content written faster, you can always help me by getting me some good coffee beans, I have never tried java for instance.