Are you kidding me? Or, why we’ll wait glibc 2.14 for a while

A couple of days ago I noted the move to glibc 2.14 of my tinderbox with the hope to quickly find and fix the packages that depend on the now-removed RPC interface. I didn’t expect this kind of apocalypse, but I’m almost wanting to cry, thinking about the mess this version seems to create.

First of all, it doesn’t seem like it’s just Ruby being hit by memory corruption issues, which makes it likely that the new memcpy() interface noted in the ChangeLog is to blame. I haven’t had time to debug this yet though.

A new scary situation arose as well: wget exits with a segmentation fault when trying to resolve any hostname that is not in /etc/hosts, which in the case of the tinderbox means anything that is not localhost or Yamato (as that’s where the Squid proxy is added that caches requests for the fetched Gentoo data). I’m not sure of the cause yet, as the fault happens not within the executable’s code but directly into libresolv, which would point at a bug in glibc itself.

For what concern RPC, I’m surprised that there are so many packages depending on it, and of the widest variety: multimedia, scientific, network analysis tools, and so on. Now, I was optimist in my previous post, expecting that for most, if not all, of the packages using RPC would be fixed by relying on libtirpc. Ooh boy, how wrong I was.

See the issue is this: libtirpc itself does not build on glibc-2.14, as it relies on one of the NIS/YP headers that has also been removed. Even worse, the latest version (0.2.2) of libtirpc, which I hoped would solve the issue, does not work on any system at all, since a change by our very own Mike (vapier), which was merged upstream just before 0.2.2 release, causes the build to produce a library that lacks a couple of symbols — the source file where they are defined was not added, but even when you add it, you get a couple more symbols being missing. And this release has been out for over a month without any sign of a 0.2.3 coming (upstream repository is still broken, at the time of writing).

Are you freaking kidding me?

Oh and for those who wonder, the issue with base versioning that, as I’ve told, is holding up implementing base version support in gold, is still not fixed. This means that packages such as fuse, included, who wanted to keep binary compatibility with their original unversioned symbols are still not getting any compatibility, even with this version. In my personal opinion it would be a good time to drop the code for that in fuse, but upstream prefers waiting for the new 3.0 version, which is going to get tricky.

With all this considered, it really looks like a very badly broken release, and one that makes me wonder if it wasn’t too inconsiderate to reject the idea of moving to the eglibc patchset/fork like Debian and Ubuntu seems to have done.

Gold readiness obstacle #2: base versioning

This is the second post in the series analysing the obstacles we face if we want to actually make use of gold as system link editor at some point in time in the future. As I said for the previous one, please make your interest in the topic explicit, as it is a draining exercise to me, due to huge lack of interest by many other developers.

I have already noted up in part #1 that I have submitted a patch for gold and it wasn’t merged, which ticked me off a bit. In this post I’ll explain what that patch was about. This is particularly interesting to me, because, while it is in a very commonly used package, this problem wouldn’t be an “obstacle” as much as it is, in my view, if it wasn’t that I was doing paid work to look into it.

I have already written yesterday, and a number of times before, how you use ELF symbol versioning, so I won’t go back to the topic right now. What I’ll repeat here is that there are two main reasons to use symbol versioning: preventing symbol collisions, as it is used by the Berkeley DB slots I wrote about yesterday, or preserving binary compatibility when making incompatible change to functions’ ABIs but wanting to keep the same library ABI (and thus, soname).

For the former task, you can use the same blanked version information for all the symbols, as I noted, while for the latter you need a more surgical approach. What you usually do, when you stabilized the interface the first time around, is marking with the same version string all the functions. When one of those functions need to be replaced, then, you use source-level symbol versioning to provide a new “default” version of the symbol, together with an explicitly-versioned copy of the symbol that abides to the previously-used ABI. For more details about this, you can see the Binutils documentation that shows the example I’m going to pick up here.

     __asm__(".symver original_foo,foo@");
     __asm__(".symver old_foo,foo@VERS_1.1");
     __asm__(".symver old_foo1,foo@VERS_1.2");
     __asm__(".symver new_foo,foo@@VERS_2.0");

The code above is taken verbatim from the latest version of GNU ld (bfd) documentation. What it translates to, is this:

  • the original, replaced/deprecated interface of the foo() function is implemented with the (hidden) symbol original_foo;
  • two further, replaced/deprecated versions of foo() are implemented as old_foo and old_foo1;
  • finally, new_foo implements the most current version of foo().

How does this work in practice? Well, first of all the headers should only declare the newest interface of foo() – that is new_foo – so that new programs only can use that. When linking a new binary, the link editor will know to satisfy foo() references lacking a version with that version, not because the version string is “higher” (the version string has no meaning for link editor and runtime loaders, it’s just a string); but because it is marked as the “default” version (see the double at symbol in the directive. The other interfaces don’t have to be in the headers, and they will be ignored by the link editor, like they weren’t there. Software built against a previous version of the library, where the default version for foo() was VERS_1.2 or VERS_1.1, would still reference those versions; the runtime loader (ld.so) would then look those up, rather than VERS_2.0.

Lovely, isn’t it? You can improve your interface, solve age-old issues, without having to break the ABI, with the sole “little” downside of increasing the size of the library itself… and relying on a feature only available, for what I know, on GLIBC and maybe FreeBSD (you can achieve the same effect on Windows, but their approach is massively different, anyway let’s ignore that for now). Before somebody says that you actually double the size of the code, I’d like to point out that most of the time, the old function can be expressed as a call to the new function, with properly adapted parameters, unless you’re really changing the function to something entirely different.

For those wondering, using this approach with C++ is very complex and I’d probably say impossible: the ABI for C++ libraries includes the vtable for classes; when adding a new function to a class you change the vtable, increasing its size, and causing the ABI to change. It is for this reason that Trolltech used D-pointers in Qt for a long time, and why KDE had many problems introducing new features and fixing old bugs within a major release cycle.

Now let’s go back to our story of gold, fuse, and symbols.

The fuse library is designed to keep as binary compatible as possible with its predecessors, at least when built for GLIBC (it has special rules to not version interfaces when building for uClibc for instance). This is because it is designed to allow proprietary filesystem providers — for instance the Mac version is used by Parallels to provide their shared folders support. Unfortunately it seems like this wasn’t a requirement in their original implementation, which was built wihtout version information for symbols. This happens quite often actually.

The Binutils example code above fortunately shows exactly how to deal with that: you declare a symbol with no version information. This is called the “base version”, and can only be referenced as the sole version in a linker script, or by omitting the version specifier in a .symver directive. This works with the GNU assembler (as) and with the BFD link editor, but when creating a library with a base-versioned symbols with gold, you get an error:

libtool: link: i686-pc-linux-gnu-gcc -shared  .libs/fuse.o .libs/fuse_kern_chan.o .libs/fuse_loop.o .libs/fuse_loop_mt.o .libs/fuse_lowlevel.o .libs/fuse_mt.o .libs/fuse_opt.o .libs/fuse_session.o .libs/fuse_signals.o .libs/cuse_lowlevel.o .libs/helper.o .libs/subdir.o .libs/iconv.o .libs/mount.o .libs/mount_util.o   -lrt -ldl -Wl,--as-needed  -pthread -Wl,--version-script -Wl,./fuse_versionscript -Wl,-O1 -Wl,--hash-style=gnu   -pthread -Wl,-soname -Wl,libfuse.so.2 -o .libs/libfuse.so.2.8.5
/usr/lib/gcc/i686-pc-linux-gnu/4.6.0/../../../../i686-pc-linux-gnu/bin/ld: error: symbol __fuse_exited has undefined version 

It might not be as quick to be said but this message simply means that gold does not support linking objects containing base-versioned symbols. Is it just a missing feature? Not really. I mean, the feature itself is missing, and indeed is simple to implement, to the point I have implemented it, and you can find the patch for it in Sourceware bug #12261 which is still pending.

The problem here is that even though GNU bfd/ld implements that feature, it is a pointless feature to implement, right now. The problem lies not in the link editor but in the runtime loader (ld.so). As you can see from the testcases provided by Ian in the bug linked above, GLIBC does not do what it is expected to.

What you expect is that, when the loader finds an undefined (requested) symbol, without an attached version information, it would look for a symbol with the corresponding name with base version (no version attached to the definition), and failing that it would look for the one in the default version. What actually happens is that the loader simply picks the first symbol it finds with the same name, without caring about the version if it wasn’t specified in the customer. It is just sheer luck if it finds the one that was intended to be found.

What’s the morale? Well, we have one advertised feature that never worked but that a few projects, such as fuse, wanted to rely upon… I don’t disagree with Ian that this should be fixed in GLIBC first, and that for now gold is just exposing code that doesn’t work. Unfortunately Ian’s requests about the feature went unanswered – and due to Drepper just dumping the list of bug numbers without description in the NEWS files I can’t tell if it was addressed in the new 2.14 version – which means we still have no clue whether this is a functionality that will ever be useful or not. I’ll have to try again if fuse project would agree at just dumping the symbols for now, since they cannot be useful with current glibc versions.

Again, expressing your interest on the topic helps me judge how much weight to put on it outside of my dayjob. Thanks in advance.

How to use GIT via SFTP only

GIT is pretty nice for an SCM software, it’s nice to be able to work locally on a project without losing track of the changes before publishing it on a proper server. One problem of it, though, is that you can’t find much hosting space for it, as it requires you either HTTP push capabilities or an SSH access where GIT is installed, as it needs it server side to be able to push.

I’ve worked around this for most of my projects by hosting my own GIT repositories on Farragut, but this limits quite a bit the scope of the projects, and limits also the fact that if I’m offline, my repositories are, too. For this I’ve moved my overlay to overlays.gentoo.org as soon as GIT support was added.

When I started doing some serious work on Rust and Ruby-Hunspell, I decided I couldn’t just leave them to die with my server if my connection is revoked, or if something happens to it, so I’ve looked for an alternative solution. I had already hosting on RubyForge for the ruby-hunspell site but the only access to the site was through SFTP, and thus git wasn’t able to handle it.

Thinking of it, I found a solution that was quite obvious: GIT can push to a local path too, so I just needed to get a copy of the pushed repository and load it there; using SFTP would have been slow, but worked. I first did it that way, but it grew tired easily.

The next step was helped by Timothy, who talked often of Fuse and who handled together with genstef the ebuilds for fuse to run on Gentoo/FreeBSD: using sshfs and fuse, I could push to a «local» path that was instead mounted from the RubyForge SFTP site. And that’s what I’m using now to push both ruby-hunspell and rust repositories, both of which are available to users via HTTP on the two respective sites.

I found one big showstopper with this approach today, but after wasting some of Ferdy’s time, I found a pretty simple workaround for it too: during the second push, GIT tries to rename the master.lock file into master, without removing master of course; on SSHFS this does not work by default and returns an EPERM error (Permission denied); to fix that you just need to enable the rename workaround while mounting the SFTP directory.

It is slow, but it works nicely if you don’t have a server where you can use GIT properly.

For who’s wondering, this is what I’m using currently to mount the RubyForge server:

sshfs -o workaround=rename rubyforge.org:/var/www/gforge-projects/ 
    /media/repos/flame/remote/rubyforge

I hope this post can be useful to someone else too :)

Update (2017-04-22): as you may know, Rubyforge was shut down in 2014. Unfortunately that means that most of the Rust documentation and repository are probably gone (I may have backups but I have not uploaded them anywhere).