Again about glibc 2.14, RPC and modern software

It looks like my previous post on glibc 2.14 made it to reddit – even though it made not much of an impression to flattr – and there is at least one interesting question asked there, about what software is using RPC that I wasn’t expecting.

While it is definitely true that I underestimated the amount of systems still using the old-style NIS, standing to the commenters on my other post about PAM, there is a long list of packages that make use of glibc’s RPC subsystem that I didn’t expect. All of this definitely doesn’t make for an interface that is dying without replacement, as one commenter expressed:

Except that no one uses Sun RPC for that. It’s only application in modern unixes is NFS, so it does not really belong to libc. nfs-utils and libtirpc should handle that. Same goes for NIS and other remnants from the dark ages. Removing unused bloat from the fundamental system library is actually a good thing.

And for the record, RPC will not be removed from “the fundamental system library”: code for the RPC implementation is still all there, it’s just hidden and disallowed from being linked to, which means that the packages that use the interface will not build, but those that were built before (or the binary packages that come prebuilt) will not fail to run on the new library. No “bloat” removed.

Okay, so what are those packages? Well, for once let’s see at something I have worked on myself for a while and that is actively developed to this very moment: libvirt. that, while obviously designed to work well with libtirpc, can’t be installed on glibc 2.14 (as libtirpc is not fixed yet). And its RPC usage has nothing to do with NFS either. On the other hand, it seems like watchdog, lsof, quota, autofs and possibly tcpdump do need it for NFS support.

I don’t know much about them, but the list of packages requiring RPC includes oc, torque, libcult, libassa, hamlib, lives, xinetd, db (yes Berkeley DB), libdap, tcb, netkit-rusers, netkit-bootparamd, ogdi, charm, netkit-rwall, gs-assembler, ctdb, perdition, amanda, scilab and R….

I haven’t started fixing any of these myself, I have way too much things on my plate already and this is not an high enough priority for me to tackle in my free time, but at least I can report and keep tabs on them. It’s enough for now I guess.

About GLIBC 2.14, EGLIBC, and Gentoo

I was originally planning to write about one of my current job tasks tonight, since that was honestly interesting for the Free Software part as well, but since I’ve received a number of comments in those regards, and even a couple of direct email messages, I think it might be a better use of my time to reply on this situation instead.

I have blogged repeatedly about the trouble caused by the new version of GLIBC (2.14) and its developers’ choice to stop allowing access to the RPC implementation that it comes with, in favour of the new, also-broken-by-the-same-update libtirpc library.

Turns out that this situation is becoming so absurd, that at least Archlinux decided to revert the removal of the RPC interface. And the same decision seems to be taken by the EGLIBC developers (which as far as I can tell, means that Debian and Ubuntu will keep the RPC interface as well). The obvious question people ask me then is “Why isn’t Gentoo doing the same?”

I’m afraid I don’t have a real answer to this: I’m not the GLIBC maintainer, that’s Mike. I’m not in his head and I honestly haven’t asked him to comment on the issue yet; the reason why I’m not pushing him for comments or actions is simple: I see no particular urge to move to the new GLIBC version. The news entries for the new release are a bit short to be of immediate interest to me, and the presence of a bug making Ruby not installable (thanks Sergei for tracking down the root cause!) makes it very low-priority to me, as in, no-priority really.

In particular, the last I knew about the EGLIBC situation, was that Mike preferred validating the applied patches by their own merit, following the upstream GLIBC developers as close as possible unless required for particular architectures and situations, which is a choice I respect deeply. The issue there seems to be that Drepper is getting more and more detached with the needs of the eco-system, and is still a sort-of dictator for what concerns the C library. I was also pointed at some suspects that he’s no longer in direct employment of RedHat, but given that I don’t really care about that I didn’t confirm or reject that; make what you want of it.

As for reverting the removal of RPC interface.. I don’t like that choice. I mean, the problem here is not that we lack a replacement for the RPC interface in GLIBC, but rather than the replacement is non-working. Rather than spend effort in working against GLIBC developers, it would be better spent to fix libtirpc so that it works with GLIBC 2.14, thus leaving us with a properly-working RPC implementation.

In particular, I think it might be a good idea now to implement the proper virtual for RPC implementations on GLIBC and other systems:

elibc_glibc? ( || ( net-libs/libtirpc <sys-libs/glibc-2.14 ) )

Using such a virtual would make it easier for me to ignore the packages that are known not working with glibc-2.14, as the dependencies wouldn’t be satisfied, and the tinderbox would then skip over the package altogether. I guess I should send an email about this so that it can be discussed and implemented.

There is another reason why I’m not so keen on restoring the interfaces that were removed from this version of the C library; while in my previous post’s comments a number of people have commented, correcting me on my first assessment that NIS was dead, it is still something that most desktops wouldn’t need, and uClibc does not implement, and finding the packages relying on said interface is still an interesting task to tackle.

In general, I’m afraid to tell you that I’m not going to “solve” the problem, by restoring the symbols, myself. If Mike decides to take that approach, the fallout is just going to be delayed, not avoided. And no, even though I probably would prefer moving away from GLIBC to EGLIBC – not just for this problem but also for things like the base versioning issue that is making gold less useful than it could be – I don’t have the time nor the motivation to step up and become the new C library maintainer in Gentoo. I barely have the time to keep on track with what I’m already supposed to do.

Are you kidding me? Or, why we’ll wait glibc 2.14 for a while

A couple of days ago I noted the move to glibc 2.14 of my tinderbox with the hope to quickly find and fix the packages that depend on the now-removed RPC interface. I didn’t expect this kind of apocalypse, but I’m almost wanting to cry, thinking about the mess this version seems to create.

First of all, it doesn’t seem like it’s just Ruby being hit by memory corruption issues, which makes it likely that the new memcpy() interface noted in the ChangeLog is to blame. I haven’t had time to debug this yet though.

A new scary situation arose as well: wget exits with a segmentation fault when trying to resolve any hostname that is not in /etc/hosts, which in the case of the tinderbox means anything that is not localhost or Yamato (as that’s where the Squid proxy is added that caches requests for the fetched Gentoo data). I’m not sure of the cause yet, as the fault happens not within the executable’s code but directly into libresolv, which would point at a bug in glibc itself.

For what concern RPC, I’m surprised that there are so many packages depending on it, and fo the widest variety: multimedia, scientific, network analysis tools, and so on. Now, I was optimist in my previous post, expecting that for most, if not all, of the packages using RPC would be fixed by relying on libtirpc. Ooh boy, how wrong I was.

See the issue is this: libtirpc itself does not build on glibc-2.14, as it relies on one of the NIS/YP headers that has also been removed. Even worse, the latest version (0.2.2) of libtirpc, which I hoped would solve the issue, does not work on any system at all, since a change by our very own Mike (vapier), which was merged upstream just before 0.2.2 release, causes the build to produce a library that lacks a couple of symbols — the source file where they are defined was not added, but even when you add it, you get a couple more symbols being missing. And this release has been out for over a month without any sign of a 0.2.3 coming (upstream repository is still broken, at the time of writing).

Are you freaking kidding me?

Oh and for those who wonder, the issue with base versioning that, as I’ve told, is holding up implementing base version support in gold, is still not fixed. This means that packages such as fuse, included, who wanted to keep binary compatibility with their original unversioned symbols are still not getting any compatibility, even with this version. In my personal opinion it would be a good time to drop the code for that in fuse, but upstream prefers waiting for the new 3.0 version, which is going to get tricky.

With all this considered, it really looks like a very badly broken release, and one that makes me wonder if it wasn’t too inconsiderate to reject the idea of moving to the eglibc patchset/fork like Debian and Ubuntu seems to have done.