Restarting a tinderbox

So after my post about glibc 2.17 we got the ebuild in tree, and I’m now re-calibrating the ~amd64 tinderbox to use it. This sounds like an easy task but it really isn’t so. The main problem is that with the new C library you want to make sure to start afresh: no pre-compiled dependencies should be in, or they won’t be found: you want the highest coverage as possible, and that takes some work.

So how do you re-calibrate the tinderbox? First off you stop the build, and then you have to clean it up. The cleanup sometimes is as easy as emerge --depclean — but in some cases, like this time, the Ruby packages’ dependencies are causing a bit of a stir, so I had to remove them altogether with qlist -I dev-ruby virtual/ruby dev-lang/ruby | xargs emerge -C after which the depclean command actually starts working.

Of course it’s not a two minutes command like on any other system, especially when going through the “Checking for lib consumers” step — the tinderbox has a 181G of data in its partition (a good deal of which is old logs which I should actually delete at this point — and no that won’t delete the logs in the reported bugs, as those are stored on s3!), without counting the distfiles (which are shared with its host).

In this situation, if there were automagic dependencies on system/world packages, it would actually bail out and I’d have to go manually clean them up. Luckily for me, there’s no problem today, but I have had this kind of problem before. This is actually one of the reasons why I want to keep the world set in the tinderbox as small as possible — right now it consists basically of: portage-utils, gentoolkit (for revdep-rebuild), java-dep-check, Python 2.7 (it’s an old thing, it might be droppable now, not sure), and netcat6 for sending the logs back to the analysis script. I would have liked to remove netcat6 from the list but last time the busybox nc implementation didn’t work as expected with IPv6.

The unmerge step should be straightforward, but unfortunately it seems to be causing more grief than it’s expected, in many cases. What happens is that Portage has special handling for symlinked directories — and after we migrated to use /run instead of /var/run all the packages that have not been migrated to not using keepdir on it, ebuild-side, will spend much more time at unmerge stage to make sure nothing gets broken. This is why we have a tracker bug and I’ve been reporting ebuilds creating the directory, rather than just packages that do not re-create it on the init script. Also, this is when I thank I decided to get rid of XFS as the file deletion there was just way too slow.

Even though Portage takes care of verifying the link-time dependencies, I’ve noticed that sometimes things are broken nonetheless, so depending on what one’s target is, it might be a good idea to just run revdep-rebuild to make sure that the system is consistent. In this case I’m not going to waste the time, as I’ll be rebuilding the whole system in the next step, after glibc gets updated. This way we’re sure that we’re running with a stable base. If packages are broken at this level, we’re in quite the pinch, but it’s not a huge deal.

Even though I’m keeping my world file to the minimum, the world and system set is quite huge, when you add up all the dependencies. The main reason is that the tinderbox enables lots and lots of flags – as I want to test most code – so things like gtk is brought in (by GCC, nonetheless), and the cascade effect can be quite nasty. The system rebuild can easily take a day or two. Thankfully, the design of the tinderbox scripts make it so that the logs are send through the bashrc file, and not through the tinderbox harness itself, which means that even if I get failures at this stage, I’ll get a log for them in the usual place.

After this is completed, it’s finally possible to resume the tinderbox building, and hopefully then some things will work more as intended — like for instance I might be able to get a PHP to work again… and I’ll probably change the tinderbox harness to try building things without USE=doc, if they fail, as too many packages right now fail with it enabled or, as Michael Mol pointed out, because there are circular dependencies.

So expect me working on the tinderbox for the next couple of days, and then start reporting bugs against glibc-2.17, the tracker for which I opened already, even though it’s empty at the time of writing.

GLIBC 2.17: what’s going to be a trouble?

So LWN reports just today on the release of GLIBC 2.17 which solves a security issue and looks like was released mostly to support the new AArch64 architecture – i.e. arm64 – but the last entry in the reported news is possibly going to be a major headache and I’d better post already about it so that we have a reference for it.

I’m referring to this:

The `clock_*' suite of functions (declared in <time.h>) is now available directly in the main C library. Previously it was necessary to link with -lrt to use these functions. This change has the effect that a single-threaded program that uses a function such as `clock_gettime' (and is not linked with -lrt) will no longer implicitly load the pthreads library at runtime and so will not suffer the overheads associated with multi-thread support in other code such as the C++ runtime library.

This is in my opinion the most important change, not only because, as it’s pointed out, C++ software would have quite an improvement not to link to the pthreads library, but also because it’s the only change listed there that I can foresee trouble with already. And why is that? Well, that’s easy. Most of the software out there will do something along these lines to see what library to link to when using clock_gettime (the -lrt option was not always a good idea because it’s not existing for most other operating systems out there, including FreeBSD and Mac OS X).

AC_SEARCH_LIB([clock_gettime], [rt])

This is good, because it’ll try either librt, or just without any library at all (“none required”) which means that it’ll work on both old GLIBC systems, new GLIBC systems, FreeBSD, and OS X — there is something else on Solaris if I’m not mistaken, which can be added up there, but I honestly forgot its name. Unfortunately, this can easily end up with more trouble when software is underlinked.

With the old GLIBC, it was possible to link software with just librt and have them use the threading functions. Once librt will be dropped automatically by the configuration, threading libraries will no longer be brought in by it, and it might break quite a few packages. Of course, most of these would already have been failing with gold but as you remembered, I wasn’t able to get to the whole tree with it, and I haven’t set up a tinderbox for it again yet (I should, but it’s trouble enough with two!).

What about --as-needed in this picture? A full hard-on implementation would fail on the underlinking, where pthreads should have been linked explicitly, but would also make sure to not link librt when it’s not needed, which would make it possible to improve the performance of the code (by skipping over pthreads) even when the configure scripts are not written properly (like for instance if they are using AC_CHECK_LIB instead of AC_SEARCH_LIB). But since it’s not the linkage of librt that causes the performance issue, but rather the one for pthreads, it actually works out quite well, even if some packages might keep an extra linkage to librt which is not used.

There is a final note that I need o write about and honestly worries me quite a bit more than all those above. The librt library has not been dropped — only the clock functions have been moved over to the main C library, but the library keeps asynchronous and list-based I/O operation interfaces (AIO and LIO), the POSIX message queues interfaces, the shared memory interfaces, and the timer interfaces. This means that if you’re relying on a clock_gettime test to bring in librt, you’ll end up with a failing package. Luckily for me, I’ve avoided that situation already on feng (which uses the message queues interface) but as I said I foresee trouble at least for some packages.

Well, I guess I’ll just have to wait for the ebuild for 2.17 to be in the tree, and run a new tinderbox from scratch… we’ll see what gets us there!

Gentoo Linux Health Report — October 2012 Edition

I guess it’s time for a new post on what’s the status with Gentoo Linux right now. First of all, the tinderbox is munching as I write. Things are going mostly smooth but there are still hiccups due to some developers not accepting its bug reports because of the way logs are linked (as in, not attached).

Like last time that I wrote about it, four months ago, this is targeting GCC 4.7, GLIBC 2.16 (which is coming out of masking next week!) and GnuTLS 3. Unfortunately, there are a few (biggish) problems with this situation, mostly related to the Boost problem I noted back in July.

What happens is this:

  • you can’t use any version of boost older than 1.48 with GCC 4.7 or later;
  • you can’t use any version of boost older than 1.50 with GLIBC 2.16;
  • many packages don’t build properly with boost 1.50 and later;
  • a handful of packages require boost 1.46;
  • boost 1.50-r2 and later (in Gentoo) no longer support eselect boost making most of the packages using boost not build at all.

This kind of screwup is a major setback, especially since Mike (understandably) won’t wait any more to unmask GLIBC 2.16 (he waited a month, the Boost maintainers had all the time to fix their act, which they didn’t — it’s now time somebody with common sense takes over). So the plan right now is for me and Tomáš to pick up the can of worms, and un-slot Boost, quite soon. This is going to solve enough problems that we’ll all be very happy about it, as most of the automated checks for Boost will then work out of the box. It’s also going to reduce the disk space being used by your install, although it might require you to rebuild some C++ packages, I’m sorry about that.

For what concerns GnuTLS, version 3.1.3 is going to hit unstable users at the same time as glibc-2.16, and hopefully the same will be true for stable when that happens. Unfortunately there are still a number of packages not fixed to work with gnutls, so if you see a package you use (with GnuTLS) in the tracker it’s time to jump on fixing it!

Speaking of GnuTLS, we’ve also had a smallish screwup this morning when libtasn1 version 3 also hit the tree unmasked — it wasn’t supposed to happen, and it’s now masked, as only GnuTLS 3 builds fine with it. Since upstream really doesn’t care about GnuTLS 2 at this point, I’m not interested in trying to get that to work nicely, and since I don’t see any urgency in pushing libtasn1 v3 as is, I’ll keep it masked until GNOME 3.6 (as gnome-keyring also does not build with that version, yet).

Markos has correctly noted that the QA team – i.e., me – is not maintaining the DevManual anymore. We made it now a separate project, under QA (but I’d rather say it’s shared under QA and Recruiters at the same time), and the GIT Repository is now writable by any developer. Of course if you play around that without knowing what you’re doing, on master, you’ll be terminated.

There’s also the need to convert the DevManual to something that makes sense. Right now it’s a bunch of files all called text.xml which makes editing a nightmare. I did start working on that two years ago but it’s tedious work and I don’t want to do it on my free time. I’d rather not have to do it while being paid for it really. If somebody feels like they can handle the conversion, I’d actually consider paying somebody to do that job. How much? I’d say around $50. Desirable format is something that doesn’t make a person feel like taking their eyes out when trying to edit it with Emacs (and vim, if you feel generous): my branch used DocBook 5, which I rather fancy, as I’ve used it for Autotools Mythbuster but RST or Sphinx would probably be okay as well, as long as no formatting is lost along the way. Update: Ben points out he already volunteered to convert it to RST, I’ll wait for that before saying anything more.

Also, we’re looking for a new maintainer for ICU (and I’m pressing Davide to take the spot) as things like the bump to 50 should have been handled more carefully. Especially now that it appears that it’s basically breaking a quarter of its dependencies when using GCC 4.7 — both the API and ABI of the library change entirely depending on whether you’re using GCC 4.6 or 4.7, as it’ll leverage C++11 support in the latter. I’m afraid this is just going to be the first of a series of libraries making this kind of changes and we’re all going to suffer through it.

I guess this is all for now.

Boosting my morale? I wish!

Let’s take a deep breath. You probably remember I’m running a tinderbox which is testing some common base system packages before they are unmasked (and thus unleashed on users); in particular I use it for testing new releases of GCC (4.7) and GLIBC (2.16).

It didn’t take me much after starting GLIBC 2.16 testing to find out that the previously-latest version of Boost (1.49) was not going to work with it. The problem is that there is a new definition that both of them tried to provide, TIME_UTC (probably relates to C++11/C11). Now unfortunately since it’s an API breakage to replace that definition, it means that it can’t be applied to the older versions, and it means that packages need to be fixed. Furthermore, the new 1.50 version has also broken the temporary compatibility introduced in 1.48 for their filesystem module’s API. This boils down to a world of pain for maintainers of packages using Boost (which includes yours truly, luckily none is directly maintained by me, just proxy).

So I had to add one extra package to the list, and ran the reverse dependencies — the positive side is that it didn’t take long to fill the bug although there are still a few issues with older boost versions not being supported yet. This brought up a few issues though…

First problem is the way Boost build itself, and its tests, is obnoxious: it’s totally serial, no parallelisation at all! The result is that to run the whole testsuite it takes over eight hours on Excelsior! The big issue is that for the testing, it takes some 10-20 times longer to build the test than run it (lovely language, C++), so a parallel build of the tests, even if the tests were executed in series, would mean a huge impact, and would also likely mean that the tests would become meaningful. As they sit, the (so-called) maintainer of the package has admitted to not run them when he revbumps, but only on full new versions.

The second problem is how Boost versions are discovered. The main issue is that Boost, instead of using proper sonames to keep binary compatibility, embeds its major/minor version pair in the library name — although most distributions symlinks the preferred version to the unversioned name (in Gentoo this is handled through the eselect boost tool). This is not extremely far from what most distributions do with Berkeley DB — but it causes problem when you have to find which one you should link to, especially when you consider that sometimes the unversioned name is not there at all.

So both CMake and Autotools (actually Autoconf Archive) provide macros to try a few different libraries. The former does it almost properly, starting from the highest version and then going in descending order — but uses a pre-defined list of versions to try! Which mean that most packages with CMake will try 1.49 first, as they don’t know that 1.50 is out yet! If no known version is found, then it will fallback to the unversioned library, which makes it work quite differently whether you have only one or more than one version installed!

For what concerns the macros from Autoconf Archive, they are quite nasty; from one side they really aren’t portable at all, as they use GNU sed syntax, they use uname (which makes them totally useless during cross-compilation), but most worrisome of all, is that they use ls to find which boost libraries are available and then take the first one that is usable. This means that if you have 1.50, 1.49 and 1.44 installed, it’ll use the oldest! Similarly to CMake, it uses the unversioned library last. In this case, though, I was able to improve the macros by reversing the check order, which makes them work correctly for most distributions out there.

What is even funnier about the AX macros (that were created for libtorrent, and are used by Gource, which I proxy-maintain for Enrico), is that due to the way they are implemented, it is feasible that they end up using the old libraries and the new headers (it was the case for me here with 1.491.50, as it didn’t fail to compile, just to link). As long as the interface used have different names and the linker will error out, all is fine. But if you have interfaces that are source-compatible, linker-compatible, but with different vtables, you have a crash waiting to happen.

Oh well…

GNU software (or, gnulib considered harmful and other stories)

The tinderbox right now seems to be having fun trying to play catch-up with changes in GNU software: GCC, Automake, glibc and of course GnuTLS. Okay it’s true that compatibility problems are not a prerogative of GNU, but there are some interesting problems with this whole game, especially for what concerns inter-dependencies.

So, there is a new C library in town, which, if we ignore the whole x32 dilemma, has support for ISO C11 as its major new feature. And what is the major change in this release? The gets() function has finally been removed. This is good, as it was a very nasty thing to use, and nobody in his sane mind would use it…

tbamd64 ~ # scanelf -qs -gets -R /bin /opt /sbin /lib /usr
gets  /opt/flexlm/bin/lmutil

Okay nevermind those who actually use it, the rest of the software shouldn’t be involved, should it? You wish. What happens is that since gnulib no longer only carries replacements for GNU extensions but also includes code that is not present on glibc itself, and extra warnings about use of deprecated features, it now comes with its own re-declaration of gets() to feature a prominent warning if it’s used. And of course, that makes it fail badly with the new gnulib.

Obviously, this has been fixed in gnulib already, since it was planned for gets() to be removed, but it takes quite a bit of time for a fix in gnulib to trickle down to the packages using it, which is one of my previous main complaints about it. Which means that Gentoo will have to patch the same code over and over again in almost all GNU software, since almost all of it uses gnulib.

Luckily for me only two packages have hit (with this problem, at least) that I’m in the herd of: GnuTLS and libtasn1 which is a dep of it. The former is fixed in version 3 which is also masked but I’m testing as well, while the latter is fixed in the current ~arch (I really don’t care about 2.16 in stable yet!), so there is nothing to patch there. The fact that GCC 4.6 itself fails to build with this version of glibc is obviously a different problem altogether, and so is the fact that we need Boost 1.50 for a number of packages to work with the new glibc/gcc combination, as 1.49 is broken with the new C library and 1.48 is broken with the new compiler.

Now to move on to something different: Automake 1.12 was released a couple of months ago and is now in ~arch, causing trouble although not as bad as it could have been. Interestingly enough, one of the things that they did change in this version was removing $(mkdir_p) as I wrote in my guide — but that seems to have been a mistake.

What should have been removed in 1.12 was the default use of AM_PROG_MKDIR_P, while the mkdir_p define should have been kept around until version 1.13. Stefano said he’s going to revert that change in automake 1.12.2, but I guess it’s better if we deal with it right away instead of waiting for 1.13 to hit us….

Of course there is a different problem with the new automake as well: GNU gettext hasn’t been updated to support the new automake versions, so using it causes deprecation warnings with 1.12 (and will fail with 1.13 if it’s not updated). And of course a number of projects are now using -Werror on automake, like it wasn’t enough trouble to use it on the source code itself.

And of course the problem with Gentoo, according to somebody, is the fact that my tinderbox bugs (those dozen a day) are filed with linked build logs instead of attached ones. Not the fact that the same somebody commits new versions of critical packages without actually doing anything to test them.

There’s ABI and ABI

With all this talk about x32 there are people who might not know what we’re referring to when we talk about ABI. Indeed this term, much alike its sibling API, is so overloaded with multiple meanings that the only way to know what one is referring to is understanding in which of the many contexts it’s being used.

It’s not that I haven’t talked about ABI before but I think it’s the first time I talk about it in this context.

Let’s start from the meaning of the two acronyms:

  • API stands for Application Programming Interface;
  • ABI stands for Application Binary Interface.

The whole idea is that the API is what the humans are concerned with, and ABI what the computers are concerned with. But I have to repeat that what these two mean depends vastly on the context you refer to them.

For instance what I usually talk about is the ABI of a shared object which is a very limited subset of what we talk about in the context of x32. In that context, the term ABI refers to the “compiled API”, which often is mistaken for the object’s symbol table although it includes more details such as the ordered content of the transparent structures, the order and size of the parameters in functions’ signatures and the meaning of said parameters and the return value (that’s why recently we had trouble due to libnetlink changing the return values, which caused NetworkManager to fail).

When we call x32 and amd64 ABIs of the x86-64 architecture, instead, we refer to the interface between a few more components… while I don’t know of a sure, all-covering phrase, the interfaces involved in this kind of ABI are those between kernel and userspace (the syscalls), the actual ELF variant used (in this case it’s a 32-bit class, x86-64 arch ELF file), the size of primitive types as declared by the compiler (long, void*, int, …), the size of typedefs from the standard library, the ordered content of the standard transparent structures, and probably most importantly the calling convention for functions. Okay there are a few more things in the mix such as the symbol resolution and details like those, but the main points are here, I think.

Now in all the things I noted above there is one thing that can be extracted without having to change the whole architecture ABI — it’s the C library ABI: the symbol table, typedefs, ordered content of transparent structures, and so on. That is the limited concept of shared object ABI applied to the C library object itself. This kind of change still require a lot of work, among others because of the way glibc works, which will likely require replacing a number of libraries, modules, the loader and more and more.

Why do I single out this kind of change? Well, while this would have also caused trouble with binaries the same way the introduction of a new architecture did, there is an interesting “What if” scenario: What if Ryan’s FatELF and Intel’s x32 ABI happened nine years ago, and people would have been keen on breaking the C library ABI for the good old x86 at the time?

In such a condition, with the two ABIs being both ILP32 style (which means that int, long and void* are 32-bit), if the rest of the C library ABI was the same between the two, a modified version of Ryan’s FatELF approach – one where the data sections are shared – could have been quite successful!

But let it be clear, this is not going to happen as it is now. Changing the C library ABI for x86 at this point is a script worth of Monty Python and the new x32 ABI corrects some of the obvious problems present in x86 itself — namely the use of 32-bit off_t (which restricts the size of files) and time_t (which could cause the Y2K38 bug), with the two of them having widely incompatible data structures.

Again about glibc 2.14, RPC and modern software

It looks like my previous post on glibc 2.14 made it to reddit – even though it made not much of an impression to flattr – and there is at least one interesting question asked there, about what software is using RPC that I wasn’t expecting.

While it is definitely true that I underestimated the amount of systems still using the old-style NIS, standing to the commenters on my other post about PAM, there is a long list of packages that make use of glibc’s RPC subsystem that I didn’t expect. All of this definitely doesn’t make for an interface that is dying without replacement, as one commenter expressed:

Except that no one uses Sun RPC for that. It’s only application in modern unixes is NFS, so it does not really belong to libc. nfs-utils and libtirpc should handle that. Same goes for NIS and other remnants from the dark ages. Removing unused bloat from the fundamental system library is actually a good thing.

And for the record, RPC will not be removed from “the fundamental system library”: code for the RPC implementation is still all there, it’s just hidden and disallowed from being linked to, which means that the packages that use the interface will not build, but those that were built before (or the binary packages that come prebuilt) will not fail to run on the new library. No “bloat” removed.

Okay, so what are those packages? Well, for once let’s see at something I have worked on myself for a while and that is actively developed to this very moment: libvirt. that, while obviously designed to work well with libtirpc, can’t be installed on glibc 2.14 (as libtirpc is not fixed yet). And its RPC usage has nothing to do with NFS either. On the other hand, it seems like watchdog, lsof, quota, autofs and possibly tcpdump do need it for NFS support.

I don’t know much about them, but the list of packages requiring RPC includes oc, torque, libcult, libassa, hamlib, lives, xinetd, db (yes Berkeley DB), libdap, tcb, netkit-rusers, netkit-bootparamd, ogdi, charm, netkit-rwall, gs-assembler, ctdb, perdition, amanda, scilab and R….

I haven’t started fixing any of these myself, I have way too much things on my plate already and this is not an high enough priority for me to tackle in my free time, but at least I can report and keep tabs on them. It’s enough for now I guess.

About GLIBC 2.14, EGLIBC, and Gentoo

I was originally planning to write about one of my current job tasks tonight, since that was honestly interesting for the Free Software part as well, but since I’ve received a number of comments in those regards, and even a couple of direct email messages, I think it might be a better use of my time to reply on this situation instead.

I have blogged repeatedly about the trouble caused by the new version of GLIBC (2.14) and its developers’ choice to stop allowing access to the RPC implementation that it comes with, in favour of the new, also-broken-by-the-same-update libtirpc library.

Turns out that this situation is becoming so absurd, that at least Archlinux decided to revert the removal of the RPC interface. And the same decision seems to be taken by the EGLIBC developers (which as far as I can tell, means that Debian and Ubuntu will keep the RPC interface as well). The obvious question people ask me then is “Why isn’t Gentoo doing the same?”

I’m afraid I don’t have a real answer to this: I’m not the GLIBC maintainer, that’s Mike. I’m not in his head and I honestly haven’t asked him to comment on the issue yet; the reason why I’m not pushing him for comments or actions is simple: I see no particular urge to move to the new GLIBC version. The news entries for the new release are a bit short to be of immediate interest to me, and the presence of a bug making Ruby not installable (thanks Sergei for tracking down the root cause!) makes it very low-priority to me, as in, no-priority really.

In particular, the last I knew about the EGLIBC situation, was that Mike preferred validating the applied patches by their own merit, following the upstream GLIBC developers as close as possible unless required for particular architectures and situations, which is a choice I respect deeply. The issue there seems to be that Drepper is getting more and more detached with the needs of the eco-system, and is still a sort-of dictator for what concerns the C library. I was also pointed at some suspects that he’s no longer in direct employment of RedHat, but given that I don’t really care about that I didn’t confirm or reject that; make what you want of it.

As for reverting the removal of RPC interface.. I don’t like that choice. I mean, the problem here is not that we lack a replacement for the RPC interface in GLIBC, but rather than the replacement is non-working. Rather than spend effort in working against GLIBC developers, it would be better spent to fix libtirpc so that it works with GLIBC 2.14, thus leaving us with a properly-working RPC implementation.

In particular, I think it might be a good idea now to implement the proper virtual for RPC implementations on GLIBC and other systems:

elibc_glibc? ( || ( net-libs/libtirpc <sys-libs/glibc-2.14 ) )

Using such a virtual would make it easier for me to ignore the packages that are known not working with glibc-2.14, as the dependencies wouldn’t be satisfied, and the tinderbox would then skip over the package altogether. I guess I should send an email about this so that it can be discussed and implemented.

There is another reason why I’m not so keen on restoring the interfaces that were removed from this version of the C library; while in my previous post’s comments a number of people have commented, correcting me on my first assessment that NIS was dead, it is still something that most desktops wouldn’t need, and uClibc does not implement, and finding the packages relying on said interface is still an interesting task to tackle.

In general, I’m afraid to tell you that I’m not going to “solve” the problem, by restoring the symbols, myself. If Mike decides to take that approach, the fallout is just going to be delayed, not avoided. And no, even though I probably would prefer moving away from GLIBC to EGLIBC – not just for this problem but also for things like the base versioning issue that is making gold less useful than it could be – I don’t have the time nor the motivation to step up and become the new C library maintainer in Gentoo. I barely have the time to keep on track with what I’m already supposed to do.

PAM, glibc 2.14, and NIS

After my list of issues related to glibc-2.14, the situation is somewhat more manageable: Mike fixed his own mistakes in the build of libtirpc, which now builds a working library, but on the other hand resolved to use a dirty hack to install the old NIS/YP header files, which are need by both libtirpc and other packages. Dirty or not, this at least produces some reasonable expectations of a working starting point.

I won’t even quote Mike’s suggestion for how to add support for libtirpc since that’s going to cause huge trouble with non-Linux platforms, given that for instances FreeBSD has a working RPC implementation (which, as far as I can tell, supports iPv6) in its C library itself. This means that what you should have in the ebuild, is probably something along these lines:

IUSE="elibc_glibc"

RDEPEND="elibc_glibc? ( || ( net-libs/libtirpc 

The option of creating a virtual package for this is sweet, but unfortunately, supporting libtirpc requires the buildsystem to identify it properly. it takes a bit of work on making it possible to build against that implementation in most cases, so simply adding the virtual is unlikely to be of help.

With enough support to actually fix the packages, tonight I looked into PAM, which I knew would become troublesome, since I already tried removing NIS support from it a few years ago; at the time I was unable to follow through with upstream because it was still the time I was getting in and out of hospitals.

At the time I had a patch that allowed building if NIS was unavailable, but it didn’t make it possible to disable NIS when available; this is instead implemented now in version 1.1.3-r1, with the nis USE flag, which I suggest to keep disabled (it is disabled by default). When actually enabling it, it’ll require either libtirpc (preferred) or an old glibc version.

Since NIS/YP is a virtually dead technology, it shouldn’t surprise that the default is now to keep it disabled, it’s not just a matter of requiring an extra dependency; I doubt anybody is using that stuff on any modern system; definitely not on users’ desktops or servers, which would be probably happy to drop a bunch of code that was left unused.

For those who wonder how much code are we talking about, comparing the before and after of the pam_unix module, which is the main consumer of the NIS/YP interfaces, it shows the total size decreasing of about 3KiB, one of which is purely part of the executable code, while the rest is distributed across data and rodata sections (which seem to mostly relate to string literals), and the overhead (imported symbol and string tables). Might not sound like a lot, especially on a module with about 50KiB of copy-on-write bss section, but it’s still something you gain by simply removing unused code, without even the need to optimize anything.

Are you kidding me? Or, why we’ll wait glibc 2.14 for a while

A couple of days ago I noted the move to glibc 2.14 of my tinderbox with the hope to quickly find and fix the packages that depend on the now-removed RPC interface. I didn’t expect this kind of apocalypse, but I’m almost wanting to cry, thinking about the mess this version seems to create.

First of all, it doesn’t seem like it’s just Ruby being hit by memory corruption issues, which makes it likely that the new memcpy() interface noted in the ChangeLog is to blame. I haven’t had time to debug this yet though.

A new scary situation arose as well: wget exits with a segmentation fault when trying to resolve any hostname that is not in /etc/hosts, which in the case of the tinderbox means anything that is not localhost or Yamato (as that’s where the Squid proxy is added that caches requests for the fetched Gentoo data). I’m not sure of the cause yet, as the fault happens not within the executable’s code but directly into libresolv, which would point at a bug in glibc itself.

For what concern RPC, I’m surprised that there are so many packages depending on it, and fo the widest variety: multimedia, scientific, network analysis tools, and so on. Now, I was optimist in my previous post, expecting that for most, if not all, of the packages using RPC would be fixed by relying on libtirpc. Ooh boy, how wrong I was.

See the issue is this: libtirpc itself does not build on glibc-2.14, as it relies on one of the NIS/YP headers that has also been removed. Even worse, the latest version (0.2.2) of libtirpc, which I hoped would solve the issue, does not work on any system at all, since a change by our very own Mike (vapier), which was merged upstream just before 0.2.2 release, causes the build to produce a library that lacks a couple of symbols — the source file where they are defined was not added, but even when you add it, you get a couple more symbols being missing. And this release has been out for over a month without any sign of a 0.2.3 coming (upstream repository is still broken, at the time of writing).

Are you freaking kidding me?

Oh and for those who wonder, the issue with base versioning that, as I’ve told, is holding up implementing base version support in gold, is still not fixed. This means that packages such as fuse, included, who wanted to keep binary compatibility with their original unversioned symbols are still not getting any compatibility, even with this version. In my personal opinion it would be a good time to drop the code for that in fuse, but upstream prefers waiting for the new 3.0 version, which is going to get tricky.

With all this considered, it really looks like a very badly broken release, and one that makes me wonder if it wasn’t too inconsiderate to reject the idea of moving to the eglibc patchset/fork like Debian and Ubuntu seems to have done.