This Time Self-Hosted
dark mode light mode Search

PAM surprises

Preparing the testground called tinderbox for GCC 4.5 called for a whole cleanup of the tinderbox, a mass-unmerge of all the packages installed that are not part of the world (because they are not essential to the tinderbox process, or its management, nor are dependencies of those).

Easier said than done, the unmerge of twelve thousands (12000) packages require the good part of a day. And this is keeping 313 packages installed as world/system trees (it doesn’t help that GCC depends on GTK+ and with that the tree gets easily bloated). What I wasn’t expecting, was the fun result after the unmerge completed.

It became apparent midway through the removal that Portage post-remove hooks are not designed to work well with mass-unmerge: each TeXLive package that got unmerged cause the whole fonts cache to be regenerated, and that’s slow; each Python package looked for pyc/pyo files, and each package using mime-types rebuilt the cache (with included pages-long warnings about the chemistry mime types being non-standard).

The first problem appeared in the final part of the unmerge: find didn’t work (it is used by the Python post-removal). Why’s that? Because it automagically links against the SElinux library. The selinux USE flag there only controls the dependency variables, but it makes no effort to disable linking against selinux in the first place. For this reason, all the SElinux-related packages are now masked in our default profiles.

Much nastier was the second problem I hit; since I had rebuilt my host system as well as cleaning up the tinderbox ground, I restart GNOME to make sure that the old libraries were flushed out of memory and replaced with the newly built copies… so I logged off the tinderbox SSH session, then came back, but the tinderbox refused to let me in! Even worse, lxc-console didn’t let me login on the system at all. This is most definitely, bad.

Luckily (or not so, from one side), the ball fell squarely within my area of expertise, and management in Gentoo: PAM. Indeed resetting the PAM chain to permit any login for a limited time I was able to log myself back into the container and find what the problem was. Obviously it was another automatic dependency.

You might or might not know that glibc refuses (last I knew) to support the Blowfish cipher for the crypt() function, while a few distributions have been providing patched implementations to support encoding the system password with this algorithm. Gentoo has not been following here as we try to keep closer to upstream.

But we’re definitely not sleeping about this; it was in Summer 2008 – while going in and out of hospitals – that I have added support for SHA512 hashing to our PAM setup. This is definitely better than the default crypt() algorithm or the previously-used MD5 hashing. And in a similar spirit, I have recently implemented extended hashes on pam-pgsql while doing a work task with that package.

Anyway, given that upstream won’t add blowfish support, but distribution wanted it for different reasons, how do you solve the problem cleanly? Well, the solution has been to invent a new library, called libxcrypt; I sincerely have no idea who’s the actual maintainer of the library, but it is available in Gentoo fetching the source archive from Debian’s mirrors. This library provides an alternative crypt() function that indeed supports the Blowfish cipher.

Now, given that Linux-PAM seems to be mostly maintained by SUSE, and that they have been interested in using stronger algorithms for a long time, it is a logical conclusion to expect that Linux-PAM is ready to work with this library, and that’s the case. The problem is that you have no way to tell it to stop using it, so if it’s found it’ll be used. And if you then unmerge the package, without having preserve-libs enabled, you lose all the chances to properly log in on your system, bad!

Luckily, it wasn’t too difficult to patch in Linux-PAM to allow an easy opt-out, even though I didn’t make it a configure switch but simply tied it to a cache variable: tell the configure that you have no xcrypt.h header and it won’t search for the library (originally it would, causing more fuss than needed, and badly failing at the end of the day). Right now, the two ebuilds of Linux-PAM in tree both disable xcrypt and thus save you from locking yourself out.

What this shows is one organisational problem within Gentoo: different groups (and individuals) aiming at similar targets are not communicating well enough to understand what they are doing one with the other, and this can have nasty effects on all the users out there. Luckily, the xcrypt thing is relatively rare, and preserve-libs makes it mostly (but not entirely) harmless — what happens if you have your password hashed with blowfish but you remove blowfish hashing capabilities, needed for comparing it when you log in? Again, we can feel safer knowing that our default configuration will never use blowfish, and the user mocking with that configuration is left alone, if something as silly as rebuilding PAM without blowfish is done.

Just so you know, my plans for this is to add an xcrypt USE flag for Linux-PAM (sys-libs/pam) and then a blowfish one for the PAM configuration files (sys-auth/pambase) that depends on that. Yes the two are different, because one provides an interface (xcrypt) and the other a feature (blowfish-hashing of the password).

Now, back to work!

Comments 1
  1. Hey Diego,your issues with massive unmerges, brought me to think about what could be done to improve the merge/unmerge efficiency.Method 1: Spawn some tasks away from the main portage process and, when using parallel-build, or add it as another portage-feature. Portage would wait for the child processes to die and inform user that the package has been merged.Method 2: Add a persistent message queue to portage and disallow duplicate entries (some file based gdbm or even a textfile). All tasks that the merge process requires to be done would go into this queue and would be executed at the end of the merge. If the merge process is interrupted (ctrl+c, power outage, …), then a) the next emerge would require to first perform the queue before new packages are processed b) during boot, an init script could check if the queue doesn’t contain tasks and perform them to ensure a properly working system.Question: Many ebuilds tell users what to do after a merge (i.e. after a xorg-server update, installed drivers must be rebuild, or do “hash -r” on all terminals after coreutils update)… Why aren’t (some of) these things done automatically? I guess its probably because we dont want portage to screw with the system without the user knowing. But most users will want to ‘just update their system’. So, do portage EAPIs features such as performing tasks only when version of a package changes or when user tells portage to do so? I.e. calling “emerge -uDNva –do-stuff-for-me world” would perform pkg_dostuffforuser() that would be newly defined in ebuilds.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.