Anybody hiring me for PAM?

This post might sound like a nasty plug, but I’m really doing this because it seems like the only solution up to this point.

In the past few days some trouble came up on the PAM side again. Let me try to put this into prospective: if nowadays I can actually find some use in knowing how PAM works, I joined the PAM team four years ago while working on Gentoo/FreeBSD because I needed configuration files migrated to a format that worked there as well as Linux. Since then, Azarah went missing and the whole of PAM was shoved on my back. Nowadays, I maintain the Linux-PAM package, a bunch of random PAM modules, and should oversee over the general PAM configuration in Gentoo.

Unfortunately, this requires also a lot of coordinating skils, and time to do the coordination: maintainers of other PAM modules, and maintainers of packages that use PAM themselves, should talk with me about the default configurations and the like; instead I’m usually reactive on that matter. And that is, as you might guess, not the best of the experience, nor the easiest of the tasks.

I have written before about the need of a new pambase and this is now obvious to actually implement proper support for multiple authentication methods like Kerberos, LDAP, PKCS#11, YubiKey, … I have a few ideas on how to solve this, namely changing the current situation with a few predefined, hidden chains (.gentoo-session-minimal, .gentoo-session-console .gentoo-session-graphical) wrapped on the system-* series of chains, all generated with M4 rather than the current C preprocessor (that lacks any kind of arithmetic capabilities).

But even more than fixing pambase, there is the need to review the packages that use PAM. A few days ago, Samuli complained to me that ConsoleKit was not being executed properly on login(1) — turned out the problem was that /etc/pam.d/login was not calling back into system-local-login as it was expected to. Root cause was that the modified PAM chain file was replaced with the previous ones (which didn’t use that chain) after the major bump of sys-apps/shadow when it was picked up by Debian. Dated 24 Feb 2008. Over two and a half years ago.

While I cannot get rid of the fault of missing the revert; why did I miss it? Simple enough: Portage’s confmem feature never told me that /etc/pam.d/login was changed from the one I had before. It assumed that my local version was a modified one and thus accepted that one as the good one.

Now this makes the second revision bump and second stable request that I have to take care of to fix PAM-connected trouble; the previous one, back in July (for the bump) and last month (for stable) related to the chpasswd chain that had been broken for, well, almost the same time as this one.

In fixing another ConsoleKit problem, bug #342345 I found that the GDM and KDM chains are not compatible with pambase, and they both need more fine-grained control over the sessions (console and graphical sessions have different needs, in the latter case we have to skip motd/mail/lastlog modules).

A quick check around on the tinderbox told me that there are a number of PAM chain files that should be cleaned up, reduced, optimised and so on. And that the number of files there does not correspond with the number of files installed.

Basically, what we need now is an audit of all the PAM-using packages in the tree, beside the improvement of pambase as stated above. We’re talking of a month or two worth of work. Not something I’d do myself in spare time, not something I can do as it is during work time. It’s not simply a matter of writing what I did for the original pambase; it’s a work more in line with the Ruby NG situation, and that one we haven’t completed just yet with three people working on it, including a bunch of work time thrown in by me for a few jobs I took during the year.

So here’s the catch: nobody has helped me with PAM in years, and while Constanze is being ascended to developer status, I know she’s also pretty busy with her thesis so I cannot easily ask her to commit enough time to lift me from enough work. This means that we can either keep the current status-quo of just band-aiding through enough troubles so that we can keep it running, or somebody got to help me either with work or with funding. As I said, I’m already losing money with the tinderbox and I don’t want to lose time, sleep and (possible) money on working on something as ungrateful as PAM.

Don’t get me wrong, I’m not asking for donations here; I’m asking to be paid to do a job, and that job is the auditing and review of (a part of) the PAM-using ebuilds. If you’re using Gentoo (and PAM) in production, you might be interested in hiring me to get this out of the way. I don’t even have much to pretend: €1500/month, one to three months time (depending on the deepness of the work you’d want to fund), you can provide the agreement details and give me a list of priority programs to work on (those that you use in your organisation). On the same terms, I’m willing to help you package new software that is not currently in portage in the spare time.

Let me know by mail if you’re interested. Extra points if you use GPG-encrypted email because that stuff doesn’t get sent to spam.

Linux Containers on Gentoo, Redux

I’ve got a few further requests for Linux Containers support in Gentoo, one of which came from Alex, so I’ve decided to give it a try to update the current status of the stack, and give a decent description of what the remaining problems are.

First of all, the actual software: I’m still not fond of the way upstream is dealing with the lxc package itself; the build system is still trying to be smart and happening to be stupid, related to the LXC library. There are still very little failsafes, and there isn’t really enough space to manage LXC with the default tools as they are. While libvirt should support LXC just fine, I haven’t found the time to try it again and see if it works; I’m afraid it might only work if you use the forced setup that RedHat uses for LXC… but again I cannot say much until I find time to try it out and tweak it where needed.

*A note, as I stated before a lot of the documentation and tutorials regarding libvirt only apply to RedHat or Fedora. I can’t blame them for that, they do the work, they do the walk, but often it means that we have to adapt them or at least find a way to provide them with the pieces they expect in the right place. It requires a lot of my time to do that.*

I’ve finally added my “custom” init script to the ebuild, with version 0.7.2; it might change further with or without revision bump as I fix bugs reported; it should mostly auto-configure, the only requirement it has is to symlink to lxc.container to start the container defined in /etc/lxc/container.conf; it auto-detects the root path (so it won’t force you a particular filesystem layout), and works with both 32- and 64-bit containers transparently, so long there is a /sbin/init command (which I could have to change for systemd-based distributions at some point). What I now reason it lacks is support for detecting the network interface it uses and require that started up; I can add that at some point, in the mean time use /etc/conf.d/lxc.container and then add rc_need="net.yourif".

For what concerns networking, last I checked with lxc 0.7.1 userspace, and kernel 2.6.34, the macvlan system still isolated the host from the guests, which might be what you want but it’s definitely not what I care for. I’m guessing this might actually be by design; at any rate, even though technically slower, I find myself quite comfortable with using a Linux-based bridge as main interface, and bridge together the Virtual Ethernet device of the guest with the physical interface(s) of the host. This also works fine with libvirt/KVM, so it’s not a bad decision in my opinion. I just added 0.7.2 but I can’t see how that makes a difference, as macvlan is handled in kernel.

Thankfully, doing so with Gentoo’s networking system (which Roy wanted to deprecate, tsk!) is piece of cake: open /etc/conf.d/net, rename config_eth0 with config_br0, then add config_eth0="null" and bridge_br0="eth0".. exec ln -s net.lo /etc/init.d/net.br0, and use that for bringing the network up. Then on the LXC configuration side you got

lxc.network.type = veth
lxc.network.link = br0

and you’re all set. As I said, piece of cake.

Slightly more difficult is to properly handle the TTY devices; some people prefer to make use of the Linux virtual terminals to handle LXC containers; I sincerely don’t want it to mess with my real virtual terminals, I prefer using the lxc-console command to access it without networking. Especially since it messes up a lot if you are using KMS with the Radeon driver (which is what I’ve been doing for the past year or so).

For this to work out, I noted two things though: the first is that simply using the cgroup access control lists on the devices don’t help out that much (I actually haven’t tried to set them up properly just yet); on the other hand, LXC can create “pseudo-ttys” that can be used with lxc-console; the default number (9) does not work all that well, because the Gentoo init system set up twelve virtual terminals by default. So my solution is to use my custom static device tarball and the following snippet in the configuration:

lxc.tty = 12
lxc.pts = 128

This ensures that the TTY devices are all properly set up, so that they don’t mess with your virtual terminals, and lxc-console works like a charm in this configuration.

Now, the sad part: OpenRC is not yet stable, and I haven’t fixed yet the NFS bug I found (you stop it into the container and the host’s NFS exports are destroyed.. bad script, bad script!). On the other hand, I’m not doing LXC daily any longer for the simplest reason: the tinderbox is set up as I wish already, for the most part, so I have little to no incentive to work more on this; the good news is, I’m up for hire as I said for what concerns Ruby. So if you really want to use LXC in production and want me to improve whatever area Gentoo-related to it, including libvirt, you can just contact me.

Beside that, everything should be in place. Have fun working with LXC!

Really want Ruby 1.9 generally available? Read on.

Gentoo currently does not offer Ruby 1.9 available to users directly; there are a number of reasons for that, and can be summed up in what Alex described as “not pulling a Python 3 on our users”. Right now, there are near to no packages that need Ruby 1.9, and a lot that does not even work with it. While a minority nowadays, a few won’t even work if it’s installed together with 1.8, let alone configured as primary provider for Ruby.

Me, Alex and Hans have been working for a long time to find a solution, and since last year the definite solution seems to be Ruby NG which I originally started in May 2009 after having trouble with keeping this very blog alive on the previous vserver — which nowadays only hosts the xine bugzilla .

The road has been still uphill from there, as the three pages of posts tagged with RubyNG on this blog can document; trouble with the ideas and implementations, compatibility problems, a huge web of dependencies between packages, various fixes, all of it makes the road to Ruby 1.9 quite difficult for us packagers. At the same time, we’ve been doing our best to ensure that what the users are given with proper software, of good quality. Maybe it’s because I’m deeply involved with QA, maybe it is because I’m not writing production software daily, but I still think that we shouldn’t be providing with half-assed software easily, just for the sake of it.

That means that most of the time we either don’t add support for Ruby 1.9, or we go deeply into fixing the underlying issues to make sure that the software will work upstream, and not just in Gentoo (as otherwise there could be nasty surprises, like some I got, where an application works perfectly fine locally, where software is installed through Portage, and fails on Heroku that uses plain Rubygems). You can tell how that can be a PITA by looking at my github page — it lists mostly Ruby packages that I had to “fork” (branch, actually) to get the fixes in; mostly they have been merged upstream, sometimes they are dead in the water though.

All of this makes the situation quite complex; while I sort-of enjoy working with Ruby and these things, I also noted that it takes a very long time to get all the dependency web tested and fixed… and it’s the sort of time that, in my personal free time, I just don’t have. I have been packaging (and thus testing and fixing) a few packages that I triaged for a few job tasks, and some that I’m still using, using the paid work time, but that can’t cut it to work for every package out there. I guess the same thing goes on for Alex, Hans and Gordon.

What’s the bottom-line? Well, Hans in particular has been doing a huge work to port the ebuilds from the old gems.eclass to ruby-fakegem.eclass so that they can be installed when Ruby 1.9 is present without messing that up, even though they wouldn’t work with it. This makes the day that we can get it unmasked much nearer. But there are quite a few cases where we can’t just drop the old version so easily, and it mostly relates to non-gem bindings and the usage of Ruby as a scripting engine (rather than adding support for a library to Ruby itself). And this is without counting further issues like bundler not working altogether too well because it lacks dependency information, or getting Rubygems to refuse messing with the Portage-installed gems altogether (that is now much more feasible than before, since we no longer use the gem command from within Portage to install the stuff).

So what can you do to get this sooner? You can help out by making sure packages work with Ruby 1.9; when they have been positively tested not to work on that version, they are usually marked as so in the ebuild itself; for my part, I always note the problems with an Unicode right-pointed arrow, so running a fgrep command on the tree for “ruby19 →” should give you a very good idea of how many problems (and how many different problems there are out there).

You have no idea where to start with this thing? There is another option: hire me. Well, I would have liked to say “hire us”, but it turns out at least both Alex and Hans are not looking to be hired for this, while a project of mine is delivered this week and then I have some extra time for the next few months. I wouldn’t mind being paid to work full-time on getting Ruby 1.9-ready packages in the tree. I’m a registered freelancer in Italy so I have an European VAT ID and I can make proper invoices, so it’s going to be all clear in the books. If you’re interested you can contact me to discuss pricing and amount of work you’re looking for.

Just please, stop harassing the team because we’re not as fast as you’d like us to be… we’re already doing a hell of a job in a hell of a hurry!

Adding to the tree for once

You probably are used to me sending last rites for packages on behalf of QA, and thus removing packages (or in the past just removing packages, without the QA part). It often seems like my final contribution to Gentoo will be negative, in the sense that I remove more than I add.

Right now, though, I’ve been adding quite a few packages to the tree, so I’d like to say that no, I’m not one of those people who just like to remove stuff and who would like you to have a minimal system!

So with some self-advertisement (and a shameless plug while I’m at it…) I’d like to point out some of the things I’ve been working on.

The first package I’d like to separate is the newly-added app-emacs/nxml-libvirt-schemas: as I ranted on I wanted to be able to have syntax completion for the libvirt (a)XML configuration files (I still maintain they should have had either a namespace, a doctype or a libvirt-explicit document element name), so now that me and Daniel got the schemas to a point where they can be used to validate the current configuration files, I’ve added the package. It uses the source tarball of libvirt, with the intent of not depending on it, I’m wondering if it’d be better to use the system-installed Relax-NG files to create the specific Relax-NG Compact files, but that’ll have to wait for 0.7.5 anyway (which means, hopefully next week).

The second set of packages is obviously tied in with the Ruby-NG work and consists of a few new Ruby packages; some are brought in from the testing overlay I’ve built to try out the new packages, others have been brought in as dependency of packages being ported to the new eclasses, one instead (addressable) is a dependency of Typo that I didn’t install through Portage lately. I should probably add that I’m testing the new ebuilds “live” on this blog, so if you find problems with it, I’d be happy to receive a line about that.

The third set of packages instead relates to a work I’m currently doing for a long-time customer of mine, a company developing embedded systems; I won’t disclose much about the project, but I’m currently helping them build a new firmware, and I’m doing most of my job through the use of Portage and Gentoo’s own ebuilds. For this reason I have already added an ebuild for gdbserver (the small program that allows for remote debugging) that makes it trivial to cross-compile it for a different architecture, and I’m currently working on a gcc-runtime ebuild (which would also be pretty useful, if I get it right, for remote servers, like my own, to void having to install the full-blown GCC, but still have the libraries needed).

And tied to that same work you’ll probably find a few changes for cross-compilation both in and out of Gentoo, and some other similar changes; I have some GCC patches that I have to send upstream, and some changes for the toolchain eclass (right now you cannot really merge a GCJ cross-compiler, or even build one for non-EABI ARM).

So this is what I’m currently adding to the tree myself, I’m also trying to help the newly-cleaned up virtualization team to handle libvirt (and its backports) as well as the GUI programs, and I should be helping Pavel getting gearmand into shape if I had more time (I know, I know). And this is to the obvious side of the tinderbox work which is still going on and on (and identi.ca proves it since the script is denting away the status continuously), or the maintainer work for things like PAM (which I bumped recently and I need to double-check for uclibc).

So now, can you see why I might forget about things from time to time?

Gentoo developer up for hire

Although I have a job interview this afternoon, for a Java programmer position nonetheless, I’m currently just working as an external contractor, thus I don’t have any kind of job security, as you can guess from my ranting of not having been able to pay for Yamato just yet.

But this is not a request for help, it’s rather an offer, if anybody is looking to hire me up for some job related to Free Software, may it be related to Gentoo, embedded software, ELF files and stuff like that. I’ve written up something here.

You can guess that if I’m hired for Gentoo-related jobs, or to extend Free Software projects, which are things I’m experienced with, and good at, it’s going to be a contribution to the community too.

Now, this weekend I’m probably going to write a bit more about Gentoo automation and other topics, so please bear with me with this little “advertisement break” while I finish my other writings.