The unsolved problem of the init scripts

One of probably the biggest problems with maintaining software in Gentoo where a daemon is involved, is dealing with init scripts. And it’s not really that much of a problem with just Gentoo, as almost every distribution or operating system has its own to handle init scripts. I guess this is one of the nice ideas behind systemd: having a single standard for daemons to start, stop and reload is definitely a positive goal.

Even if I’m not sure myself whether I want the whole init system to be collapsed into a single one for every single operating system out there, there at least is a chance that upstream developers will provide a standard command-line for daemons so that init scripts no longer have to write a hundred lines of pre-start setup code commands. Unfortunately I don’t have much faith that this is going to change any time soon.

Anyway, leaving the daemons themselves alone, as that’s a topic for a post of its own and I don care about writing it now. What remains is the init script itself. Now, while it seems quite a few people didn’t know about this before, OpenRC has been supporting since almost ever a more declarative approach to init scripts by setting just a few variables, such as command, pidfile and similar, so that the script works, as long as the daemon follows the most generic approach. A whole documentation for this kind of scripts is present in the runscript man page and I won’t bore you with the details of it here.

Beside the declaration of what to start, there are a few more issues that are now mostly handled to different degrees depending on the init script, rather than in a more comprehensive and seamless fashion. Unfortunately, I’m afraid that this is likely going to stay the same way for a long time, as I’m sure that some of my fellow developers won’t care to implement the trickiest parts that can implemented, but at least i can try to give a few ideas of what I found out while spending time on said init scripts.

So the number one issue is of course the need to create the directories the daemon will use beforehand, if they are to be stored on temporary filesystems. What happens is that one of the first changes that came with the whole systemd movements was to create /run and use that to store pidfiles, locks and other runtime stateless files, mounting it as tmpfs at runtime. This was something I was very interested in to begin with because I was doing something similar before, on the router with a CF card (through an EIDE adapter) as harddisk, to avoid writing to it at runtime. Unfortunately, more than an year later, we still have lots of ebuilds out there that expects /var/run paths to be maintained from the merge to the start of the daemon. At least now there’s enough consensus about it that I can easily open bugs for them instead of just ignore them.

For daemons that need /var/run it’s relatively easy to deal with the missing path; while a few scripts do use mkdir, chown and chmod to handle the creation of the missing directories , there is a real neat helper to take care of it, checkpath — which is also documented in the aforementioned man page for runscript. But there has been many other places where the two directories are used, which are not initiated by an init script at all. One of these happens to be my dear Munin’s cron script used by the Master — what to do then?

This has actually been among the biggest issues regarding the transition. It was the original reason why screen was changed to save its sockets in the users’ home instead of the previous /var/run/screen path — with relatively bad results all over, including me deciding to just move to tmux. In Munin, I decided to solve the issue by installing a script in /etc/local.d so that on start the /var/run/munin directory would be created … but this is far from a decent standard way to handle things. Luckily, there actually is a way to solve this that has been standardised, to some extents — it’s called tmpfiles.d and was also introduced by systemd. While OpenRC implements the same basics, because of the differences in the two init systems, not all of the features are implemented, in particular the automatic cleanup of the files on a running system —- on the other hand, that feature is not fundamental for the needs of either Munin or screen.

There is an issue with the way these files should be installed, though. For most packages, the correct path to install to would be /usr/lib/tmpfiles.d, but the problem with this is that on a multilib system you’d end up easily with having both /usr/lib and /usr/lib64 as directories, causing Portage’s symlink protection to kick in. I’d like to have a good solution to this, but honestly, right now I don’t.

So we have the tools at our disposal, what remains to be done then? Well, there’s still one issue: which path should we use? Should we keep /var/run to be compatible, or should we just decide that /run is a good idea and run with it? My guts say the latter at this point, but it means that we have to migrate quite a few things over time. I actually started now on porting my packages to use /run directly, starting from pcsc-lite (since I had to bump it to 1.8.8 yesterday anyway) — Munin will come with support for tmpfiles.d in 2.0.11 (unfortunately, it’s unlikely I’ll be able to add support for it upstream in that release, but in Gentoo it’ll be). Some more of my daemons will be updated as I bump them, as I already spent quite a lot of time on those init scripts to hone them down on some more issues that I’ll delineate in a moment.

For some, but not all!, of the daemons it’s actually possible to decide the pidfile location on the command line — for those, the solution to handle the move to the new path is dead easy, as you just make sure to pass something equivalent to -p ${pidfile} in the script, and then change the pidfile variable, and done. Unfortunately that’s not always an option, as the pidfile can be either hardcoded into the compiled program, or read from a configuration file (the latter is the case for Munin). In the first case, no big deal: you change the configuration of the package, or worse case you patch the software, and make it use the new path, update the init script and you’re done… in the latter case though, we have trouble at hand.

If the location of the pidfile is to be found in a configuration file, even if you change the configuration file that gets installed, you can’t count on the user actually updating the configuration file, which means your init script might get out of sync with the configuration file easily. Of course there’s a way to work around this, and that is to actually get the pidfile path from the configuration file itself, which is what I do in the munin-node script. To do so, you need to see what the syntax of the configuration file is. In the case of Munin, the file is just a set of key-value pairs separated by whitespace, which means a simple awk call can give you the data you need. In some other cases, the configuration file syntax is so messed up, that getting the data out of it is impossible without writing a full-blown parser (which is not worth it). In that case you have to rely on the user to actually tell you where the pidfile is stored, and that’s quite unreliable, but okay.

There is of course one thing now that needs to be said: what happens when the pidfile changes in the configuration between one start and the stop? If you’re reading the pidfile out of a configuration file it is possible that the user, or the ebuild, changed it in between causing quite big headaches trying to restart the service. Unfortunately my users experienced this when I changed Munin’s default from /var/run/munin/munin-node.pid to /var/run/munin-node.pid — the change was possible because the node itself runs as root, and then drops privileges when running the plugins, so there is no reason to wait for the subdirectory, and since most nodes will not have the master running, /var/run/munin wouldn’t be useful there at all. As I said, though, it would cause the started node to use a pidfile path, and the init script another, failing to stop the service before starting it new.

Luckily, William corrected it, although it’s still not out — the next OpenRC release will save some of the variables used at start time, allowing for this kind of problems to be nipped in the bud without having to add tons of workarounds in the init scripts. It will require some changes in the functions for graceful reloading, but that’s in retrospective a minor detail.

There are a few more niceties that you could do with init scripts in Gentoo to make them more fool proof and more reliable, but I suppose this would cover the main points that we’re hitting nowadays. I suppose for me it’s just going to be time to list and review all the init scripts I maintain, which are quite a few.

The boot process

One of the things that is obvious, between the mailing lists and the comments to my previous post is that there are quite different expectations of what the boot process involves — which is to be expected, since in Gentoo the boot process, like many other things, is totally customized on a per-user basis.

As Greg and William said before, the whole point of supporting (or not) a split /usr approach is not something that is tied that much to udev itself, but more a matter of what is involved in the boot process at all. Reimar pointed out that in the comments to the other post, and I guess that’s the one thing that right now we have to consider a bit more thoroughly. So let’s see if I can analyse it a bit more closely.

Let me put a foreword here. The one problem that is the biggest regarding udev and split /usr is that, while it’s still possible to select whether to search for rules in the rootfs or /usr, it didn’t, and maybe doesn’t, search both paths at the same time. That is probably the only thing that I count as a total non-sense, and it’s breaking for break sake. And it realistically is one of the things that made many Gentoo users upset with Lennart and Kay: the migration of rules is easy for binary distributions – you just rebuild all the packages installing in the old path – but it’s a pain in the neck for Gentoo users; and the cost of searching both paths is unlikely to be noticeable.

So what do we consider as part of the boot? Well, as I said in the other post, if you expect to be able to log in without /usr, you’re probably out of luck, if you use PAM — while the modules are still available on the rootfs, many of them require libraries in /usr — ConsoleKit, Kerberos, PKCS#11, … This is also one of the reasons why I’m skeptical about just teaching Portage to move dependencies to the rootfs: it’ll probably move a good deal of libraries to the rootfs, especially for a desktop, which will in turn make the “lightweight rootfs” option moot.

Another reason why I don’t think that the automatic move is going to solve the problem, is that while it’s possible to teach Portage to move the libraries, it’s impossible to teach it to move plugins, or the datafiles that those libraries use. More about that in the next paragraphs.

So let’s drop the login issue: we don’t expect to be able to log in the system without /usr so it’s not an option. The next thing that is going to be a problem is coldplugging (I’ll consider hotplugging during boot as hotplugging but it might actually be more complex). The idea of coldplugging is that you want to start a given piece of software if, at boot, you find a given device connected. As an example you might want to start pcscd if a smartcard reader (be it a CCID one or another driver) is found, or ekeyd if an EntropyKey is connected, without the user having added them to the runlevels manually.

What’s the problem with this then? Well, the coldplugged services might require /usr for both the service and the libraries, which means you can’t run them without /usr — the udev-postmount service was, if I recall correctly, created just to deal with that, with udev actually keeping a score of which rules failed to execute, and re-executing them after /usr was mounted, but it relied on udev’s own handling of re-execution of rules, which I forgot if it still exists or not. If not, then that’s a big deal, but not something I want to care about to be honest. An easy way out of this is to say that coldplugging is not supported if your coldplugged services are needing /usr and you have it split, but it’s still quite hacky.

This blog post was supposed to be a bit longer, and provide among other things a visual representation of the boot-time service dependencies. It turns out now that I left it open for a whole week without being able to complete it as I intended. In particular, the graphical representation is messy because there are so many involved services, that on my laptop it’s seriously unreadable. I’ve been using the representation as a debug method to improve on my service files though, and I’ll write about that. It’s going to enter OpenRC’s git soon.

This said, this “half” post is good enough to read as it is. I’ll write more about it later on.

May I have a network connection, please?

If you’re running ~arch, you probably noticed by now that the latest OpenRC release no longer allows services to “need net” in their init scripts. This change has caused quite a bit of grief because some services no longer started after a reboot, or no longer start after a restart, including Apache. Edit: this only happens if you have corner case configurations such as an LXC guest. As William points out, the real change is simply that net.lo no longer provides the net virtual, but the other network interfaces do.

While it’s impossible to say that this is not annoying as hell, it could be much worse. Among other reasons, because it’s really trivial to work it around until the init scripts themselves are properly fixed. How? You just need to append to /etc/conf.d/$SERVICENAME the line rc_need="!net" — if the configuration file does not exist, simply create it.

Interestingly enough, knowing this workaround also allows you to do something even more useful, that is making sure that services requiring a given interface being up depend on that interface. Okay it’s a bit complex, let me backtrack a little.

Most of the server daemons that you have out there don’t really care of how many, which, and what name your interfaces are. They open either to the “catch-all” address (0.0.0.0 or :: depending on the version of the IP protocol — the latter can also be used as a catch-both IPv4 and IPv6, but that’s a different story altogether), to a particular IP address, or they can bind to the particular interface but that’s quite rare, and usually only has to do with the actual physical address, such as RADVD or DHCP.

Now to bind to a particular IP address, you really need to have the address assigned to the local computer or the binding will fail. So in these cases you have to stagger the service start until the network interface with that address is started. Unfortunately, it’s extremely hard to do so automatically: you’d have to parse the configuration file of the service (which is sometimes easy and most of the times not), and then you’d have to figure out which interface will come up with that address … which is not really possible for networks that get their addresses automatically.

So how do you solve this conundrum? There are two ways and both involve manual configuration, but so do defined-address listening sockets for daemons.

The first option is to keep the daemon listening on the catch-all addresses, then use iptables to set up filtering per-interface or per-address. This is quite easy to deal with, and quite safe as well. It also has the nice side effect that you only have one place to handle all the IP address specifications. If you ever had to restructure a network because the sysadmin before you used the wrong subnet mask, you know how big a difference that makes. I’ve found before that some people think that iptables also needs the interfaces to be up to work. This is not the case, fortunately, it’ll accept any interface names as long as they could possibly be valid, and then will only match them when the interface is actually coming up (that’s why it’s usually a better idea to whitelist rather than blacklist there).

The other option requires changing the configuration on the OpenRC side. As I shown above you can easily manipulate the dependencies of the init scripts without having to change those scripts at all. So if you’re running a DHCP server on the lan served by the interface named lan0 (named this way because a certain udev no longer allows you to swap the interface names with the permanent rules that were first introduced by it), and you want to make sure that one network interface is up before dhcp, you can simply add rc_need="net.lan0" to your /etc/conf.d/dhcpd. This way you can actually make sure that the services’ dependencies match what you expect — I use this to make sure that if I restart things like mysql, php-fpm is also restarted.

So after I gave you two ways to work around the current not-really-working-well status, but why did I not complain about the current situation? Well, the reason for which so many init scripts have that “need net” line is simply cargo-culting. And the big problem is that there is no real good definition of what “net” is supposed to be. I’ve seen used (and used it myself!) for at least the following notions:

  • there are enough modules loaded that you can open sockets; this is not really a situation that I’d like to find myself to have to work around; while it’s possible to build both ipv4 and ipv6 as modules, I doubt that most things would work at all that way;
  • there is at least one network interface present on the system; this usually is better achieved by making sure that net.lo is started instead; especially since in most cases for situations like this what you’re looking for is really whether 127.0.0.1 is usable;
  • there is an external interface connected; okay sure, so what are you doing with that interface? because I can assure you that you’ll find eth0 up … but no cable is connected, what about it now?
  • there is Internet connectivity available; this would make sense if it wasn’t for the not-insignificant detail that you can’t really know that from the init system; this would be like having a “need userpresence” that makes sure that the init script is started only after the webcam is turned on and the user face is identified.

While some of these particular notions have use cases, the fact that there is no clear identification of what that “need net” is supposed to be makes it extremely unreliable, and at this point, especially considering all the various options (oldnet, newnet, NetworkManager, connman, flimflam, LXC, vserver, …) it’s definitely a better idea to get rid of it and not consider it anymore. Unfortunately, this is leading us into a relative world of pain, but sometimes you have to get through it.

Busy init… OpenRC, Busybox and inittab

Even though it’s Sunday, I’m going back to write about my work on using Busybox as OpenRC shell, and not only.

One of the things we’ve decided for the firmware of the device is to reduce the amount of involved projects as much as possible. The reason falls into the realm of license auditing which is also the reason why I made sure that in Gentoo we now have a split hwids package which is available under a more “easygoing” license. The issue is not as much as not wanting to use GPL’d software, but the issue that the more GPL’d software you have in your firmware, the more you have to be careful to do everything by the letter, as it’s easy to get something wrong even with the best of intents.

Maybe even if I told myself that I disagree with Donnie’s post I’m actually agreeing with my actions.

To be very clear, once the device is available to the customers, we’re going to publish the whole sources of the open-source components, including those that we’re not required to because they are BSD, and even to people who wouldn’t receive the firmware itself, simply because there is no sense in keeping them private. Moreover, there is no change we made that is not trickling upstream already, either in Gentoo or to the original projects, which are mostly Aravis (which Luca is working on) and the Linux kernel (as I had to write some Super I/O drivers, which I intend to improve over the next few months).

Anyway, since we wanted to reduce the amount of projects, no matter what the license, that we depend on, we’ve also decided to replace sysvinit with BusyBox itself. This is not as much of a gain as one would expect, but it’s still significant. The main trick is that you can’t use the default /etc/inittab file Gentoo comes with, but you have to replace it with one compatible with BusyBox’s own init binary.

::sysinit:/sbin/rc sysinit
::wait:/sbin/rc boot
::wait:/sbin/rc

Okay possibly I could have dropped the boot runlevel altogether and just used sysinit and then default, but there is a trick to keep in mind: if you try to mimic the “classic” inittab file, you’ll be using rc default as the last entry: unfortunately if you do it that way, the softlevel= parameter to be passed at the kernel is not going to work at all. I’m not sure why it works on the standard setup, probably due to the implementation of the System V numbered runlevels, but for sure it doesn’t work with BusyBox, as the code in rc.c takes the default as the name of the runlevel to start — not giving a particular runlevel makes it decide which one to start depending on the parameter given, and defaults to, well, default, if none is set explicitly.

Beside this issue, and the small issue with sysctl init script not working correctly with Busybox’s own sysctl command, it seems like, for the very small set of init scripts we’re dealing with, BusyBox is a godsend.

But this is not all, of course! I’m also creating a few init scripts for Busybox applets themselves, for instance I have a bb.acpid script that is used to handle the acpid replacement within BusyBox, and the next up is going to be a (multiplexed) bb.udhcpd script for its own DHCP server. These will soon arrive in a Gentoo near you, and should be very nice if you’re not looking for a full-fledged DHCP server.

The main issues with this, as far as I can tell, is that there is little to no documentation on the format of the configuration file: for what concerns acpid I had to read through the source and guess what it needed, and that’s not so easy to do after a while as you have both common and non-common parser functions… at any rate, you’ll see more documentation about this appear in this blog and (if I can find more time to do this), probably on the Gentoo Wiki, as I think it would be a good idea to document it somewhere — you can make it more likely for me to write more documentation about the stuff I’m not doing for work by sending a thank you or you can otherwise wait for me to have more time.

Using BusyBox with OpenRC

As I said in my previous post I’ve decided to use BusyBox to strip down a Gentoo system to a bare minimum for the device I’m working on right now. This actually allows for a lot of things to be merged into a single package, but at the same time it has another very interesting side effect.

One of the reasons why systemd has been developed is that the traditional init systems for Unix use shell scripts for starting and stopping the services, which is slow both due to the use of bash (which is known to be slow and bloated for the task) and the use of fork/exec model for new commands. While we’ve had some work done to get the init scripts reduced and working with alternative shells, namely dash, what I’m using now is a slightly different variation: everything is done through BusyBox.

This has more than just the advantage of its shell scripting being faster: when using a “fat” build of busybox, which includes most of the utilities used in Linux, including coreutils, which and so on so forth, the execution of commands such as cut, sed, and the like is no longer happening through fork/exec, but rather through internal calls, as BusyBox is a multi-call binary. And even if the call is so complex it has to fork for it (which doesn’t happen so often for what I can tell), the binary is already well in memory, relocated and all.

Up to now I only found one issue in OpenRC when booting this way, related to the sysctl init script, so I submitted a patch which is already in OpenRC now. I’m not yet done with removing the separate packages, so there might be more issues, and of course I’m only using a very small subset of init scripts, so I wouldn’t be able to tell whether it would work on a full desktop system.

But this would for sure be a worthy experiment for somebody at some point.

The infamous /run migration

When I saw that a new release of OpenRC was made that finally took the step of migrating our setup to the “new standard” /run directory I was honestly thrilled: for once, the setup I used on my router is no longer a special case but it looks a lot more like the rule: /var/run directories need to be recreated, as the default /run is in a tmpfs filesystem.

The fact that this version of OpenRC has been removed without a replacement it did bother me a bit, especially in the regards of LXC because I had to make changes to the cgroups handling — since this version of OpenRC also uses cgroups when using the new init script style (it doesn’t do it if you call start-stop-daemon yourself though), I had to stop mounting /cgroups myself, otherwise it wouldn’t be able to start any guest container, as OpenRC mounts its own copy of said filesystem, trying (and, as far as I can tell, failing) to behave more like systemd.

At any rate I did the migration on most of my systems for the sake of testing it before doing so in production. As far as I can tell none of my init scripts misbehaved, which is positive. On the other hand, there seem to be some trouble with what concerns other scripts, in particular munin cron scripts … d’oh!

For the sake of using the lowest possible privilege to do any given job, the munin cron script, the one that generates the actual HTML content that is served by Apache is running as a mostly unprivileged user — on the other hand, the node daemon is usually running as root and dropping to nobody’s privileges when executing most of the plugins; some require super user privileges, on either standard systems (for instance smartd based plugins) or at least on hardened (the original if_ plugin, that accessed /proc; the current version, both upstream and in Gentoo, was modified by me to use /sys which is available to non-privileged users as well, even on hardened).

At any rate this seems to show that there still isn’t something properly set up with the way OpenRC prepares /run: the privileges to /run/lock should probably be root:uucp 0775 and that is not the case right now, not on all systems at least.. and Munin should use that directory to store the lock file.

Oh well, time to find a better workaround locally, I guess.

Updating init scripts

Hah. I know what you’re thinking: Flameeyes has disappeared! Yeah, I probably wish I was spending vacations in London or somewhere along those lines, but that’s not the case. Alas I’m doing double shifts lately, which is why Gentoo is taking mostly second place. But I shouldn’t complain, in this economy, having too much work is pretty rare.

Beside still operating the tinderbox I’ve decided to spend some time to update the init scripts that come with my packages. The new OpenRC init system has a much more extensive runscript shell that is used to execute the init scripts; this means that new init scripts can be written in a declarative way, that makes them shorter and more fool proof.

Indeed, for some init scripts – such as the new netatalk ones – the script boils down to setting the dependencies and declare which command to start and with which option. This is very good and nice.

I have to thank Samuli for the idea, as he went to update acpid’s init script with the new style, and so pushed me to look at other init scripts — in one case, it was because the package (haveged) was not working on one of my vservers, it seems like it was simply a transient problem, the latest version worked fine… and now has a new init script as well!

In some cases these updates also come with slight changes in behaviour, mostly in the case of ekeyd that is no longer setting sysctls for you (that’s why you got a sysctl.conf after all!), and in the case of quagga I ended up finally collapsing the two init scripts in a single one (before it was one for zebra and one for the rest, now it’s a single one symlinked for each service).

What is this reminding me, though, is the problem with init scripts that I have found with LXC before: a number of init scripts can’t be used on the host if you plan on using the same init script within a container: vtun, ulogd, autoconfig, wmacpimon, nagios, amphetadesk, vdr, portsentry and gnunetd use killall; drqsd, drqmd, ttyd, upsd and irda use non-properly-bound pkill, ncsa, npcd, btpd, nrpe, gift, amuled and amuleweb use non-properly-bound pgrep (and in the case of ncsa, npcd and nrpe, all of which seem to involve nagios, what it’s trying to do is a simple pkill).

Luckily it seems like there aren’t any more scripts installed in /etc/init.d that don’t use runscript, although I’m sure there are a few more that need work, if they are provided by upstream, as the Netatalk case shows us.

Oh well, more work to do then.

Some UPS notes (apcupsd et al)

If you didn’t notice, one of the packages I’ve been maintaining in Gentoo is apcupsd that is one of the daemons that can be used to control APC-produced UPS units. But for quite a while, my maintenance of the package was mostly limited to keeping it in a working state (with a wide range of different results, to be honest), since from the original (messy) ebuild I originally inherited, the current one is quite linear and, in my opinion, elegant.

But in the past two weeks or so, a few things happened that required me to look into the package more closely: a version bump, a customer having two UPSes connected to the same system, and the only remaining non-APC UPS in my home office declaring itself dead.

The version bump was the chance for me to finally fix the strict aliasing issue that is still present; this time, instead of simply disabling strict aliasing (quick, hacky way) I decided to look at the code, to make it actually strict aliasing compliant. This might not sound like much, but this kind of warnings is particularly nasty as you never know when it will cause an issue. Besides, it caused Portage to abort in stricter mode, that is what I use for packages I maintain myself.

Also, while my customer’s needs didn’t really influence my work on apcupsd itself, it caused me to look even more into munin’s apc_nis plugin as beforehand it was not configurable at all: it only ever used localhost:3551 to connect to APC NIS interface, which meant that if you wanted to change the port, or make it only listen on an external interface, you were out of luck. The patch to make this configurable is now part of Munin trunk, but I haven’t had time to ask Jeremy to add it to Gentoo as well (the few patches of mine to Munin are all merged upstream now, and Munin 2 will have those, and finally, native IPv6 transport, which means I probably won’t need to use ssh connections to fetch data over NAT, but just properly-configured firewalls).

There is another issue that comes up when having multiple UPS connected to the same box though: permanence of device names. While the daemon auto-discovers a single connected APC device, when you have multiple devices you need to explicitly tell it to access a given one. To do so, you could use the hiddev device paths, but the kernel does not make those persistent if you connect/disconnect the units. To solve this issue, the new ebuild for apcupsd that I committed today uses udev rules to provide /etc/apcups/usb-${SERIALNO} symlinks that you can use to provide stable reference to your apcupsd instances. I sent the rules upstream, hoping that they’ll be integrated in the next release.

A note here: while I’m a fan of autoconfiguration, I’m having trouble considering the idea of having apcupsd auto-started when an APC UPS is connected. The reason is not obvious though: while it would work fine if it was the only UPS and thus the only apcupsd instance to be present, if you had a second instance set up for a different UPS there would be no way to match the two together. This is at a minimum bothersome.

Speaking about init scripts, the powerfail init script currently only works in single-UPS configurations (whereas the main init script works fine in multiple UPS configurations), and even there it is a bit … broken. The powerfail flag can be written in a number of different places – the default and the Gentoo variants also point to different paths! – but this script does not take that into consideration at all. More to the point, the default, which uses /var/run might not be available at the shutdown init level since that would probably have been unmounted by that time. What I should do, I guess, is make it possible for the init script to fetch the configured value from the apcuspd configuration file, and move the default to use /run.

Next problem in my list is that apcaccess should not be among the superuser binaries, since it can be run from user just fine, but I’ll have to get that cleared with upstream first, it might break some scripts to move it in Gentoo only.

Finally, there is the problem that the sources of apcupsd are written with disregard for what many consider “library-only problems” – namely PIC – and has a very nasty copy-on-write scorecard. Unfortunately, some of the issues are so rooted into the design that I don’t feel up to fix the sources myself, but if somebody wanted a project to follow and optimise, that might be a good choice in this respect.

Sigh. I hope to find more time to fix the remaining issues with the scripts soon. For now if you have comments/notes that I might have missed, your feedback is welcome.

PSA: Packages failing to install with new, OpenRC-based stages: missing users and groups

This month Gentoo finally marked stable baselayout2 and OpenRC, which is an outstanding accomplishment, even though it happens quite late in the game, given that OpenRC exists for a few years by now. Since now these packages are stable, they are also used to build the new stages that are provided for installing new copies of Gentoo.

This has had an unforeseen (but not totally unexpected) problem with users and groups handling, since the new version of baselayout dropped a few users and groups that were previously defined by default, in light of its more BSD-compatible nature. Unfortunately, some of these users and groups were referenced by ebuilds, like in the case of Asterisk, that set its user as part of both the asterisk and dialout groups — the latter is no longer part of the default set of users created by baselayout, so installing Asterisk before last on a new system created from the OpenRC-based stage would have failed.

Okay so this is a screw-up and one that we should fix it as soon as possible, but why did it happen in the first place? There are two things to consider here: the root cause of the problem and why it wasn’t caught before this happened. I’d start with the second one in my opinion.

When testing OpenRC, we all came to the conclusion that it worked fine. Not only my computers, but even my vservers, my customer’s servers, and most of the developers’ production boxes have been running OpenRC for years. I even stopped caring about providing non-OpenRC-compatible init scripts at some point. Why did none of us hit this problem before? Being a rolling distribution, our main testing process does not involve making a new install altogether: you upgrade to OpenRC and judge whether it works or not.

Turns out this is not such a great idea for what concern critical system packages (we have seen issues with Python before as well): when upgrading from Baselayout 1 to Baselayout 2, not all files are replaced; users and group added by baselayout 1 are kept around, which makes it impossible to identify this class of issues. We should probably document more stringent stable marking process for system components, and work with releng to find a way to test a stage so that it actually boots up with a given kernel and configuration (KVM should help a lot there).

As for the root cause of the problem, we have been fighting with this issue since I became a dev, and that’s why there is GLEP27 which is supposed to take care of managing users and groups and assigning them global IDs. Unfortunately this is one of those GLEPs that were defined, but never implemented.

To be honest there has been work on the issue, which was also funded by the Google Summer of Code program, but the end results didn’t make it to Gentoo, but rather to another project (which is why I always have doubts about Gentoo’s waste of GSoC funding).

So until we have a properly-implemented GLEP27, which is nothing glamorous, nothing that newcomers seem to feel like tackling, we’re just dancing around a huge number of problems with handling of users and groups, that is not going to get easier with time, at all.

What is my plan here? I’ll probably find some time tonight or so to set up a tinderbox that uses the OpenRC-based stage, and see what might not work out of the box; unfortunately even that is not going to be a complete solution: if two ebuilds use the same group, and they are independent one from the other, it is well possible that the group is added by one and not the other, so whether they install correctly depends on the order of installation. Which is simply a bad thing to have and a difficult to test for.

In the mean time, please do report any package that fails to build with the new stages. Thank you!

CGROUPS woes

The cgroup functionality that the Linux kernel introduced a few versions ago, while originally being almost invisible, is proving itself having a quite wide range of interests, which in turn caused not few headaches to myself and other developers.

I originally looked into cgroups because of LXC and then I noticed it being used by Chromium, then libvirt (with its own bugs related to USB devices support). Right now the cgroup functionality is also used by the userland approach to task scheduling to replace the famous 200LOC kernel patch, and by the newest versions of the OpenVZ hypervisor.

While cgroup is a solid kernel technique, its interface doesn’t seem so much. The basic userland interface is accessible through a special pseudo-filesystem, just like the ones used for /sys and /proc. Unfortunately, the way to use this interface hasn’t really been documented decently, and that results in tons of problems; in my previous post regarding LXC I mistakenly inverted the cgroup-files I actually confused the way Ubuntu and Fedora mount cgroups; it is Fedora to use /sys/fs/cgroup as the base path for accessing cgroups, but as Lennart commented on the post itself, there’s a twist.

In practice there are two distinct interfaces to cgroups; one is through a single, all-mixed-in interface, that is accessed through the cgroup pseudo-filesystem when mounted without options; this is the one you can find mounted in /cgroup (also by the lxc init script in Gentoo) or /dev/cgroups. The other interface allows access (and thus limit) to one particular type of cgroup (such as memory, or cpuset), and have each hierarchy mounted at a different path. That second interface is the one that Lennart designed to be used by Fedora and that has been made “official” by the kernel developers in commit 676db4af043014e852f67ba0349dae0071bd11f3 (even though it is not really documented anywhere but in that commit).

Now as I said the lxc init script doesn’t follow that approach but rather it takes the opposite direction; this was not intended as a way to ditch the road taken by the kernel developers or by Fedora, but rather out of necessity: the commit above was added last summer, the Tinderbox has been running LXC for over an year at that point, and of course all the LXC work I did for Gentoo was originally based on the tinderbox itself. But since I did have a talk with Lennart and the new method is the future, I added to my TODO list, last month still, to actually look into making cgroups a supported piece of configuration in Gentoo.

And it came crashing down.

Between yesterday and this morning I actually found the time I needed to get to write an init script to mount the proper cgroup hierarchy the Fedora style. Interestingly enough, if you were to umount the hirarchy after mucking with it, you’re not going to mount it anymore, so there won’t be any “stop” for the script anyway. But that’s the least of my problems now. Once you do mount cgroups the way you’re supposed to, following the Fedora approach, LXC stops working.

I haven’t started looking into what the problem could be there; but it seems obvious that LXC doesn’t seem to take it very nicely when its single-access interface for cgroups is instead split in a number of different directories, each with its own little interface to use. And I can’t blame it much.

Unfortunately this is not the only obstacle LXC have to face now; beside the problem with actually shutting down a container (which only works partially and mostly out of sheer luck with my init system), the next version of OpenRC is going to drop support ofr auto-detecting LXC, both because identifying the cpuset in /proc is not going to work soon (it’s optional in kernel and considered deprecated) and because it wrongly identify the newest OpenVZ guests as LXC (since they also started using the same cgroups basics as LXC). These two problems mean that soon you’ll have to use some sort of lxc-gentoo script to set up an LXC guest, which will both configure a switch to shut the whole guest down, and configure OpenRC to accept it as an LXC guest manually.

Where does this leave us? Well, first of all, I’ll have to test if the current GIT master of LXC can cope with this kind of interface. If it doesn’t, I’ll have to talk with upstream to see that they would actually be supported so that LXC can be used with a Gentoo host, as well as a Fedora one, with the new cgroups interface (so that it can be made available to users for use with chromium and other software that might make good use of them). Then it would be time to focus on the Gentoo guests, so I’ll have to evaluate the contributed lxc-gentoo scripts that I know are on the Gentoo Wiki, for a start.

But let me write this again: don’t expect LXC to work nice for production use, now or anytime soon!