What we needs from daemons

In my post of yesterday I noted some things about the init scripts, small niceties that init scripts should do in Gentoo for them to work properly and to solve the issue of migrating pid files to /run. Today I’d like to add a few more notes of what I wish all daemons out there implemented at the very least.

First of all, while some people prefer for the daemon to not fork and background by itself, I honestly prefer it to — it makes so many things so much easier. But if you fork, wait till the forked process completed initialization before exiting! The reason why I’m saying this is that, unfortunately, it’s common for a daemon to start up, fork, then load its configuration file and find out there’s a mistake … leading to a script that thinks that the daemon started properly, while no process is left running. In init scripts, --wait allows you to tell the start-stop-daemon tool to wait for a moment to see if the daemon could start at all, but it’s not so nice, because you have to find the correct wait time empirically, and in almost every case you’re going to run longer than needed.

If you will background by yourself, please make sure that you create a pidfile to tell the init system which ID to signal to stop — and if you do have such a pidfile, please do not make it configurable on the configuration file, but set a compiled-in default and eventually allow an override at runtime. The runtime override is especially welcome if your software is supposed to have multiple instances configured on the same box — as then a single pidfile would conflict. Not having it configured on a file means that you no longer need to hack up a parser for the configuration file to be able to know what the user wants, but you can rely on either the default or your override.

Also if you do intend to support multiple instances of the same daemon make sure that you allow multiple configuration files to be passed in by he command-line. This simplifies a lot the whole handling of multiple-instances, and should be mandatory in that situation. Make sure you don’t re-use paths in that case either.

If you have messages you should also make sure that they are sent to syslog — please do not force, or even default, everything to log files! We have tons of syslog implementations, and at least the user does not have to guess which one of many files is going to be used for the messages from your last service start — at this point you probably guessed that there are a few things I hope to rectify in Munin 2.1.

I’m pretty sure that there are other concerns that could come up, but for now I guess this would be enough for me to have a much simpler life as an init script maintainer.

The unsolved problem of the init scripts

One of probably the biggest problems with maintaining software in Gentoo where a daemon is involved, is dealing with init scripts. And it’s not really that much of a problem with just Gentoo, as almost every distribution or operating system has its own to handle init scripts. I guess this is one of the nice ideas behind systemd: having a single standard for daemons to start, stop and reload is definitely a positive goal.

Even if I’m not sure myself whether I want the whole init system to be collapsed into a single one for every single operating system out there, there at least is a chance that upstream developers will provide a standard command-line for daemons so that init scripts no longer have to write a hundred lines of pre-start setup code commands. Unfortunately I don’t have much faith that this is going to change any time soon.

Anyway, leaving the daemons themselves alone, as that’s a topic for a post of its own and I don care about writing it now. What remains is the init script itself. Now, while it seems quite a few people didn’t know about this before, OpenRC has been supporting since almost ever a more declarative approach to init scripts by setting just a few variables, such as command, pidfile and similar, so that the script works, as long as the daemon follows the most generic approach. A whole documentation for this kind of scripts is present in the runscript man page and I won’t bore you with the details of it here.

Beside the declaration of what to start, there are a few more issues that are now mostly handled to different degrees depending on the init script, rather than in a more comprehensive and seamless fashion. Unfortunately, I’m afraid that this is likely going to stay the same way for a long time, as I’m sure that some of my fellow developers won’t care to implement the trickiest parts that can implemented, but at least i can try to give a few ideas of what I found out while spending time on said init scripts.

So the number one issue is of course the need to create the directories the daemon will use beforehand, if they are to be stored on temporary filesystems. What happens is that one of the first changes that came with the whole systemd movements was to create /run and use that to store pidfiles, locks and other runtime stateless files, mounting it as tmpfs at runtime. This was something I was very interested in to begin with because I was doing something similar before, on the router with a CF card (through an EIDE adapter) as harddisk, to avoid writing to it at runtime. Unfortunately, more than an year later, we still have lots of ebuilds out there that expects /var/run paths to be maintained from the merge to the start of the daemon. At least now there’s enough consensus about it that I can easily open bugs for them instead of just ignore them.

For daemons that need /var/run it’s relatively easy to deal with the missing path; while a few scripts do use mkdir, chown and chmod to handle the creation of the missing directories , there is a real neat helper to take care of it, checkpath — which is also documented in the aforementioned man page for runscript. But there has been many other places where the two directories are used, which are not initiated by an init script at all. One of these happens to be my dear Munin’s cron script used by the Master — what to do then?

This has actually been among the biggest issues regarding the transition. It was the original reason why screen was changed to save its sockets in the users’ home instead of the previous /var/run/screen path — with relatively bad results all over, including me deciding to just move to tmux. In Munin, I decided to solve the issue by installing a script in /etc/local.d so that on start the /var/run/munin directory would be created … but this is far from a decent standard way to handle things. Luckily, there actually is a way to solve this that has been standardised, to some extents — it’s called tmpfiles.d and was also introduced by systemd. While OpenRC implements the same basics, because of the differences in the two init systems, not all of the features are implemented, in particular the automatic cleanup of the files on a running system —- on the other hand, that feature is not fundamental for the needs of either Munin or screen.

There is an issue with the way these files should be installed, though. For most packages, the correct path to install to would be /usr/lib/tmpfiles.d, but the problem with this is that on a multilib system you’d end up easily with having both /usr/lib and /usr/lib64 as directories, causing Portage’s symlink protection to kick in. I’d like to have a good solution to this, but honestly, right now I don’t.

So we have the tools at our disposal, what remains to be done then? Well, there’s still one issue: which path should we use? Should we keep /var/run to be compatible, or should we just decide that /run is a good idea and run with it? My guts say the latter at this point, but it means that we have to migrate quite a few things over time. I actually started now on porting my packages to use /run directly, starting from pcsc-lite (since I had to bump it to 1.8.8 yesterday anyway) — Munin will come with support for tmpfiles.d in 2.0.11 (unfortunately, it’s unlikely I’ll be able to add support for it upstream in that release, but in Gentoo it’ll be). Some more of my daemons will be updated as I bump them, as I already spent quite a lot of time on those init scripts to hone them down on some more issues that I’ll delineate in a moment.

For some, but not all!, of the daemons it’s actually possible to decide the pidfile location on the command line — for those, the solution to handle the move to the new path is dead easy, as you just make sure to pass something equivalent to -p ${pidfile} in the script, and then change the pidfile variable, and done. Unfortunately that’s not always an option, as the pidfile can be either hardcoded into the compiled program, or read from a configuration file (the latter is the case for Munin). In the first case, no big deal: you change the configuration of the package, or worse case you patch the software, and make it use the new path, update the init script and you’re done… in the latter case though, we have trouble at hand.

If the location of the pidfile is to be found in a configuration file, even if you change the configuration file that gets installed, you can’t count on the user actually updating the configuration file, which means your init script might get out of sync with the configuration file easily. Of course there’s a way to work around this, and that is to actually get the pidfile path from the configuration file itself, which is what I do in the munin-node script. To do so, you need to see what the syntax of the configuration file is. In the case of Munin, the file is just a set of key-value pairs separated by whitespace, which means a simple awk call can give you the data you need. In some other cases, the configuration file syntax is so messed up, that getting the data out of it is impossible without writing a full-blown parser (which is not worth it). In that case you have to rely on the user to actually tell you where the pidfile is stored, and that’s quite unreliable, but okay.

There is of course one thing now that needs to be said: what happens when the pidfile changes in the configuration between one start and the stop? If you’re reading the pidfile out of a configuration file it is possible that the user, or the ebuild, changed it in between causing quite big headaches trying to restart the service. Unfortunately my users experienced this when I changed Munin’s default from /var/run/munin/munin-node.pid to /var/run/munin-node.pid — the change was possible because the node itself runs as root, and then drops privileges when running the plugins, so there is no reason to wait for the subdirectory, and since most nodes will not have the master running, /var/run/munin wouldn’t be useful there at all. As I said, though, it would cause the started node to use a pidfile path, and the init script another, failing to stop the service before starting it new.

Luckily, William corrected it, although it’s still not out — the next OpenRC release will save some of the variables used at start time, allowing for this kind of problems to be nipped in the bud without having to add tons of workarounds in the init scripts. It will require some changes in the functions for graceful reloading, but that’s in retrospective a minor detail.

There are a few more niceties that you could do with init scripts in Gentoo to make them more fool proof and more reliable, but I suppose this would cover the main points that we’re hitting nowadays. I suppose for me it’s just going to be time to list and review all the init scripts I maintain, which are quite a few.

May I have a network connection, please?

If you’re running ~arch, you probably noticed by now that the latest OpenRC release no longer allows services to “need net” in their init scripts. This change has caused quite a bit of grief because some services no longer started after a reboot, or no longer start after a restart, including Apache. Edit: this only happens if you have corner case configurations such as an LXC guest. As William points out, the real change is simply that net.lo no longer provides the net virtual, but the other network interfaces do.

While it’s impossible to say that this is not annoying as hell, it could be much worse. Among other reasons, because it’s really trivial to work it around until the init scripts themselves are properly fixed. How? You just need to append to /etc/conf.d/$SERVICENAME the line rc_need="!net" — if the configuration file does not exist, simply create it.

Interestingly enough, knowing this workaround also allows you to do something even more useful, that is making sure that services requiring a given interface being up depend on that interface. Okay it’s a bit complex, let me backtrack a little.

Most of the server daemons that you have out there don’t really care of how many, which, and what name your interfaces are. They open either to the “catch-all” address (0.0.0.0 or :: depending on the version of the IP protocol — the latter can also be used as a catch-both IPv4 and IPv6, but that’s a different story altogether), to a particular IP address, or they can bind to the particular interface but that’s quite rare, and usually only has to do with the actual physical address, such as RADVD or DHCP.

Now to bind to a particular IP address, you really need to have the address assigned to the local computer or the binding will fail. So in these cases you have to stagger the service start until the network interface with that address is started. Unfortunately, it’s extremely hard to do so automatically: you’d have to parse the configuration file of the service (which is sometimes easy and most of the times not), and then you’d have to figure out which interface will come up with that address … which is not really possible for networks that get their addresses automatically.

So how do you solve this conundrum? There are two ways and both involve manual configuration, but so do defined-address listening sockets for daemons.

The first option is to keep the daemon listening on the catch-all addresses, then use iptables to set up filtering per-interface or per-address. This is quite easy to deal with, and quite safe as well. It also has the nice side effect that you only have one place to handle all the IP address specifications. If you ever had to restructure a network because the sysadmin before you used the wrong subnet mask, you know how big a difference that makes. I’ve found before that some people think that iptables also needs the interfaces to be up to work. This is not the case, fortunately, it’ll accept any interface names as long as they could possibly be valid, and then will only match them when the interface is actually coming up (that’s why it’s usually a better idea to whitelist rather than blacklist there).

The other option requires changing the configuration on the OpenRC side. As I shown above you can easily manipulate the dependencies of the init scripts without having to change those scripts at all. So if you’re running a DHCP server on the lan served by the interface named lan0 (named this way because a certain udev no longer allows you to swap the interface names with the permanent rules that were first introduced by it), and you want to make sure that one network interface is up before dhcp, you can simply add rc_need="net.lan0" to your /etc/conf.d/dhcpd. This way you can actually make sure that the services’ dependencies match what you expect — I use this to make sure that if I restart things like mysql, php-fpm is also restarted.

So after I gave you two ways to work around the current not-really-working-well status, but why did I not complain about the current situation? Well, the reason for which so many init scripts have that “need net” line is simply cargo-culting. And the big problem is that there is no real good definition of what “net” is supposed to be. I’ve seen used (and used it myself!) for at least the following notions:

  • there are enough modules loaded that you can open sockets; this is not really a situation that I’d like to find myself to have to work around; while it’s possible to build both ipv4 and ipv6 as modules, I doubt that most things would work at all that way;
  • there is at least one network interface present on the system; this usually is better achieved by making sure that net.lo is started instead; especially since in most cases for situations like this what you’re looking for is really whether 127.0.0.1 is usable;
  • there is an external interface connected; okay sure, so what are you doing with that interface? because I can assure you that you’ll find eth0 up … but no cable is connected, what about it now?
  • there is Internet connectivity available; this would make sense if it wasn’t for the not-insignificant detail that you can’t really know that from the init system; this would be like having a “need userpresence” that makes sure that the init script is started only after the webcam is turned on and the user face is identified.

While some of these particular notions have use cases, the fact that there is no clear identification of what that “need net” is supposed to be makes it extremely unreliable, and at this point, especially considering all the various options (oldnet, newnet, NetworkManager, connman, flimflam, LXC, vserver, …) it’s definitely a better idea to get rid of it and not consider it anymore. Unfortunately, this is leading us into a relative world of pain, but sometimes you have to get through it.

Busy init… OpenRC, Busybox and inittab

Even though it’s Sunday, I’m going back to write about my work on using Busybox as OpenRC shell, and not only.

One of the things we’ve decided for the firmware of the device is to reduce the amount of involved projects as much as possible. The reason falls into the realm of license auditing which is also the reason why I made sure that in Gentoo we now have a split hwids package which is available under a more “easygoing” license. The issue is not as much as not wanting to use GPL’d software, but the issue that the more GPL’d software you have in your firmware, the more you have to be careful to do everything by the letter, as it’s easy to get something wrong even with the best of intents.

Maybe even if I told myself that I disagree with Donnie’s post I’m actually agreeing with my actions.

To be very clear, once the device is available to the customers, we’re going to publish the whole sources of the open-source components, including those that we’re not required to because they are BSD, and even to people who wouldn’t receive the firmware itself, simply because there is no sense in keeping them private. Moreover, there is no change we made that is not trickling upstream already, either in Gentoo or to the original projects, which are mostly Aravis (which Luca is working on) and the Linux kernel (as I had to write some Super I/O drivers, which I intend to improve over the next few months).

Anyway, since we wanted to reduce the amount of projects, no matter what the license, that we depend on, we’ve also decided to replace sysvinit with BusyBox itself. This is not as much of a gain as one would expect, but it’s still significant. The main trick is that you can’t use the default /etc/inittab file Gentoo comes with, but you have to replace it with one compatible with BusyBox’s own init binary.

::sysinit:/sbin/rc sysinit
::wait:/sbin/rc boot
::wait:/sbin/rc

Okay possibly I could have dropped the boot runlevel altogether and just used sysinit and then default, but there is a trick to keep in mind: if you try to mimic the “classic” inittab file, you’ll be using rc default as the last entry: unfortunately if you do it that way, the softlevel= parameter to be passed at the kernel is not going to work at all. I’m not sure why it works on the standard setup, probably due to the implementation of the System V numbered runlevels, but for sure it doesn’t work with BusyBox, as the code in rc.c takes the default as the name of the runlevel to start — not giving a particular runlevel makes it decide which one to start depending on the parameter given, and defaults to, well, default, if none is set explicitly.

Beside this issue, and the small issue with sysctl init script not working correctly with Busybox’s own sysctl command, it seems like, for the very small set of init scripts we’re dealing with, BusyBox is a godsend.

But this is not all, of course! I’m also creating a few init scripts for Busybox applets themselves, for instance I have a bb.acpid script that is used to handle the acpid replacement within BusyBox, and the next up is going to be a (multiplexed) bb.udhcpd script for its own DHCP server. These will soon arrive in a Gentoo near you, and should be very nice if you’re not looking for a full-fledged DHCP server.

The main issues with this, as far as I can tell, is that there is little to no documentation on the format of the configuration file: for what concerns acpid I had to read through the source and guess what it needed, and that’s not so easy to do after a while as you have both common and non-common parser functions… at any rate, you’ll see more documentation about this appear in this blog and (if I can find more time to do this), probably on the Gentoo Wiki, as I think it would be a good idea to document it somewhere — you can make it more likely for me to write more documentation about the stuff I’m not doing for work by sending a thank you or you can otherwise wait for me to have more time.

The infamous /run migration

When I saw that a new release of OpenRC was made that finally took the step of migrating our setup to the “new standard” /run directory I was honestly thrilled: for once, the setup I used on my router is no longer a special case but it looks a lot more like the rule: /var/run directories need to be recreated, as the default /run is in a tmpfs filesystem.

The fact that this version of OpenRC has been removed without a replacement it did bother me a bit, especially in the regards of LXC because I had to make changes to the cgroups handling — since this version of OpenRC also uses cgroups when using the new init script style (it doesn’t do it if you call start-stop-daemon yourself though), I had to stop mounting /cgroups myself, otherwise it wouldn’t be able to start any guest container, as OpenRC mounts its own copy of said filesystem, trying (and, as far as I can tell, failing) to behave more like systemd.

At any rate I did the migration on most of my systems for the sake of testing it before doing so in production. As far as I can tell none of my init scripts misbehaved, which is positive. On the other hand, there seem to be some trouble with what concerns other scripts, in particular munin cron scripts … d’oh!

For the sake of using the lowest possible privilege to do any given job, the munin cron script, the one that generates the actual HTML content that is served by Apache is running as a mostly unprivileged user — on the other hand, the node daemon is usually running as root and dropping to nobody’s privileges when executing most of the plugins; some require super user privileges, on either standard systems (for instance smartd based plugins) or at least on hardened (the original if_ plugin, that accessed /proc; the current version, both upstream and in Gentoo, was modified by me to use /sys which is available to non-privileged users as well, even on hardened).

At any rate this seems to show that there still isn’t something properly set up with the way OpenRC prepares /run: the privileges to /run/lock should probably be root:uucp 0775 and that is not the case right now, not on all systems at least.. and Munin should use that directory to store the lock file.

Oh well, time to find a better workaround locally, I guess.

Updating init scripts

Hah. I know what you’re thinking: Flameeyes has disappeared! Yeah, I probably wish I was spending vacations in London or somewhere along those lines, but that’s not the case. Alas I’m doing double shifts lately, which is why Gentoo is taking mostly second place. But I shouldn’t complain, in this economy, having too much work is pretty rare.

Beside still operating the tinderbox I’ve decided to spend some time to update the init scripts that come with my packages. The new OpenRC init system has a much more extensive runscript shell that is used to execute the init scripts; this means that new init scripts can be written in a declarative way, that makes them shorter and more fool proof.

Indeed, for some init scripts – such as the new netatalk ones – the script boils down to setting the dependencies and declare which command to start and with which option. This is very good and nice.

I have to thank Samuli for the idea, as he went to update acpid’s init script with the new style, and so pushed me to look at other init scripts — in one case, it was because the package (haveged) was not working on one of my vservers, it seems like it was simply a transient problem, the latest version worked fine… and now has a new init script as well!

In some cases these updates also come with slight changes in behaviour, mostly in the case of ekeyd that is no longer setting sysctls for you (that’s why you got a sysctl.conf after all!), and in the case of quagga I ended up finally collapsing the two init scripts in a single one (before it was one for zebra and one for the rest, now it’s a single one symlinked for each service).

What is this reminding me, though, is the problem with init scripts that I have found with LXC before: a number of init scripts can’t be used on the host if you plan on using the same init script within a container: vtun, ulogd, autoconfig, wmacpimon, nagios, amphetadesk, vdr, portsentry and gnunetd use killall; drqsd, drqmd, ttyd, upsd and irda use non-properly-bound pkill, ncsa, npcd, btpd, nrpe, gift, amuled and amuleweb use non-properly-bound pgrep (and in the case of ncsa, npcd and nrpe, all of which seem to involve nagios, what it’s trying to do is a simple pkill).

Luckily it seems like there aren’t any more scripts installed in /etc/init.d that don’t use runscript, although I’m sure there are a few more that need work, if they are provided by upstream, as the Netatalk case shows us.

Oh well, more work to do then.

Some UPS notes (apcupsd et al)

If you didn’t notice, one of the packages I’ve been maintaining in Gentoo is apcupsd that is one of the daemons that can be used to control APC-produced UPS units. But for quite a while, my maintenance of the package was mostly limited to keeping it in a working state (with a wide range of different results, to be honest), since from the original (messy) ebuild I originally inherited, the current one is quite linear and, in my opinion, elegant.

But in the past two weeks or so, a few things happened that required me to look into the package more closely: a version bump, a customer having two UPSes connected to the same system, and the only remaining non-APC UPS in my home office declaring itself dead.

The version bump was the chance for me to finally fix the strict aliasing issue that is still present; this time, instead of simply disabling strict aliasing (quick, hacky way) I decided to look at the code, to make it actually strict aliasing compliant. This might not sound like much, but this kind of warnings is particularly nasty as you never know when it will cause an issue. Besides, it caused Portage to abort in stricter mode, that is what I use for packages I maintain myself.

Also, while my customer’s needs didn’t really influence my work on apcupsd itself, it caused me to look even more into munin’s apc_nis plugin as beforehand it was not configurable at all: it only ever used localhost:3551 to connect to APC NIS interface, which meant that if you wanted to change the port, or make it only listen on an external interface, you were out of luck. The patch to make this configurable is now part of Munin trunk, but I haven’t had time to ask Jeremy to add it to Gentoo as well (the few patches of mine to Munin are all merged upstream now, and Munin 2 will have those, and finally, native IPv6 transport, which means I probably won’t need to use ssh connections to fetch data over NAT, but just properly-configured firewalls).

There is another issue that comes up when having multiple UPS connected to the same box though: permanence of device names. While the daemon auto-discovers a single connected APC device, when you have multiple devices you need to explicitly tell it to access a given one. To do so, you could use the hiddev device paths, but the kernel does not make those persistent if you connect/disconnect the units. To solve this issue, the new ebuild for apcupsd that I committed today uses udev rules to provide /etc/apcups/usb-${SERIALNO} symlinks that you can use to provide stable reference to your apcupsd instances. I sent the rules upstream, hoping that they’ll be integrated in the next release.

A note here: while I’m a fan of autoconfiguration, I’m having trouble considering the idea of having apcupsd auto-started when an APC UPS is connected. The reason is not obvious though: while it would work fine if it was the only UPS and thus the only apcupsd instance to be present, if you had a second instance set up for a different UPS there would be no way to match the two together. This is at a minimum bothersome.

Speaking about init scripts, the powerfail init script currently only works in single-UPS configurations (whereas the main init script works fine in multiple UPS configurations), and even there it is a bit … broken. The powerfail flag can be written in a number of different places – the default and the Gentoo variants also point to different paths! – but this script does not take that into consideration at all. More to the point, the default, which uses /var/run might not be available at the shutdown init level since that would probably have been unmounted by that time. What I should do, I guess, is make it possible for the init script to fetch the configured value from the apcuspd configuration file, and move the default to use /run.

Next problem in my list is that apcaccess should not be among the superuser binaries, since it can be run from user just fine, but I’ll have to get that cleared with upstream first, it might break some scripts to move it in Gentoo only.

Finally, there is the problem that the sources of apcupsd are written with disregard for what many consider “library-only problems” – namely PIC – and has a very nasty copy-on-write scorecard. Unfortunately, some of the issues are so rooted into the design that I don’t feel up to fix the sources myself, but if somebody wanted a project to follow and optimise, that might be a good choice in this respect.

Sigh. I hope to find more time to fix the remaining issues with the scripts soon. For now if you have comments/notes that I might have missed, your feedback is welcome.

About the new Quagga ebuild

A foreword: some people might think that I’m writing this just to banter about what I did; my sincere reason to write, though, is to point out an example of why I dislike 5-minutes fixes as I wrote last December. It’s also an almost complete analysis of my process of ebuild maintenance so it might be interesting for others to read.

For a series of reasons that I haven’t really written about at all, I need Quagga in my homemade Celeron router running Gentoo — for those who don’t know, Quagga is a fork of an older project called Zebra, and provides a few daemons for route advertisement protocols (such as RIP and BGP). Before yesterday, the last version of Quagga in Portage was 0.99.15 (and the stable is an old 0.98 still), but there was recently a security bug that required a bump to 0.99.17.

I was already planning on getting Quagga a bump to fix a couple of personal pet peeves with it on the router; since Alin doesn’t have much time, and also doesn’t use Quagga himself, I’ve added myself to the package’s metadata; and started polishing the ebuild and its support files. The alternative would have been for someone to just pick up the 0.99.15 ebuild, update the patch references, and push it out with the 0.99.17 version, which would have categorized for a 5-minutes-fix and wouldn’t have solved a few more problems the ebuild had.

Now, the ebuild (and especially the init scripts) make a point that they were contributed by someone working for a company that used Quagga; this is a good start, from one point: the code is supposed to work since it was used; on the other hand companies don’t usually care for the Gentoo practices and policies, and tend to write ebuilds that could be polished a bit further to actually be compliant to our guidelines. I like them as a starting point, and I got used to do the final touches in those cases. So if you have some ebuilds that you use internally and don’t want to spend time maintaining it forever, you can also hire me to clean them up and merge in tree.

So I started from the patches; the ebuild applied patches from a tarball, three unconditionally and two based on USE flags; both of those had URLs tied to them that pointed out that they were unofficial feature patches (a lot of networking software tend to have similar patches). I set out to check the patches; one was changing the detection of PCRE; one was obviously a fix for --as-needed, one was a fix for an upstream bug. All five of them were on a separate patchset tarball that had to be fetched from the mirrors. I decided to change the situation.

First of all, I checked the PCRE patch; actually the whole PCRE logic, inside configure is long winded and difficult to grok properly; on the other hand, a few comments and the code itself shows that the libpcreposix library is only needed non non-GNU systems, as GLIBC provides the regcomp/@regexec@ functions. So instead of applying the patch and have a pcre USE flag, I changed to link the use or not of PCRE depending on the elibc_glibc implicit USE flag; one less patch to apply.

Second patch I looked at was the --as-needed-related patch that changed the order of libraries link so that the linker wouldn’t drop them out; it wasn’t actually as complete as I would have made. Since libtool handles transitive dependencies fine, if the libcap library is used in the convenience library, it only has to be listed there, not also in the final installed library. Also, I like to take a chance to remove unused definitions in the Makefile while I’m there. So I reworked the patch on top of the current master branch in their GIT, and sent it upstream hoping to get it merged before next release.

The third patch is a fix for an upstream bug that hasn’t been merged in a few releases already, so I kept it basically the same. The two feature patches had new versions released, and the Gentoo version seems to have gone out of sync with the upstream ones a bit; for the sake of reducing Gentoo-specific files and process, I decided to move to use the feature patches that the original authors release; since they are only needed when their USE flags are enabled, they are fetched from the original websites conditionally. The remaining patches are too small to be part of a patchset tarball, so I first simply put them in files/ are they were, with mine a straight export from GIT. Thinking about it a bit more, I decided today to combine them in a single file, and just properly handle them on Gentoo GIT (I started writing a post detailing how I manage GIT-based patches).

Patches done, the next step is clearing out the configuration of the program itself; the ipv6 USE flag handles the build and installation of a few extra specific daemons for for the IPv6 protocol; the rest are more or less direct mappings from the remaining flags. For some reason, the ebuild used --libdir to change the installation directory of the libraries, and then later installed an env.d file to set the linker search path; which is generally a bad idea — I guess the intention was just to follow that advice, and not push non-generic libraries into the base directory, but doing it that way is mostly pointless. Note to self: write about how to properly handle internal libraries. My first choice was to see if libtool set rpath properly, and in that case leave it to the loader to deal with it. Unfortunately it seems like there is something bad in libtool, and while rpath worked on my workstation, it didn’t work on the cross-build root for the router though; I’m afraid it’s related to the lib vs lib64 paths, sigh. So after testing it out on the production router, I ended up revbumping the ebuild already to unhack itif libtool can handle it properly, I’ll get that fixed upstream so that the library is always installed, by default, as a package-internal library, in the mean time it gets installed vanilla as upstream wrote it. It makes even more sense given that there are headers installed that suggest the library is not an internal library after all.

In general, I found the build system of quagga really messed up and in need of an update; since I know how many projects are sloppy about build systems, I’d probably take a look. But sincerely, before that I have to finish what I started with util-linux!

While I was at it, I fixed the installation to use the more common emake DESTDIR= rather than the older einstall (which means that it now installs in parallel as well); and installed the sample files among the documentation rather than in /etc (reasoning: I don’t want to backup sample files, nor I want to copy them to the router, and it’s easier to move them away directly). I forgot the first time around to remove the .la files, but I did so afterwards.

What remains is the most important stuff actually; the init scripts! Following my own suggestions the scripts had to be mostly rewritten from scratch; this actually was also needed because the previous scripts had a non-Gentoo copyright owner and I wanted to avoid that. Also, there were something like five almost identical init scripts in the package, where almost is due to the name of the service itself; this means also that there had to be more than one file without any real reason. My solution is to have a single file for all of them, and symlink the remaining ones to that one; the SVCNAME variable is going to define the name of the binary to start up. The one script that differs from the other, zebra (it has some extra code to flush the routes) I also rewrote to minimise the differences between the two (this is good for compression, if not for deduplication). The new scripts also take care of creating the /var/run directory if it doesn’t exist already, which solves a lot of trouble.

Now, as I said I committed the first version trying it locally, and then revbumped it last night after trying it on production; I reworked that a bit harder; beside the change in libraries install, I decided to add a readline USE flag rather than force the readline dependency (there really isn’t much readline-capable on my router, since it’s barely supposed to have me connected), this also shown me that the PAM dependency was strictly related to the vtysh optional component; and while I looked at PAM, (Updated) I actually broke it (and fixed it back in r2); the code is calling pam_start() with a capital-case “Quagga” string; but Linux-PAM puts it in all lower case… I didn’t know that, and I was actually quite sure that it was case sensitive. Turns out that OpenPAM is case-sensitive, Linux-PAM is not; that explains why it works with one but not the other. I guess the next step in my list of things to do is check out if it might be broken with Turkish locale. (End of update)

Another thing that I noticed there is that by default Quagga has been building itself as a Position Independent Executable (PIE); as I have written before using PIE on a standard kernel, without strong ASLR, has very few advantages, and enough disadvantages that I don’t really like to have it around; so for now it’s simply disabled; since we do support proper flags passing, if you’re building a PIE-complete system you’re free to; and if you’re building an embedded-enough system, you have nothing else to do.

The result is a pretty slick ebuild, at least in my opinion, less files installed, smaller, Gentoo-copyrighted (I rewrote the scripts practically entirely). It handles the security issue but also another bunch of “minor” issues, it is closer to upstream and it has a maintainer that’s going to make sure that the future releases will have an even slicker build system. It’s nothing exceptional, mind you, but it’s what it is to fix an ebuild properly after a few years spent with bump-renames. See?

Afterword: a few people, seemingly stirred up by a certain other developer, seems to have started complaining that I “write too much”, or pretend that I actually have an uptake about writing here. The main uptake I have is not having to repeat myself over and over to different people. Writing posts cost me time, and keeping the blog running, reachable and so on so forth takes me time and money, and running the tinderbox costs me money. Am I complaining? Not so much; Flattr is helping, but trust me that it doesn’t even cover the costs of the hosting, up to now. I’m just not really keen on the slandering because I write out explanation of what I do and why. So from now on, you bother me? Your comments will be deleted. Full stop.

Some more notes about Linux Containers

I’ve been playing around more with Linux Containers after my post about init scripts and I start to think they are quite near being working for Gentoo. I hope once I come back from my vacations to get them in the tree together with Tiziano.

Right now the problems we have are:

  • the standard stage3 needs to be heavily tweaked or it’ll be a massacre when started;
  • if I do set ttys in the configuration, at start time it moves me to the real tty1, which causes a domino effect with X11 that is annoying, although not critical;
  • OpenRC needs to be tweaked to add support for Linux-Containers; there are quite a few things that could be eased up by having a working OpenRC that ignores some init scripts when running in containers; the code seems to be in src/librc/librc.c (look for openvz) and should be easy to check whether we’re running on containers, by checking the running cgroup (/proc/self/cgroup);
  • the current ebuild in Tiziano’s overlay creates a /var/lib/lib directory and installs the lxc- binaries in /usr/bin, both of which shouldn’t happen in Gentoo once installed;
  • running rc shutdown inside the container will stop all the services properly, but will not kill init and thus not kill the vserver, I’m not sure why; running kill 1 also seems not to work, I have yet to check whether sending the kill signal from outside will properly shut down the rc inside, if yes, then it’ll be a good way to shut down the container.

Once I’ll be back and I’ll be working on the init scripts, they’ll be in a separate package, kinda like mysql’s, since what I have for now is slightly more complex, will add a few more standard locations (for instance they’ll use a /var/log/lxc directory that is not part of the standard install of lxc) and will require a couple of packages that are not part of lxc.

Linux Containers and the init scripts problem

Since the tinderbox is now running on Linux containers I’m also experimenting with making more use of those. Since containers are, as the name implies, self contained, I can use them in place of chroots for testing stuff that I’d prefer wouldn’t contaminate my main system, for instance I can use them instead of the Python virtualenv to get a system where I can use easy_install to copy in the stuff that is not packaged in Portage as a temporary measure.

But after some playing around I came to the conclusion that we got essentially two problems with init scripts. Two very different problems actually, and one involves more than just Linux Containers, but I’ll just state both here.

The first problem is specific to Linux Containers and relates to one limitation I think I wrote of before; while the guest (tinderbox) cannot see the processes of the host (yamato) the opposite is not true, and indeed the host cannot really distinguish between its processes and the ones from the guest. This isn’t much of a problem, since the start and stop of daemons is usually done through pidfiles that list the started process id, rather than doing a search and destroy over all the processes.

But the “usually” part here is the problem: there are init scripts that use the killall command (which as far as I can tell does not take namespaces into consideration) to identify which process to send signals to. It’s not just a matter of using it to kill processes; most of the times, it seems to be used to send signals to the daemon (like SIGHUP for reloading configuration or stuff like that). This was probably done in response to changes to start-stop-daemon that asked for it not to be used for that task. Fortunately, there is a quick way to fix this: instead of using killall we can almost always use kill and take the PID to send the signal to through the pidfile created either by the daemon itself or by s-s-d.

Hopefully this won’t require especially huge changes, but it brings up the issue of improving the quality assurance over the init scripts we currently ship. I found quite a few that dependent on services that weren’t in the dependencies of the ebuild (either because they are “sample configurations’ or because they lacked some runtime dependencies), a few that had syntax mistakes in them (some due to the new POSIX-correctness introduced by OpenRC, but not all of them), and quite a bit of them which run commands in global scope that slow down the dependencies regeneration. I guess this is something else that we have to decide upon.

The other problem with init script involves KVM and QEmu as well. While RedHat has developed some tools for abstracting virtual machine management, I have my doubts about them as much now as I had some time ago for what concerns both configuration capabilities (they still seem to bring in a lot of unneeded stuff – to me – like dnsmasq), and now code quality as well (the libvirt testsuite is giving me more than a few headaches to be honest).

Luca already proposed some time ago that we could just write a multiplex-capable init script for KVM and QEmu so that we could just configure the virtual machines like we do for the network interfaces, and then use the standard rc system to start and stop them. While it should sound trivial, this is no simple task: starting is easy, but stopping the virtual machine? Do you just shut it down, detaching the virtual power cord? Or do you go stopping the services internal to the VM as you should? And how do you do that, with ACPI signals, with SSH commands?

The same problem applies to Linux containers, but with a twist: trying to run shutdown -h now inside a Linux container seem to rather stop the host, rather than the guest! And there you cannot rely on ACPI signals either.

If somebody has a suggestion, they are very welcome.