The unsolved problem of the init scripts

One of probably the biggest problems with maintaining software in Gentoo where a daemon is involved, is dealing with init scripts. And it’s not really that much of a problem with just Gentoo, as almost every distribution or operating system has its own to handle init scripts. I guess this is one of the nice ideas behind systemd: having a single standard for daemons to start, stop and reload is definitely a positive goal.

Even if I’m not sure myself whether I want the whole init system to be collapsed into a single one for every single operating system out there, there at least is a chance that upstream developers will provide a standard command-line for daemons so that init scripts no longer have to write a hundred lines of pre-start setup code commands. Unfortunately I don’t have much faith that this is going to change any time soon.

Anyway, leaving the daemons themselves alone, as that’s a topic for a post of its own and I don care about writing it now. What remains is the init script itself. Now, while it seems quite a few people didn’t know about this before, OpenRC has been supporting since almost ever a more declarative approach to init scripts by setting just a few variables, such as command, pidfile and similar, so that the script works, as long as the daemon follows the most generic approach. A whole documentation for this kind of scripts is present in the runscript man page and I won’t bore you with the details of it here.

Beside the declaration of what to start, there are a few more issues that are now mostly handled to different degrees depending on the init script, rather than in a more comprehensive and seamless fashion. Unfortunately, I’m afraid that this is likely going to stay the same way for a long time, as I’m sure that some of my fellow developers won’t care to implement the trickiest parts that can implemented, but at least i can try to give a few ideas of what I found out while spending time on said init scripts.

So the number one issue is of course the need to create the directories the daemon will use beforehand, if they are to be stored on temporary filesystems. What happens is that one of the first changes that came with the whole systemd movements was to create /run and use that to store pidfiles, locks and other runtime stateless files, mounting it as tmpfs at runtime. This was something I was very interested in to begin with because I was doing something similar before, on the router with a CF card (through an EIDE adapter) as harddisk, to avoid writing to it at runtime. Unfortunately, more than an year later, we still have lots of ebuilds out there that expects /var/run paths to be maintained from the merge to the start of the daemon. At least now there’s enough consensus about it that I can easily open bugs for them instead of just ignore them.

For daemons that need /var/run it’s relatively easy to deal with the missing path; while a few scripts do use mkdir, chown and chmod to handle the creation of the missing directories , there is a real neat helper to take care of it, checkpath — which is also documented in the aforementioned man page for runscript. But there has been many other places where the two directories are used, which are not initiated by an init script at all. One of these happens to be my dear Munin’s cron script used by the Master — what to do then?

This has actually been among the biggest issues regarding the transition. It was the original reason why screen was changed to save its sockets in the users’ home instead of the previous /var/run/screen path — with relatively bad results all over, including me deciding to just move to tmux. In Munin, I decided to solve the issue by installing a script in /etc/local.d so that on start the /var/run/munin directory would be created … but this is far from a decent standard way to handle things. Luckily, there actually is a way to solve this that has been standardised, to some extents — it’s called tmpfiles.d and was also introduced by systemd. While OpenRC implements the same basics, because of the differences in the two init systems, not all of the features are implemented, in particular the automatic cleanup of the files on a running system —- on the other hand, that feature is not fundamental for the needs of either Munin or screen.

There is an issue with the way these files should be installed, though. For most packages, the correct path to install to would be /usr/lib/tmpfiles.d, but the problem with this is that on a multilib system you’d end up easily with having both /usr/lib and /usr/lib64 as directories, causing Portage’s symlink protection to kick in. I’d like to have a good solution to this, but honestly, right now I don’t.

So we have the tools at our disposal, what remains to be done then? Well, there’s still one issue: which path should we use? Should we keep /var/run to be compatible, or should we just decide that /run is a good idea and run with it? My guts say the latter at this point, but it means that we have to migrate quite a few things over time. I actually started now on porting my packages to use /run directly, starting from pcsc-lite (since I had to bump it to 1.8.8 yesterday anyway) — Munin will come with support for tmpfiles.d in 2.0.11 (unfortunately, it’s unlikely I’ll be able to add support for it upstream in that release, but in Gentoo it’ll be). Some more of my daemons will be updated as I bump them, as I already spent quite a lot of time on those init scripts to hone them down on some more issues that I’ll delineate in a moment.

For some, but not all!, of the daemons it’s actually possible to decide the pidfile location on the command line — for those, the solution to handle the move to the new path is dead easy, as you just make sure to pass something equivalent to -p ${pidfile} in the script, and then change the pidfile variable, and done. Unfortunately that’s not always an option, as the pidfile can be either hardcoded into the compiled program, or read from a configuration file (the latter is the case for Munin). In the first case, no big deal: you change the configuration of the package, or worse case you patch the software, and make it use the new path, update the init script and you’re done… in the latter case though, we have trouble at hand.

If the location of the pidfile is to be found in a configuration file, even if you change the configuration file that gets installed, you can’t count on the user actually updating the configuration file, which means your init script might get out of sync with the configuration file easily. Of course there’s a way to work around this, and that is to actually get the pidfile path from the configuration file itself, which is what I do in the munin-node script. To do so, you need to see what the syntax of the configuration file is. In the case of Munin, the file is just a set of key-value pairs separated by whitespace, which means a simple awk call can give you the data you need. In some other cases, the configuration file syntax is so messed up, that getting the data out of it is impossible without writing a full-blown parser (which is not worth it). In that case you have to rely on the user to actually tell you where the pidfile is stored, and that’s quite unreliable, but okay.

There is of course one thing now that needs to be said: what happens when the pidfile changes in the configuration between one start and the stop? If you’re reading the pidfile out of a configuration file it is possible that the user, or the ebuild, changed it in between causing quite big headaches trying to restart the service. Unfortunately my users experienced this when I changed Munin’s default from /var/run/munin/munin-node.pid to /var/run/munin-node.pid — the change was possible because the node itself runs as root, and then drops privileges when running the plugins, so there is no reason to wait for the subdirectory, and since most nodes will not have the master running, /var/run/munin wouldn’t be useful there at all. As I said, though, it would cause the started node to use a pidfile path, and the init script another, failing to stop the service before starting it new.

Luckily, William corrected it, although it’s still not out — the next OpenRC release will save some of the variables used at start time, allowing for this kind of problems to be nipped in the bud without having to add tons of workarounds in the init scripts. It will require some changes in the functions for graceful reloading, but that’s in retrospective a minor detail.

There are a few more niceties that you could do with init scripts in Gentoo to make them more fool proof and more reliable, but I suppose this would cover the main points that we’re hitting nowadays. I suppose for me it’s just going to be time to list and review all the init scripts I maintain, which are quite a few.

Exit mobile version