Linux Containers and the init scripts problem

Since the tinderbox is now running on Linux containers I’m also experimenting with making more use of those. Since containers are, as the name implies, self contained, I can use them in place of chroots for testing stuff that I’d prefer wouldn’t contaminate my main system, for instance I can use them instead of the Python virtualenv to get a system where I can use easy_install to copy in the stuff that is not packaged in Portage as a temporary measure.

But after some playing around I came to the conclusion that we got essentially two problems with init scripts. Two very different problems actually, and one involves more than just Linux Containers, but I’ll just state both here.

The first problem is specific to Linux Containers and relates to one limitation I think I wrote of before; while the guest (tinderbox) cannot see the processes of the host (yamato) the opposite is not true, and indeed the host cannot really distinguish between its processes and the ones from the guest. This isn’t much of a problem, since the start and stop of daemons is usually done through pidfiles that list the started process id, rather than doing a search and destroy over all the processes.

But the “usually” part here is the problem: there are init scripts that use the killall command (which as far as I can tell does not take namespaces into consideration) to identify which process to send signals to. It’s not just a matter of using it to kill processes; most of the times, it seems to be used to send signals to the daemon (like SIGHUP for reloading configuration or stuff like that). This was probably done in response to changes to start-stop-daemon that asked for it not to be used for that task. Fortunately, there is a quick way to fix this: instead of using killall we can almost always use kill and take the PID to send the signal to through the pidfile created either by the daemon itself or by s-s-d.

Hopefully this won’t require especially huge changes, but it brings up the issue of improving the quality assurance over the init scripts we currently ship. I found quite a few that dependent on services that weren’t in the dependencies of the ebuild (either because they are “sample configurations’ or because they lacked some runtime dependencies), a few that had syntax mistakes in them (some due to the new POSIX-correctness introduced by OpenRC, but not all of them), and quite a bit of them which run commands in global scope that slow down the dependencies regeneration. I guess this is something else that we have to decide upon.

The other problem with init script involves KVM and QEmu as well. While RedHat has developed some tools for abstracting virtual machine management, I have my doubts about them as much now as I had some time ago for what concerns both configuration capabilities (they still seem to bring in a lot of unneeded stuff – to me – like dnsmasq), and now code quality as well (the libvirt testsuite is giving me more than a few headaches to be honest).

Luca already proposed some time ago that we could just write a multiplex-capable init script for KVM and QEmu so that we could just configure the virtual machines like we do for the network interfaces, and then use the standard rc system to start and stop them. While it should sound trivial, this is no simple task: starting is easy, but stopping the virtual machine? Do you just shut it down, detaching the virtual power cord? Or do you go stopping the services internal to the VM as you should? And how do you do that, with ACPI signals, with SSH commands?

The same problem applies to Linux containers, but with a twist: trying to run shutdown -h now inside a Linux container seem to rather stop the host, rather than the guest! And there you cannot rely on ACPI signals either.

If somebody has a suggestion, they are very welcome.

8 thoughts on “Linux Containers and the init scripts problem

  1. I believe Linux-VServer meets all your requirements for process separation.I’m not aware of any problems using init scripts within a VServer, although sometimes you need to grant access explicitly to any unusual entries in /dev or /proc.With a default config, ‘shutdown -h now’ and ‘halt’ seems to have no effect on either the guest or the host.’killall -9′ is limited to the guest, and leaves processes in the host and other VServer contexts unaffected.

    Like

  2. I guess the problem there is that VServer requires a special kernel if I recall correctly; while Linux Containers, with the only problem of shutdown, work just as fine, with the standard 2.6.30 kernel.

    Like

  3. “keyword nojail noopenvz noprefix novserver” in init script already do something, find the way to extend that to suit your needs.vservers already had that problem now solved.In case you decide to give a try to vservers, portage has sys-kernel/vserver-sources-2.3.0.36.14 which is 2.6.30 plus vserver patches (I’m using 2.6.30.1-vs2.3.0.36.14-pre4).as an alternative replace shutdown with your own script which do the right thing

    Like

  4. The ‘shutdown -h now’ stopping the host instead of guest is quite interesting. Without any insight into lxc I believe that this is either a bug or an isolation problem that should be someday resolved in new linux kernels.Luca’s idea of amultiplex-capable init script for KVM and QEmu sounds really great. Libvirt’s way to manage networks breaks (at least for me) gentoo’s networking scripts. So I’ve written my own start-stop snippets that currently need LOTs of love. But I could actually help with that, at least with basic ideas and possibly some testing. ACPI singals are by FAR the best way to handle this. KVM provides a rather sane way of dealing with it:if you start a VM with: -daemonize -monitor unix:/var/run/vm/0.monit,server,nowaitthen to shut it down you only needecho “system_powerdown” | socat – UNIX-CONNECT:/var/run/vm/0.monitUsing ssh is not the way to go. Firstly you’d enforce networking on guests, secondly what if guests use windows? + assume a server with virtual machines dedicated to different users. A user accidentally deletes the ssh account gentoo uses for managing vms > management of vms is screwed.Same counts for the even more insane options like echo “poweroff” | netcat 127.0.0.1 50000 &

    Like

  5. Btw, networking and kvm is an interesting issue. In my best knowledge switching ethX to promisc, creating a bridge and using tap is the most scalable and sane way to handle networking with multiple vms running.For some reason beyond my understanding, shutting down a kvm (both the acpi way and by kill -9) disables the tap that the VM used and the interface has to be deleted and created again.This breaks the current networking scripts so I had to use little hacks. But I guess that my tests sofar might be useful to someone of more skill then myself.So Diego, if you or Luca would want to work on this just drop me a note.

    Like

  6. Pavel just a quick note: yes that problem with tap happened to me too; I solved it by using vde, and bridging vde0 with eth0 (the interface to the rest of the office).

    Like

  7. The manual of VServer says:http://linux-vserver.org/ut… “You must use “reboot -f” or “halt -f” to restart or shut down from within the guest.” “vserver myguest stop” works fine from the host.I also know that Windows can detect then you press “power” button and shutdown itself. It must be the “system_powerdown” Pavel’s trick.

    Like

  8. Here’s a way for the host to trigger the guest to shut itself down gracefully, non-interactively, and without rpc/ssh/rsh etc…put a line in the containers inittab that does init 0 in response to powerfail, then send a sigpwr from the host to the containers init pid.get the init pid fromlxc-ps -C init -opidignore the header and the hosts own pid 1the lxc-user mail list came up with it. I made a set of integrated scripts out of it for openSUSE in this rpm:http://download.opensuse.or…You have opened my eyes with the possibility of killall commands on the host killing processes in the containers. I’ll have to go looking over all the system init scripts now. Holy cow what a mess that could make if it wasn’t for the fact that at this early stage I’m only using lxc containers for backup servers not direct production. I’m still not sure what all happens with iptables either for another possibly sticky area.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s