The cgroup functionality that the Linux kernel introduced a few versions ago, while originally being almost invisible, is proving itself having a quite wide range of interests, which in turn caused not few headaches to myself and other developers.
I originally looked into cgroups because of LXC and then I noticed it being used by Chromium, then libvirt (with its own bugs related to USB devices support). Right now the cgroup functionality is also used by the userland approach to task scheduling to replace the famous 200LOC kernel patch, and by the newest versions of the OpenVZ hypervisor.
While cgroup is a solid kernel technique, its interface doesn’t seem so much. The basic userland interface is accessible through a special pseudo-filesystem, just like the ones used for /sys
and /proc
. Unfortunately, the way to use this interface hasn’t really been documented decently, and that results in tons of problems; in my previous post regarding LXC I mistakenly inverted the cgroup-files I actually confused the way Ubuntu and Fedora mount cgroups; it is Fedora to use /sys/fs/cgroup
as the base path for accessing cgroups, but as Lennart commented on the post itself, there’s a twist.
In practice there are two distinct interfaces to cgroups; one is through a single, all-mixed-in interface, that is accessed through the cgroup pseudo-filesystem when mounted without options; this is the one you can find mounted in /cgroup
(also by the lxc init script in Gentoo) or /dev/cgroups
. The other interface allows access (and thus limit) to one particular type of cgroup (such as memory, or cpuset), and have each hierarchy mounted at a different path. That second interface is the one that Lennart designed to be used by Fedora and that has been made “official” by the kernel developers in commit 676db4af043014e852f67ba0349dae0071bd11f3 (even though it is not really documented anywhere but in that commit).
Now as I said the lxc init script doesn’t follow that approach but rather it takes the opposite direction; this was not intended as a way to ditch the road taken by the kernel developers or by Fedora, but rather out of necessity: the commit above was added last summer, the Tinderbox has been running LXC for over an year at that point, and of course all the LXC work I did for Gentoo was originally based on the tinderbox itself. But since I did have a talk with Lennart and the new method is the future, I added to my TODO list, last month still, to actually look into making cgroups a supported piece of configuration in Gentoo.
And it came crashing down.
Between yesterday and this morning I actually found the time I needed to get to write an init script to mount the proper cgroup hierarchy the Fedora style. Interestingly enough, if you were to umount the hirarchy after mucking with it, you’re not going to mount it anymore, so there won’t be any “stop” for the script anyway. But that’s the least of my problems now. Once you do mount cgroups the way you’re supposed to, following the Fedora approach, LXC stops working.
I haven’t started looking into what the problem could be there; but it seems obvious that LXC doesn’t seem to take it very nicely when its single-access interface for cgroups is instead split in a number of different directories, each with its own little interface to use. And I can’t blame it much.
Unfortunately this is not the only obstacle LXC have to face now; beside the problem with actually shutting down a container (which only works partially and mostly out of sheer luck with my init system), the next version of OpenRC is going to drop support ofr auto-detecting LXC, both because identifying the cpuset in /proc
is not going to work soon (it’s optional in kernel and considered deprecated) and because it wrongly identify the newest OpenVZ guests as LXC (since they also started using the same cgroups basics as LXC). These two problems mean that soon you’ll have to use some sort of lxc-gentoo
script to set up an LXC guest, which will both configure a switch to shut the whole guest down, and configure OpenRC to accept it as an LXC guest manually.
Where does this leave us? Well, first of all, I’ll have to test if the current GIT master of LXC can cope with this kind of interface. If it doesn’t, I’ll have to talk with upstream to see that they would actually be supported so that LXC can be used with a Gentoo host, as well as a Fedora one, with the new cgroups interface (so that it can be made available to users for use with chromium and other software that might make good use of them). Then it would be time to focus on the Gentoo guests, so I’ll have to evaluate the contributed lxc-gentoo
scripts that I know are on the Gentoo Wiki, for a start.
But let me write this again: don’t expect LXC to work nice for production use, now or anytime soon!
hey diego, interesting post. i am the original lxc-gentoo author, there’s a few more people contributing now.it seems the main problem with lxc in terms of usability is that lxc-create is a naive method of setup. prompting the user for additional configuration information after lxc-create but before real execution (so things like network device setup, mac address assignment/dhcp config) has to be done manually.basically, host config modification is not touched upon. however, it is necessary to get a working guest in most cases.attempting to do the above kind of steps on the domain of libvirt, which is a much larger project.personally i think we need to take a step back and think about how to resolve this.there is some interesting related work on OVF that maybe we can attempt to use.
see also recent discussion of libvirt’s lxc support here: https://www.redhat.com/arch…the big problem is that you need a host-side and guest-side config linkup to get a functional environment. you also want to maintain portability. you also might want growth for other options, like daemon configs (eg: specify dependent service location, such as dns or file server or database server or something) which OVF already tackles.i am checking out OVF vs. libvirt’s XML and trying to see what looks to be the best way forward.