More about Linux Resource Containers and Gentoo

Flameeyes

15 years ago

I have written before that I strongly object at LXC userland to be considered production-ready and now I have another example for you.

Let’s not even dig into the fact that the buildsystem for the userland tools is quite objectionable, as it “avoids” using libtool by using silly hacks into Makefile.am. Let’s not even spend much to say that they have no announcement-only mailing list, and they stopped using the SF.net File Release System (that has an RSS feed of the changes, and a number of mirrors) in favour of a simple download directory since that’s just project administration gone wrong.

The one big issue is that there is no documentation of changes between one release and the other. Either you follow GIT, or you’re left wondering what the heck is going on, if you look at the tarball itself only. On the other hand, just judging from the commit messages, there isn’t enough information either, so you have to read the code itself to understand what the heck is going on.

So let’s begin with what brought me here: I use LXC heavily in place of testing CHROOT, since it’s easier to SSH to an already-setup instance than having to set up a new one each time there is something new to test. Beside a number of “named” containers, I started having a number of “buildyards” which I only use to test particular services; case in point I wanted to test Squid together with the new SquidClamav which I need for a customer of mine so I copied over one of my previous buildyards and fired it up…

Unfortunately, the results weren’t very good: I didn’t get a portage tree bound… quickly checking around, 0.7.2 worked fine 0.7.3 didn’t. After looking at the code it became apparent that the root filesystem mountpoint that was introduced in this release as a configuration option is not only used for loop-device backed images (which are now supported, and weren’t before), but also for standard directory-based containers. If you add this to one issue I have described before (the fact that lxc does not validate that the mount paths provided for bind-mounts are within the new rootfs tree) you may start to understand the issue here.

If you haven’t seen it yet, here’s the breakdown:

my container’s rootfs is located at /media/chroots/lxc-buildyard4;
the /etc/lxc/buildyard4.conf file used to bind-mount the portage directory as /media/chroots/lxc-buildyard4/usr/portage;
with 0.7.2, the pivot_root system was called over /media/chroots/lxc-buildyard4 and all was fine;
with 0.7.3, before pivoting, /media/chroots/lxc-buildyard4 was bind-mounted to a different path (let’s assume /usr/lib/lxc/rootfs but it was a bit more messed up);
when I accessed /usr/portage within the chroot I was actually accessing the path to be found at /usr/lib/lxc/rootfs/usr/portage.

Okay so it’s a bit murky, because if you think of bind-mounts the same way you think about symlink, the first bind mount should have accessed all the sub-bind-mounts as well, but that wasn’t the case because the first wasn’t a recursive bind-bound. Which means you really have to change all your configuration files to use the new rootfs mount path, and they didn’t seem to make that very clear as news files.

Besides, the default configuration is variable on the libdir setting, which means you’d have different paths between 32-bit and 64-bit systems (symlinks are ignored in part, remember) so to avoid that, I’ve revision-bumped lxc in tree and is now using /usr/lib/lxc/rootfs directly, ignoring multilib altogether.

On a different note, I’m still planning on writing a post detailing how cgroups work, since both LXC and libvirt (two projects I follow) make use of them, as well as Chrome/Chromium and, now, the Lennart userland implementation of the 200loc kernel patch. But before doing that I want to compare how the other distributions solve the mountpoint problem:

the cgroup filesystem has to be mounted to be used, like /sys, /dev, /proc and so on… but OpenRC currently ignores it;
the LXC init script accepts an already-mounted cgroup filesystem or mounts it over /cgroup;
as far as I can tell, Fedora uses /dev/cgroup but that montpoint, like /dev/pts and /dev/shm need to be created after udev was started, as /dev is a virtual filesystem itself (that’s what /etc/init.d/devfs does);
while on the other hand Ubuntu seem to rely on /sys/fs/cgroup which is an empty directory on the /sys pseudo-filesystem created when cgroup is enabled in the kernel.

Sincerely, my preferred solution right now is the last one, since that requires no special code, just need /sys mounted, and is much more similar to how fusectl is mounted (on /sys/fs/fuse/connections). If you have any comments, feel free to have a say here.

Share this: