More about Linux Resource Containers and Gentoo

I have written before that I strongly object at LXC userland to be considered production-ready and now I have another example for you.

Let’s not even dig into the fact that the buildsystem for the userland tools is quite objectionable, as it “avoids” using libtool by using silly hacks into Makefile.am. Let’s not even spend much to say that they have no announcement-only mailing list, and they stopped using the SF.net File Release System (that has an RSS feed of the changes, and a number of mirrors) in favour of a simple download directory since that’s just project administration gone wrong.

The one big issue is that there is no documentation of changes between one release and the other. Either you follow GIT, or you’re left wondering what the heck is going on, if you look at the tarball itself only. On the other hand, just judging from the commit messages, there isn’t enough information either, so you have to read the code itself to understand what the heck is going on.

So let’s begin with what brought me here: I use LXC heavily in place of testing CHROOT, since it’s easier to SSH to an already-setup instance than having to set up a new one each time there is something new to test. Beside a number of “named” containers, I started having a number of “buildyards” which I only use to test particular services; case in point I wanted to test Squid together with the new SquidClamav which I need for a customer of mine so I copied over one of my previous buildyards and fired it up…

Unfortunately, the results weren’t very good: I didn’t get a portage tree bound… quickly checking around, 0.7.2 worked fine 0.7.3 didn’t. After looking at the code it became apparent that the root filesystem mountpoint that was introduced in this release as a configuration option is not only used for loop-device backed images (which are now supported, and weren’t before), but also for standard directory-based containers. If you add this to one issue I have described before (the fact that lxc does not validate that the mount paths provided for bind-mounts are within the new rootfs tree) you may start to understand the issue here.

If you haven’t seen it yet, here’s the breakdown:

  • my container’s rootfs is located at /media/chroots/lxc-buildyard4;
  • the /etc/lxc/buildyard4.conf file used to bind-mount the portage directory as /media/chroots/lxc-buildyard4/usr/portage;
  • with 0.7.2, the pivot_root system was called over /media/chroots/lxc-buildyard4 and all was fine;
  • with 0.7.3, before pivoting, /media/chroots/lxc-buildyard4 was bind-mounted to a different path (let’s assume /usr/lib/lxc/rootfs but it was a bit more messed up);
  • when I accessed /usr/portage within the chroot I was actually accessing the path to be found at /usr/lib/lxc/rootfs/usr/portage.

Okay so it’s a bit murky, because if you think of bind-mounts the same way you think about symlink, the first bind mount should have accessed all the sub-bind-mounts as well, but that wasn’t the case because the first wasn’t a recursive bind-bound. Which means you really have to change all your configuration files to use the new rootfs mount path, and they didn’t seem to make that very clear as news files.

Besides, the default configuration is variable on the libdir setting, which means you’d have different paths between 32-bit and 64-bit systems (symlinks are ignored in part, remember) so to avoid that, I’ve revision-bumped lxc in tree and is now using /usr/lib/lxc/rootfs directly, ignoring multilib altogether.

On a different note, I’m still planning on writing a post detailing how cgroups work, since both LXC and libvirt (two projects I follow) make use of them, as well as Chrome/Chromium and, now, the Lennart userland implementation of the 200loc kernel patch. But before doing that I want to compare how the other distributions solve the mountpoint problem:

  • the cgroup filesystem has to be mounted to be used, like /sys, /dev, /proc and so on… but OpenRC currently ignores it;
  • the LXC init script accepts an already-mounted cgroup filesystem or mounts it over /cgroup;
  • as far as I can tell, Fedora uses /dev/cgroup but that montpoint, like /dev/pts and /dev/shm need to be created after udev was started, as /dev is a virtual filesystem itself (that’s what /etc/init.d/devfs does);
  • while on the other hand Ubuntu seem to rely on /sys/fs/cgroup which is an empty directory on the /sys pseudo-filesystem created when cgroup is enabled in the kernel.

Sincerely, my preferred solution right now is the last one, since that requires no special code, just need /sys mounted, and is much more similar to how fusectl is mounted (on /sys/fs/fuse/connections). If you have any comments, feel free to have a say here.

6 thoughts on “More about Linux Resource Containers and Gentoo

  1. Well last I checked /etc/init.d/devfs was also creating/dev/null and /dev/console since it was still necessary and udev didn’t create it. As for chroot implications I have so little exp. I cannot commentI found this out since I thought udev was supposed to do this and removed it from the runlevel, was a couple kernels ago though.

    Like

  2. All systemd systems (such as F15) mount a tmpfs to /sys/fs/cgroup, and then the cgroup hierarchies beneath that. That’s the official kernel-mandated place to mount these things now, everything else is a thing of the past. Older Fedora versions used to mount it to /cgroup btw, never /dev/cgroup.

    Like

  3. great! thanks for the info. im hoping to replace linux-vserver sometime with lxc. while i read the cgroup docs from the kernel, it would be cool to read it written by another person again.

    Like

  4. So, Flameeyes, reading Lennart’s comment you’ve already leaned towards the right solution. If Ubuntu _and_ Fedora do it, it’ll probably be good for 6 months :-).

    Like

  5. Michael, I’m quite sure there has been proposal of that before and it was turned down, but I have no idea why to be honest. Anyway, thanks for sending that in! I hope it might show there is some sanity left in LXC.But with the notes from Lennart above, I’m starting to think that LXC might have to be restructured rather soon… it’s likely to not work at all with that configuration. I need the time to test it up…

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s