Hard Containers — LXC and GrSecurity

Okay so Excelsior is here, and it’s installed, and here starts the new list of troubles, which seems to start, as usual, with my favourite system ever: LXC which is the basis the Tinderbox work upon.

The first problem is not strictly tied to LXC, but one of the dependencies required: the lynx browser fails to build in parallel if there are enough cores available (which there certainly are here!). This bug should be relatively easy to fix but I haven’t had time to look into it just yet.

A second issue came up due to the way I was proceeding to do the install, outside of office hours, and is that the ebuild is using the kernel sources, I think to identify which capabilities are available on the system. This should be fixed as well, so that it checks the capabilities on the installed linux-headers instead of the sources, which might not be the same.

The third issue is funny: Excelsior is going to use an hardened kernel. The reason is relatively simple to understand: it’s going to run a lot of code of unknown origins, it’ll allow other people in, one wants to be as as possible… unfortunately it seems like this is not, by default, a good configuration to use with LXC.

In particular, grsecurity is designed by default to limit what you can do within a chroot, by applying a long list of restrictions. This is fine, if not for the fact that LXC also chroots to start its own set up process. I’m now fixing the ebuild to warn about those options that you have to (or might) want to disable in your GrSec setup to use LXC.

Interestingly, it’s not a good idea to disable all of them, since a few are actually very good if you want to use LXC, such as the mknod restriction, which is very good in particular if you want to make sure that only a subset of the devices are accessible (even when counting in the allowed/non-allowed access of the devices cgroup).

In particular, these have to be disabled:

  • Deny pivot_root in chroot (CONFIG_GRKERNSEC_CHROOT_PIVOT)
  • Capability restrictions (CONFIG_GRKERNSEC_CHROOT_CAPS)

while the double-chroot would be counter-synergistic as it would disallow services within the container to further chroot to allow a defense-in-depth approach.

Then there is another issue. Before starting to set up the actual tinderbox, I wanted to prepare another container, which is the one I’ll be using for my own purposes, including bumping of Ruby packages and stuff like that. Since the system is supposed to stay very locked down, this time I want to mount the block device straight into the container, which is a supported configuration…. but it turns out that the configuration parser, trying to workaround old issues (yes that’s a one and a half years’ old post) will ignore any mount request that doesn’t have the destination rootfs prefixed.

Unfortunately when you mount a block device, it means that you’ll end up with something along the lines of /dev/sdb1/usr/portage. This also collides with the documentation in man lxc.conf:

If the rootfs is an image file or a device block and the fstab is used to mount a point somewhere in this rootfs, the path of the rootfs mount point should be prefixed with the /usr/lib/lxc/rootfs default path or the value of lxc.rootfs.mount if specified.

Anyway this should be fixed in 0.8.0_rc2-r2 which is now in tree, I’ve not been able to test it thoroughly yet, so holler at me if something doesn’t work.