Linux Containers on Gentoo, Redux

I’ve got a few further requests for Linux Containers support in Gentoo, one of which came from Alex, so I’ve decided to give it a try to update the current status of the stack, and give a decent description of what the remaining problems are.

First of all, the actual software: I’m still not fond of the way upstream is dealing with the lxc package itself; the build system is still trying to be smart and happening to be stupid, related to the LXC library. There are still very little failsafes, and there isn’t really enough space to manage LXC with the default tools as they are. While libvirt should support LXC just fine, I haven’t found the time to try it again and see if it works; I’m afraid it might only work if you use the forced setup that RedHat uses for LXC… but again I cannot say much until I find time to try it out and tweak it where needed.

*A note, as I stated before a lot of the documentation and tutorials regarding libvirt only apply to RedHat or Fedora. I can’t blame them for that, they do the work, they do the walk, but often it means that we have to adapt them or at least find a way to provide them with the pieces they expect in the right place. It requires a lot of my time to do that.*

I’ve finally added my “custom” init script to the ebuild, with version 0.7.2; it might change further with or without revision bump as I fix bugs reported; it should mostly auto-configure, the only requirement it has is to symlink to lxc.container to start the container defined in /etc/lxc/container.conf; it auto-detects the root path (so it won’t force you a particular filesystem layout), and works with both 32- and 64-bit containers transparently, so long there is a /sbin/init command (which I could have to change for systemd-based distributions at some point). What I now reason it lacks is support for detecting the network interface it uses and require that started up; I can add that at some point, in the mean time use /etc/conf.d/lxc.container and then add rc_need="net.yourif".

For what concerns networking, last I checked with lxc 0.7.1 userspace, and kernel 2.6.34, the macvlan system still isolated the host from the guests, which might be what you want but it’s definitely not what I care for. I’m guessing this might actually be by design; at any rate, even though technically slower, I find myself quite comfortable with using a Linux-based bridge as main interface, and bridge together the Virtual Ethernet device of the guest with the physical interface(s) of the host. This also works fine with libvirt/KVM, so it’s not a bad decision in my opinion. I just added 0.7.2 but I can’t see how that makes a difference, as macvlan is handled in kernel.

Thankfully, doing so with Gentoo’s networking system (which Roy wanted to deprecate, tsk!) is piece of cake: open /etc/conf.d/net, rename config_eth0 with config_br0, then add config_eth0="null" and bridge_br0="eth0".. exec ln -s net.lo /etc/init.d/net.br0, and use that for bringing the network up. Then on the LXC configuration side you got

lxc.network.type = veth
lxc.network.link = br0

and you’re all set. As I said, piece of cake.

Slightly more difficult is to properly handle the TTY devices; some people prefer to make use of the Linux virtual terminals to handle LXC containers; I sincerely don’t want it to mess with my real virtual terminals, I prefer using the lxc-console command to access it without networking. Especially since it messes up a lot if you are using KMS with the Radeon driver (which is what I’ve been doing for the past year or so).

For this to work out, I noted two things though: the first is that simply using the cgroup access control lists on the devices don’t help out that much (I actually haven’t tried to set them up properly just yet); on the other hand, LXC can create “pseudo-ttys” that can be used with lxc-console; the default number (9) does not work all that well, because the Gentoo init system set up twelve virtual terminals by default. So my solution is to use my custom static device tarball and the following snippet in the configuration:

lxc.tty = 12
lxc.pts = 128

This ensures that the TTY devices are all properly set up, so that they don’t mess with your virtual terminals, and lxc-console works like a charm in this configuration.

Now, the sad part: OpenRC is not yet stable, and I haven’t fixed yet the NFS bug I found (you stop it into the container and the host’s NFS exports are destroyed.. bad script, bad script!). On the other hand, I’m not doing LXC daily any longer for the simplest reason: the tinderbox is set up as I wish already, for the most part, so I have little to no incentive to work more on this; the good news is, I’m up for hire as I said for what concerns Ruby. So if you really want to use LXC in production and want me to improve whatever area Gentoo-related to it, including libvirt, you can just contact me.

Beside that, everything should be in place. Have fun working with LXC!

Kerberos and libvirt

Do you remember my latest libvirt ranting and the recent post about Kerberos and NFSv4 don’t you? Well, let’s tie the two up and consider a couple of good and bad things related to both.

First of all, as Daniel Berrange pointed out, QEmu does support IPv6; unfortunately it doesn’t seem to work just as he supposed it to: even though my hostname resolves to both IPv4 and IPv6, QEmu by default only listens to v4. The same goes if you don’t provide a listening socket (such as “”), and again the identical same happens with the default setting provided by libvirt (0.0.0.0). You can force it to listen to v6 by either providing a v6-only hostname, a v6 IP address or the v6 catch-all [::] which makes it work on both v6 and v4, lovely, isn’t it?

Then, about libvirt-remote, as many pointed out it is possible to use it with SSH as user, but there are two catches there: the first is that with the way the arguments are passed down from virt-manager down to libvirt, to ssh and zsh on the other side, something goes funky; it works fine with bash because it splits the parameters again, but with zsh as login shell for my user it tries to call a binary called nc -U ... which as you might have guessed is not correct. The second problem is that even if you set the unix socket access for your user, it won’t let it work if you are using SSH and the system is configured with PolicyKit. I guess this was designed to work in two distinct configuration (desktop and server) and trying to mix the two creates a bit of trouble.

This does not solve two problems though: the dangling connections that are kept alive even after closing virt-manager and its inability to provide diagnostic more human-readable than the Python exceptions. This became tremendously obvious today as I went to consider the idea of using Kerberos for the authentication of libvirt itself, given that it can do that via SASL. It would make more sense, since I’ll be having a Kerberos install anyway at this point, to use the Kerberos credentials for more than a couple of services.

Using Kerberos for libvirt actually makes quite a bit of sense: you can set up properly TLS support for the connection and have an user-based authentication (rather than the whole host-based authentication that is supported with the TLS-only login). Setting up libvirt itself is not difficult, if it wasn’t for the single problem that most of the documentation tells you to use /etc/libvirt/krb5.keytab while it’ll be looking only at /etc/krb5.keytab by default — maybe it’s worth for Gentoo to change the init script so that it searches for the one documented. After that, I can properly login on the libvirt-remote access with virt-manager and Kerberos…. but I still am having trouble with QEmu and VNC this time around.

Now a little note regarding pambase: as I’ve been brought to note the default configuration used by pambase with the kerberos USE flag enabled might not be well suited for all the sites using Kerberos right now. I know that, but Gentoo never pretended to give perfect defaults, or defaults that suit everybody; on the other hand I think it’s important to give a default for Kerberos in our packaging. I’ll have to talk with Robin or someone else for integrating a default regarding pam_ldap as well, since the LDAP guide we provide is hinting at the wrong solution for the PAM configuration, if the system also want to be a desktop.

Having found a decent way to provide multiple optional login systems for users is actually finally paving the way to provide token-based login that I talked about last year.

Ranting on: libvirt remote

*This is a rant. This is not meant to be constructive, so it is not. If Rich is reading this, he might find some points he can work on; if I had the time, I would be working on them myself; if you feel like you agree with my rant and would like to get the stuff implemented, you can hire me to work on this. But for the rest, this is a rant, so if you’re not in the mood to read my rants, you might want to skip over this.*

The laptop from hell is finally shaping up; the smartcard reader works, after editing the ccid files (yes I have to publish the patches up there). Thanks to this I finally wanted to take one further step to make use of it to augment my productivity. One of these things, which I couldn’t feasibly do with OS X, is handling my virtual machines park via virt-manager.

Now, libvirt is designed to be a client-server system, and the server might not be local. There are three transport options to use remote servers: clear-text, TLS, and SSH tunnelling. Now, the clear-text is an obvious bad choice; SSH would be a good choice, if it wasn’t that.. it only works with the root user. There is no way to configure which user to use for the ssh tunnel, and I really don’t want to enable SSH root logins.

TLS is an interesting choice. With this transport, the authentication on the server side is done similarly to what OpenVPN does: you have a personal certification authority (CA), one key/certificate pair for the server, and then one pair per client. All the certificates are signed by the CA itself, and that validates the client to connect to the server. It’s a very nice approach for hosting providers I guess, since you can have a number of workstations that have the certificates set up properly to connect to the farm of virtualisation servers. Unfortunately it has design flawswhen you want to use it with something like a laptop.

First of all, while the server’s certificates and key files’ paths are configurable in the libvirtd.conf file, the client’s files are not configurable, they are hardcoded at build-time based on the system configuration directory (for Gentoo, that’s /etc). They are also only used host-global, as it does not even check for an override in the user’s directory. And this is a double-problem because the certificate has to be passwordless! Or, to put it in a different way, it has to be insecure, lacking a password protection. And since I just said that it does not allow for per-user overrides, you cannot even rely purely on encrypting your home directory. Alternative option is to symlink them from the /etc paths to your home directory but that’s not elegant at all.

It gets even a little worse: to access the display of the virtual machines libvirt tries to do a smart thing, by using the VNC protocol rather than reinventing the wheel. Now, a lot of comments can be written regarding the choice of using the VNC protocol itself, but the fact that it doesn’t reinvent a new one is positive. Unfortunately, when accessing a remote server via TLS, instead of muxing the VNC protocol over the same connection, it simply tries to connect to the VNC display on the remote host. And of course it’s a separate configuration to tell qemu to open the VNC on the correct host. D’oh!

Okay so I get to configure qemu to open the VNC on all interfaces, all IPs… but here’s the catch: for TLS to work correctly, you need to provide correct hostnames, stable hostnames; to do so I decided to use IPv6, since my boxes’ IPv6 are already forward-confirmed, thanks to the latest Hurricane Electric service (providing name server hosting without asking me to maintain my own). Unfortunately, it seems just like qemu does not support IPv6 at all, which means that .. the connection will not work because the hostname will find no hit.

It really doesn’t look too difficult to implement at least part of these features, like choosing the certificate files path, or connecting as a different SSH user. Sure, if you were to shoot high, you could probably consider using a single SCTP socket with multiple channels to multiplex the libvirt protocol together with the VNC connections, but that’s not needed at all. It really just need a few touches here and there to make it much more usable.

Reinventing lots of little wheels

You might remember my quick review of the Secure Programming with Static Analysis book. While on the overall I was expecting a much more practical view on how to maximise the gain from static analysis (like how to make sure that trying to get rid of false positive does not end up cluttering both the source and the produced object code), it had some quite important insights that I think are worth the read, and the money of the book.

One of these insight is an explanation on why Microsoft’s “secure” interfaces differ from the standard POSIX ones. Having a status code returned, to check whether the action completed, failed, or completed-with-truncation, is definitely more useful than being returned one of the two pointers that are already provided as input. Similarly, the book shows some of the “secure wrappers” commonly used for replacing inherently insecure functions such as readlink().

Now, on the whole, this is all good, but I noticed one thing while following the libvirt development mailing lists: people end up reinventing tons of little wheels all around. While I like the idea behind gnulib, and I even wrote an article about its use a long time ago, it starts to show a couple of shortcomings in my view. The first is that the same source code has to be bundled to a number of projects; while it’s usually ignored for the most part on modern systems that have the functions available, it’s still source code that is shipped around multiple times and that might have nasty problems. The second problem is that both on modern systems (when wrappers are involved) and on less-modern systems (or systems that comply with older versions of the various standards, such as Solaris, or AIX) the same object code is added to multiple binaries, instead of shared among them, increasing both the on-disk and in-memory sizes. It also adds the burden of verification, and replacement, of interface to the single programs rather than centralising it in a project.

Why bother, given that then you might as well just port a subset of the GNU C library (or just use a ported uClibc), and at that point you might as well not use that operating system at all? Well one of the problems with the current approach is felt even by users running Linux, Gentoo users in primis as they feel the slowness of running ./configure and having to check for the same features every time (compare this old post of mine — the best way to make a configure script faster is to reduce the number of tests it has to perform!). Shouldn’t it be enough to assume that the interfaces are present, and leave it to the user to provide a replacement library if they are not?

This is after all the favourite approach of the FFmpeg project: if POSIX or C99 mandates the presence of an interface, then FFmpeg can use it; if it’s not available, it’s up to the user/developer/packager to provide the proper flags, include paths, extra libraries to have them available. Non-standard compiler features used are a different matter, of course.

But even if this would solve the problem by having some sort of libgnucompat or libposixcompliant library to deal with other operating systems it does not solve another problem that I’ve noticed applies to libvirt: reinventing wrappers, be them security-wrappers or not. Indeed if you look at the symbols exported by libvirt.so, you’ll easily see that there are sixteen functions with virFile prefix that seems to be just convenience and security wrappers around common file operations. This reduces the amount of boilerplate code that libvirt developers have to write each time they have to use that particular feature, but then you think that similar code is written by many other projects as well to deal with the same situation; this is where convenience libraries come into being, stuff like glib, for instance.

Unfortunately, since there’s more than one way to skin a cat, there is no drought of convenience libraries, even conflicting convenience libraries, out there. And nobody seems to agree on what’s the right way to do them (for instance, I can actually appreciate very well the hatred on glib’s use of g-prefixed basic types , such as guint8 and gpointer rather than keeping with the standard types that are available in C99 such as uint8_t. While these are not always available, it’d make much more sense to make those available rather than inventing your own, no? But let’s not keep on that topic for now.

Some of the most widely common wrappers are also getting slowly into the C libraries and the actual standards, although sometimes with not-too-bright results (the getline() function really could have used a nicer, less un-specific name), and other times with huge feuds between implementers (anybody has seen strl*() functions on POSIX yet? or glibc?).

With all the defects in it as well as the other autotools, libtool has probably done one of the best wrappers out there: libltdl. With all its possible problems (and there are many), that library is well designed enough to be usable in at least three widely different configurations — as described — including the ability to bundle a copy of the library but still use the system copy if so asked (or even by default). Too bad this does not seem to happen with any other kind of wrappers’ library.

This seems to be the opposite of the system when compared with the situation happening within the Ruby community; maybe because creating and publishing a gem is so easy (especially much easier compared to the standard track of release publishing for C-based libraries and packages — or any other language that is compiled, mostly), we have a huge number of “code fragments” gems, that provide one or two source files, with either a couple or classes or a handful of useful functions that are then reused on multiple packages by the same author. Not that the Ruby way here is perfect (but it surely is better than other Ruby ways I ranted on about before), and one of the biggest problems is that many time you have multiple gem solving the same problem once and again, like for testing systems.

I don’t hold much hope that developers can sit along and decide to write on a single implementation of anything, but it sure would be so nice if it happened. You’d then have the same code shared among all processes, with no duplication, with a lot of eyes to look at the possible faults and solving them, and so on so forth. Yes it’s definitely an utopian point of view. Alas.

Autotools Mythbuster: Why autoconf is not getting any friendlier

You know that I’m an “autotools lover” in the sense that i prefer having to deal with autotools-based buildsystem that most custom build-system that are as broken as they can possibly be. Or with cmake.

But I’m definitely not saying that autotools are perfect, far from it. I’m pretty sure that I said before that there are so many non-obvious tricks around that confuse most people and cause lots of confusion out there. The presence of these non-obvious tricks is due partly to the way autoconf tarted in the first place, partly into the fact that it has to generate standard sh scripts for compatibility with older systems, and partly because upstream is not really helping the whole stack to maturate.

One of the most non-obvious, but yet common, macros out there is definitely the AC_ARG_ENABLE (and its cousin AC_ARG_WITH obviously), as I’ve blogged and documented before. The main issue with those macros is that most people expect them to be boolean (--enable and --disable; --with and --without) even though it’s quite freeform (--with-something=pfx or --enable-errors=critical). But also the sheer fact that the second parameters need to be returned by another different macro (AS_HELP_STRING) is quite silly if you think of it. And, if you look at it, the (likely more recent) AC_ARG_VAR macro does not require a further macro call for the help string.

It gets even wilder: a lot of the AC_ARG_WITH calls are, further down, wrappers around calls to PKG_CHECK_MODULES (a pkg-config support macro). How is this worse? Well, you end up repeating the same code between different projects to produce, more or less, the same output. To at least reduce the incidence of this, Luca wrote a set of macros that wrap around and perform these common tasks. The result is that you have to provide a lot less details; you lose some flexibility of course, but it produces a much neater code in your configure.ac.

Now, as I said, autoconf upstream is partly responsible for this: while the new versions have been trying to reduce the possible misuse of macros, they are bound to a vast amount of compatibility. You don’t see the kind of changes that happened with the 2.1x → 2.5x version jump, nowadays. The result of this is that you cannot easily provide new macros, as they wouldn’t be compatible with older versions of autoconf, and that would be taken as mostly bad.

And the reason for that can be found in the reason why libvirt’s configure is still stuck with compatibility toward autoconf 2.59. While autotools are designed to not require their presence on either the host or build system (unlike CMake, imake, qmake, scons, …), developing for, and checking compatibility with, older systems require to have the tools at hand to rebuild configure and company. I solve this myself by just using NFS to pass the code around the various (real and virtual) machines, after regenerating it on my Gentoo box, but I admit it’s definitely not sleek enough. In this case, libvirt is also developed against RHEL-5, and in that system the only autoconf available is 2.59.

One problem to solve this kind of problem would be to have a single, main repository of macros, that everybody could use to perform the same task. Of course this would also mean standardising on some sort of interface for macros, and for the build systems themselves, and this is easier said than done. While we’re definitely not to the point Ruby is we still aren’t really that standard. Many packages handled by more or less the same people tend to re-use the same code over and over, but that only builds a multiple amount of similar, but still half-custom build systems.

There are only two, as far as I know, efforts that ever tried extending autoconf in a “massive” way: KDE (and the results are, well, let’s just say that I can’t blame for trying to forget about KDE3’s build system), and GNOME. The latter is still alive, although it isn’t concerned with giving more generic macro interfaces to common tasks.

There is the autoconf macro archive but there is one catch with it: some of the macros submitted there have GPL or GPL-affine licenses (as long as they are compatible); that means that you might not be able to use them in MIT-licensed systems. While it even says that FSF does suggest using more open licenses for macro files, it does not require the submissions to be. Which can get quite messy on the long term, I’m afraid, for those projects.

At any rate, this post should show that I don’t really think that autotools are error-safe, but at the same time, I don’t think that creating a totally new language to express these things (like CMake does) is the solution. If only Rake was parallel-capable (which is unlikely to happen as long as Ruby cannot seriously multithread), then it would probably be, in my opinion, a better replacement for autotools than what we have now. That is, if the presence of Ruby is not a problem.

Adding to the tree for once

You probably are used to me sending last rites for packages on behalf of QA, and thus removing packages (or in the past just removing packages, without the QA part). It often seems like my final contribution to Gentoo will be negative, in the sense that I remove more than I add.

Right now, though, I’ve been adding quite a few packages to the tree, so I’d like to say that no, I’m not one of those people who just like to remove stuff and who would like you to have a minimal system!

So with some self-advertisement (and a shameless plug while I’m at it…) I’d like to point out some of the things I’ve been working on.

The first package I’d like to separate is the newly-added app-emacs/nxml-libvirt-schemas: as I ranted on I wanted to be able to have syntax completion for the libvirt (a)XML configuration files (I still maintain they should have had either a namespace, a doctype or a libvirt-explicit document element name), so now that me and Daniel got the schemas to a point where they can be used to validate the current configuration files, I’ve added the package. It uses the source tarball of libvirt, with the intent of not depending on it, I’m wondering if it’d be better to use the system-installed Relax-NG files to create the specific Relax-NG Compact files, but that’ll have to wait for 0.7.5 anyway (which means, hopefully next week).

The second set of packages is obviously tied in with the Ruby-NG work and consists of a few new Ruby packages; some are brought in from the testing overlay I’ve built to try out the new packages, others have been brought in as dependency of packages being ported to the new eclasses, one instead (addressable) is a dependency of Typo that I didn’t install through Portage lately. I should probably add that I’m testing the new ebuilds “live” on this blog, so if you find problems with it, I’d be happy to receive a line about that.

The third set of packages instead relates to a work I’m currently doing for a long-time customer of mine, a company developing embedded systems; I won’t disclose much about the project, but I’m currently helping them build a new firmware, and I’m doing most of my job through the use of Portage and Gentoo’s own ebuilds. For this reason I have already added an ebuild for gdbserver (the small program that allows for remote debugging) that makes it trivial to cross-compile it for a different architecture, and I’m currently working on a gcc-runtime ebuild (which would also be pretty useful, if I get it right, for remote servers, like my own, to void having to install the full-blown GCC, but still have the libraries needed).

And tied to that same work you’ll probably find a few changes for cross-compilation both in and out of Gentoo, and some other similar changes; I have some GCC patches that I have to send upstream, and some changes for the toolchain eclass (right now you cannot really merge a GCJ cross-compiler, or even build one for non-EABI ARM).

So this is what I’m currently adding to the tree myself, I’m also trying to help the newly-cleaned up virtualization team to handle libvirt (and its backports) as well as the GUI programs, and I should be helping Pavel getting gearmand into shape if I had more time (I know, I know). And this is to the obvious side of the tinderbox work which is still going on and on (and identi.ca proves it since the script is denting away the status continuously), or the maintainer work for things like PAM (which I bumped recently and I need to double-check for uclibc).

So now, can you see why I might forget about things from time to time?

Backports, better access to

One of the tasks that distributions have to deal with on a daily basis is ensuring that the code they package is as free of bug as humanly possible without having to run after upstream continuously. To solve this problem the usual solution is to use the upstream-provided releases, but also apply over those patches to fix issues and, most importantly, backports.

Backports are patches that upstream already accepted, and merged into the current development tree, applied over an older version (usually, the latest released version). Handling these together with external patches tends to get tricky, from one side you need to track down for the new release which have been merged already (just checking those that apply don’t do the trick, since you would probably find non-merged patches that don’t apply because source files changed between the two versions), from the other, they often apply with fuzz, which has been proven a liability with the latest GNU patch version.

Now, luckily, with time better tools have been created to handle patching: quilt is a very good one when you have to deal with generic packages, but even better than that is the git source control manager software. When you have an upstream repository in git, you can clone it, then create a new branch stemming from the latest release tag, and apply your patches, committing them right away exactly how they are. And thanks to the cherry-pick and cherry commands, handling backports (especially verifying whether they have been merged upstream) is piece of cake.

It even gets better when you use the format-patch command to generate the patchset since they will be ordered, and described, right away; the only thing that it lacks is creating a patchset tarball right out of that, not that it’s overly difficult to do though (I should probably write a script and publish that). Add then tags, and the ability to reset branches and you can see how dealing with distribution patching via git gets much nicer than it was before the coming of git.

But even this has a relative usefulness when you keep the stuff just locally-available. So to solve this problem, and especially to remove myself as a single point of failure (given my background that’s quite something — and even more considering lately I had to spend most of my time on paid work projects, as life became suddenly, and sadly, quite complicated), I’ve decided to prepare and push out branches.

So you can see there is one repository for lxc (thanks to Adrian and Sven for the backports), and one for libvirt (thanks to Daniel that after daring me went on to merge the patches to the schema I made; two out of three are in, the last one is the only reason why nxml integration for libvirt files is not yet in).

Now, there is one other project I’d like to publish the patches of this way, and that project is iscsitarget; unfortunately upstream is still using Subversion so I’ve been using a git-svn bridge which is far from nice. On the other hand my reasoning to publish that is that I dropped iscistarget! — which means that if you’ve been relying on it, from next kernel release onward you’ll probably encounter build failures (I already fixed it to build with 2.6.32 beforehand since I was using release candidates for a while). Myself, I’ve moved to sys-block/tgt (thanks to Alexey) which does not require an external kernel module, but rather uses the SCSI target module that Linux already provides out of the box.

Again on libvirt’s XML configuration files

So, Daniel Velliard – among the other things the maintainer of libvirt – dared me to give further voice to my concerns about libvirt’s configuration files being “aXML, almost XML”, given, and I quote him here:

I’m also on the XML standard group at W3C and the main author of libxml2, I would have no troubles debunking your argument very quickly

I guess I’ll then have to cross-post this entry to the libvir-list to make sure that I’m not, to paraphrase him, “hiding” on my blog (do I hide on a blog with at least 15K visitors a month, with no comment moderation, and syndicated on the homepage of Gentoo’s website ?).

First of all, I’m not really undermining Daniel’s technical preparation; I’m pretty sure he generally knows what he’s doing. But his being the author of libxml2 or being part of W3C’s standard group does not mean that he’s perfect. Nor it means that somebody should refrain from commenting on his ideas. This really reminds me of what people said about Ryan when I criticised his idea and just wanted me to shut up because he did a good job at porting games before. My original title for this post was reading “Being part of groups or having written libraries does not make you infallible” but it wasn’t catchy enough.

So, this is the preamble for my blog’s readers only, since it’s definitely not relevant to the libvirt project. The rest is for the list as well.

Hello,

In a recent post on my blog I ranted on about libvirt and in particular I complained that the configuration files look like what I call “almost XML”. The reasons why I say that are multiple, let me try to explain some.

In the configuration files, at least those created by virt-manager there is no specification of what the file should be (no document type, no namespace, and, IMHO, a too generic root element name); given that some kind of distinction is needed for software like Emacs’s nxml-mode to know how to deal with the file, I think that’s pretty bad for interaction between different applications. While libvirt knows perfectly well what it’s dealing with, other packages might not. Might not sound a major issue but it starts tickling my senses when this happens.

The configuration seem somewhat contrived in places like the disk configuration: if the disk is file-backed it require the file attribute to the <source> element, while it needs the dev attribute if it’s a block device; given that it’s a path in both cases it would have sounded easier on the user if a single path attribute was used. But this is opinable.

The third problem I called out for in the block is a lack of a schema for the files; Daniel corrected me pointing out that the schemas are distributed with the sources and installed. Sure thing, I was wrong. On the other hand I maintain that there are problems with those schemas. The first is that both the version distributed with 0.7.4 and the git version as of today suffer from bug #546254 (secret.rng being not well formed) so it means nobody has even tested them as of lately; then there is the fact that they are never referenced by the human-readable documentation which is why I didn’t find it the first time around; add also to that some contrived syntax in those schema as well that causes trang to produce a non-valid rnc file out of them (nxml-mode uses rnc rather than rng).

But I guess the one big problem with the schemas is that they don’t seem to properly encode what the human-readable documentation says, or what virt-manager does. For instance (please follow me with selector-like syntax), virt-manager creates /domain/os/type[@machine='pc-0.11'] in the created XML; the same attribute seem to be documented: “There are also two optional attributes, arch specifying the CPU architecture to virtualization, and machine referring to the machine type”. The schema does not seem to accept that attribute though (“element type: Relax-NG validity error : Invalid attribute machine for element type” with xmllint, just to make sure that it’s not a bug in any other piece of software, this is Daniel’s libxml2).

Now after voicing my opinions here, as Daniel dared me to do, I’d like to explain a second why I didn’t post this on the list in the first place: of what I wrote here, my beefs for calling this aXML, the only things that can be solved easily are the schemas; schemas that, at the time I wrote the blog, I was unable to find. The syntax, and the lack of a “safe” identification of the files as libvirt’s are the kind of legacy problems one has to deal with to avoid wasting users’ time with migrations and corrections, so I don’t really think they should be addressed unless a redesign of the configuration is intended.

Just my two cents, you’re free to take them as you wish, I cannot boast a curriculum like Daniel’s, but I don’t think I’m stepping out of place to point out these things.

I know you missed them: virtualisation ranting goes on!

While I started writing init scripts for qemu I’ve been prodded again by Luca to look at libvirt instead of reinventing the wheel. You probably remember me ranting about the whole libvirt and virt-manager suite quite some time ago as it really wasn’t my cup of tea. But then I gave it another try.

*On a very puny note here, what’s up with the lib- prefix here? libvirt, libguestfs, they don’t look even remotely like libraries to me… sure there is a libvirt library in libvirt, but then shouldn’t the daemon be called simply virtd?*

The first problem I found is that the ebuild still tries to force dnsmasq and iptables on me if I have the network USE flags enabled; turns out that neither is mandatory so I have to ask Doug to either drop them or add another USE flag for them since I’m sure they are pain in the ass for other people beside me. I know quite a bit of people ranted about dnsmasq in particular.

Sidestepped that problem I first tried, again, to use the virt-manager graphical interface to build a new VM interface. My target this time was to try re-installing OpenSUSE, this time, though, using the virtio disk interface.

A word of note about qemu vs. qemu-kvm: at first I was definitely upset by the fact that the two cannot be present on the same system, this is particularly nasty considering the fact that it takes a little longer to get the qemu-kvm code bumped when a new qemu is released. On the other hand, after finding out that, yeah, qemu allows you to use virtio for disk device but no, it doesn’t allow you to boot from them, I decided that upstream is simply going crazy. Reimar maybe you should send your patches directly to qemu-kvm, they would probably be considered I guess.

The result of the wizard was definitely not good; the main problem was that the selection for the already-present hard disk image silently failed; I had to input the LVM path myself, which at the time felt a minor problem (although another strange thing was that it could see just one out of the two volume groups I have in the system); but the result was … definitely not what I was looking for.

First problem was that the selection dialog that I thought was not working was working alright… just on the wrong field, so it replaced the path to the ISO image to use for installing with that of the disk again (which as you might guess does not work that well). The second problem was that even though I set explicitly that I wanted to use a Linux version with support for virtio devices, it didn’t configure it to use virtio at all.

Okay, time to edit the configuration file by hand; I could certainly use virt-manager to replace vinagre to access the VNC connections (over unix path instead of TCP/IP to localhost), so that would be enough to me. Unfortunately the configuration file declares it to be XML; if you know me you know I’m not one of those guys who just go away screaming as soon as XML is involved, even though I dislike it as a configuration file it makes probably quite a bit of sense in this case, I found by myself trying to make the init script above usable that the configuration for qemu is quite complex. The big bad turn down for me is that *it’s not XML, it’s aXML (almost XML)!

With the name aXML I call all those uses of XML that are barely using the syntax but not the features. In this particular case, the whole configuration file, while documented for humans, is lacking an XML declaration as well as any kind of doctype or namespace that would tell a software like, say, nxml, what the heck is it dealing with. And more to the point, I could find no Relax-NG or other kind of schema for the configuration file; with one of those, I could make it possible for Emacs to become a powerful configuration file editor: it would know how to validate the syntax and allow completion of elements. Lacking it, it’s quite a task for the human to look at.

Just to make things harder, the configuration file, which, I understand, has to represent very complex parameters that the qemu command line accepts, is not really simplified at all. For instance, if you configure a disk, you have to choose the type between block and file (which is normal operation even for things like iSCSI); unfortunately to configure the path the device or file is found you don’t simply have a <source>somepath</source> element but you need to provide <source dev="/path" /> or <source file="/path" /> — yes, you have to change the attribute name depending on the type you have chosen! And no, virsh does not help you by telling you that you had an invalid attribute or left one empty; you have to guess by looking at the logs. It doesn’t even tell you that the path to the ISO image you gave is wrong.

But okay, after reconfiguring the XML file so that the path is correct, that network card and disks are to use virtio and all that stuff, as soon as you start you can see a nice -no-kvm in the qemu command line. What’s that? Simple: virt-manager didn’t notice that my qemu is really qemu-kvm. Change the configuration to use kvm and surprise surprise: libvirtd crashes! Okay to be fair it’s qemu that crashes first and libvirtd follows it, but the whole point is that if qemu is the hypervisor, libvirtd should be the supervisor and not crash if the hypervisor it launched doesn’t seem to work.

And it goes even funnier: if I launch as root the same qemu command, it starts up properly, without network but properly. Way to go libvirt; way to go. Sigh.

Linux Containers and the init scripts problem

Since the tinderbox is now running on Linux containers I’m also experimenting with making more use of those. Since containers are, as the name implies, self contained, I can use them in place of chroots for testing stuff that I’d prefer wouldn’t contaminate my main system, for instance I can use them instead of the Python virtualenv to get a system where I can use easy_install to copy in the stuff that is not packaged in Portage as a temporary measure.

But after some playing around I came to the conclusion that we got essentially two problems with init scripts. Two very different problems actually, and one involves more than just Linux Containers, but I’ll just state both here.

The first problem is specific to Linux Containers and relates to one limitation I think I wrote of before; while the guest (tinderbox) cannot see the processes of the host (yamato) the opposite is not true, and indeed the host cannot really distinguish between its processes and the ones from the guest. This isn’t much of a problem, since the start and stop of daemons is usually done through pidfiles that list the started process id, rather than doing a search and destroy over all the processes.

But the “usually” part here is the problem: there are init scripts that use the killall command (which as far as I can tell does not take namespaces into consideration) to identify which process to send signals to. It’s not just a matter of using it to kill processes; most of the times, it seems to be used to send signals to the daemon (like SIGHUP for reloading configuration or stuff like that). This was probably done in response to changes to start-stop-daemon that asked for it not to be used for that task. Fortunately, there is a quick way to fix this: instead of using killall we can almost always use kill and take the PID to send the signal to through the pidfile created either by the daemon itself or by s-s-d.

Hopefully this won’t require especially huge changes, but it brings up the issue of improving the quality assurance over the init scripts we currently ship. I found quite a few that dependent on services that weren’t in the dependencies of the ebuild (either because they are “sample configurations’ or because they lacked some runtime dependencies), a few that had syntax mistakes in them (some due to the new POSIX-correctness introduced by OpenRC, but not all of them), and quite a bit of them which run commands in global scope that slow down the dependencies regeneration. I guess this is something else that we have to decide upon.

The other problem with init script involves KVM and QEmu as well. While RedHat has developed some tools for abstracting virtual machine management, I have my doubts about them as much now as I had some time ago for what concerns both configuration capabilities (they still seem to bring in a lot of unneeded stuff – to me – like dnsmasq), and now code quality as well (the libvirt testsuite is giving me more than a few headaches to be honest).

Luca already proposed some time ago that we could just write a multiplex-capable init script for KVM and QEmu so that we could just configure the virtual machines like we do for the network interfaces, and then use the standard rc system to start and stop them. While it should sound trivial, this is no simple task: starting is easy, but stopping the virtual machine? Do you just shut it down, detaching the virtual power cord? Or do you go stopping the services internal to the VM as you should? And how do you do that, with ACPI signals, with SSH commands?

The same problem applies to Linux containers, but with a twist: trying to run shutdown -h now inside a Linux container seem to rather stop the host, rather than the guest! And there you cannot rely on ACPI signals either.

If somebody has a suggestion, they are very welcome.