So, Daniel Velliard – among the other things the maintainer of libvirt – dared me to give further voice to my concerns about libvirt’s configuration files being “aXML, almost XML”, given, and I quote him here:
I’m also on the XML standard group at W3C and the main author of libxml2, I would have no troubles debunking your argument very quickly
I guess I’ll then have to cross-post this entry to the libvir-list to make sure that I’m not, to paraphrase him, “hiding” on my blog (do I hide on a blog with at least 15K visitors a month, with no comment moderation, and syndicated on the homepage of Gentoo’s website ?).
First of all, I’m not really undermining Daniel’s technical preparation; I’m pretty sure he generally knows what he’s doing. But his being the author of libxml2 or being part of W3C’s standard group does not mean that he’s perfect. Nor it means that somebody should refrain from commenting on his ideas. This really reminds me of what people said about Ryan when I criticised his idea and just wanted me to shut up because he did a good job at porting games before. My original title for this post was reading “Being part of groups or having written libraries does not make you infallible” but it wasn’t catchy enough.
So, this is the preamble for my blog’s readers only, since it’s definitely not relevant to the libvirt project. The rest is for the list as well.
Hello,
In a recent post on my blog I ranted on about libvirt and in particular I complained that the configuration files look like what I call “almost XML”. The reasons why I say that are multiple, let me try to explain some.
In the configuration files, at least those created by virt-manager there is no specification of what the file should be (no document type, no namespace, and, IMHO, a too generic root element name); given that some kind of distinction is needed for software like Emacs’s nxml-mode to know how to deal with the file, I think that’s pretty bad for interaction between different applications. While libvirt knows perfectly well what it’s dealing with, other packages might not. Might not sound a major issue but it starts tickling my senses when this happens.
The configuration seem somewhat contrived in places like the disk configuration: if the disk is file-backed it require the file attribute to the
<source>
element, while it needs the dev attribute if it’s a block device; given that it’s a path in both cases it would have sounded easier on the user if a single path attribute was used. But this is opinable.The third problem I called out for in the block is a lack of a schema for the files; Daniel corrected me pointing out that the schemas are distributed with the sources and installed. Sure thing, I was wrong. On the other hand I maintain that there are problems with those schemas. The first is that both the version distributed with 0.7.4 and the git version as of today suffer from bug #546254 (secret.rng being not well formed) so it means nobody has even tested them as of lately; then there is the fact that they are never referenced by the human-readable documentation which is why I didn’t find it the first time around; add also to that some contrived syntax in those schema as well that causes trang to produce a non-valid rnc file out of them (nxml-mode uses rnc rather than rng).
But I guess the one big problem with the schemas is that they don’t seem to properly encode what the human-readable documentation says, or what virt-manager does. For instance (please follow me with selector-like syntax), virt-manager creates
/domain/os/type[@machine='pc-0.11']
in the created XML; the same attribute seem to be documented: “There are also two optional attributes, arch specifying the CPU architecture to virtualization, and machine referring to the machine type”. The schema does not seem to accept that attribute though (“element type: Relax-NG validity error : Invalid attribute machine for element type” with xmllint, just to make sure that it’s not a bug in any other piece of software, this is Daniel’s libxml2).Now after voicing my opinions here, as Daniel dared me to do, I’d like to explain a second why I didn’t post this on the list in the first place: of what I wrote here, my beefs for calling this aXML, the only things that can be solved easily are the schemas; schemas that, at the time I wrote the blog, I was unable to find. The syntax, and the lack of a “safe” identification of the files as libvirt’s are the kind of legacy problems one has to deal with to avoid wasting users’ time with migrations and corrections, so I don’t really think they should be addressed unless a redesign of the configuration is intended.
Just my two cents, you’re free to take them as you wish, I cannot boast a curriculum like Daniel’s, but I don’t think I’m stepping out of place to point out these things.
The reason I dislike XML is that it is usually used based on this concept (from the reply to your mail):”basically in normal use of libvirt nobodyshould have to look at the XML, the virt-viewer/virt-install etc…tools should generate and handle those for you.”And my experience is just that those tools you are supposed to use just never work correctly and/or support exactly what you need so you almost always still end up editing the file by hand.That does not have to be like this, but it will be unless you place the priority of the configuration tools (far?) _above_ that of your main project, which _very_ few are willing to do.
This is my main gripe with people using XML. I love XML and personally I think it’s great as a configuration file syntax. But I make a point of ensuring that I keep up-to-date and very strict schemas, which I can then use to validate the configuration file myself. Personally I see keeping up-to-date schemas as a pre-requisite to using XML…
I’m an Amanda developer, and we’ve started using XML, too – not for config files, but for metadata storage and for parsable output from reporting tools.In getting started on this project, I found an incredible lack of advice on how to go about this. A few of the issues we’ve encountered:* a schema has a hard time capturing all of the detail needed to describe the meaning of a file to a human; balancing that with prose documentation is hard* it’s not clear when to use an attribute and when to use a nested element* parsers and generators are huge amounts of code even for simple schemas, especially in CSo I’d be interested in how well we’ve done so far, and I’d love to see a post or two about the *correct* way to do these things. Easily my favorite thing about Gentoo is that it puts a high priority on “doing it right” and on pushing that right-ness upstream.