Modernising XML technologies

While I’m at the hospital I decided I will be not wasting my time all day, although the article on daydreaming that Donnie linked is quite interesting indeed, staying here costs. It’s not the cost of the hospitalisation per se, because the health service (SSN) pays for it, but there are added costs: starting from the puny crosswords magazines, to the phone calls (which we try to share in the family, calling in turn), to the hostel fees, for my family to stay here in Verona for the weekend (and then once the surgery is near for my mother to stay here until they release me). And you can guess that I’ll spend my convalescence buying stuff to spend the time.

So, while I cannot feasibly work on my main programming jobs (the harddisk of the laptop is not big enough to keep all the development tools in the Windows partition on Boot Camp, nor the virtual machine with the data in it), I’m working on writing, which is something I can easily do from here even without a network connection (the article I wrote for the Italian edition of Linux Journal, two years ago, I wrote entirely offline, on the iBook), and refined just a few days before sending it off to the editor.

While I used to use LaTeX for my articles, it’s not really a good option when the article has to be published online, rather than being printed; indeed, even tex4ht, which is not a bad tool at all, does not make it nice to translate LaTeX documents to HTML. For this reason, I switched, a few months ago, starting from Implications of pure and constant functions to use DocBook. I’m still learning to use it, I’m not really good at it yet, but it’s nice. And with the recent release of the fifth major version, it also starts to feel much more XML than it did before. Which is something bad for many people, but good for me; I do like XML when it’s used in the right way for the right idea.

But the new DocBook release made me think quite a bit. Right now, the documentation for Gentoo is written using GuideXML, which is neither a subset nor a superset of DocBook, it’s a totally standalone custom format that has just a few similarities to DocBook. It would be nice if we could implement a new format for documentation, as an extension to DocBook 5 (which, through the use of namespaces, would be quite easy to implement and maintain, in my opinion); we could be keeping all the positive sides of DocBook, included its widespread usage and knowledge, and still adding those things we need and are present in GuideXML.

This of course would require a fair amount of work, as it means writing new stylesheets, new schemas, new conversion code, and converting a huge amount of documentation. Why should it be considered (in my opinion) if there are these drawbacks? Well, I have a few reasons.

The first is that, as I said before, DocBook is much more widely used than GuideXML, which, in turn, means there are many more people used to work with DocBook than with GuideXML. This makes it interesting because even if it’s more complex, it would require less specific knowledge to be able to write Gentoo documentation, and this is interesting as it would allow more contributions. I know that GuideXML is much less complex than DocBook, but the human nature is lazy, and if you got users who know DocBook already, it’s more likely than they’d be contributing if the format used in Gentoo was an extension rather than an independent standalone format.

The second is that, sincerely, I find GuideXML too limited in some regards; in the guides I have written, like the as-needed fixing guide and the backtracing guide, I ended up using the <c> element for command parameters. The problem here, in my opinion, is that it tries to be both meat and fish: it’s semantical in some aspects but it’s stylistic in others (<e>), it’s HTML-compatible in some regards (<p> and <pre>) but then it’s not for others (again <e>). The result is that I sincerely feel like I miss something from time to time, feeling that I suppose I would have with standard DocBook too, but this is why I would expect we’d be using an extension to that.

Also, there’s an extra advantage in this: with the due work it would be quite easy to have a PDF version of the guide, that would probably be quite welcome to the people interested in a printed version (printer-friendly versions aren’t that printer friendly after a while).

It would be nice to look at this together with the previous proposal from Tiziano for moving the definition of the XML formats we use from DTD to Relax NG. It’s important to remember that the XML technologies are designed to be extensible, and that they are not yet stable, in particular, many of them (like DTD) are now obsoleted, as they are not expressive enough, and they were kept as a compatibility with previous SGML-based markup languages; the same holds true for the older versions of DocBook.