This Time Self-Hosted
dark mode light mode Search

Modernising XML technologies

While I’m at the hospital I decided I will be not wasting my time all day, although the article on daydreaming that Donnie linked is quite interesting indeed, staying here costs. It’s not the cost of the hospitalisation per se, because the health service (SSN) pays for it, but there are added costs: starting from the puny crosswords magazines, to the phone calls (which we try to share in the family, calling in turn), to the hostel fees, for my family to stay here in Verona for the weekend (and then once the surgery is near for my mother to stay here until they release me). And you can guess that I’ll spend my convalescence buying stuff to spend the time.

So, while I cannot feasibly work on my main programming jobs (the harddisk of the laptop is not big enough to keep all the development tools in the Windows partition on Boot Camp, nor the virtual machine with the data in it), I’m working on writing, which is something I can easily do from here even without a network connection (the article I wrote for the Italian edition of Linux Journal, two years ago, I wrote entirely offline, on the iBook), and refined just a few days before sending it off to the editor.

While I used to use LaTeX for my articles, it’s not really a good option when the article has to be published online, rather than being printed; indeed, even tex4ht, which is not a bad tool at all, does not make it nice to translate LaTeX documents to HTML. For this reason, I switched, a few months ago, starting from Implications of pure and constant functions to use DocBook. I’m still learning to use it, I’m not really good at it yet, but it’s nice. And with the recent release of the fifth major version, it also starts to feel much more XML than it did before. Which is something bad for many people, but good for me; I do like XML when it’s used in the right way for the right idea.

But the new DocBook release made me think quite a bit. Right now, the documentation for Gentoo is written using GuideXML, which is neither a subset nor a superset of DocBook, it’s a totally standalone custom format that has just a few similarities to DocBook. It would be nice if we could implement a new format for documentation, as an extension to DocBook 5 (which, through the use of namespaces, would be quite easy to implement and maintain, in my opinion); we could be keeping all the positive sides of DocBook, included its widespread usage and knowledge, and still adding those things we need and are present in GuideXML.

This of course would require a fair amount of work, as it means writing new stylesheets, new schemas, new conversion code, and converting a huge amount of documentation. Why should it be considered (in my opinion) if there are these drawbacks? Well, I have a few reasons.

The first is that, as I said before, DocBook is much more widely used than GuideXML, which, in turn, means there are many more people used to work with DocBook than with GuideXML. This makes it interesting because even if it’s more complex, it would require less specific knowledge to be able to write Gentoo documentation, and this is interesting as it would allow more contributions. I know that GuideXML is much less complex than DocBook, but the human nature is lazy, and if you got users who know DocBook already, it’s more likely than they’d be contributing if the format used in Gentoo was an extension rather than an independent standalone format.

The second is that, sincerely, I find GuideXML too limited in some regards; in the guides I have written, like the as-needed fixing guide and the backtracing guide, I ended up using the <c> element for command parameters. The problem here, in my opinion, is that it tries to be both meat and fish: it’s semantical in some aspects but it’s stylistic in others (<e>), it’s HTML-compatible in some regards (<p> and <pre>) but then it’s not for others (again <e>). The result is that I sincerely feel like I miss something from time to time, feeling that I suppose I would have with standard DocBook too, but this is why I would expect we’d be using an extension to that.

Also, there’s an extra advantage in this: with the due work it would be quite easy to have a PDF version of the guide, that would probably be quite welcome to the people interested in a printed version (printer-friendly versions aren’t that printer friendly after a while).

It would be nice to look at this together with the previous proposal from Tiziano for moving the definition of the XML formats we use from DTD to Relax NG. It’s important to remember that the XML technologies are designed to be extensible, and that they are not yet stable, in particular, many of them (like DTD) are now obsoleted, as they are not expressive enough, and they were kept as a compatibility with previous SGML-based markup languages; the same holds true for the older versions of DocBook.

Comments 9
  1. Well, Diego, as far as I know you haven’t discussed this with any member of the GDP at all. Not once. So…this is somewhat of a surprise, you might say.In no particular order, I’d like to respond to some points of your post.First, yes, GuideXML is easier than DocBook. This gets more contributors in the door, especially from the larger non-developer community. If folks had to know docbook, with its dozens (hundreds?) of extra tags and such, we’d get far, far fewer patches or new documentation. Even with the easy XML format we use, we don’t get enough.Yeah, docbook is more widely used. But it’s nowhere near as pretty on-screen, nor as user-friendly to navigate. And it’s extremely bloated; our XSLT would have to increase by a few orders of magnitude given all the extra useless tags that we’d still have to account for.What in Gentoo actually uses docbook? The only thing that comes to mind is the devmanual, and that doesn’t get all that many patches, either. I don’t think you’ve made a sufficient argument that it’s easier to get contributors too. Perhaps you’d get a few more contributors from within the Gentoo developer pool, but that’s about it. Heck, most Gentoo devs don’t want to have anything to do with writing docs as it is, much less asking them to do it in a complex format like docbook! :)Second, you should be using the <c> tag for commands. That’s what it’s for: C for Command. Whether that’s a command parameter or the command itself. :)Third, you’re right that both GuideXML and docbook aren’t always about pure semantics. I think that’s just human nature creeping into their design — we want to go with what feels intuitive/easy/quick/whatever, which is not always what’s semantically correct. This holds true for both XML flavors, so there’s little to be gained by just switching.Fourth, I’ve worked with both GuideXML and docbook, and while I appreciate that it really is easier to convert docbook into various non-XML formats, since there are a lot of existing tools, just because it’s easier to convert doesn’t mean that we should switch. The GDP doesn’t really get requests from folks that want PDFs of our stuff, or anything.Actually, we used to generate PDFs of the installation handbooks and place them on the LiveCDs, but at some point we stopped doing that — I think something in our XSL changed. The script got broken somehow, at any rate.For random fun, you might want to check out this article drobbins wrote on creating GuideXML (at least, in its earliest form, before swifT and neysx started improving it):http://www.gentoo.org/doc/e

  2. I actually think I did discuss it (albeit I admit not on the lists) with someone from the doc team before. But it was before DocBook 5 and it was easy to see there wasn’t enough cross-over to allow for getting both compatibility.I still think it’s worth to investigate on the feasibility. In particular, I think we’re having “too many formats” for documentation and other stuff, and that it would be an option to investigate about reducing them.Still, not going to act on this until I’m well healthy 🙂 Just trying to make people think, here 😉

  3. I’ve been a member of GDP for more than three years, and contributing at least one year before that. During that time, I haven’t seen a single user submission in Docbook.The GDP has been for ages announcing that they accept contributions in any format. We have plenty of minions who can convert stuff from your favorite text format to GuideXML.So it seems that all folks in the GDP are pretty happy with current state of affairs, and the problem is other Gentoo developers who somehow don’t like the GuideXML.This is of course pretty valid point.However, there’s a simple solution — you can specify your own XSLT stylesheet in the XML file, so you can just go ahead, write a custom XSLT for Docbook handling and be happy ever after. If this stylesheet have more than one user, I’m sure you can paersuade us to put them into our CVS and extend support for XM Lmarkups by GuideXML.

  4. I’ve spent some time testing DocBook 5 while working on my Gentoo book. It’s possible to cut it down to a very small subset of allowed tags. You can do this using a Relax-NG file to set notAllowed for anything you don’t want in there.If you want to try, I could send over some of my stuff. That also enables emacs’s tab completion to work nicely on the allowed subset.

  5. The problem is that it’s not just GDP that uses GuideXML. And there are quite a few guides (like the ones I linked to, which I wrote myself) that are not under GDP at all. Contributions for those are quite rare for what I can see, and I’ve been asked/told before that DocBook would have been preferred.Sincerely, my reason to wonder is to actually assess the feasibility to avoid specific specific knowledge by deprecating GuideXML in favour of an extended/reduced DocBook. If it’s proven unfeasible, my second choice would be to try reducing the gap between the two (like adding a namespaced GuideXML, using @xml:id@ and XLink, like DocBook is doing now).I’m not saying that GuideXML is bad per-se, but it’s a specific-knowledge that seems to say “NIH syndrome” if there is a feasible alternative using a standard and more widely used format.I’m unable to try much from here, and will probably be unable to test much before surgery (as I’m not leaving for hom before that), afterward it’ll have to be seen, but I sincerely want to assess this objectively, and eventually start planning what could be done to reduce the gap between the two.Sure, GuideXML is much simpler, but why should two different skillset be required to write the two formats? I’d sincerely be quite happy if I could forget GuideXML to just focus on DocBook, they still confuse me sometimes. Even if we won’t be using full-fledged DocBook 5, it’d be an option to use a slimmed down set.

  6. Sure, go ahead and write the XSLT. If there are more and more documents being written in the new markup format, we can revisit our decision and deprecate GuideXML.Let me repeat myself once again — if you (or anybody else) are having troubles with editting a GuideXML document, the GDP has plaenty of monkeys to do the job. Just drop by IRC or file a bug and we’re happy to help.You seriously aren’t suggesting to convert all our docs to some custom version of Docbook, are you? I don’t care why Daniel Robbins hadn’t used any of existing XML-based documentation formats, what matters to me is the sheer amount of docs that *are* in GuideXML. We don’t *force* you to use GuideXML, please don’t force us to convert our stuff to something more shiny.

  7. Actually, converting GuideXML to Docbook would be a lot less difficult than converting Docbook to GuideXML, largely because GuideXML has a small amount of supported entities/tags so writing an XSLT that maps those onto Docbook structures shouldn’t be all that difficult.GuideXML however has support for variables (such as those used by the handbooks for fast release documentation updates) which, afaik, isn’t supported in Docbook. And extending Docbook with additional (namespaced) tags mutes this discussion somewhat, as you’re then discussing GuideXML versus custom-Docbook…Otoh, I’ve been writing in Docbook for a while now (although I must admit, 4-series) and appreciate all the resources surrounding the format.

  8. I find Docbook a PITA to write. The tags are just too long. GuideXML looks much prettier in a text editor to me. But I agree that a Docbook subset would be the optimal route for documentation functionality-wise.If no one else has the problem with the long tags, I think Donnie should post his Docbook subset, everyone can switch to that, and I can write XSLT that maps one-character tags to the relevant Docbook elements. 🙂

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.