Documentation: remake versus download

One of the things that I like a lot about Gentoo is that you can easily have installed the whole set of documentation for almost every library out there, being API, tutorials or all the stuff like that.

This, unfortunately, comes with a price: you need the time and the tools to build this documentation most of the times. And sometimes the tools you’re needed to install are almost overkill against the library they are used by. While most of the software out there with generated man pages ships with them already prebuilt in the tarball (thanks to automake, the whole thing can be done quite neatly), there are packages that don’t ship with them, either because they don’t have a clean way to tar them up at release or because they are not released (ruby-elf is culprit of this too, since it’s only available on the repository for now).

For those, the solution usually is to bring in some extra packages like, for the ruby-elf case above, the docbook-ns stylesheets that are used to produce the final man page from the DocBook 5 sources. But it might not use this: there are quite a lot of different ways to build man pages: perl scripts, compiled tools, custom XML formats, you name it.

And this is just for man pages, which are usually updated explicitly by their authors: API documentation, which is usually extrapolated from the source code directly, is rarely generated when creating the final release distribution. This goes for C/C++ libraries that use doxygen or gtk-doc, to Java packages that use JavaDoc, to Ruby extensions that use RDoc (indeed, the original idea for this post came to me when I was working on Ruby-ng eclass and noticed that almost all the Ruby extensions I packaged required me to rebuild the API documentation at build time).

Now, when it comes to API documentation, it’s obvious we don’t really want to “waste” time generating it for non-developers: they would never care about reading it in the first place. This is why we have USE flags after all. But sometimes, even this does not seem to be enough control. The first problem is: which format do we use for the documentation? For those of you that don’t know it, Doxygen can generate documentation in many forms, included but not limited to HTML, PDF (through LaTeX) and Microsoft Compressed HTML (CHM). There are packages that do build all formats available; some autodiscover the available tools, other try to use the tools even when they are not installed in the system.

We should probably do some kind of selection, but it has to be said it’s not obvious, especially when upstream, while adding proper targets to rebuild documentation, only design them for their own usage: to generate and publish, on their site or something, the resulting documentation. We install the documentation for the system user, we should probably focus on what can be displayed on screen, which would probably steer us toward installing HTML files because they are browsable and easy to look at on the screen. But I’m sure there are people who are interested in having the PDFs at hand instead, so if we were to focus on just those people will complain. Not like at this point I’m caring about a 100% experience but rather having a good experience for a 90% of people, maybe 95%.

I do remember that there are quite a few packages that do try to use LaTeX to rebuild documentation, this because there have been quite a few sandbox problems with the font cache that was regenerated during portage build. Unfortunately, I don’t have any number at hand, because – silly me – the tinderbox strips documentation away to save space (maybe I should remove that quirk, the raid1 volumes have quite a bit of free space by now). I can speak, recently, for Ragel, which I’ve move away from rebuilding the documentation, inspired first by the FreeBSD ports which downloaded the pre-built PDF version from Ragel’s site (I did the same for version 6.4, under doc USE flag), and then sidestepping the issue altogether since upstream now ships with the PDF in the source tarball.

But this is also buggering me as upstream for a few projects: what is the best for my users? The online API documentation is useful when you don’t want to rebuild the documentation locally, and can be searched by search engines much more easily, but is that enough? Offline users? Users with restricted bandwidth? Servers with restricted bandwidth? Of course offline users can regenerate the documentation, but is that the best option? Should the API documentation be shipped within the source tarball? That could make the tarball much much bigger than just the sources; it can even double in size.

Downloadable documentation, Python-style, looks to me like one of the best options. You get the source tarball, and the documentation tarball; you install the latter if the doc USE flag is enabled. But how to generate them? I guess that adding one extra target to the Makefiles (or equivalent for your build system) may very well be an option, I’ll probably work on that for lscube with a ready recipe showing how to make the tarball during make dist (and of course documenting it where it’s easier to reach than my blog).

The only problem with this is that it doe not take advantages of improved generation by newer version of the software; for instance if one day Doxygen, JavaDoc, RDoc and the like decide finally to agree on a single, compatible XML/XHTML format for documentation to be accessed with an application integrating a browser and an index system (I’d like to say that both Apple and Microsoft provide applications that seem to be doing that; I haven’t used them quite long enough to actually tell how well they work, but they are designed to do that).

But at least let this be a start for a discussion: should we really rebuild PDF documentation when installing packages for Gentoo, even under doc USE flag, or should we stick with more display-oriented formats?

6 thoughts on “Documentation: remake versus download

  1. Why not have some eselect stuff to be able to select the type of doc you want to privilege ?(so to set the type of output for doxygen, simple or multi to have browsable and portable …)

    Like

  2. An eselect module alone wouldn’t be able to cut it through; you need a way to export the setting to the ebuilds, so USE flags or USE_EXPAND’d variables.While feasible, I’m not really sure if we want to go that way. Sure it’s possible but… I don’t see a huge request for that explicitly.

    Like

  3. Yeah this is definitely an area with room for improvement IMHO. I think a consistency of USE=doc would be a nice start. Some packages use this flag to regenerate the documentation only, but still install it regardless because it’s part of the tarball (many gtk-doc related stuff for example is prebuild and already included in the release tarball). Others use it like expected and install documentation only when set. And then you also have FEATURES=nodoc which is mostly unrelated to api documentation.As for building of non-upstream supported documentation targets, it usably involves tinkering with Makefiles and Doxyfiles etc. So if upstream doesn’t want to adapt it will probably become a real maintenance burden. Personally, google is my friend in these areas, but I definitely like something like devhelp for these queries too.

    Like

  4. It would be nice if html could be generated for the gnu stuff that uses texinfo right now. Any alternative to info would be good.

    Like

  5. As upstream for Amanda and Buildbot, I see the other side of this problem. The particular set of docbook requirements for our manpages is quite difficult for users to install (except on Gentoo..), so we go to great lengths to pre-build the documentation and avoid the possibility of a user’s build system trying to re-build it.I consider Gentoo the most authoritative downstream, since Gentoo devs usually know *more* about the build process for my own apps than I do(!), and also have the breadth of experience to be able to say “this is the normal way to do it.” So I’d be interested to hear any feedback you might have for Amanda or Buildbot.

    Like

  6. From a user’s perspective, I would like to have much more USE or USE_EXPAND flags to control the documentation type: There is no need to have the same documentation as pdf, ps, dvi, html, info, man-page or even more formats installed simultaneously, especially on laptops where disk space is an issue.Moreover, it is also a time issue, if I think about e.g. asymptote which had a new release almost every day, and where building of each of the formats took really long (finally, they changed to less formats). But the only convenient alternative – USE=”-doc”, i.e. no documentation at all – is often also not what you want.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s