This Time Self-Hosted
dark mode light mode Search

This is crazy, man!

And here, the term “man” refers to the manual page software.

I already noted that I was working on fixing man-db to work with heirloom-doctools but I didn’t go in much details about the problems I faced. So let me try to explain it a bit in more detail, if you’re interested.

The first problem is, we have two different man implementations in portage: sys-apps/man and sys-apps/man-db. The former is the “classical” one used by default, while the latter is a newer implementation, which is supposedly active and maintained. I say supposedly because it seems to me like it’s not really actively maintained, nor tested, at all.

The “classic” implementation is often despised because, among other things, it’s not designed with UTF-8 in mind at all, and even its own output, for locales where ASCII is not enough, is broken. For instance, in Italian, it outputs a latin1 bytestreams even when the locale is set to use UTF-8. Confusion ensures. The new implementation should also improve the caching by not using flat-files for the already-processed man pages, but rather using db files (berkeley or gdbm).

So what’s the problem with man-db, the new implementation? It supports UTF-8 natively, so that’s good, but the other problem is that it’s only ever tested coupled with groff (GNU implementation of the (n)roff utility), and with nothing else. How should this matter? Well, there are a few features in groff that are not found anywhere else, this includes, for instance, the HTML output (eh? no I don’t get it either, I can tell that the nroff format isn’t exactly nice, and I guess that having the man pages available as HTML makes them more readable for users, but why adding that to groff, and especially to the man command? no clue), as well as some X-specific output. Both those features are not available in heirloom, but as far as I can see they are so rarely used that it really makes little sense to depend on groff just because of those; and the way man-db is designed, when a nroff implementation different from groff is found, the relative options are not even parsed or considered; which is good to reduce the size of the code, but also requires to at least compile-test the software with a non-groff implementation, something that upstream is not currently doing it seems.

More to the point, not only the code uses broken (as in, incomplete) groff-conditionals, but the build system does depend on groff as well: the code to build the manual called groff directly, instead of using whatever the configure script found, and the testsuite relied on the “warnings” feature that is only enabled with groff (there was one more point to that: previously it returned error when the --warnings option was passed, so I had to fix it so that it is, instead, ignored).

But it doesn’t go much better from here on: not only man-db tries to use two output terminals that are no defined by the heirloom nroff (so I hacked that around temporarily on heirloom itself, the proper fix would be having a configuration option for man-db), but it also has problems handling line and page lengths. This is quite interesting actually, since it took me a long time (and some understanding of how nroff works, which is nowhere near nice, or human).

Both nroff and groff (obviously) are designed to work with paged documents (although troff is the one that is usually used to print, since that produces PostScript documents); for this reason, by default, their output is restricted width-wise, and has pagebreaks, with footers and headers, every given number of lines. Since the man page specification (command, man section, source, …) are present in the header and footer, the page breaks can be seen as spurious man page data in-between paragraphs of documentation. This is what you read on most operating systems, included Solaris, where the man program and the nroff program are not well combined. To avoid this problem, both the man implementations set line and page breaks depending on the terminal: the line break is set to something less than the width of the terminal, to fill it with content; the page break is set as high as possible to allow smooth scrolling in the text.

Unfortunately, we got another “groff versus the world” situation here: while the classic nroff implementation sets the two values with the .ll and .pl roff commands (no, really you don’t want to learn about roff commands unless you’re definitely masochist!), groff uses the LL register (again… roff… no thanks), and thus can be set with the command-line parameter -rLL=$width; this syntax is not compatible with heirloom’s nroff, but I’m pretty sure you already guessed that.

The classic man implementation, then, uses both the command and the register approach, but without option switches; instead it prepends to the manpage sources the roff commands (.ll and .nr LL that sets the register from within the roff language), and then gets the whole output processed; this makes it well compatible with the heirloom tools. On the other hand, man-db uses the command-line approach which makes it less compatible; indeed when using man-db with the heirloom-doctools, you’re left with a very different man output, which looks like an heirloom in itself, badly formatted and not filling the terminal.

Now we’re in quite a strange situation: from one side, the classic man implementation sucks at UTF-8 (in its own right), but it works mostly fine with heirloom-doctools (that supports UTF-8 natively); from the other side we got a newer, cleaner (in theory) implementation, that should support UTF-8 natively (but requiring a Debian-patched groff), but has lock-ins on groff (which definitely sucks with UTF-8). The obvious solution here would be to cross-pollinate the two implementations to get a single pair of programs that work well together and has native UTF-8; but it really is not easy.

And at the end… I really wish the roff language for man pages could die in flames, it really shows that it was designed for a long-forgotten technology. I have no difficulty to understand why GNU wanted to find a new approach to documentation with info pages (okay the info program suck; the documentation itself doesn’t suck as much, you can use either pinfo or emacs to browse it decently). Myself, though, I would have gone to plain and simple HTML: using docbook or some other format, it’s easy to generate html documentation (actually, docbook supports features that allows for the same documentation page to become both a roff-written man page, and an HTML page), and with lynx, links, and graphical viewers, the documentation would be quite accessible.

Comments 3
  1. Texinfo reminds me a little of AmigaGuide-files (which are even older) – the main “advantage” was that you can put multiple pages into one file which can save a few KB of your precious disc space.

  2. The killer app for me is man-db’s improved whatis/apropos. I preferred man 1.6 for most things, not least because I could easily patch the ASCII output file for cleaner results, but the far superior apropos command means that the old man isn’t really practical any more.Will

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.