This Time Self-Hosted
dark mode light mode Search

Texinfo to Kindle, an odissey

This should be my last week in Los Angeles for the moment. Tomorrow Excelsior will be connected directly to the Internet, with its own IP (v4) and an IPv6 tunnel ready. I’ll catch a plane next week to go back to Italy to take care of a few things, while it crunches numbers.

Since I’m expecting long plane rides in my future, I hope to be able to read much more. In particular, I want to finally find the time to learn enough Elisp to write my own Emacs modes. I really miss a decent ActionScript mode while I’m working on Flash code (don’t ask).

So I set myself out to find a way to produce a standard ePub file — from that, converting to a Kindle-compatible Mobi file, is just a matter of using Calibre.

I found this post from one and a half years ago, which describe the situation pretty nicely.. while I’m currently ignoring the issue with the TOC that the author is describing (probably simply because I haven’t been able to load this on my Kindle and judge it yet), I found a different one: makeinfo will generate invalid XML.

The problem lies in the id= attribute of XML, which is tightly specified by the language to have a given format (has to start with only certain characters, and then only a few more are allowed — it can’t start with a number for instance, nor it can contain a slash character). While makeinfo already had a function to (partially) escape an XML id, it wasn’t using it for the docbook output. The function itself, then, wasn’t considering all the escapes, and thus even when calling it, the output would still be invalid, if the texi sources contained non-alphanumeric characters.

So now I have a patch for texinfo which should work; too bad I also have to get a copyright assignment for this as well, and I don’t know if I’ll have to wait till I get home to sign and send it back or not. The important part is having the patch though. I also fixed the issue with setfilename being added to the output when creating docbook.

Then there is another issue: the dbtoepub script. In Gentoo this script is installed by docbook-xsl-stylesheets and docbook-xsl-ns-stylesheets within /usr/share — the problem is that it was never mad easy to execute, and its dependencies weren’t considered. I took the chance of a bump of the stylesheets to add an USE flag for Ruby to the package (the script is written in Ruby) that will either remove the script or also install an executable wrapper so that it can be executed.

Actually, while I was at it, I made sure that the two ebuilds, which install two variants of the same basic content, will be almost identical just changing the directory where the content is installed, and making the remaining changes happen depending on $PN (an exception being the keywords as the namespaced version is not used so much, it’s just me liking them most of the time).

After I got the epub file, it was time to make sure it was complying with the specs; I’ve been burnt before by invalid or simply non-standard epub files. Luckily, Adobe of all people released an open source (BSD-licensed) tool to audit the files; epubcheck version 1.1 is now in tree as app-text/epubcheck. I’m hoping somebody who knows more Java than me can get a new version of jing in tree so I can bring epubcheck 3 into the tree — they use a quite newer one than is available right now, and that’s bad. The new version is designed to support the new version of the epub standard (which is supported by the 1.77.0 release of the stylesheets as well, and should be relatively easy to use even without Ruby), so I’m fine with version 1.1 right now.

Anyway all the tools I’ve been using should now be in tree (I’m testing the texinfo patch as we speak), and soon enough I should be able to start reading that manual on my Kindle.. expect some Emacs modes from me, afterwards.

Comments 1
  1. I’ve found that using texinfo -> HTML -> ebook-convert -> mobi worked better (at least for my texinfo-on-kindle odyssey) than the texinfo -> docbook -> tex2mobi -> ebook-convert -> mobi described in the original link. Calibre’s ebook-convert apparently uses XHTML as an intermediate language anyway, so it’s not like you are gaining much by going to docbook.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.