The nice thing about standards is that you have so many to choose from.
Andrew S. Tanenbaum
The quote from Tanenbaum is a classic, something that most developers at some point in their career will have to face. But I’d like to expand on that; taking into consideration Open Standards as well. Most Free Software developers (and, argh, advocates!) will agree that Open Standards are a very good thing; make sure that they are fully documented, and let people develop royalty-free implementations, and you got a win.
Or do you? As the title of this post let you know, there is one further problem, with the standards to choose from: their implementation. I’ve already delved into a number of problems related to standards and their implementation; for instance the KWord vs OpenOffice problem, with the two using (at the time they started boasting OpenDocument support) two completely different, non-interoperable methods to define bullet-lists. And again with the inconsistent SVG implementations that cause the same file to appear in vastly different ways, without even an error reported, with multiple software.
And eBooks are nothing different either; let’s leave alone the problem with formatting them (for instance, O’Reilly books are easily readable, but are actually formatting “randomly” for me, compared to others; or The Dragon Reborn which probably underwent an OCR pass, given that Thom sometimes became Torn). I’ve already ranted about DTBook ebooks but this time I’m seriously pissed.
Let me explain again the whole DTBook problem first, because it provides a basic context for the trouble that follows right now. I have a PRS-505 Sony Reader; when I bought it, it only supported PDFs (sort-of) and Sony’s own BBeB/LRF format. Thankfully, Sony updated the firmware to add support for the ePub format, which is supposedly an open standard and should have a number of working implementations, on various operating systems and hardware devices. Apple’s iPad among others is supposed to read ePub files. So what’s the catch?
Well, first of all, since I called in Apple’s iPad, there is the problem of DRM; ePub by itself does not really define a DRM scheme; O’Reilly does not use any DRM in their electronic media (bless them), and Apple does not support DRM-locked ePub files either (and as far as I know they provide no DRM for their files either, but I don’t have a device to test it myself). On the other hand, most online bookstores, and the devices such as the Sony Reader or Kobo’s eReader, support Adobe’s DRM scheme, technically called ADEPT, but marketed as “Adobe Digital Editions”. Of course, as far as I know at least, there is no open source software that can deal with ADEPT-locked files, although there is code out there that allows you to unlock the files once you fetch your personal encryption key out of an enabled system.
Okay, let’s leave DRM out now, and speak about the format itself; ePub files are ZIP files, not tremendously different from an OpenDocument file.. it actually comes with the same META-INF directory and mimetype file. Within that, you have a series of XML files, with the metadata of the book, the Table of Contents, a filename for the cover file, and the list of files with the actual book’s content. A note here: at least The Dragon Reborn seems to be a corrupted ZIP files for both unzip and the inept script, but is read fine by the Reader Library, and by the the Reader itself.
The content files can be of different formats; the most common case is (X)HTML; which as you might expect is the easiest to support, given the wide range of software rendering HTML out there. But a different format, called DTBook, was designed to support text-to-speech reading of audiobooks. Files can easily be called ePub, even though the actual content is in DTBook, and not supported by most devices and software; neither the Reader nor Calibre support that format, and can’t thus read the copy I bought of The Salmon of Doubt (sigh!).
Something even stranger happened when I bought (with a $2 discount, as this time it worked) Sourcery by Terry Pratchett … I started the series a year or two ago, but rather than getting the books, at the time I got the audiobooks version to get some sleep (I’m still doing the same thing, over an year and a half later… whenever I don’t have my iPod on during the night, I wake up feeling worse than when I went to sleep, because of bad nightmares…).. Sourcery is the only one that I haven’t been able to listen in its entirety since I started (well, I also didn’t listen to Mort and rather read it as eBook already). Unfortunately the downloaded ePub, even though not resulting corrupt for what unzip is concerned, cannot be viewed on the Reader, just like the DTBook version it reports a “Page error”, shows no Table of Content, lists a start and end page of 1.
After un-locking the file with inept; I could load it on Calibre and.. it actually reads fine. So the file is a valid ePub book, why on earth would the Reader not read it at all? Not something I can answer without having access to the sources obviously. Luckily, at least this time I can read my book, since Calibre could process it and create a new ePub copy that the Reader actually seem to load and read.
Alas. I really have nothing else I could possibly say.
À propos text-to-speech, the Internet Archive (and its Open Library) uses this nifty open format called Daisy:http://www.daisy.org
That _is_ the DTBook stuff I talked about…