The second best thing about standards: different implementations

The nice thing about standards is that you have so many to choose from.
Andrew S. Tanenbaum

The quote from Tanenbaum is a classic, something that most developers at some point in their career will have to face. But I’d like to expand on that; taking into consideration Open Standards as well. Most Free Software developers (and, argh, advocates!) will agree that Open Standards are a very good thing; make sure that they are fully documented, and let people develop royalty-free implementations, and you got a win.

Or do you? As the title of this post let you know, there is one further problem, with the standards to choose from: their implementation. I’ve already delved into a number of problems related to standards and their implementation; for instance the KWord vs OpenOffice problem, with the two using (at the time they started boasting OpenDocument support) two completely different, non-interoperable methods to define bullet-lists. And again with the inconsistent SVG implementations that cause the same file to appear in vastly different ways, without even an error reported, with multiple software.

And eBooks are nothing different either; let’s leave alone the problem with formatting them (for instance, O’Reilly books are easily readable, but are actually formatting “randomly” for me, compared to others; or The Dragon Reborn which probably underwent an OCR pass, given that Thom sometimes became Torn). I’ve already ranted about DTBook ebooks but this time I’m seriously pissed.

Let me explain again the whole DTBook problem first, because it provides a basic context for the trouble that follows right now. I have a PRS-505 Sony Reader; when I bought it, it only supported PDFs (sort-of) and Sony’s own BBeB/LRF format. Thankfully, Sony updated the firmware to add support for the ePub format, which is supposedly an open standard and should have a number of working implementations, on various operating systems and hardware devices. Apple’s iPad among others is supposed to read ePub files. So what’s the catch?

Well, first of all, since I called in Apple’s iPad, there is the problem of DRM; ePub by itself does not really define a DRM scheme; O’Reilly does not use any DRM in their electronic media (bless them), and Apple does not support DRM-locked ePub files either (and as far as I know they provide no DRM for their files either, but I don’t have a device to test it myself). On the other hand, most online bookstores, and the devices such as the Sony Reader or Kobo’s eReader, support Adobe’s DRM scheme, technically called ADEPT, but marketed as “Adobe Digital Editions”. Of course, as far as I know at least, there is no open source software that can deal with ADEPT-locked files, although there is code out there that allows you to unlock the files once you fetch your personal encryption key out of an enabled system.

Okay, let’s leave DRM out now, and speak about the format itself; ePub files are ZIP files, not tremendously different from an OpenDocument file.. it actually comes with the same META-INF directory and mimetype file. Within that, you have a series of XML files, with the metadata of the book, the Table of Contents, a filename for the cover file, and the list of files with the actual book’s content. A note here: at least The Dragon Reborn seems to be a corrupted ZIP files for both unzip and the inept script, but is read fine by the Reader Library, and by the the Reader itself.

The content files can be of different formats; the most common case is (X)HTML; which as you might expect is the easiest to support, given the wide range of software rendering HTML out there. But a different format, called DTBook, was designed to support text-to-speech reading of audiobooks. Files can easily be called ePub, even though the actual content is in DTBook, and not supported by most devices and software; neither the Reader nor Calibre support that format, and can’t thus read the copy I bought of The Salmon of Doubt (sigh!).

Something even stranger happened when I bought (with a $2 discount, as this time it worked) Sourcery by Terry Pratchett … I started the series a year or two ago, but rather than getting the books, at the time I got the audiobooks version to get some sleep (I’m still doing the same thing, over an year and a half later… whenever I don’t have my iPod on during the night, I wake up feeling worse than when I went to sleep, because of bad nightmares…).. Sourcery is the only one that I haven’t been able to listen in its entirety since I started (well, I also didn’t listen to Mort and rather read it as eBook already). Unfortunately the downloaded ePub, even though not resulting corrupt for what unzip is concerned, cannot be viewed on the Reader, just like the DTBook version it reports a “Page error”, shows no Table of Content, lists a start and end page of 1.

After un-locking the file with inept; I could load it on Calibre and.. it actually reads fine. So the file is a valid ePub book, why on earth would the Reader not read it at all? Not something I can answer without having access to the sources obviously. Luckily, at least this time I can read my book, since Calibre could process it and create a new ePub copy that the Reader actually seem to load and read.

Alas. I really have nothing else I could possibly say.

Cooling down about eBooks excitement

So I have written a few posts regarding eBooks in the past month or so, since I finally went to use my Sony eReader full time. Unfortunately, it failed for me yesterday, on the train back from Milan – where I was with a friend to show off his game – as I wanted to read The Salmon of Doubt which I bought from Kobo at the start of the month.

It failed me with a quite unimpressive “page error” so I thought the file was corrupted on the Memory Stick (or even the Memory Stick started to fail — they are not eternal, and this one has been passed down from a friend of mine to me for PSPs, and is now in the Reader, since digital distribution of PSP games called for something bigger than 1GB). I uploaded it to the Reader anew, and it still failed; I then decided to convert it with Calibre but it also failed (although, at least giving me an idea about what the problem was in the first place!).

The problem, as it turns out, is that the ePub specification is, like ODT, SVG and MP4/ISO Media, a specification that includes so much more than any single implementation will ever support. One issue that lately has been noted by many is that Apple’s iBooks application for the iPad, which supports ePub books, surprisingly does not support DRM’d files (well, at least not those DRM’d with Adobe Digital Editions), but it’s not the only one. In this case, while the Sony Reader supports Adobe Digital Editions files, it does not support DTBook files. And that is what my ePub file is, deep within.

Now, there are tools that supposedly convert one format to the other, yet they don’t seem to do that much of a good result out of it, so I wasn’t able to get it to appear properly just yet. And this also requires me to tinker quite a bit with the raw files I don’t know a thing about.

This starts to make me wary about eBooks… one out of fifteen up to now doesn’t spell trouble, but there are cases where it might not be so good to have them around. Add the fact that there is basically no content I could find in Italian as eBook, and I start to get afraid I can only partly replace dead-tree books for a long time still. Sucks!

Refreshing Gentoo Work

After a few months spent mostly working on lscube, I’ve been ignoring most of the non-basic Gentoo work for a while. Between last night (before going to sleep) and this morning, though, I started the catch-up work.

First of all, Tim released a new version of libarchive that required some testsuite fixing (and I haven’t noticed the first time around that it now wrongly uses -Werror since I have -Wno-error in my CFLAGS to avoid time wastes). Thankfully, Tim is a dream upstream to work with and the most important fix is already upstreamed.

Then I have been active in the Ruby area since I both needed to work on the new Typo and a few more packages are bundled with Typo’s code (you’re going to find a git branch with no third party code bundled in my git repositories when I’m done), and got some new tasks to work on.

The gems problem, which is hopefully going to be solved after the Summer of Code, is for now just being sidestepped; indeed, I’ve ended up adding the will_paginate library with a fake gemspec which actually works pretty nicely, without having the usual side effects of gems (no object files installed, no extra dependency on rake, no installed testsuite) and with the obvious advantages from the tarballs, including working testsuites (and tested), documentation built on request and installed, as well as examples. This, and probably a few more before end of the month, package will be tested directly here on the blog if you’re interested on the outcome.

I still have a few things that I’m supposed to have done in the past month among which figures updating calibre (I’ve been using an old version on OSX up to now), figuring out why libcdio-0.81 freezes down during install, and stuff like that. Hopefully I’ll also be able to find time for those now that my job is a bit more safe than it was before.

Looking for comments about Sony’s Reader (PRS-505) and Linux

Dear lazyweb, this time I am asking for opinions and comments, rather than writing my own :)

Today I’ve been trying to clean up my home office, and I seen how many reference books I have that I could have downloaded, or at least bought, in PDF format, rather than printed or bought in solid paper form. While I’m quite a bookworm and would probably continue buying novels in solid paper form, I’m considering an electronic paper device for technical references (which are also those who take more space in my library at the moment).

I was suggested more than once the Sony Reader.. while I don’t really find myself a Sony fan, I can see they are quite advanced in what they do, and being able to read standard PDFs should be good enough to me, as I just need it for reference manuals (for now). I also seen there is a project already to get the reader to work on Linux, and it seems well developed, so I’m trusting I would get it to work on Linux quite easily.

But as I like first-hand impressions, does anybody have any comment to give me about this?