The second best thing about standards: different implementations

The nice thing about standards is that you have so many to choose from.
Andrew S. Tanenbaum

The quote from Tanenbaum is a classic, something that most developers at some point in their career will have to face. But I’d like to expand on that; taking into consideration Open Standards as well. Most Free Software developers (and, argh, advocates!) will agree that Open Standards are a very good thing; make sure that they are fully documented, and let people develop royalty-free implementations, and you got a win.

Or do you? As the title of this post let you know, there is one further problem, with the standards to choose from: their implementation. I’ve already delved into a number of problems related to standards and their implementation; for instance the KWord vs OpenOffice problem, with the two using (at the time they started boasting OpenDocument support) two completely different, non-interoperable methods to define bullet-lists. And again with the inconsistent SVG implementations that cause the same file to appear in vastly different ways, without even an error reported, with multiple software.

And eBooks are nothing different either; let’s leave alone the problem with formatting them (for instance, O’Reilly books are easily readable, but are actually formatting “randomly” for me, compared to others; or The Dragon Reborn which probably underwent an OCR pass, given that Thom sometimes became Torn). I’ve already ranted about DTBook ebooks but this time I’m seriously pissed.

Let me explain again the whole DTBook problem first, because it provides a basic context for the trouble that follows right now. I have a PRS-505 Sony Reader; when I bought it, it only supported PDFs (sort-of) and Sony’s own BBeB/LRF format. Thankfully, Sony updated the firmware to add support for the ePub format, which is supposedly an open standard and should have a number of working implementations, on various operating systems and hardware devices. Apple’s iPad among others is supposed to read ePub files. So what’s the catch?

Well, first of all, since I called in Apple’s iPad, there is the problem of DRM; ePub by itself does not really define a DRM scheme; O’Reilly does not use any DRM in their electronic media (bless them), and Apple does not support DRM-locked ePub files either (and as far as I know they provide no DRM for their files either, but I don’t have a device to test it myself). On the other hand, most online bookstores, and the devices such as the Sony Reader or Kobo’s eReader, support Adobe’s DRM scheme, technically called ADEPT, but marketed as “Adobe Digital Editions”. Of course, as far as I know at least, there is no open source software that can deal with ADEPT-locked files, although there is code out there that allows you to unlock the files once you fetch your personal encryption key out of an enabled system.

Okay, let’s leave DRM out now, and speak about the format itself; ePub files are ZIP files, not tremendously different from an OpenDocument file.. it actually comes with the same META-INF directory and mimetype file. Within that, you have a series of XML files, with the metadata of the book, the Table of Contents, a filename for the cover file, and the list of files with the actual book’s content. A note here: at least The Dragon Reborn seems to be a corrupted ZIP files for both unzip and the inept script, but is read fine by the Reader Library, and by the the Reader itself.

The content files can be of different formats; the most common case is (X)HTML; which as you might expect is the easiest to support, given the wide range of software rendering HTML out there. But a different format, called DTBook, was designed to support text-to-speech reading of audiobooks. Files can easily be called ePub, even though the actual content is in DTBook, and not supported by most devices and software; neither the Reader nor Calibre support that format, and can’t thus read the copy I bought of The Salmon of Doubt (sigh!).

Something even stranger happened when I bought (with a $2 discount, as this time it worked) Sourcery by Terry Pratchett … I started the series a year or two ago, but rather than getting the books, at the time I got the audiobooks version to get some sleep (I’m still doing the same thing, over an year and a half later… whenever I don’t have my iPod on during the night, I wake up feeling worse than when I went to sleep, because of bad nightmares…).. Sourcery is the only one that I haven’t been able to listen in its entirety since I started (well, I also didn’t listen to Mort and rather read it as eBook already). Unfortunately the downloaded ePub, even though not resulting corrupt for what unzip is concerned, cannot be viewed on the Reader, just like the DTBook version it reports a “Page error”, shows no Table of Content, lists a start and end page of 1.

After un-locking the file with inept; I could load it on Calibre and.. it actually reads fine. So the file is a valid ePub book, why on earth would the Reader not read it at all? Not something I can answer without having access to the sources obviously. Luckily, at least this time I can read my book, since Calibre could process it and create a new ePub copy that the Reader actually seem to load and read.

Alas. I really have nothing else I could possibly say.

The impact of free in everyday life

Free

When I was hoping to make use of Kobo’s offer (failing), I bought the book Free, by Chris Anderson out of curiosity. I didn’t, and haven’t yet, read The Long Tail, but I’m now curious about it — too bad it isn’t available as ePub anywhere!

It is a very interesting read, especially for those of us who work on Free Software and services; it also made me think a lot about it. For instance it ties in with what I wrote about Sony and their way to get even more of my money by offering “free” games with the PlayStation Plus subscription.

But also it ties in with what O’Reilly does with their Ebook of the Day deals (you can get them through their Twitter ) where they sell for $9.99 books that would cost even quite a lot more; I actually used one of their special offers, where they let you get one of their books at that price… and I went for an otherwise too expensive CJKV Information Processing (it’s sold at $47.99).

How much is it costing to O’Reilly to “give away” at 14 the price their books? Probably not so much, given that they’re probably going to sell many more copies. For instance, beside the CJKV book that I was planning to buy anyway at some point, I actually went out on a limb, and bought Inside Cyber Warfare ($31.99) during a Ebook of the Day deal, just because it vaguely interested me and it was 13 the price. And the other day I was tempted to get Being Geek for about the same reasons.

D’uh! indeed.

But it doesn’t stop there; Anderson goes on to provide a few ways to “compete with free”, and brings up a point that Jürgen wrote about (sorry, I can’t seem to find the correct post): you can make people pay for something that is available for free by making it easier/faster to procure. I have noted that before when I complained about Mininova shutdown that what I actually end up downloading “unauthorized” and not paying for is mostly stuff that I would have to jump through way too many hoops to actually make sense for me. Real Time with Bill Maher is an example of that: I would very much prefer to have a (paid) feed that automatically downloads the episodes so that I can watch them on Saturday morning, rather than have to wait till somebody who recorded them in the US or Canada uploads it to Demonoid or some other place, so that I can fetch it.

A similar issue happens with the Japanese music I love: I have a number of original albums there as well; some I bought via the iTunes Store (kudos to Apple where it’s due: they dropped DRM and made it possible for me to buy Hikaru Utada music without going through illegal ways), one or two actually reached the European market, the others… I made a single order on Amazon JP and pretty much regret it: over a €80 order I ended up paying €30 of shipment and then over €50 of customs… which were calculated not only on the value of the order but the shipment and then VAT applied over the custom services… it’s ludicrous.

On a similar fashion, I’m generally happy to pay for (or receive gifts of) CDs of Metal music such as Blind Guardian, Rhapsody of Fire or Avantasia; both because I can hear the difference in the iTunes Store compressed versions, and because Nuclear Blast, the latest label of all three of them, is actually providing a nice package with their special editions. Take At The Edge of Time – the latest album by Blind Guardian, released on August 2nd – the special edition was priced, at pre-order, £13; it’s a 2-disc edition, with a very nice boxset, and a “special online code”, that provided access to a “making of” video, the “Sacred Worlds” CGI video from Sacred 2, and a demo mp3… all without DRM. A similar situation was the case for the Avantasia double-album set. In Italy, these album would probably have been priced at no less than €50, probably €70 as well… for that price, I wouldn’t have bought them at all.

But if up to now it’s just comparing “unauthorized” copies versus paid copies, what about content that it’s already free, in some if not all senses of the word? When I first blogged about the Reader supporting ePub I was suggested by many people to rely on Project Gutenberg since they provide ePub-format books. Well, I tried, and I actually read The Picture of Dorian Gray this way; while it was a bit cumbersome, it was an acceptable ePub book. When I tried again to read Nathaniel Hawthorne’s The Scarlet Letter, I was disappointed. Out of two versions, one with illustrations and one without, neither had a decent layout. I could have spent some time trying to fix it up so that it flowed properly on the Reader, but.. the alternative was to pay $3 and get a properly-formatted copy from KoboBooks… I went with the second option.

And another example of how free (as in gratis) content can bring sales for paid equivalent, I can bring out thinking of BBC’s NewsQuiz; the show is available for free on the same day of airing as a BBC Podcast — but only the latest episode is. On the other hand, BBC published a number of CDs with selected past episodes… mostly thanks to the caring users, I have almost the full collection; and thanks to Amazon’s recommendations I also discovered I’m Sorry I Haven’t a Clue. Once again, a free download was the direct cause of a sale for BBC.

So even if the content is free, I’m happy to pay for the container… if it’s worth it. And the same is likely going to be true for other people. At the end of the day, this is another thing you can make people pay for… and I don’t think this should be considered “bad” by anyone who truly cares about freedom, and not just feel like “sticking it to the man”…

An open letter to Kobo

Kobo is an interesting website that sells ePub e-books, compatible with Adobe Digital Editions (so, DRM’d), of which I wrote already before — unfortunately a couple of interactions with the website and its features lately have been slightly upsetting. So I’d like to simply express my opinion about it…

Dear Kobo,

Looking for a website that would sell me ePub books I could read on my Sony Reader device, I have to say that yours is the one that is the most appealing; the other decent option for novels and non-fiction (non-technical) books is the British WHSmith, but their website is a bit difficult to grok and feels a bit.. old style.

It also helps the fact that the Dollar is still lower than the Euro, while the Pound is higher again, which means that in general – minus strange cases like the sixth Hitchhiker’s book “And Another Thing” that seems to cost more on your website than from other stores – the price is quite good for me. I’m also pretty happy that I can get books released in America but not yet in Europe, like it happened with Cyber War which I positively enjoyed.

Unfortunately, things started to get strange when I bought “The Salmon of Doubt”… and had been unable to read it on my Reader; it turned out that I had to work around the DRM to be able to find the cause of the problem: the internal format of all the other books I bought is XHTML, while that one uses DTBook. Somehow, both the Sony software and your website allow it to work, but it fails badly both on the Reader and on your own Android application. Which probably means, you also didn’t plan about it. I wonder if your eReader device actually reads those.

The other problem I noticed, was that beside a number of books not available from you, but available on, again, WHSmith, for which I obviously can’t say much (either you have them or you don’t, like any other book store), there is an annoying trouble in getting chapters of book series. For instance I could get Jim Butcher’s Grave Peril from you, but then I had to turn to WHSmith for Summer Knight, Death Masks and Blood Rites, since you don’t have them. Similarly yesterday I noted that while I got Robin Hobb’s “Assassin’s Apprentice”, the second volume (Royal Assassin) is also not available, but the third (Assassin’s Quest) is.

It is an annoyance – especially since I prefer getting books on your site also because they are available on the web to read with my Linux laptop, and on my Android phone, while WHSmith’s books I can only load on the Reader or read with the official applications, minus breaking the DRM again – but a minor one at that.

It goes a bit worse when I received last week a promotional email with a $2 discount, not a lot but since I’m actually quite through The Dragon Reborn (for the second try, the first one I abandoned two years ago in the hospital), I thought it would have been a good chance to find something new to read after that. I tried it a couple of nights ago with a few of the books I bookmarked, but.. for all of them it reported expired or not applicable.

Tonight, you reply me on twitter saying that only a subset of books are available and provide me a list … but once again, trying to apply the code to two books in that list, reports that it is expired or not available. For sure, it wasn’t expired, since the mail said it would expire at “29 June 2010 11:59pm EST” – and at the time it was something like 17 EDT – but at the same time, the mail has no reference to the list I was given nor either any reference to the discount applying to a subset of the books (well, it was understandable anyway).

At the end, I bought the one book, that I didn’t know already, from that list that looked interesting and relevant to my areas of interest; for the curious it was Free by Chris Anderson, without the discount it was still at a decent price. But all of this feels like quite the kerfuffle.

I think that it would be good for both you and your customers if you can actually get these things sorted out properly; as I said I’m very happy to continue being your customer, I don’t even care about promotional codes (after all, $2 is almost nothing), but it doesn’t feel right

Oh and if you happen to be able to… could you please make a Linux application to download Adobe Digital Edition files? Thank-you!

Again on procuring eBooks

I know that most of you who read my blog daily don’t care about my toying with eBooks, and only read it for the technical articles; on the other hand, I feel like I can at least talk a bit about that, given that most of my personal life is uninteresting and thus I rarely write of that at all.

Anyway, you might remember I had some trouble finding where to buy eBooks and at the end I settled with – for non-technical books that is – WHSmith and Kobo as they both sell Adobe Digital Edition ePub books. Finding mainstream non-DRM ePub seems to be impossible; maybe only on Apple’s iBooks store, but it still doesn’t warrant me getting an iPad to try — even though, if you have an iPad or iPhone and can tell me whether that’s the case, I’d be curious. Finding a second-hand old-generation iPhone shouldn’t be too expensive and if that can get me access to mainstream non-DRM’d ePubs it might be worth it.

Anyway, the two sites above actually give me enough access that I don’t miss most of what I usually read; indeed, Kobo actually provided me with a few curious readings that I might as well try. Also, even though the Dollar is rising again, buying the books from Kobo is, for me, slightly cheaper than WHSmith.

Also, the fact that they are no simple eBook store makes them more intriguing; I’m not that enticed by their eReader (given I have already my PRS-505 and I’m not going to drop it any time soon), but the fact that they have applications available for a number of platforms (but not Linux, dang it! If they did, and it supported activation of Adobe DRM’d ePubs, they would be so great I could consider getting the eReader if only to fund them further). Even if I will probably not use those, I can still enjoy the fact that they let me read the books I buy on the web with any browser, on my reader in ePub format (and thus anywhere the ePub format can be read!) and since a few days also on my Milestone thanks to their Android application.

A word about the DRM here; while I’m one of those people who, I said already, prefer to abide to restrictions as long as they are an acceptable tradeoff (for instance the audiobooks DRM on iTunes is acceptable because they do cost a lot less than on unencumbered form). While I can understand the reason why most publishers won’t even consider not using DRM on the files, and I accept that at least this way I can get eBooks at all, I don’t think the tradeoff is useful to the user in this case. Indeed, given the fact that not all devices using ePub supports Adobe Digital Editions, it can be quite harsh to have it applied. add that to the not all ePubs are the same and thus you might have to access the content of the archive to change it into something usable, and you get the picture. Luckily, the ADEPT DRM has been long broken so it’s not difficult to get clean files.

Anyway, as I said, Kobo looks a nice choice to me because of the presence of the additional applications (just to put it into perspective, while I’m not considering buying an iPad, were I to, I could still read the books I bought from Kobo, without going around the DRM, as they have an iPad application); for instance I could easily read The Salmon of Doubt from my browser, even though the ePub version uses the infamous DTBook format above. Unfortunately they don’t have *everything*… not yet at least.

Anyway, last night I didn’t sleep so I could finish reading Assassin’s Apprentice (somebody suggested this to me a few years back; on the other hand I decided to read this because me and some friends were to a fair where also the author was…). Nice book indeed, just a bit “slow” (took me almost a month to read it fully, and it was just 400 pages). Next step, though, I wanted to come back to Dresden’s Files; Butcher’s style is enchanting. Three books out, I was up to read Summer KnightGrave Peril I got from Kobo so I assumed they had the next as well; somehow, they don’t. So at the end I got it from WHSmith; it bears little difference, but it still strikes me as odd.

And in all this, there seems to be no shop for Italian eBooks; sigh. If only ChiareLettere had ePubs available.. their books are quite bulky and I would love to give them away and trading them for digital copies of them. I wonder if I should get more (technical) skills about this kind of publishing and propose to handle that kind of stuff myself. I would also know where to start, maybe.

Parsers, tokenizers and state machines

You might remember that I have been having fun with Ragel in the paste, since Luca proposed using it for LScube projects (which reduced to using it on feng, for what I am concerned). The whole idea sounded overly complicated in the first place, but then we were able to get most, if still not all, kinks nailed.

Right now, feng uses Ragel to parse the combined RTSP/HTTP request line (with a homebrew backtracking parser, which became the most difficult feature to implement), to split the headers into a manageable hash table (converting them from string into enumerated constants at the same time, another kinky task), to convert both the Range and Transport headers for RTSP into lists of structures that can be handled in the code directly, and for splitting and validating URLs provided by the clients.

While the Ranges, Transports and URLs parsing is done directly with Ragel, and without an excessive amount of blunt force, both the parsing of the request line and the tokenizing of header names is “helped” by using a huge, complex and absolutely ludicrous (if I may say so) Rube Goldberg machine using XML and XSLT to produce Ragel files, which then produce C code, that is finally compiled. You might note here that we have at least three sources and four formats: XML → Ragel → C → relocatable object.

The one thing that we aren’t doing with Ragel right now is the configuration file parser. Originally, Luca intended the configuration file for feng to be compatible with the lighttpd one, in particular allowing to include bits and pieces of the two. As it happens, feng really needs much less configuration, right now, than lighttpd ever did, and the only conditional we support is based on the $SOCKET variable to produce something akin to vhost support. The lighttpd compatibility here is taking a huge toll in the size of code needed to actually parse the file, as well as the waste of time and space to parse it in a tree of tokens and then run through them to set the parameters. And this is without launching into the problems caused by actually wanting to use the “lemon” tool.

Unfortunately, neither replacing the parser (for the same format) with Ragel, nor inventing a new format (keeping the same features) to parse with Ragel is looking interesting. The problem is that Ragel provides a very low-level interface for the parsers, and does not suit well the kind of parsing a configuration file needs. There is an higher-level parser based upon Ragel, by the same author, called Kelbt, but it’s barely documented, and seems only to produce C++ code (which you should probably know already how I dislike).

I definitely don’t intend to write my own Ragel-based parser generator like I did for RTSP (and HTTP); I also don’t want to lose support for “variables” (since many paths are created by appending root paths with sub-directories), so I don’t think switching to INI itself would be a good idea. I considered using the C pre-processor to convert an user-editable key-value pair file into a much simpler one (without conditionals, vhosts and variables) to then parse it, but it’s still a nasty approach, as it requires a further pass after editing and depends on the presence of a C preprocessor anyway.

I also don’t intend using XML for the configuration file! While I have used XML in the above-mentioned Goldberg machine, it smells funny to me as well as anyone else. Even though I find those people judging anything using XML as “crap” delusional, I don’t think XML is designed for user-editable configuration files (and I seriously abhor non-user-editable configuration files, such as those provided by Cherokee, to name one).

Mauro, some time ago, suggested me to look into Antlr3, unfortunately at the time I had to stop because of the lack of documentation on how to do stuff with it. I had already enough trouble to get proper information out of the Ragel documentation (which looks like the thesis document it is). Looking up Wikipedia earlier to look for a couple of links for documentation, it seems like the only source of good documentation is a book — since my time to read another technical book right now is a bit limited, and spending €20 just to get the chance to decide whether I want Antlr or not, doesn’t look appealing either.

Anyway, setting aside the Antlr option for a moment, Luca prodded me about one feature I overlooked in Ragel, maybe because it was introduced later than I started working on implementing it: Scanners. The idea of a scanner is that it reduces the work needed to properly assign the priorities to the transitions so that the longest-matching pattern executes the final code. This would be nice, wouldn’t it? Maybe it would even allow us to drop the Goldberg machine, since the original reason to have it was, indeed, the need to get those priorities right without having to change all the code repeatedly until it works properly.

Unfortunately, to use Ragel scanners, I had to change first of all the Goldberg machine, and splitting it further into another file, since the scanners can only be called into, rather than included in the state machine (and having the machine defined in the same file seem to trigger the need to declare all the possible variables at the same time, which is definitely not something I want to do for all state machine). This was bad enough (making the sources more disorganises, and the build system more complex), but after I was able to build the final binary, I also found an increment of over 3KB of executable code, without optimisations turned on (and using state tables, rather than goto statements, an increase in code size is a very bad sign).

Now it is very well possible that the increase in code size gets us a speed-up in parsing (I sincerely doubt so, though), and maybe using alternative state machine types (such as the above-mentioned goto based one) will provide better results still. Unfortunately, assessing this gets tricky. I already tried writing some sort of benchmark of possible tokenizers, mostly because I fear that gperf, lying on its name, is far from perfect and is producing suboptimal code for too many cases (and gperf is used by many many projects, even at a basic system level. The problem with that is that I suck at both benchmarking and profiling, so it started.

Anyway, comments open, if you have any ideas on how I can produce the benchmark, or suggestions about the configuration file parsing and so on, please let me know. Oh and actually there are two further custom file formats (domain-specific languages at all effects) that are currently parsing in the most suboptimal of ways (strcmp()), and we would need at least one more. If we could get one high-level, but still efficient (very efficient if possible) parser generator, we would definitely gain a lot in tem of managing sources.

Cooling down about eBooks excitement

So I have written a few posts regarding eBooks in the past month or so, since I finally went to use my Sony eReader full time. Unfortunately, it failed for me yesterday, on the train back from Milan – where I was with a friend to show off his game – as I wanted to read The Salmon of Doubt which I bought from Kobo at the start of the month.

It failed me with a quite unimpressive “page error” so I thought the file was corrupted on the Memory Stick (or even the Memory Stick started to fail — they are not eternal, and this one has been passed down from a friend of mine to me for PSPs, and is now in the Reader, since digital distribution of PSP games called for something bigger than 1GB). I uploaded it to the Reader anew, and it still failed; I then decided to convert it with Calibre but it also failed (although, at least giving me an idea about what the problem was in the first place!).

The problem, as it turns out, is that the ePub specification is, like ODT, SVG and MP4/ISO Media, a specification that includes so much more than any single implementation will ever support. One issue that lately has been noted by many is that Apple’s iBooks application for the iPad, which supports ePub books, surprisingly does not support DRM’d files (well, at least not those DRM’d with Adobe Digital Editions), but it’s not the only one. In this case, while the Sony Reader supports Adobe Digital Editions files, it does not support DTBook files. And that is what my ePub file is, deep within.

Now, there are tools that supposedly convert one format to the other, yet they don’t seem to do that much of a good result out of it, so I wasn’t able to get it to appear properly just yet. And this also requires me to tinker quite a bit with the raw files I don’t know a thing about.

This starts to make me wary about eBooks… one out of fifteen up to now doesn’t spell trouble, but there are cases where it might not be so good to have them around. Add the fact that there is basically no content I could find in Italian as eBook, and I start to get afraid I can only partly replace dead-tree books for a long time still. Sucks!

Technical eBooks? Scarcer than I’d have said!

Do you remember I went back to using the Reader with proper (ePub) content? It also turned out pretty well when I could get a newly-released book even before it’s released in Europe (and for a much lower price).

A month after resuming this, I have to accept an absurd reality: it’s much easier to find novels than technical books in ePub format! Now it is true that most of the O’Reilly catalogue is available in ePub format (one exception being CJKV Information Processing which, for the complexity of the script, is only available as PDF — and it would still have been pretty expensive, if I couldn’t make it to the 1-day offer the other day of getting any eBook for $10; for that price, even just a PDF is good enough), but they seem to be an exception.

Indeed, Addison Wesley does not seem to have their catalogue available as eBook at all! And they tend to have some very interesting books – some of which I read thanks to the gifts received – if they had them available as eBook, I would probably be buying a few more of them!

Tonight I was also looking at MIT Press since I would like to convert my current shelf to eBooks, for those titles that I’m still interesting to have around, and which are available as eBook obviously; Using OpenMP is one of them — my idea was to know ho much they would cost me as eBook, which is usually a fraction of their original price, and “sell” them for the same price to interested friends. While they do have an eBook store, it doesn’t have their older titles, and it leaves a lot to be desired.

My reason for wanting to convert what I have already is that I’m getting ready to pack and get the hell away from home; nots of things are going on and I’m in the middle of a very nasty family situation. I’ll be looking for my luck elsewhere, most likely in Turin, hopefully later on this year. But before leaving, I’m trying to get rid of some baggage, both psychological and physical; books are something that, while I’d be sad splitting from, I cannot afford to bring with me when I’ll be moving.

Incidentally, I’m in a bit of a pinch with CDs and DVDs as well… I already rip all my CDs and the music DVDs to bring them with me more easily on the iPod — but I don’t want to get rid of the originals; I guess that once I move I might still get some “physical storage space” here, to keep them. I already moved to buying music digitally – through the iTunes store, thankfully they don’t have DRM any more! – but audiobooks are still crippled protected, as they tell you, and metal loses some edges when encoded. Let’s not even get into digitally-distributed movies. And yes, I’m the kind of person who gladly pays for content.

On the other hand, for what concerns fiction and non-fiction books, there are quite a few possible stores, such as -WHSmith- Kobo and Waterstones — the only problem I got with them is that none of them supports a wishlist; I’d love to replace the one I have now on Amazon with one for eBook: they’d be cheaper and I’d have less trouble bringing them around.

Anyway, I’m still baffled by the lack of vast archives of technical eBooks.

Book Review: Cyber War

Since now the Sony Reader became much more useful I decided to make good use of it already. It wasn’t enough to finally have finished the Time Management book from O’Reilly, I was also finally able to read The God Delusion (shame on me not to have read it before! — Not that I needed it to “convert” as I’ve been sure about my atheism for over half my life). Reading on the device, even non-technical books, looks most definitely nice, so for the future I’ll try to get an electronic edition of any kind of book, before asking for a copy on dead trees.

Little note here, since I read some nasty comments about my atheism from the usual creek of “Free Software Advocates”. Not only you’re lame for attacking a developer who actually works on free software for almost all of his free time, but if you decide to attack on this ground, you’re beneath me. On that note, I’m very tempted to just add tracking for visitors coming from these creeks, and refuse their comments on my blog altogether.

But back to the topic; eBooks have this interesting propriety of fomenting my impulse shopping: you want to read something in particular? Look it up, and in less than half an hour you can start reading; and this timing includes registration on the site if it’s the first time you drop by there. So when last Saturday I was watching the latest episode of Real Time with Bill Maher (which I’ll keep admitting I’m downloading illegally, as HBO won’t even let me pay for it here!), and the special guest actually surprised me.

Richard A. Clarke sounded to me, at the announcement, another of the usual government pinheads who complain about the way the world of Internet is (in Italy, it’s basically the entire political class, but I also remind clearly an Obama comment about Internet when asked about Marijuana — again from Bill Maher some time ago). But the book he presented, Cyber War definitely struck a nerve in me.

I found the book, at Kobo even though it’s still (obviously) unavailable on Amazon UK. Nine euros later, I was reading, and it was three in the morning! I finished the book today, reading on and off every time I was too “cooked” to work. It gave me creeps and hopes at the same time.

Despite the evocative “cyber” name to everything, which reminds me more of Neuromancer than a non-fiction state-of-affairs book, Clarke seems to know what he’s talking about. Most of his points are, realistically, more tied to the United States of America than the rest of the world, and he admits that more than a couple of times, but it got me thinking.

I’m relatively disinterested in the military, warfare and conflict aspects that he obviously talks about extensively — it’s what the book is about to begin with. But reflecting upon the simple amount of interconnections between the “wild Internet” and critical systems is something that scares the crap out of me. You would expect important systems like power grids and railway systems to not be interfacing with Internet, and most likely they are not, directly, but there certainly is a “hop connection” – like the six degrees of Kevin Bacon – which even for me, more or less working in the field, doesn’t appear natural at first.

What woke me up reading that book was the consideration about rail services; in USA I guess most of the rail services are freight transit, not public transportation, nowadays. In Italy, trains are mostly people-oriented as far as I can tell, so it gets less logistical, but more civil, as a target. The most obvious Internet-connected systems for a transportation system is obviously the reservation system, since you order tickets online, usually. But there are more, and more down-to-earth control. Last time I was in Milan, I was able to check through my cellphone how much delay there was for my train… what scares me to think about, now, is that the moment the train passed through a station, even without stopping, the website would have told me.

Similarly, the Italian power company have converted most houses to an electronic measure device — this way they don’t have to send personnel once an year to read the data. This works for the convenience of us users most of the time (before, you paid based on what they expected you to consume until they checked the real usage… you either paid extra the whole year, or you’d have to pay a huge amount to cover what you consumed unexpectedly), but on the other hand, you now get a connection between your house and the power company’s system… I just hope they don’t have enough bandwidth for that to be really problematic.

The book does not try to sell us the kind of “hacking” that goes on TV with shows like CSI, that still seem to think that an electric outlet can get you to access the FBI databases, but, with due handwaving for a non-technical book, explain what the main problem is: you only need to trigger a pre-defined event. In the case of the device I named above, it would “just” take a malicious piece of software (that the book consistently called “logic bomb”) that expects a precise sequence of inputs to trigger a cascade failure… you could give those inputs through those devices and be near to unidentifiable (this reminds me of the slot-machines code in Ocean 13, but I digress once again).

Clarke’s repeats over and over that one of the main problems to solve is to make sure that you can trust the code that runs on your systems, and on that note, while he doesn’t make it too explicit, I think he would welcome that all critical systems were to run on open source software (he does mention open source as a tool in passing a couple of times — and goes on ranting about Microsoft for over an entire page!). While obviously open source does not guarantee you that there is no malicious code – it would be very very difficult to audit all of it and be sure that there are no intentional trigger sequence in it – certainly it makes it easier to spot those problems than closed-source software. He definitely seem to understand that security-by-obscurity is of no use, especially as he admits that both the US government and others (he refers mostly of China) have been scouring over the private “intellectual property” of manufacturers – it might be obscure to me and you, but sure as hell it is not obscure to a military force wanting to exploit it!

Put in the light of this book, efforts such as SELinux and the Coverity founding for auditing Linux and other components of Free operating systems make total sense. We can make use of those, but primarily, they want to be able to trust the code. Some of the ideas Clarke gives in the book seem a bit too much for me, though.

While the “back to mainframes” idea that he suggests, specifically noting that Vint Cerf wouldn’t agree with that, might sound the most strange option, I actually think he has a point there. Total division might be impossible for some tasks, are we’re definitely too used to be able to do everything and the kitchen sink with the same software, but it might be worth considering, in some domains at least.

The one that I find far fetched is the idea of having Artificial Intelligence write the software from scratch, to avoid the “human factor” introducing bugs by mistake, and backdoors by design. It’s not pure fear that makes me cringe there, but it’s rather the problem of chicken and egg: who writes the AI in the first place? We already have had examples of software being programmed to hide malicious code within other software (cfr. Reflection on Trusting Trust by Ken Thompson).

At any rate, this is one of those books that I will suggest you all to read. And at the end, it really gives you so many interesting points of view, that you’d really resent the way “cyber crime” is portrayed by CSI or – to a lesser extent, even! – Dan Brown.

Good news for my Readers

Sony Reader PRS-505 v1.1

I’ve been told I’m overly negative and critical about everything, be it Free Software or not… probably it’s true, the problem is that in my line of work, I always have to end up thinking for the worse, and that shows in my general behaviour. I’d rather be proven wrong, that something works fine, rather than be proven wrong that something crashes in flame.

But for once I am happy about something and I’m going to write about it. And going against the current trend, I’m also going to start with a “thanks, Sony!” — although this is not related at all with the PlayStation 3, of course.

Do you remember two years ago I bought a Reader (Sony PRS-505)? I’ve been using it on and off since then, although I ended up using it mostly for smaller things, plain text converter to LRF and similar, rather than full-blown books because of the problems with viewing PDF files.

Well, yesterday I received an advert from O’Reilly, where I bought some of the PDF books I have, and I noticed this line: “PDF, .epub, Kindle-compatible .mobi, and Android .apk” (with a link to a blog post announcement about the support for ePUB in the Reader… as it was, it happened just while I was playing with the Reader for most of my time and it always escaped me… probably because that was the time I ended up in the hospital as well).

The reason why I actually read through this O’Reilly advert (I admit I don’t usually look at that stuff much; I don’t send it to spam, but most of the promotional messages get sent to a different folder/label and it gets cleared out pretty soon), is that I am probably going to buy CJKV Information Processing as soon as I have time to read it (I was actually almost going to buy it right away because I needed it to finish a job, but since the customer might actually be considered “at risk” to me, this is not the case anymore).

Anyway, since I read this particular note I decided to go on to look at the O’Reilly website; after logging in, the “Electronic Media” section wasn’t working properly (I only got a “Data Error” loading), but a mail to O’Reilly (and about two hours) later, I could finally see the books I bought in June 2008, available in many different formats. Not all of them were available in all the formats, but all of them were available in PDF (which I already got) and ePub (which is what the Reader reads). I downloaded them, and went to the next step, getting them on the Reader itself.

The problem here was that… it had been months since I last used the Reader, so its battery was totally drained; it wouldn’t be such a problem… but it doesn’t charge via USB when it’s drained, it needs an external charger. While it has a connection for that, it didn’t come with a charger (and even that would have had limited use to me, given that this model wasn’t sold in Europe so in the best of cases it would have come with an American adapter). But now the connection looked more familiar to me, with a yellow color-code. A quick search later it turns out that Sony did something good: it uses the same charger cable than the PSP (PlayStation Portable)… which I got since last year, it was a Christmas present from my sister.

I actually wondered a bit about the relationship between the Reader and the PSP: not only they have the same charger, but they both use MemoryStick Pro Duo cards (and as I have used the SD card for other things, namely photos, since last time I used the Reader, I have set it up with a Memory Stick this time; I had a 1GB one floating around since I changed the PSP’s to 4GB to store digitally distributed games). On the other hand, Sony seems to have had enough common sense to support SD as well on the Reader.

Charged the Reader up, copied the ePub over with Calibre… it doesn’t find it, because I didn’t update the firmware when it was released. Okay, time to look for the firmware update, which can, obviously, be found on the Sony product page… but only for Windows. Now this is not nice, but it’s all too common… and similarly to Nokia, there is an absurdity: the management software is available for Mac OS X as well, but the firmware updater isn’t. Sigh. This is why I keep Windows virtual machines around.

After the upgrade, and after uploading all the books to the Reader, I was finally able to try it out, and the result is hell sweet! The O’Reilly ePubs look much much nicer than the sample books shipped with the reader itself, very readable even at “small” size. I’m most likely going to get the CJKV book named above from O’Reilly at this point, to read on the Reader (reasoning being that I’m scheduled to go around a lot in the next month, and going around with a 900 pages book on Eastern languages, in English, in Italy, really sells me away too much of a nerd… beside being clumsy).

This is a photo of my Reader displaying a page from the book Time Management for System Administrators that I’ve last been struggling to read with the “previous Reader” (which is actually the same hardware, but it really feels like something new now that I can use ePub). You can see from the photo that it reads actually quite nicely.

Sure, this model is now considered quite obsolete: it has no touchscreen, it’s slow, it has no wireless connection, and all that stuff, but it feels quite solid, and even if it’s two years later, the firmware update really gave it a new spin. And of course, given this version is basically “end-of-lifed”, there is an alternative Free firmware (given that the original firmware is based on Linux, this is not surprising), which seems to bridge some feature-gaps that were missing before.

Now, can somebody point me at some other publishers of ePub material? I don’t expect much to be available in ePub format for what concerns narrative, but maybe there is something out there. For what I can tell searching quickly, WHSmith has an eBook store, and it seems to have quite a few things that I’d be quite interested in (for instance the Wheel of Time series)… there are two problems with that: the books are only available as “Adobe Digital Editions” (so, DRM-locked), and I have no clue whether they sell to Italian customers. I could probably give it a try, the reduced price can be acceptable for some stuff at least. Unless, of course, it can be cleaned up in which case I’d probably give a lot of money to WHSmith…

Looking for PDF books suppliers

So, after my wondering about getting a Sony Reader, I actually ordered one today. On eBay (without laser engraving) as Sony’s shop doesn’t have it available anymore, and I’d rather do it sooner rather than later, as you’ll never know what might come up when you plan too far ahead.

The main use will certainly be to read the common PDF reference documentation as I’m plenty of that, and I often end up either printing it or not using it at all. Give me a few months and the amount of paper I’d be saving would be worth the money I spent on the reader ;)

But there are other books too; Pragmatic Bookshelf sells PDFs for books, and they have quite some interesting titles, so that’s also quite a big improvement.

The only thing that would be missing would be O’Reilly books (I don’t have many, but I’m interested in some from time to time, like the GNU Make book I linked before too. Sure I can live without them as PDF, but if possible, I’d like that option too :) For what I can see, they don’t sell the whole book as PDF on the standard store, you can buy chapters at $4, but that’s quite too much for a whole book.

Luca told me to look for the subscription option, which I suppose is Safari Book Online; it sounds interesting, but considering its cost, I’d rather be sure first if that subscription is what I need. So the question here is for hoosgot a Safari Book Online subscription already. Are books in the library downloadable as PDF? Or are chapters downloadable one by one?

Thanks in advance for the info.