Reverse Engineering is just the first step

Last year I said that reverse engineering obsolete systems is useful giving as an example adding Coreboot support for very old motherboards that are simpler and whose components are more likely to have been described somewhere already. One thing that I realized I didn’t make very clear in that post is that there is an important step on reverse engineering: documenting. As you can imagine from this blog, I think that documenting the reverse engineering processes and results are important, but I found out that this is definitely not the case for everybody.

On the particularly good side, going to 33c3 had a positive impression on me. Talks such as The Ultimate GameBoy Talk were excellent: Michael Steil did an awesome job at describing a lot of the unknown details of Nintendo’s most popular handheld. He also did a great job at showing practical matters, such as which tricks did various games use to implement things that at first sight would look impossible. And this is only one of his talks, he has a series that is going on year after year, I’ve watched his talk about the Commodore 64, and the only reason why it’s less enjoyable to watch is that the recording quality suffers from the ages.

In other posts I already referenced Micah’s videos. These have also been extremely nice to start watching, as she does a great job at explaining complex concepts, and even the “stream of consciousness” streams are very interesting and a good time to learn new tricks. What attracted me to her content, though, is the following video:

I have been using Wacom tablets for years, and I had no idea how they really worked behind the scene. Not only she does a great explanation of the technology in general, but the teardown of the mouse was also awesome with full schematics and explanation of the small components. No wonder I have signed up for her Patreon right away: she deserve to be better known and have a bigger following. And if funding her means spreading more knowledge around, well, then I’m happy to do my bit.

For the free software, open source and hacking community, reverse engineering is only half the process. The endgame is not for one person to know exactly how something works, but rather for the collectivity to gain more insight on things, so that more people have access to the information and can even improve on it. The community needs not only to help with that but also to prioritise projects that share information. And that does not just mean writing blogs about things. I said this before: blogs don’t replace documentation. You can see blogs as Micah’s shop-streaming videos, while documentation is more like her video about the tablets: they synthesize documentation in actually usable form, rather than just throwing information around.

I have a similar problem of course: my blog posts are not usually a bit of a stream of consciousness and they do not serve an useful purpose to capture the factual state of information. Take for example my post about reverse engineering the OneTouch Verio and its rambling on, then compare it with the proper protocol documentation. The latter is the actual important product, compared to my ramblings, and that is the one I can be proud of. I would also argue that documenting these things in a easily consumable form is more important than writing tools implementing them as those only cover part of the protocol and in particular can only leverage my skills, that do not involve statistical, pharmaceutical or data visualisation skills.

Unfortunately there are obstacles to these idea of course. Sometimes, reverse engineering documentation is attacked by manufacturer even more than code implementing the same information. So for instance while I have some information I still haven’t posted about a certain gaming mouse, I already know that the libratbag people do not want documentation of the protocols in their repository or wiki, because it causes them more headaches than the code. And then of course there is the problem of hosting this documentation somewhere.

I have been pushing my documentation on GitHub, hoping nobody causes a stink, but the good thing about using git rather than Wiki or similar tools is exactly that you can just move it around without losing information. This is not always the case: a lot of documentation is still nowadays only available either as part of code itself, or on various people’s homepages. And there are at least two things that can happen with that, the first is the most obvious and morbid one: the author of the documentation dies, and the documentation disappears once their domain registration expires, or whatever else, or if the homepage is at a given university or other academic endeavour, it may very well be that the homepage gets to disappear before the person anyway.

I know a few other alternatives to store this kind of data have been suggested, including common wiki akin to Wikipedia, but allowing for original research, but I am still uncertain that is going to be very helpful. The most obvious thing I can think of, is making sure these information can actually be published in books. And I think that at least No Starch Press has been doing a lot for this, publishing extremely interesting books including Designing BSD Rootkits and more recently Rootkits and Bootkits which is still in Early Access. A big kudos to Bill for this.

From my side, I promise I’ll try to organize my findings of anything I’ll work on in the best of my ability, and possibly organize it in a different form than just a blog, because the community deserves better.

Choosing a license for my static website framework

You might remember that some time ago I wrote a static website generator and that I wanted to release it as Free Software at some point in time.

Well, right now I’m using that code to create three different websites – mine the one for my amateur director friend and the one for a Metal band – which is not too shabby for something that started as a way to avoid repeating the same code over and over again, and it actually grew bigger than I expected at first.

Right now, it not only generates the pages, but also the sitemap and to some extent the robots.txt (by providing for instance the link to the sitemap itself). It can generate pages that link to Flickr photos and albums, including providing descriptions and having a gallery-like showcase page, and it also has some limited support for YouTube videos (the problem there is that YouTube does not have a RESTful API; I can implement REST calls through XSLT, but I don’t think I would be able to program GData protocol with that).

Last week, I was clearing up the code a bit more, because I’m soon going to use it for a new website (for a game – not video game – a friend of mine invented and is producing) and ended up finding some interesting documentation from Yahoo! on providing semantic information for their search engine (and I guess, to some extent, Google as well).

This brought up two questions to me:

  • is it worth keeping working on this framework based on XSLT alone? As I said, Flickr support was piece-of-cake, because the API they use is REST-based, but YouTube’s GData-based API definitely require something “more”. And at the same time, even Flickr gallery wrapping has been a bit of a problem, because I cannot really properly paginate using XSLT 1.0 (and libxslt does not support XSLT 2.0, where the iterators I needed are implemented). While I like the consistent generation of code, I start to feel like it needs something to pre-process the data before sending it out; for instance I could make some program just filter the references to youtube videos, write down an XML description of them, downloaded with GData, and then let XSLT handle that. Or cache the Flickr photos (which would be very good to avoid requesting all the photos’ details every time the website is updated);
  • I finally want to publish FSWS to the public; even if – or maybe especially – I want to discontinue it or part of it, or morph into something that is less “pure” than what I have now. What I’m not sure about is which license to use. I don’t want to make it just GPL, as that implies that you can modify it and never give anything back, since you won’t redistribute the framework, but the results; AGPL-3 sounds more like it, but I don’t want to make the pages generated by the framework to apply that license. Does anybody have an idea?

I’m also open to suggestion on how something like this should work. I really would prefer if the original content is written simply in XML: it’s close enough to the output format (XHMTL/HTML5) and shouldn’t be much of a trouble to write. The less vague idea I have on the matter is to use multiple steps of XML conversion; already the framework uses some nasty two-pass conversion of the input document (it splits in N-branches depending on the configured languages, then it processes the two branches almost independently to produce the output), and since some content is generated by the first pass, it’s also difficult to make sure that all the references are there for links and similar.

It would be easier if I could write my own xslt functions: I could just replace an element referring to a youtube video with a reference to a (cached) XML document, and similarly for Flickr photos. But to do so I guess I’ll either have to use JavaScript and an XSLT processor that supports it, or I should write my own libxslt-based processor that can understand some special functions to deal with GData and similar.