Report from SCaLE13x

This year I have not been able to visit FOSDEM. Funnily enough this confirms the trend of me visiting FOSDEM only on even-numbered years, as I previously skipped 2013 as I was just out for my first and only job interview, and 2011 because of contract related timing. Since I still care for going to an open source conference early in the year, I opted instead for SCaLE, the timing of which fit perfectly my trip to Mountain View. It also allowed me to walk through Hermosa Beach once again.

So Los Angeles again it was, which meant I was able to meet with a few Gentoo developers, a few VideoLAN developers who also came all the way from Europe, and many friends who I have met at various previous conferences. It is funny how I end up meeting some people more often through conferences than I meet my close friends from back in Italy. I guess this is the life of the frequent travelers.

While my presence at SCaLE was mostly a way to meet some of the Gentoo devs that I had not met before, and see Hugo and Ludovic from VideoLAN who I missed at the past two meetings, I did pay some attention to the talks — I wish I could have had enough energy to go to more of them, but I was coming from three weeks straight of training, during which I sat for at least two hours a day in a room listening to talks on various technologies and projects… doing that in the free time too sounded like a bad idea.

What I found intriguing in the program, and in at least one of the talks I was able to attend, was that I could find at least a few topics that I wrote about in the past. Not only now containers are now all the rage, through Docker and other plumbing, but there was also a talk about static site generators, of which I wrote in 2009 and I’ve been using for much longer than that, out of necessity.

All in all, it was a fun conference and meeting my usual conference friends and colleagues is a great thing. And meeting the other Gentoo devs is what sparked my designs around TG4 which is good.

I would like to also thank James for suggesting me to use Tweetdeck during conferences, as it was definitely nicer to be able to keep track of what happened on the hashtag as well as the direct interactions and my personal stream. If you’re the occasional conferencegoer you probably want to look into it yourself. It also is the most decent way to look at Twitter during a conference on a tablet, as it does not require you to jump around between search pages and interactions (on a PC you can at least keep multiple tabs open easily.)

When (multimedia) fiefdoms crumble

Mike coined the term multimedia fiefdoms recently. He points to a number of different streaming, purchase and rental services for video content (movies, TV series) as the new battleground for users (consumers in this case). There are of course a few more sides in this battle, including music and books, but the idea is still perfectly valid.

What he didn’t get into the details of is what happens one of those fiefdoms capitulates, declaring itself won over, and goes away. It’s not a fun situation to be in, but we actually have plenty of examples of it, and these, more than anything else, should drive the discourse around and against DRM, in my opinion.

For some reasons, the main example of failed fiefdoms is to be found in books, and I lived through (and recounted) a few of those instances. For me personally, it all started four years ago, when I discovered Sony gave up on their LRF format and decided to adopt the “industry standard” ePub by supporting Adobe Digital Editions (ADEPT) DRM scheme on their devices. I was slow on the uptake, the announcement came two years earlier. For Sony, this meant tearing down their walled garden, even though they kept supporting the LRF format and their store for a while – they may even do still, I stopped following two years ago when I moved onto a Kindle – for the user it meant they were now free to buy books from a number of stores, including some publishers, bookstores with online presence and dedicated ebookstores.

But things didn’t always go smoothly: two years later, WHSmith partnered with Kobo, and essentially handed the latter all their online ebook market. When I read the announcement I was actually happy, especially since I could not buy books off WHSmith any more as they started looking for UK billing addresses. Unfortunately it also meant that only a third of the books that I bought from WHSmith were going to be ported over to Kobo due to an extreme cock-up with global rights even to digital books. If I did not go and break the DRM off all my ebooks for the sake of it, I would have lost four books, having to buy them anew again. Given this was not for the seller going bankrupt but for a sell-out of their customers, it was not understandable that they refused to compensate people. Luckily, it did port The Gone-Away World which is one of my favourite books.

Fast forward another year, and the Italian bookstore LaFeltrinelli decided to go the same way, with a major exception: they decided they would keep users on both platforms — that way if you want to buy a digital version of a book you’ll still buy it on the same website, but it’ll be provided by Kobo and in your Kobo library. And it seems like they at least have a better deal regarding books’ rights, as they seemed to have ported over most books anyway. But of course it did not work out as well as it should have been, throwing an error in my face and forcing me to call up Kobo (Italy) to have my accounts connected and the books ported.

The same year, I end up buying a Samsung Galaxy Note 10.1 2014 Edition, which is a pretty good tablet and has a great digitizer. Samsung ships Google Play in full (Store, Movies, Music, Books) but at the same time install its own App, Video, Music and Book store apps, it’s not surprising. But it does not take six months for them to decide that it’s not their greatest idea, in May this year, Samsung announced the turn down of their Music and Books stores — outside of South Korea at least. In this case there is no handover of the content to other providers, so any content bought on those platforms is just gone.

Not completely in vain; if you still have access to a Samsung device (and if you don’t, well, you had no access to the content anyway), a different kind of almost-compensation kicks in: the Korean company partnered with Amazon of all bookstores — surprising given that they are behind the new “Nook Tablet” by Barnes & Noble. Beside a branded «Kindle for Samsung» app, they provide one out of a choice of four books every month — the books are taken from Amazon’s KDP Select pool as far as I can tell, which is the same pool used as a base for the Kindle Owners’ Lending Library and the Kindle Unlimited offerings; they are not great but some of them are enjoyable enough. Amazon is also keeping honest and does not force you to read the books on your Samsung device — I indeed prefer reading from my Kindle.

Now the question is: how do you loop back all this to multimedia? Sure books are entertaining but they are by definition a single media, unless you refer to the Kindle Edition of American Gods. Well, for me it’s still the same problem of fiefdoms that Mike referred to; indeed every store used to be a walled garden for a long while, then Adobe came and conquered most with ePub and ADEPT — but then between Apple and their iBooks (which uses its own, incompatible DRM), and Amazon with the Kindle, the walls started crumbling down. Nowadays plenty of publishers allow you to buy the book, in ePub and usually many other formats at the same time, without DRM, because the publishers don’t care which device you want to read your book on (a Kindle, a Kobo, a Nook, an iPad, a Sony Reader, an Android tablet …), they only want for you to read the book, and get hooked, and buy more books.

Somehow the same does not seem to work for video content, although it did work to an extent, for a while at least, with music. But this is a different topic.

The reason why I’m posting this right now is that just today I got an email from Samsung that they are turning down their video store too — now their “Samsung Hub” platform gets to only push you games and apps, unless you happen to live in South Korea. It’s interesting to see how the battles between giants is causing small players to just get off the playing fields… but at the same time they bring their toys with them.

Once again, there is no compensation; if you rented something, watch it by the end of the year, if you bought something, sorry, you won’t be able to access it after new year. It’s a tough world. There is a lesson, somewhere, to be learnt about this.

Small differences don’t matter (to unpaper)

After my challenge with the fused multiply-add instructions I managed to find some time to write a new test utility. It’s written ad hoc for unpaper but it can probably be used for other things too. It’s trivial and stupid but it got the job done.

What it does is simple: it loads both a golden and a result image files, compares the size and format, and then goes through all the bytes to identify how many differences are there between them. If less than 0.1% of the image surface changed, it consider the test a pass.

It’s not a particularly nice system, especially as it requires me to bundle some 180MB of golden files (they compress to just about 10 MB so it’s not a big deal), but it’s a strict improvement compared to what I had before, which is good.

This change actually allowed me to explore one change that I abandoned before because it resulted in non-pixel-perfect results. In particular, unpaper now uses single-precision floating points all over, rather than doubles. This is because the slight imperfection caused by this change are not relevant enough to warrant the ever-so-slight loss in performance due to the bigger variables.

But even up to here, there is very little gain in performance. Sure some calculation can be faster this way, but we’re still using the same set of AVX/FMA instructions. This is unfortunate, unless you start rewriting the algorithms used for searching for edges or rotations, there is no gain to be made by changing the size of the code. When I converted unpaper to use libavcodec, I decided to make the code simple and as stupid as I could make it, as that meant I could have a baseline to improve from, but I’m not sure what the best way to improve it is, now.

I still have a branch that uses OpenMP for the processing, but since most of the filters applied are dependent on each other it does not work very well. Per-row processing gets slightly better results but they are really minimal as well. I think the most interesting parallel processing low-hanging fruit would be to execute processing in parallel on the two pages after splitting them from a single sheet of paper. Unfortunately, the loops used to do that processing right now are so complicated that I’m not looking forward to touch them for a long while.

I tried some basic profile-guided optimization execution, just to figure out what needs to be improved, and compared with codiff a proper release and a PGO version trained after the tests. Unfortunately the results are a bit vague and it means I’ll probably have to profile it properly if I want to get data out of it. If you’re curious here is the output when using rbelf-size -D on the unpaper binary when built normally, with profile-guided optimisation, with link-time optimisation, and with both profile-guided and link-time optimisation:

% rbelf-size -D ../release/unpaper ../release-pgo/unpaper ../release-lto/unpaper ../release-lto-pgo/unpaper
    exec         data       rodata        relro          bss     overhead    allocated   filename
   34951         1396        22284            0        11072         3196        72899   ../release/unpaper
   +5648         +312         -192           +0         +160           -6        +5922   ../release-pgo/unpaper
    -272           +0        -1364           +0         +144          -55        -1547   ../release-lto/unpaper
   +7424         +448        -1596           +0         +304          -61        +6519   ../release-lto-pgo/unpaper

It’s unfortunate that GCC does not give you any diagnostic on what it’s trying to do achieve when doing LTO, it would be interesting to see if you could steer the compiler to produce better code without it as well.

Anyway, enough with the microptimisations for now. If you want to make unpaper faster, feel free to send me pull requests for it, I’ll be glad to take a look at them!

The subtlety of modern CPUs, or the search for the phantom bug

Yesterday I have released a new version of unpaper which is now in Portage, even though is dependencies are not exactly straightforward after making it use libav. But when I packaged it, I realized that the tests were failing — but I have been sure to run the tests all the time while making changes to make sure not to break the algorithms which (as you may remember) I have not designed or written — I don’t really have enough math to figure out what’s going on with them. I was able to simplify a few things but I needed Luca’s help for the most part.

Turned out that the problem only happened when building with -O2 -march=native so I decided to restrict tests and look into it in the morning again. Indeed, on Excelsior, using -march=native would cause it to fail, but on my laptop (where I have been running the test after every single commit), it would not fail. Why? Furthermore, Luca was also reporting test failures on his laptop with OSX and clang, but I had not tested there to begin with.

A quick inspection of one of the failing tests’ outputs with vbindiff showed that the diffs would be quite minimal, one bit off at some non-obvious interval. It smelled like a very minimal change. After complaining on G+, Måns pushed me to the right direction: some instruction set that differs between the two.

My laptop uses the core-avx-i arch, while the server uses bdver1. They have different levels of SSE4 support – AMD having their own SSE4a implementation – and different extensions. I should probably have paid more attention here and noticed how the Bulldozer has FMA4 instructions, but I did not, it’ll show important later.

I decided to start disabling extensions in alphabetical order, mostly expecting the problem to be in AMD’s implementation of some instructions pending some microcode update. When I disabled AVX, the problem went away — AVX has essentially a new encoding of instructions, so enabling AVX causes all the instructions otherwise present in SSE to be re-encoded, and is a dependency for FMA4 instructions to be usable.

The problem was reducing the code enough to be able to figure out if the problem was a bug in the code, in the compiler, in the CPU or just in the assumptions. Given that unpaper is over five thousands lines of code and comments, I needed to reduce it a lot. Luckily, there are ways around it.

The first step is to look in which part of the code the problem appears. Luckily unpaper is designed with a bunch of functions that run one after the other. I started disabling filters and masks and I was able to limit the problem to the deskewing code — which is when most of the problems happened before.

But even the deskewing code is a lot — and it depends on at least some part of the general processing to be run, including loading the file and converting it to an AVFrame structure. I decided to try to reduce the code to a standalone unit calling into the full deskewing code. But when I copied over and looked at how much code was involved, between the skew detection and the actual rotation, it was still a lot. I decided to start looking with gdb to figure out which of the two halves was misbehaving.

The interface between the two halves is well-defined: the first return the detected skew, and the latter takes the rotation to apply (the negative value to what the first returned) and the image to apply it to. It’s easy. A quick look through gdb on the call to rotate() in both a working and failing setup told me that the returned value from the first half matched perfectly, this is great because it meant that the surface to inspect was heavily reduced.

Since I did not want to have to test all the code to load the file from disk and decode it into a RAW representation, I looked into the gdb manual and found the dump commands that allows you to dump part of the process’s memory into a file. I dumped the AVFrame::data content, and decided to use that as an input. At first I decided to just compile it into the binary (you only need to use xxd -i to generate C code that declares the whole binary file as a byte array) but it turns out that GCC is not designed to compile efficiently a 17MB binary blob passed in as a byte array. I then opted in for just opening the raw binary file and fread() it into the AVFrame object.

My original plan involved using creduce to find the minimal set of code needed to trigger the problem, but it was tricky, especially when trying to match a complete file output to the md5. I decided to proceed with the reduction manually, starting from all the conditional for pixel formats that were not exercised… and then I realized that I could split again the code in two operations. Indeed while the main interface is only rotate(), there were two logical parts of the code in use, one translating the coordinates before-and-after the rotation, and the interpolation code that would read the old pixels and write the new ones. This latter part also depended on all the code to set the pixel in place starting from its components.

By writing as output the calls to the interpolation function, I was able to restrict the issue to the coordinate translation code, rather than the interpolation one, which made it much better: the reduced test case went down to a handful of lines:

void rotate(const float radians, AVFrame *source, AVFrame *target) {
    const int w = source->width;
    const int h = source->height;

    // create 2D rotation matrix
    const float sinval = sinf(radians);
    const float cosval = cosf(radians);
    const float midX = w / 2.0f;
    const float midY = h / 2.0f;

    for (int y = 0; y < h; y++) {
        for (int x = 0; x < w; x++) {
            const float srcX = midX + (x - midX) * cosval + (y - midY) * sinval;
            const float srcY = midY + (y - midY) * cosval - (x - midX) * sinval;
            externalCall(srcX, srcY);
        }
    }
}

Here externalCall being a simple function to extrapolate the values, the only thing it does is printing them on the standard error stream. In this version there is still reference to the input and output AVFrame objects, but as you can notice there is no usage of them, which means that now the testcase is self-contained and does not require any input or output file.

Much better but still too much code to go through. The inner loop over x was simple to remove, just hardwire it to zero and the compiler still was able to reproduce the problem, but if I hardwired y to zero, then the compiler would trigger constant propagation and just pre-calculate the right value, whether or not AVX was in use.

At this point I was able to execute creduce; I only needed to check for the first line of the output to match the “incorrect” version, and no input was requested (the radians value was fixed). Unfortunately it turns out that using creduce with loops is not a great idea, because it is well possible for it to reduce away the y++ statement or the y < h comparison for exit, and then you’re in trouble. Indeed it got stuck multiple times in infinite loops on my code.

But it did help a little bit to simplify the calculation. And with again a lot of help by Måns on making sure that the sinf()/cosf() functions would not return different values – they don’t, also they are actually collapsed by the compiler to a single call to sincosf(), so you don’t have to write ugly code to leverage it! – I brought down the code to

extern void externCall(float);
extern float sinrotation();
extern float cosrotation();

static const float midX = 850.5f;
static const float midY = 1753.5f;

void main() {
    const float srcX = midX * cosrotation() - midY * sinrotation();
    externCall(srcX);
}

No external libraries, not even libm. The external functions are in a separate source file, and beside providing fixed values for sine and cosine, the externCall() function only calls printf() with the provided value. Oh if you’re curious, the radians parameter became 0.6f, because 0, 1 and 0.5 would not trigger the behaviour, but 0.6 which is the truncated version of the actual parameter coming from the test file, would.

Checking the generated assembly code for the function then pointed out the problem, at least to Måns who actually knows Intel assembly. Here follows a diff of the code above, built with -march=bdver1 and with -march=bdver1 -mno-fma4 — because turns out the instruction causing the problem is not an AVX one but an FMA4, more on that after the diff.

        movq    -8(%rbp), %rax
        xorq    %fs:40, %rax
        jne     .L6
-       vmovss  -20(%rbp), %xmm2
-       vmulss  .LC1(%rip), %xmm0, %xmm0
-       vmulss  .LC0(%rip), %xmm2, %xmm1
+       vmulss  .LC1(%rip), %xmm0, %xmm0
+       vmovss  -20(%rbp), %xmm1
+       vfmsubss        %xmm0, .LC0(%rip), %xmm1, %xmm0
        leave
        .cfi_remember_state
        .cfi_def_cfa 7, 8
-       vsubss  %xmm0, %xmm1, %xmm0
        jmp     externCall@PLT
 .L6:
        .cfi_restore_state

It’s interesting that it’s changing the order of the instructions as well, as well as the constants — for this diff I have manually swapped .LC0 and .LC1 on one side of the diff, as they would just end up with different names due to instruction ordering.

As you can see, the FMA4 version has one instruction less: vfmsubss replaces both one of the vmulss and the one vsubss instruction. vfmsubss is a FMA4 instruction that performs a Fused Multiply and Subtract operation — midX * cosrotation() - midY * sinrotation() indeed has a multiply and subtract!

Originally, since I was disabling the whole AVX instruction set, all the vmulss instructions would end up replaced by mulss which is the SSE version of the same instruction. But when I realized that the missing correspondence was vfmsubss and I googled for it, it was obvious that FMA4 was the culprit, not the whole AVX.

Great, but how does that explain the failure on Luca’s laptop? He’s not so crazy so use an AMD laptop — nobody would be! Well, turns out that Intel also have their Fused Multiply-Add instruction set, just only with three operands rather than four, starting from Haswell CPUs, which include… Luca’s laptop. A quick check on my NUC which also has a Haswell CPU confirms that the problem exists also for the core-avx2 architecture, even though the code diff is slightly less obvious:

        movq    -24(%rbp), %rax
        xorq    %fs:40, %rax
        jne     .L6
-       vmulss  .LC1(%rip), %xmm0, %xmm0
-       vmovd   %ebx, %xmm2
-       vmulss  .LC0(%rip), %xmm2, %xmm1
+       vmulss  .LC1(%rip), %xmm0, %xmm0
+       vmovd   %ebx, %xmm1
+       vfmsub132ss     .LC0(%rip), %xmm0, %xmm1
        addq    $24, %rsp
+       vmovaps %xmm1, %xmm0
        popq    %rbx
-       vsubss  %xmm0, %xmm1, %xmm0
        popq    %rbp
        .cfi_remember_state
        .cfi_def_cfa 7, 8

Once again I swapped .LC0 and .LC1 afterwards for consistency.

The main difference here is that the instruction for fused multiply-subtract is vfmsub132ss and a vmovaps is involved as well. If I read the documentation correctly this is because it stores the result in %xmm1 but needs to move it to %xmm0 to pass it to the external function. I’m not enough of an expert to tell whether gcc is doing extra work here.

So why is this instruction causing problems? Well, Måns knew and pointed out that the result is now more precise, thus I should not work it around. Wikipedia, as linked before, points also out why this happens:

A fused multiply–add is a floating-point multiply–add operation performed in one step, with a single rounding. That is, where an unfused multiply–add would compute the product b×c, round it to N significant bits, add the result to a, and round back to N significant bits, a fused multiply–add would compute the entire sum a+b×c to its full precision before rounding the final result down to N significant bits.

Unfortunately this does mean that we can’t have bitexactness of images for CPUs that implement fused operations. Which means my current test harness is not good, as it compares the MD5 of the output with the golden output from the original test. My probable next move is to use cmp to count how many bytes differ from the “golden” output (the version without optimisations in use), and if the number is low, like less than 1‰, accept it as valid. It’s probably not ideal and could lead to further variation in output, but it might be a good start.

Optimally, as I said a long time ago I would like to use a tool like pdiff to tell whether there is actual changes in the pixels, and identify things like 1-pixel translation to any direction, which would be harmless… but until I can figure something out, it’ll be an imperfect testsuite anyway.

A huge thanks to Måns for the immense help, without him I wouldn’t have figured it out so quickly.

Conferencing

This past weekend I had the honor of hosting the VideoLAN Dev Days 2014 in Dublin, in the headquarters of my employer. This is the first time I organize a conference (or rather help organize it, Audrey and our staff did most of the heavy lifting), and I made a number of mistakes, but I think I can learn from them and be better the next time I’ll try something like this.

_MG_8424.jpg
Photo credit: me

Organizing an event in Dublin has some interesting and not-obvious drawbacks, one of which is the need for a proper visa for people who reside in Europe but are not EEA citizens, thanks to the fact that Ireland is not part of Schengen. I was expecting at least UK residents not to need any scrutiny, but Derek proved me wrong as he had to get an (easy) visa at entrance.

Getting just shy of a hundred people in a city like Dublin, which is by far not a metropolis like Paris or London would be is an interesting exercise, yes we had the space for the conference itself, but finding hotels and restaurants for the amount of people became tricky. A very positive shout out is due to Yamamori Sushi that hosted the whole of us without a fixed menu and without a hitch.

As usual, meeting in person with the people you work with in open source is a perfect way to improve collaboration — knowing how people behave face to face makes it easier to understand their behaviour online, which is especially useful if the attitudes can be a bit grating online. And given that many people, including me, are known as proponent of Troll-Driven Development – or Rant-Driven Development given that people like Anon, redditors and 4channers have given an even worse connotation to Troll – it’s really a requirement, if you are really interested to be part of the community.

This time around, I was even able to stop myself from gathering too much swag! I decided not to pick up a hoodie, and leave it to people who would actually use it, although I did pick up a Gandi VLC shirt. I hope I’ll be able to do that at LISA as I’m bound there too, and last year I came back with way too many shirts and other swag.

Ramblings on audiobooks

In one of my previous posts I have noted I’m an avid audiobook consumer. I started when I was at the hospital, because I didn’t have the energy to read — and most likely, because of the blood sugar being out of control after coming back from the ICU: it turns out that blood sugar changes can make your eyesight go crazy; at some point I had to buy a pair of €20 glasses simply because my doctor prescribed me a new treatment and my eyesight ricocheted out of control for a week or so.

Nowadays, I have trouble sleeping if I’m not listening to something, and I end up with the Audible app installed in all my phones and tablets, with at least a few books preloaded whenever I travel. Of course as I said, I keep the majority of my audiobooks in the iPod, and the reason is that while most of my library is on Audible, not all of it is. There are a few books that I have bought on iTunes before finding out about Audible, and then there are a few I received in CD form, including The Hitchhiker’s Guide To The Galaxy Complete Radio Series which is my among my favourite playlists.

Unfortunately, to be able to convert these from CD to a format that the iPod could digest, I ended up having to buy a software called Audiobook Builder for Mac, which allows you to rip CDs and build M4B files out of them. What’s M4B? It’s the usual mp4 format container, just with an extension that makes iTunes consider it an audiobook, and with chapter markings in the stream. At the time I first ripped my audiobooks, ffmpeg/libav had no support for chapter markings, so that was not an option. I’ve been told that said support is there now, but I have not tried getting it to work.

Indeed, what I need to find out is how to build an audiobook file out of a string of mp3 files, and I have no idea how to fix that now that I no longer have access to my personal iTunes account on a mac to re-download the Audiobook Builder and process them. In particular, the list of mp3s that I’m looking forward to merge together are the years 2013 and 2014 of BBC’s The News Quiz, to which I’m addicted and listen continuously. Being able to join them all together so I can listen to them with a multi-day-running playlist is one of the very few things that still let me sleep relatively calmly — I say relatively because I really don’t remember when was the last time I have slept soundly in about an year by now.

Essentially, what I’d like is for Audible to let me sideload some content (the few books I did not buy from them, and the News Quiz series that I stitch together from the podcast), and create a playlist — then for what I’m concerned I don’t have to use an iPod at all. Well, beside the fact that I’d have to find a way to shut up notifications while playing audiobooks. Having Dragons of Autumn Twilight interrupted by the Facebook pop notification is not something that I’m looking forward for most of the time. And in some cases I even have had some background update disrupting my playback so there is definitely space for improvement.

Did Apple lose its advantage?

Readers of my blog for a while probably know already that I’ve been an Apple user over time. What is not obvious is that I have scaled down my (personal) Apple usage over the past two years, mostly because my habits, and partly because of Android and Linux getting better and better. One component is, though, that some of the advantages to be found when using Apple started to disappear for me.

I think that for me the start of the problems is to be found in the release of iOS 7. Beside the taste of not liking the new flashy UI, what I found is that it did not perform as well as previous releases. I think this is the same effect others have had. In particular the biggest problem with it for me had to do with the way I started using my iPad while in Ireland. Since I now have access to a high-speed connection, I started watching more content in streaming. In particular, thanks to my multiple trips to the USA over the past year, I got access to more video content on the iTunes store, so I wanted to watch some of the new TV series through it.

Turned out that for a few versions, and I mean a few months, iOS was keeping the streamed content in the cache, not accounting for it anywhere, and never cleaning it up. The result was that after streaming half a series, I would get errors telling me the iPad storage was full, but there was no way from the device itself to clear the cache. EIther you had to do a factory reset to drop off all the content of the device, or you had to use a Windows application to remove the cache files manually. Not very nice.

Another very interesting problem with the streaming the content: it can be slow. Not always but it can. One night I wanted to watch The LEGO Movie since I did not see it at the cinema. It’s not available on the Irish Netflix so I decided to rent it off iTunes. It took the iPad four hours to download it. It made no sense. And no, the connection was not hogged by something else, and running a SpeedTest from the tablet itself showed it had all the network capacity it needed.

The iPad is not, though, the only Apple device I own; I also bought an iPod Touch back in LA when my Classic died. even though I was not really happy with downgrading from 80G down to 64G. But it’s mostly okay, as my main use for the iPod is to listen to audiobooks and podcasts when I sleep — which recently I have been doing through Creative D80 Bluetooth speakers, which are honestly not great but at least don’t force me to wear earphones all night long.

I had no problem before switching the iPod from one computer to the next, as I moved from iMac to a Windows disk for my laptop. When I decided to just use iTunes on the one Windows desktop I keep around (mostly to play games), then a few things stopped working as intended. It might have been related to me dropping the iTunes Match subscription, but I’m not sure about that. But what happens is that only a single track for each of the albums was being copied on the iPod and nothing else.

I tried factory reset, cable and wireless sync, I tried deleting the iTunes data on my computer to force it to figure out the iPod is new, and the current situation I’m in is only partially working: the audiobooks have been synced, but without cover art and without the playlists — some of the audiobooks I have are part of a series, or are split in multiple files if I bought them before Audible started providing single-file downloads. This is of course not very good when the audio only lasts three hours, and then I start having nightmares.

It does not help that I can’t listen to my audiobooks with VLC for Android because it thinks that the chapter art is a video stream, and thus puts the stream to pause as soon as I turn off the screen. I should probably write a separate rant about the lack of proper audiobooks tools for Android. Audible has an app, but it does not allow you to sideload audiobooks (i.e. stuff I ripped from my original CDs, or that I bought on iTunes), nor it allows you to build a playlist of books, say for all the books in a series.

As I write this, I asked iTunes again to sync all the music to my iPod Touch as 128kbps AAC files (as otherwise it does not fit into the device); iTunes is now copying 624 files; I’m sure my collection contains more than 600 albums — and I would venture to say more than half I have in physical media. Mostly because no store allows me to buy metal in FLAC or ALAC. And before somebody suggests Jamendo or other similar services: yes, great, I actually bought lots of Jazz on Magnatune before it became a subscription service and I loved it, but that is not a replacement for mainstream content. Also, Magnatune has terrible security practices, don’t use it.

Sorry Apple, but given these small-but-not-so-small issues with your software recently, I’m not going to buy any more devices from you. If any of the two devices I have fails, I’ll just get someone to build a decent audiobook software for me one way or the other…

unpaper and libav status update

The other day I wrote about unpaper and the fact that I was working on making it use libav for file input. I have now finished converting unpaper (in a branch) so that it does not use its own image structure, but rather the same AVFrame structure that libav uses internally and externally. This meant not only supporting stripes, but using the libav allocation functions and pixel formats.

This also enabled me to use libav for file output as well as input. While for the input I decided to add support for formats that unpaper did not read before, for output at the moment I’m sticking with the same formats as before. Mostly because the one type of output file I’d like to support is not currently supported by libav properly, so it’ll take me quite a bit longer to be able to use it. For the curious, the format I’m referring to is multipage TIFF. Right now libav only supports single-page TIFF and it does not support JPEG-compressed TIFF images, so there.

Originally, I planned to drop compatibility with previous unpaper version, mostly because to drop the internal structure I was going to lose the input format information for 1-bit black and white images. At the end I was actually able to reimplement the same feature in a different way, and so I restored that support. The only compatibility issue right now is that the -depth parameter is no longer present, mostly because it and -type constrained the same value (the output format).

To reintroduce the -depth parameter, I want to support 16-bit gray. Unfortunately to do so I need to make more fundamental changes to the code, as right now it expects to be able to get the full value at most at 24 bit — and I’m not sure how to scale a 16-bit grayscale to 24-bit RGB and maintain proper values.

While I had to add almost as much code to support the libav formats and their conversion as there was there to load the files, I think this is still a net win. The first point is that there is no format parsing code in unpaper, which means that as long as the pixel format is something that I can process, any file that libav supports now or will support in the future will do. Then there is the fact that I ended up making the code “less smart” by removing codepath optimizations such as “input and output sizes match, so I won’t be touching it, instead I’ll copy one structure on top of the other”, which means that yes, I probably lost some performance, but I also gained some sanity. The code was horribly complicated before.

Unfortunately, as I said in the previous post, there are a couple of features that I would have preferred if they were implemented in libav, as that would mean they’d be kept optimized without me having to bother with assembly or intrinsics. Namely pixel format conversion (which should be part of the proposed libavscale, still not reified), and drawing primitives, including bitblitting. I think part of this is actually implemented within libavfilter but as far as I know it’s not exposed for other software to use. Having optimized blitting, especially “copy this area of the image over to that other image” would be definitely useful, but it’s not a necessary condition for me to release the current state of the code.

So current work in progress is to support grayscale TIFF files (PAL8 pixel format), and then I’ll probably turn to libav and try to implement JPEG-encoded TIFF files, if I can find the time and motivation to do so. What I’m afraid of is having to write conversion functions between YUV and RGB, I really don’t look forward to that. In the mean time, I’ll keep playing Tales of Graces f because I love those kind of games.

Also, for those who’re curious, the development of this version of unpaper is done fully on my ZenBook — I note this because it’s the first time I use a low-power device to work on a project that actually requires some processing power to build, but the results are not bad at all. I only had to make sure I had swap enabled: 4GB of RAM are no longer enough to have Chrome open with a dozen tabs, and a compiler in the background.

unpaper and libav

I’ve resumed working on unpaper since I have been using it more than a couple of times lately and there has been a few things that I wanted to fix.

What I’ve been working on now is a way to read input files in more formats; I was really aggravated by the fact that unpaper implemented its own loading of a single set of file formats (the PPM “rawbits”); I went on to look into libraries that abstract access to image formats, but I couldn’t find one that would work for me. At the end I settled for libav even though it’s not exactly known for being an image processing library.

My reasons to choose libav was mostly found in the fact that, while it does not support all the formats I’d like to have supported in unpaper (PS and PDF come to mind), it does support the formats that it supports now (PNM and company), and I know the developers well enough that I can get bugs and features fixed or implemented as needed.

I have now a branch can read files by using libav. It’s a very naïve implementation of it though: it reads the image into an AVFrame structure and then convert that into unpaper’s own image structure. It does not even free up the AVFrame, mostly because I’d actually like to be able to use AVFrame instead of unpaper’s structure. Not only to avoid copying memory when it’s not required (libav has functions to do shallow-copy of frames and mark them as readable when needed), but also because the frames themselves already contain all the needed information. Furthermore, libav 12 is likely going to include libavscale (or so Luca promised!) so that the on-load conversion can also be offloaded to the library.

Even with the naïve implementation that I implemented in half an afternoon, unpaper not only supports the same input file as before, but also PNG (24-bit non-alpha colour files are loaded the same way as PPM, 1-bit black and white is inverted compared to PBM, while 8-bit grayscale is actually 16-bit with half of it defining the alpha channel) and very limited TIFF support (1-bit is the same as PNG; 8-bit is paletted so I have not implemented it yet, and as for colour, I found out that libav does not currently support JPEG-compressed TIFF – I’ll work on that if I can – but otherwise it is supported as it’s simply 24bpp RGB).

What also needs to be done is to write out the file using libav too. While I don’t plan to allow writing files in any random format with unpaper, I wouldn’t mind being able to output through libav. Right now the way this is implemented, the code does explicit conversion back or forth between black/white, grayscale and colour at save time, and this is nothing different than the same conversion that happens at load time, and should rather be part of libavscale when that exists.

Anyway, if you feel like helping with this project, the code is on GitHub and I’ll try to keep it updated soon.

XBMC part 2

I have posted about me setting up a new box for XBMC and here is a second part to that post, now that I arrived to Dublin and I actually set it up on my living room as part of my system. There are a few things that needs better be described.

The first problem I had was how to set up the infrared receiver for the remote control. I originally intended to use my Galaxy Note as it has an IR blaster for I have no idea what reason; but then I realized I have a better option.

While the NUC does not, unfortunately, support CEC input, my receiver, a Yamaha RX-V475 comes with a programmable remote controller, which – after a very quick check cat-ing the event input device node – appeared sending signals in the right frequency for the built-in IR sensor to pick it up. So the question was to find a way to map the buttons on the remote to action to XBMC.

Important note: a lot of the documentation out there tells you that the nuvoton driver is buggy and requires to play with /sys files and the DSDT tables. This is outdated, just make sure you use kernel version 3.15 or later and it works perfectly fine.

The first obvious option, which I have seen documented basically everywhere, is to use lirc. Now that’s a piece of software that I know a little too well for comfort. Not everybody knows this, both because I went by a different nickname at the time, and because it happened a long time before I joined Gentoo, and definitely before I started keeping a blog. But as things are, back in the days when Linux 2.5 was a thing, I did the first initial port of the lirc driver to a newer kernel, mostly as an external patch to apply on top of the kernel. I even implemented devfs, since while I was doing that I finally moved to Gentoo. and I needed devfs to use it.

I wanted to find an alternative to using lirc for this and other reasons. Among other things, last time I have used it, I was using it on computer that was not dedicated as an HTPC, so this looked like a much easier task with a single user-facing process in the system. After looking around quite a bit I found that you can make the driver output X-compatible key events instead of IR events by loading the right keymap. While there are multiple ways to do this, I ended up using ir-keytable which comes in v4l-utils.

The remote control only had to be set to send codes for a VDR for the brand “Microsoft” — which I assume puts it in a mode compatible with Windows XP Media Center Edition. Funnily enough they actually put a separate section for Apple TV codes. After that, the RC6/MCE table can be used, and that will send proper keypresses fr things like the arrows and the number buttons.

I only had to change a couple of keys, namely Enter and Exit to send KEY_RETURN and KEY_BACKSPACE respectively, so that they map to actions in XBMC. It would probably be simple enough to change the bindings to XBMC directly, but I find it more reliable for it to send a different key altogether. The trick is to edit /etc/rc_keymaps/rc6_mce to change the key that is sent, and then re-run ir-keytable -a /etc/rc_maps.cfg, and the problem is solved (udev rules are in place so that the map is loaded at reboot).

And this is one more problem solved, now I’m actually watching things with XBMC so it seems to be working fine.