The subtlety of modern CPUs, or the search for the phantom bug

Yesterday I have released a new version of unpaper which is now in Portage, even though is dependencies are not exactly straightforward after making it use libav. But when I packaged it, I realized that the tests were failing — but I have been sure to run the tests all the time while making changes to make sure not to break the algorithms which (as you may remember) I have not designed or written — I don’t really have enough math to figure out what’s going on with them. I was able to simplify a few things but I needed Luca’s help for the most part.

Turned out that the problem only happened when building with -O2 -march=native so I decided to restrict tests and look into it in the morning again. Indeed, on Excelsior, using -march=native would cause it to fail, but on my laptop (where I have been running the test after every single commit), it would not fail. Why? Furthermore, Luca was also reporting test failures on his laptop with OSX and clang, but I had not tested there to begin with.

A quick inspection of one of the failing tests’ outputs with vbindiff showed that the diffs would be quite minimal, one bit off at some non-obvious interval. It smelled like a very minimal change. After complaining on G+, Måns pushed me to the right direction: some instruction set that differs between the two.

My laptop uses the core-avx-i arch, while the server uses bdver1. They have different levels of SSE4 support – AMD having their own SSE4a implementation – and different extensions. I should probably have paid more attention here and noticed how the Bulldozer has FMA4 instructions, but I did not, it’ll show important later.

I decided to start disabling extensions in alphabetical order, mostly expecting the problem to be in AMD’s implementation of some instructions pending some microcode update. When I disabled AVX, the problem went away — AVX has essentially a new encoding of instructions, so enabling AVX causes all the instructions otherwise present in SSE to be re-encoded, and is a dependency for FMA4 instructions to be usable.

The problem was reducing the code enough to be able to figure out if the problem was a bug in the code, in the compiler, in the CPU or just in the assumptions. Given that unpaper is over five thousands lines of code and comments, I needed to reduce it a lot. Luckily, there are ways around it.

The first step is to look in which part of the code the problem appears. Luckily unpaper is designed with a bunch of functions that run one after the other. I started disabling filters and masks and I was able to limit the problem to the deskewing code — which is when most of the problems happened before.

But even the deskewing code is a lot — and it depends on at least some part of the general processing to be run, including loading the file and converting it to an AVFrame structure. I decided to try to reduce the code to a standalone unit calling into the full deskewing code. But when I copied over and looked at how much code was involved, between the skew detection and the actual rotation, it was still a lot. I decided to start looking with gdb to figure out which of the two halves was misbehaving.

The interface between the two halves is well-defined: the first return the detected skew, and the latter takes the rotation to apply (the negative value to what the first returned) and the image to apply it to. It’s easy. A quick look through gdb on the call to rotate() in both a working and failing setup told me that the returned value from the first half matched perfectly, this is great because it meant that the surface to inspect was heavily reduced.

Since I did not want to have to test all the code to load the file from disk and decode it into a RAW representation, I looked into the gdb manual and found the dump commands that allows you to dump part of the process’s memory into a file. I dumped the AVFrame::data content, and decided to use that as an input. At first I decided to just compile it into the binary (you only need to use xxd -i to generate C code that declares the whole binary file as a byte array) but it turns out that GCC is not designed to compile efficiently a 17MB binary blob passed in as a byte array. I then opted in for just opening the raw binary file and fread() it into the AVFrame object.

My original plan involved using creduce to find the minimal set of code needed to trigger the problem, but it was tricky, especially when trying to match a complete file output to the md5. I decided to proceed with the reduction manually, starting from all the conditional for pixel formats that were not exercised… and then I realized that I could split again the code in two operations. Indeed while the main interface is only rotate(), there were two logical parts of the code in use, one translating the coordinates before-and-after the rotation, and the interpolation code that would read the old pixels and write the new ones. This latter part also depended on all the code to set the pixel in place starting from its components.

By writing as output the calls to the interpolation function, I was able to restrict the issue to the coordinate translation code, rather than the interpolation one, which made it much better: the reduced test case went down to a handful of lines:

void rotate(const float radians, AVFrame *source, AVFrame *target) {
    const int w = source->width;
    const int h = source->height;

    // create 2D rotation matrix
    const float sinval = sinf(radians);
    const float cosval = cosf(radians);
    const float midX = w / 2.0f;
    const float midY = h / 2.0f;

    for (int y = 0; y < h; y++) {
        for (int x = 0; x < w; x++) {
            const float srcX = midX + (x - midX) * cosval + (y - midY) * sinval;
            const float srcY = midY + (y - midY) * cosval - (x - midX) * sinval;
            externalCall(srcX, srcY);

Here externalCall being a simple function to extrapolate the values, the only thing it does is printing them on the standard error stream. In this version there is still reference to the input and output AVFrame objects, but as you can notice there is no usage of them, which means that now the testcase is self-contained and does not require any input or output file.

Much better but still too much code to go through. The inner loop over x was simple to remove, just hardwire it to zero and the compiler still was able to reproduce the problem, but if I hardwired y to zero, then the compiler would trigger constant propagation and just pre-calculate the right value, whether or not AVX was in use.

At this point I was able to execute creduce; I only needed to check for the first line of the output to match the “incorrect” version, and no input was requested (the radians value was fixed). Unfortunately it turns out that using creduce with loops is not a great idea, because it is well possible for it to reduce away the y++ statement or the y < h comparison for exit, and then you’re in trouble. Indeed it got stuck multiple times in infinite loops on my code.

But it did help a little bit to simplify the calculation. And with again a lot of help by Måns on making sure that the sinf()/cosf() functions would not return different values – they don’t, also they are actually collapsed by the compiler to a single call to sincosf(), so you don’t have to write ugly code to leverage it! – I brought down the code to

extern void externCall(float);
extern float sinrotation();
extern float cosrotation();

static const float midX = 850.5f;
static const float midY = 1753.5f;

void main() {
    const float srcX = midX * cosrotation() - midY * sinrotation();

No external libraries, not even libm. The external functions are in a separate source file, and beside providing fixed values for sine and cosine, the externCall() function only calls printf() with the provided value. Oh if you’re curious, the radians parameter became 0.6f, because 0, 1 and 0.5 would not trigger the behaviour, but 0.6 which is the truncated version of the actual parameter coming from the test file, would.

Checking the generated assembly code for the function then pointed out the problem, at least to Måns who actually knows Intel assembly. Here follows a diff of the code above, built with -march=bdver1 and with -march=bdver1 -mno-fma4 — because turns out the instruction causing the problem is not an AVX one but an FMA4, more on that after the diff.

        movq    -8(%rbp), %rax
        xorq    %fs:40, %rax
        jne     .L6
-       vmovss  -20(%rbp), %xmm2
-       vmulss  .LC1(%rip), %xmm0, %xmm0
-       vmulss  .LC0(%rip), %xmm2, %xmm1
+       vmulss  .LC1(%rip), %xmm0, %xmm0
+       vmovss  -20(%rbp), %xmm1
+       vfmsubss        %xmm0, .LC0(%rip), %xmm1, %xmm0
        .cfi_def_cfa 7, 8
-       vsubss  %xmm0, %xmm1, %xmm0
        jmp     externCall@PLT

It’s interesting that it’s changing the order of the instructions as well, as well as the constants — for this diff I have manually swapped .LC0 and .LC1 on one side of the diff, as they would just end up with different names due to instruction ordering.

As you can see, the FMA4 version has one instruction less: vfmsubss replaces both one of the vmulss and the one vsubss instruction. vfmsubss is a FMA4 instruction that performs a Fused Multiply and Subtract operation — midX * cosrotation() - midY * sinrotation() indeed has a multiply and subtract!

Originally, since I was disabling the whole AVX instruction set, all the vmulss instructions would end up replaced by mulss which is the SSE version of the same instruction. But when I realized that the missing correspondence was vfmsubss and I googled for it, it was obvious that FMA4 was the culprit, not the whole AVX.

Great, but how does that explain the failure on Luca’s laptop? He’s not so crazy so use an AMD laptop — nobody would be! Well, turns out that Intel also have their Fused Multiply-Add instruction set, just only with three operands rather than four, starting from Haswell CPUs, which include… Luca’s laptop. A quick check on my NUC which also has a Haswell CPU confirms that the problem exists also for the core-avx2 architecture, even though the code diff is slightly less obvious:

        movq    -24(%rbp), %rax
        xorq    %fs:40, %rax
        jne     .L6
-       vmulss  .LC1(%rip), %xmm0, %xmm0
-       vmovd   %ebx, %xmm2
-       vmulss  .LC0(%rip), %xmm2, %xmm1
+       vmulss  .LC1(%rip), %xmm0, %xmm0
+       vmovd   %ebx, %xmm1
+       vfmsub132ss     .LC0(%rip), %xmm0, %xmm1
        addq    $24, %rsp
+       vmovaps %xmm1, %xmm0
        popq    %rbx
-       vsubss  %xmm0, %xmm1, %xmm0
        popq    %rbp
        .cfi_def_cfa 7, 8

Once again I swapped .LC0 and .LC1 afterwards for consistency.

The main difference here is that the instruction for fused multiply-subtract is vfmsub132ss and a vmovaps is involved as well. If I read the documentation correctly this is because it stores the result in %xmm1 but needs to move it to %xmm0 to pass it to the external function. I’m not enough of an expert to tell whether gcc is doing extra work here.

So why is this instruction causing problems? Well, Måns knew and pointed out that the result is now more precise, thus I should not work it around. Wikipedia, as linked before, points also out why this happens:

A fused multiply–add is a floating-point multiply–add operation performed in one step, with a single rounding. That is, where an unfused multiply–add would compute the product b×c, round it to N significant bits, add the result to a, and round back to N significant bits, a fused multiply–add would compute the entire sum a+b×c to its full precision before rounding the final result down to N significant bits.

Unfortunately this does mean that we can’t have bitexactness of images for CPUs that implement fused operations. Which means my current test harness is not good, as it compares the MD5 of the output with the golden output from the original test. My probable next move is to use cmp to count how many bytes differ from the “golden” output (the version without optimisations in use), and if the number is low, like less than 1‰, accept it as valid. It’s probably not ideal and could lead to further variation in output, but it might be a good start.

Optimally, as I said a long time ago I would like to use a tool like pdiff to tell whether there is actual changes in the pixels, and identify things like 1-pixel translation to any direction, which would be harmless… but until I can figure something out, it’ll be an imperfect testsuite anyway.

A huge thanks to Måns for the immense help, without him I wouldn’t have figured it out so quickly.


This past weekend I had the honor of hosting the VideoLAN Dev Days 2014 in Dublin, in the headquarters of my employer. This is the first time I organize a conference (or rather help organize it, Audrey and our staff did most of the heavy lifting), and I made a number of mistakes, but I think I can learn from them and be better the next time I’ll try something like this.

Photo credit: me

Organizing an event in Dublin has some interesting and not-obvious drawbacks, one of which is the need for a proper visa for people who reside in Europe but are not EEA citizens, thanks to the fact that Ireland is not part of Schengen. I was expecting at least UK residents not to need any scrutiny, but Derek proved me wrong as he had to get an (easy) visa at entrance.

Getting just shy of a hundred people in a city like Dublin, which is by far not a metropolis like Paris or London would be is an interesting exercise, yes we had the space for the conference itself, but finding hotels and restaurants for the amount of people became tricky. A very positive shout out is due to Yamamori Sushi that hosted the whole of us without a fixed menu and without a hitch.

As usual, meeting in person with the people you work with in open source is a perfect way to improve collaboration — knowing how people behave face to face makes it easier to understand their behaviour online, which is especially useful if the attitudes can be a bit grating online. And given that many people, including me, are known as proponent of Troll-Driven Development – or Rant-Driven Development given that people like Anon, redditors and 4channers have given an even worse connotation to Troll – it’s really a requirement, if you are really interested to be part of the community.

This time around, I was even able to stop myself from gathering too much swag! I decided not to pick up a hoodie, and leave it to people who would actually use it, although I did pick up a Gandi VLC shirt. I hope I’ll be able to do that at LISA as I’m bound there too, and last year I came back with way too many shirts and other swag.

Ramblings on audiobooks

In one of my previous posts I have noted I’m an avid audiobook consumer. I started when I was at the hospital, because I didn’t have the energy to read — and most likely, because of the blood sugar being out of control after coming back from the ICU: it turns out that blood sugar changes can make your eyesight go crazy; at some point I had to buy a pair of €20 glasses simply because my doctor prescribed me a new treatment and my eyesight ricocheted out of control for a week or so.

Nowadays, I have trouble sleeping if I’m not listening to something, and I end up with the Audible app installed in all my phones and tablets, with at least a few books preloaded whenever I travel. Of course as I said, I keep the majority of my audiobooks in the iPod, and the reason is that while most of my library is on Audible, not all of it is. There are a few books that I have bought on iTunes before finding out about Audible, and then there are a few I received in CD form, including The Hitchhiker’s Guide To The Galaxy Complete Radio Series which is my among my favourite playlists.

Unfortunately, to be able to convert these from CD to a format that the iPod could digest, I ended up having to buy a software called Audiobook Builder for Mac, which allows you to rip CDs and build M4B files out of them. What’s M4B? It’s the usual mp4 format container, just with an extension that makes iTunes consider it an audiobook, and with chapter markings in the stream. At the time I first ripped my audiobooks, ffmpeg/libav had no support for chapter markings, so that was not an option. I’ve been told that said support is there now, but I have not tried getting it to work.

Indeed, what I need to find out is how to build an audiobook file out of a string of mp3 files, and I have no idea how to fix that now that I no longer have access to my personal iTunes account on a mac to re-download the Audiobook Builder and process them. In particular, the list of mp3s that I’m looking forward to merge together are the years 2013 and 2014 of BBC’s The News Quiz, to which I’m addicted and listen continuously. Being able to join them all together so I can listen to them with a multi-day-running playlist is one of the very few things that still let me sleep relatively calmly — I say relatively because I really don’t remember when was the last time I have slept soundly in about an year by now.

Essentially, what I’d like is for Audible to let me sideload some content (the few books I did not buy from them, and the News Quiz series that I stitch together from the podcast), and create a playlist — then for what I’m concerned I don’t have to use an iPod at all. Well, beside the fact that I’d have to find a way to shut up notifications while playing audiobooks. Having Dragons of Autumn Twilight interrupted by the Facebook pop notification is not something that I’m looking forward for most of the time. And in some cases I even have had some background update disrupting my playback so there is definitely space for improvement.

Did Apple lose its advantage?

Readers of my blog for a while probably know already that I’ve been an Apple user over time. What is not obvious is that I have scaled down my (personal) Apple usage over the past two years, mostly because my habits, and partly because of Android and Linux getting better and better. One component is, though, that some of the advantages to be found when using Apple started to disappear for me.

I think that for me the start of the problems is to be found in the release of iOS 7. Beside the taste of not liking the new flashy UI, what I found is that it did not perform as well as previous releases. I think this is the same effect others have had. In particular the biggest problem with it for me had to do with the way I started using my iPad while in Ireland. Since I now have access to a high-speed connection, I started watching more content in streaming. In particular, thanks to my multiple trips to the USA over the past year, I got access to more video content on the iTunes store, so I wanted to watch some of the new TV series through it.

Turned out that for a few versions, and I mean a few months, iOS was keeping the streamed content in the cache, not accounting for it anywhere, and never cleaning it up. The result was that after streaming half a series, I would get errors telling me the iPad storage was full, but there was no way from the device itself to clear the cache. EIther you had to do a factory reset to drop off all the content of the device, or you had to use a Windows application to remove the cache files manually. Not very nice.

Another very interesting problem with the streaming the content: it can be slow. Not always but it can. One night I wanted to watch The LEGO Movie since I did not see it at the cinema. It’s not available on the Irish Netflix so I decided to rent it off iTunes. It took the iPad four hours to download it. It made no sense. And no, the connection was not hogged by something else, and running a SpeedTest from the tablet itself showed it had all the network capacity it needed.

The iPad is not, though, the only Apple device I own; I also bought an iPod Touch back in LA when my Classic died. even though I was not really happy with downgrading from 80G down to 64G. But it’s mostly okay, as my main use for the iPod is to listen to audiobooks and podcasts when I sleep — which recently I have been doing through Creative D80 Bluetooth speakers, which are honestly not great but at least don’t force me to wear earphones all night long.

I had no problem before switching the iPod from one computer to the next, as I moved from iMac to a Windows disk for my laptop. When I decided to just use iTunes on the one Windows desktop I keep around (mostly to play games), then a few things stopped working as intended. It might have been related to me dropping the iTunes Match subscription, but I’m not sure about that. But what happens is that only a single track for each of the albums was being copied on the iPod and nothing else.

I tried factory reset, cable and wireless sync, I tried deleting the iTunes data on my computer to force it to figure out the iPod is new, and the current situation I’m in is only partially working: the audiobooks have been synced, but without cover art and without the playlists — some of the audiobooks I have are part of a series, or are split in multiple files if I bought them before Audible started providing single-file downloads. This is of course not very good when the audio only lasts three hours, and then I start having nightmares.

It does not help that I can’t listen to my audiobooks with VLC for Android because it thinks that the chapter art is a video stream, and thus puts the stream to pause as soon as I turn off the screen. I should probably write a separate rant about the lack of proper audiobooks tools for Android. Audible has an app, but it does not allow you to sideload audiobooks (i.e. stuff I ripped from my original CDs, or that I bought on iTunes), nor it allows you to build a playlist of books, say for all the books in a series.

As I write this, I asked iTunes again to sync all the music to my iPod Touch as 128kbps AAC files (as otherwise it does not fit into the device); iTunes is now copying 624 files; I’m sure my collection contains more than 600 albums — and I would venture to say more than half I have in physical media. Mostly because no store allows me to buy metal in FLAC or ALAC. And before somebody suggests Jamendo or other similar services: yes, great, I actually bought lots of Jazz on Magnatune before it became a subscription service and I loved it, but that is not a replacement for mainstream content. Also, Magnatune has terrible security practices, don’t use it.

Sorry Apple, but given these small-but-not-so-small issues with your software recently, I’m not going to buy any more devices from you. If any of the two devices I have fails, I’ll just get someone to build a decent audiobook software for me one way or the other…

unpaper and libav status update

The other day I wrote about unpaper and the fact that I was working on making it use libav for file input. I have now finished converting unpaper (in a branch) so that it does not use its own image structure, but rather the same AVFrame structure that libav uses internally and externally. This meant not only supporting stripes, but using the libav allocation functions and pixel formats.

This also enabled me to use libav for file output as well as input. While for the input I decided to add support for formats that unpaper did not read before, for output at the moment I’m sticking with the same formats as before. Mostly because the one type of output file I’d like to support is not currently supported by libav properly, so it’ll take me quite a bit longer to be able to use it. For the curious, the format I’m referring to is multipage TIFF. Right now libav only supports single-page TIFF and it does not support JPEG-compressed TIFF images, so there.

Originally, I planned to drop compatibility with previous unpaper version, mostly because to drop the internal structure I was going to lose the input format information for 1-bit black and white images. At the end I was actually able to reimplement the same feature in a different way, and so I restored that support. The only compatibility issue right now is that the -depth parameter is no longer present, mostly because it and -type constrained the same value (the output format).

To reintroduce the -depth parameter, I want to support 16-bit gray. Unfortunately to do so I need to make more fundamental changes to the code, as right now it expects to be able to get the full value at most at 24 bit — and I’m not sure how to scale a 16-bit grayscale to 24-bit RGB and maintain proper values.

While I had to add almost as much code to support the libav formats and their conversion as there was there to load the files, I think this is still a net win. The first point is that there is no format parsing code in unpaper, which means that as long as the pixel format is something that I can process, any file that libav supports now or will support in the future will do. Then there is the fact that I ended up making the code “less smart” by removing codepath optimizations such as “input and output sizes match, so I won’t be touching it, instead I’ll copy one structure on top of the other”, which means that yes, I probably lost some performance, but I also gained some sanity. The code was horribly complicated before.

Unfortunately, as I said in the previous post, there are a couple of features that I would have preferred if they were implemented in libav, as that would mean they’d be kept optimized without me having to bother with assembly or intrinsics. Namely pixel format conversion (which should be part of the proposed libavscale, still not reified), and drawing primitives, including bitblitting. I think part of this is actually implemented within libavfilter but as far as I know it’s not exposed for other software to use. Having optimized blitting, especially “copy this area of the image over to that other image” would be definitely useful, but it’s not a necessary condition for me to release the current state of the code.

So current work in progress is to support grayscale TIFF files (PAL8 pixel format), and then I’ll probably turn to libav and try to implement JPEG-encoded TIFF files, if I can find the time and motivation to do so. What I’m afraid of is having to write conversion functions between YUV and RGB, I really don’t look forward to that. In the mean time, I’ll keep playing Tales of Graces f because I love those kind of games.

Also, for those who’re curious, the development of this version of unpaper is done fully on my ZenBook — I note this because it’s the first time I use a low-power device to work on a project that actually requires some processing power to build, but the results are not bad at all. I only had to make sure I had swap enabled: 4GB of RAM are no longer enough to have Chrome open with a dozen tabs, and a compiler in the background.

unpaper and libav

I’ve resumed working on unpaper since I have been using it more than a couple of times lately and there has been a few things that I wanted to fix.

What I’ve been working on now is a way to read input files in more formats; I was really aggravated by the fact that unpaper implemented its own loading of a single set of file formats (the PPM “rawbits”); I went on to look into libraries that abstract access to image formats, but I couldn’t find one that would work for me. At the end I settled for libav even though it’s not exactly known for being an image processing library.

My reasons to choose libav was mostly found in the fact that, while it does not support all the formats I’d like to have supported in unpaper (PS and PDF come to mind), it does support the formats that it supports now (PNM and company), and I know the developers well enough that I can get bugs and features fixed or implemented as needed.

I have now a branch can read files by using libav. It’s a very naïve implementation of it though: it reads the image into an AVFrame structure and then convert that into unpaper’s own image structure. It does not even free up the AVFrame, mostly because I’d actually like to be able to use AVFrame instead of unpaper’s structure. Not only to avoid copying memory when it’s not required (libav has functions to do shallow-copy of frames and mark them as readable when needed), but also because the frames themselves already contain all the needed information. Furthermore, libav 12 is likely going to include libavscale (or so Luca promised!) so that the on-load conversion can also be offloaded to the library.

Even with the naïve implementation that I implemented in half an afternoon, unpaper not only supports the same input file as before, but also PNG (24-bit non-alpha colour files are loaded the same way as PPM, 1-bit black and white is inverted compared to PBM, while 8-bit grayscale is actually 16-bit with half of it defining the alpha channel) and very limited TIFF support (1-bit is the same as PNG; 8-bit is paletted so I have not implemented it yet, and as for colour, I found out that libav does not currently support JPEG-compressed TIFF – I’ll work on that if I can – but otherwise it is supported as it’s simply 24bpp RGB).

What also needs to be done is to write out the file using libav too. While I don’t plan to allow writing files in any random format with unpaper, I wouldn’t mind being able to output through libav. Right now the way this is implemented, the code does explicit conversion back or forth between black/white, grayscale and colour at save time, and this is nothing different than the same conversion that happens at load time, and should rather be part of libavscale when that exists.

Anyway, if you feel like helping with this project, the code is on GitHub and I’ll try to keep it updated soon.

XBMC part 2

I have posted about me setting up a new box for XBMC and here is a second part to that post, now that I arrived to Dublin and I actually set it up on my living room as part of my system. There are a few things that needs better be described.

The first problem I had was how to set up the infrared receiver for the remote control. I originally intended to use my Galaxy Note as it has an IR blaster for I have no idea what reason; but then I realized I have a better option.

While the NUC does not, unfortunately, support CEC input, my receiver, a Yamaha RX-V475 comes with a programmable remote controller, which – after a very quick check cat-ing the event input device node – appeared sending signals in the right frequency for the built-in IR sensor to pick it up. So the question was to find a way to map the buttons on the remote to action to XBMC.

Important note: a lot of the documentation out there tells you that the nuvoton driver is buggy and requires to play with /sys files and the DSDT tables. This is outdated, just make sure you use kernel version 3.15 or later and it works perfectly fine.

The first obvious option, which I have seen documented basically everywhere, is to use lirc. Now that’s a piece of software that I know a little too well for comfort. Not everybody knows this, both because I went by a different nickname at the time, and because it happened a long time before I joined Gentoo, and definitely before I started keeping a blog. But as things are, back in the days when Linux 2.5 was a thing, I did the first initial port of the lirc driver to a newer kernel, mostly as an external patch to apply on top of the kernel. I even implemented devfs, since while I was doing that I finally moved to Gentoo, and I needed devfs to use it.

I wanted to find an alternative to using lirc for this and other reasons. Among other things, last time I have used it, I was using it on computer that was not dedicated as an HTPC, so this looked like a much easier task with a single user-facing process in the system. After looking around quite a bit I found that you can make the driver output X-compatible key events instead of IR events by loading the right keymap. While there are multiple ways to do this, I ended up using ir-keytable which comes in v4l-utils.

The remote control only had to be set to send codes for a VDR for the brand “Microsoft” — which I assume puts it in a mode compatible with Windows XP Media Center Edition. Funnily enough they actually put a separate section for Apple TV codes. After that, the RC6/MCE table can be used, and that will send proper keypresses fr things like the arrows and the number buttons.

I only had to change a couple of keys, namely Enter and Exit to send KEY_RETURN and KEY_BACKSPACE respectively, so that they map to actions in XBMC. It would probably be simple enough to change the bindings to XBMC directly, but I find it more reliable for it to send a different key altogether. The trick is to edit /etc/rc_keymaps/rc6_mce to change the key that is sent, and then re-run ir-keytable -a /etc/rc_maps.cfg, and the problem is solved (udev rules are in place so that the map is loaded at reboot).

And this is one more problem solved, now I’m actually watching things with XBMC so it seems to be working fine.

A new XBMC box

A couple of months ago I was at LinuxTag in Berlin with the friends from VIdeoLAN and we shared a booth with the XBMC project. It was interesting to see the newest version of XBMC running, and I decided that it was time for me to get a new XBMC box — last time I used XBMC was on my AppleTV and while it was not strictly disappointing it was not terrific either after a while.

At any rate, we spoke about what options are available nowadays to make a good XBMC set up, and while the RaspberryPi is all the rage nowadays, my previous experience with the platform made it a no-go. It also requires you to find a place where to store your data (the USB support on the Pi is not good for many things) and you most likely will have to re-encode animes to the Right Format™ so that the RPi VideoCore can properly decode them: anything that can’t be hardware-accelerated will not play on such a limited hardware.

The alternative has been the Intel NUC (Next Unit of Computing), which Intel sells in pre-configured “barebone” kits, some of which include wifi antennas, 2.5” disk bays, and a CIR (Consumer Infrared Receiver) that allows you to use a remote such as the one for the XBox 360 to control the unit. I decided to look into the options and I settled on the D54250WYKH which has a Core i5 CPU, space for both a wireless card (I got the Intel 7260 802.11ac which is dual-radio and supports the new 11ac protocol, even though my router is not 11ac yet), and a mSATA SSD (I got a Transcend 128GB one), as well the 2.5” bay that allows me to use a good old spinning-rust harddrive to store the bulk of the data.

Be careful and don’t repeat my mistake! I originally ordered a very cool Western Digital Caviar Green 2TB HDD but while it is a 2.5” HDD, it does not fit properly in the provided cradle; the same problem used to happen with the first series of 1TB HDDs on PlayStation 3s. I decided to keep the HDD and bring it with me to Ireland, as I don’t otherwise have a 2TB HDD, instead I opted for a HGST 1.5TB HDD (no link for this one as I bought it at Fry’s the same day I picked up the rest, if nothing else because I had no will to wait, and also because I forgot I needed a keyboard).

While I could have just put OpenELEC on the device, I decided instead to install my trusted Gentoo — a Core i5 with 16GB of RAM and a good SSD is well in its ability to run it. And since I was finally setting something up that needs (for myself) to turn on very quickly, I decided to give systemd a go (especially as Robbins is now considered a co-maintainer for OpenRC which drains all my will to keep using it). The effect has been stunning, but there are a few issues that needs to be ironed out; for instance, as far as I can tell, there is no unit for rngd which means that both my laptop (now converted to systemd) and the device have no entropy, even though they both have the rdrand instruction; I’ll try to fix this lack myself.

Another huge problem for me has been getting the audio to work; while I’ve been told by the XBMC people that the NUC are perfectly well supported, I couldn’t for the sake of me get the audio to work for days. At the end it was Alexander Patrakov who pointed out to intel_iommu=on,igfx_off as a kernel option to get it to work (kernel bug #67321 still unfixed). So if you have no HDMI output on your NUC, that’s what you have to do!

Speaking about XBMC and Gentoo, the latest version as of last week (which was not the latest upstream version, as a new one got released exactly while I was installing the box), seem to force you to install FFmpeg over libav – I honestly felt a bit sorry for the developers of XBMC at LinuxTag while they were trying to tell me how the multi-threaded h264 decoder from FFmpeg is great… Anton, who wrote it, is a libav developer! – but even after you do that, it seems like it does not link it in, preferring a bundled copy of it instead. Which also doesn’t seem to build support for multithread (uh?). This is something that I’ll have to look into once I’m back in Dublin.

Other than that, there isn’t much to say; the one remaining big issue is to figure out how to properly have XBMC start up at boot without nasty autologin hacks on systemd. And of course finding a better way than using a transmission user to start the Transmission daemon, or at least find a better way to share the downloads with XBMC itself. Probably separating the XBMC and Transmission users is a good idea.

Expect more posts on what’s going on with my XBMC box in the future, and take this one as a reference about the NUC audio issue.

Mail, SSL and Postfix

In my previous post, I delineated a few reasons why I care about SSL for my blog and the xine bugzilla. What I did not talk about was about the email infrastructure for both. The reason is that, according to the very same threat model that I delineated in that post, it’s not as important for me to secure that part of the service.

That does not mean, though, that I never considered it, just I considered it not important enough yet. But after that post I realized it’s time to fix that hole and I’ve now started working on securing the email coming out of my servers as well as that coming through the xine server. But before going into details about that, let’s see why I was not as eager to secure the mail servers compared to the low-hanging fruit of my blog.

As I said in the previous post, what you have to identify two details: what information is important to defend, and who the attackers would be. In the case of the blog, as I said, the information was the email addresses of the commenters, and the attackers the other users of open, unencrypted wifi networks in use. In the case of email, the attackers in particular change drastically; the only people in a position to get access to the connections’ streams are the people in the hosting and datacenter companies, and if they made mistakes, your neighbours in the same datacenter. So for sure it’s not the very easy prey of the couple sitting at the Starbucks next to you.

The content, well, is a bit more interesting information. We all know that there is no real way to make email completely opaque to service providers unless we use end to end encryption such as GnuPG, so if you really don’t want your server admin to ever be able to tell what’s in your email, that’s what you should do. But even then, there is something that (minus protocol-level encryption) is transmitted in cleartext: the headers, the so-called metadata, that stirred the press so much last year. So once again it’s the address of the people you contact that could be easily leaked, even with everything else being encrypted. In the case of xine, the mail server handles mostly bugzilla messaging, and it is well possible that it sends over, without encryption, the comments on security bugs, so reducing the risk of that information leaking is still a good idea.

Caveat emptor in all of this post, though! In the case of the xine mail server, the server handles both inbound and outbound messages, but at the same time it does not ever let users access their mailbox; the server itself is a mail router, rather than a full mail service. This is important, becuase otherwise I wouldn’t be able to justify my sloppiness on covering SSL support for the mail! If your server hosts mailboxes or allows direct mail submission (relay), you most definitely need to support SSL as then it’s a client-server connection which is attackable by the Starbucks example above.

So what needs to be done to implement this? Well, first you need to remember that a mail router like the one I described above requires SSL in two directions: when it receive a message it should be able to offer SSL to the connecting client, and when it sends a message it has to request SSL to the remote server too. In a perfect set up, the client also offers a certificate to prove who it is. This means that you need a certificate that works both as a server and as a client certificate; thankfully, StartSSL supports that for Class 2 certificates, even if they are named for web servers, they work just fine for mail servers too.

Unfortunately, the same caveat that apply to HTTP certificates, apply to mail servers: ciphers and protocol versions combinations. Unfortunately, while Qualys has SSL Labs to qualify the SSL support in your website, I know of no similar service for mail routers, and coming up with one is not trivial, as you would want to make sure not to become a relay-spammer by mistake, and the only way to judge the message pushing of the server is to trick it into sending a message to your own service back, which should not be possible on a properly non-open relay of a server.

So the good news is that of all of the xine developers with an alias on the domain have a secure server when routing mail to them, so the work I’ve been done is not for nothing. The other note is that a good chunk of the other users in Bugzilla uses GMail or similar big hosting providers. And unlike others I actually find this a good thing, as it’s much more likely that the lonely admin of a personal mail server (like me for xine) would screw up encrypion, compared to my colleagues over at GMail. But I digress.

The bad news is that not only there is no way to assess the quality of the configuration of a mail server, but at least for the case of Postfix you have only a ternary setting about TLS: yes always, yes if client requests it (or if the server provides the option, when submitting mail), or not at all. There is no way to set up policy so that e.g. gmail servers don’t get spoofed and tricked into sending the messages over a clear text connection. A second bad news is that I have not been able to get Postfix to validate the certificates either as server or client, likely caused by the use of opportunistic TLS rather than forcing TLS support. And the last problem is that servers connecting to submit mail will not fallback to cleartext if the TLS can’t be negotiated (either because of cipher or protocols), and will instead keep trying to negotiate TLS the same way.

Anyway, my current configuration for this is:

smtpd_tls_cert_file = /etc/ssl/postfix/server.crt
smtpd_tls_key_file = /etc/ssl/postfix/server.key
smtpd_tls_received_header = yes
smtpd_tls_loglevel = 1
smtpd_tls_security_level = may
smtpd_tls_ask_ccert = yes
smtpd_tls_mandatory_protocols = !SSLv2, !SSLv3

smtp_tls_cert_file = /etc/ssl/postfix/server.crt
smtp_tls_key_file = /etc/ssl/postfix/server.key
smtp_tls_security_level = may
smtp_tls_loglevel = 1
smtp_tls_protocols = !SSLv2, !SSLv3

If you have any suggestions on how to make this more reliable and secure, please do!

Heartbleed and xine-project

This blog comes in way too late, I know, but there has been extenuating circumstances around my delay on clearing this through. First of all, yes, this blog and every other websites I maintain were vulnerable to Heartbleed. Yes they are now completely fixed: new OpenSSL first, new certificates after. For most of the certificates, though, there is no revocation issued, as they are issued through StartSSL which means that they are free to issue, and expensive to revoke. The exception to this has been the certificate used by xine’s bugzilla which was revoked, free of charge, by StartSSL (huge thanks to the StartSSL people!)

If you have an account on xine’s Bugzilla, please change your passwords NOW. If somebody knows a way to automatically reset all passwords that were not changed before a given date in Bugzilla, please let me know. Also, if somebody knows whether Bugzilla has decent support for (optional) 2FA, I’d also be interested.

More posts on the topic will follow, this is just an announcement.