Ramblings on audiobooks

In one of my previous posts I have noted I’m an avid audiobook consumer. I started when I was at the hospital, because I didn’t have the energy to read — and most likely, because of the blood sugar being out of control after coming back from the ICU: it turns out that blood sugar changes can make your eyesight go crazy; at some point I had to buy a pair of €20 glasses simply because my doctor prescribed me a new treatment and my eyesight ricocheted out of control for a week or so.

Nowadays, I have trouble sleeping if I’m not listening to something, and I end up with the Audible app installed in all my phones and tablets, with at least a few books preloaded whenever I travel. Of course as I said, I keep the majority of my audiobooks in the iPod, and the reason is that while most of my library is on Audible, not all of it is. There are a few books that I have bought on iTunes before finding out about Audible, and then there are a few I received in CD form, including The Hitchhiker’s Guide To The Galaxy Complete Radio Series which is my among my favourite playlists.

Unfortunately, to be able to convert these from CD to a format that the iPod could digest, I ended up having to buy a software called Audiobook Builder for Mac, which allows you to rip CDs and build M4B files out of them. What’s M4B? It’s the usual mp4 format container, just with an extension that makes iTunes consider it an audiobook, and with chapter markings in the stream. At the time I first ripped my audiobooks, ffmpeg/libav had no support for chapter markings, so that was not an option. I’ve been told that said support is there now, but I have not tried getting it to work.

Indeed, what I need to find out is how to build an audiobook file out of a string of mp3 files, and I have no idea how to fix that now that I no longer have access to my personal iTunes account on a mac to re-download the Audiobook Builder and process them. In particular, the list of mp3s that I’m looking forward to merge together are the years 2013 and 2014 of BBC’s The News Quiz, to which I’m addicted and listen continuously. Being able to join them all together so I can listen to them with a multi-day-running playlist is one of the very few things that still let me sleep relatively calmly — I say relatively because I really don’t remember when was the last time I have slept soundly in about an year by now.

Essentially, what I’d like is for Audible to let me sideload some content (the few books I did not buy from them, and the News Quiz series that I stitch together from the podcast), and create a playlist — then for what I’m concerned I don’t have to use an iPod at all. Well, beside the fact that I’d have to find a way to shut up notifications while playing audiobooks. Having Dragons of Autumn Twilight interrupted by the Facebook pop notification is not something that I’m looking forward for most of the time. And in some cases I even have had some background update disrupting my playback so there is definitely space for improvement.

unpaper and libav status update

The other day I wrote about unpaper and the fact that I was working on making it use libav for file input. I have now finished converting unpaper (in a branch) so that it does not use its own image structure, but rather the same AVFrame structure that libav uses internally and externally. This meant not only supporting stripes, but using the libav allocation functions and pixel formats.

This also enabled me to use libav for file output as well as input. While for the input I decided to add support for formats that unpaper did not read before, for output at the moment I’m sticking with the same formats as before. Mostly because the one type of output file I’d like to support is not currently supported by libav properly, so it’ll take me quite a bit longer to be able to use it. For the curious, the format I’m referring to is multipage TIFF. Right now libav only supports single-page TIFF and it does not support JPEG-compressed TIFF images, so there.

Originally, I planned to drop compatibility with previous unpaper version, mostly because to drop the internal structure I was going to lose the input format information for 1-bit black and white images. At the end I was actually able to reimplement the same feature in a different way, and so I restored that support. The only compatibility issue right now is that the -depth parameter is no longer present, mostly because it and -type constrained the same value (the output format).

To reintroduce the -depth parameter, I want to support 16-bit gray. Unfortunately to do so I need to make more fundamental changes to the code, as right now it expects to be able to get the full value at most at 24 bit — and I’m not sure how to scale a 16-bit grayscale to 24-bit RGB and maintain proper values.

While I had to add almost as much code to support the libav formats and their conversion as there was there to load the files, I think this is still a net win. The first point is that there is no format parsing code in unpaper, which means that as long as the pixel format is something that I can process, any file that libav supports now or will support in the future will do. Then there is the fact that I ended up making the code “less smart” by removing codepath optimizations such as “input and output sizes match, so I won’t be touching it, instead I’ll copy one structure on top of the other”, which means that yes, I probably lost some performance, but I also gained some sanity. The code was horribly complicated before.

Unfortunately, as I said in the previous post, there are a couple of features that I would have preferred if they were implemented in libav, as that would mean they’d be kept optimized without me having to bother with assembly or intrinsics. Namely pixel format conversion (which should be part of the proposed libavscale, still not reified), and drawing primitives, including bitblitting. I think part of this is actually implemented within libavfilter but as far as I know it’s not exposed for other software to use. Having optimized blitting, especially “copy this area of the image over to that other image” would be definitely useful, but it’s not a necessary condition for me to release the current state of the code.

So current work in progress is to support grayscale TIFF files (PAL8 pixel format), and then I’ll probably turn to libav and try to implement JPEG-encoded TIFF files, if I can find the time and motivation to do so. What I’m afraid of is having to write conversion functions between YUV and RGB, I really don’t look forward to that. In the mean time, I’ll keep playing Tales of Graces f because I love those kind of games.

Also, for those who’re curious, the development of this version of unpaper is done fully on my ZenBook — I note this because it’s the first time I use a low-power device to work on a project that actually requires some processing power to build, but the results are not bad at all. I only had to make sure I had swap enabled: 4GB of RAM are no longer enough to have Chrome open with a dozen tabs, and a compiler in the background.

unpaper and libav

I’ve resumed working on unpaper since I have been using it more than a couple of times lately and there has been a few things that I wanted to fix.

What I’ve been working on now is a way to read input files in more formats; I was really aggravated by the fact that unpaper implemented its own loading of a single set of file formats (the PPM “rawbits”); I went on to look into libraries that abstract access to image formats, but I couldn’t find one that would work for me. At the end I settled for libav even though it’s not exactly known for being an image processing library.

My reasons to choose libav was mostly found in the fact that, while it does not support all the formats I’d like to have supported in unpaper (PS and PDF come to mind), it does support the formats that it supports now (PNM and company), and I know the developers well enough that I can get bugs and features fixed or implemented as needed.

I have now a branch can read files by using libav. It’s a very naïve implementation of it though: it reads the image into an AVFrame structure and then convert that into unpaper’s own image structure. It does not even free up the AVFrame, mostly because I’d actually like to be able to use AVFrame instead of unpaper’s structure. Not only to avoid copying memory when it’s not required (libav has functions to do shallow-copy of frames and mark them as readable when needed), but also because the frames themselves already contain all the needed information. Furthermore, libav 12 is likely going to include libavscale (or so Luca promised!) so that the on-load conversion can also be offloaded to the library.

Even with the naïve implementation that I implemented in half an afternoon, unpaper not only supports the same input file as before, but also PNG (24-bit non-alpha colour files are loaded the same way as PPM, 1-bit black and white is inverted compared to PBM, while 8-bit grayscale is actually 16-bit with half of it defining the alpha channel) and very limited TIFF support (1-bit is the same as PNG; 8-bit is paletted so I have not implemented it yet, and as for colour, I found out that libav does not currently support JPEG-compressed TIFF – I’ll work on that if I can – but otherwise it is supported as it’s simply 24bpp RGB).

What also needs to be done is to write out the file using libav too. While I don’t plan to allow writing files in any random format with unpaper, I wouldn’t mind being able to output through libav. Right now the way this is implemented, the code does explicit conversion back or forth between black/white, grayscale and colour at save time, and this is nothing different than the same conversion that happens at load time, and should rather be part of libavscale when that exists.

Anyway, if you feel like helping with this project, the code is on GitHub and I’ll try to keep it updated soon.

A new XBMC box

A couple of months ago I was at LinuxTag in Berlin with the friends from VIdeoLAN and we shared a booth with the XBMC project. It was interesting to see the newest version of XBMC running, and I decided that it was time for me to get a new XBMC box — last time I used XBMC was on my AppleTV and while it was not strictly disappointing it was not terrific either after a while.

At any rate, we spoke about what options are available nowadays to make a good XBMC set up, and while the RaspberryPi is all the rage nowadays, my previous experience with the platform made it a no-go. It also requires you to find a place where to store your data (the USB support on the Pi is not good for many things) and you most likely will have to re-encode animes to the Right Format™ so that the RPi VideoCore can properly decode them: anything that can’t be hardware-accelerated will not play on such a limited hardware.

The alternative has been the Intel NUC (Next Unit of Computing), which Intel sells in pre-configured “barebone” kits, some of which include wifi antennas, 2.5” disk bays, and a CIR (Consumer Infrared Receiver) that allows you to use a remote such as the one for the XBox 360 to control the unit. I decided to look into the options and I settled on the D54250WYKH which has a Core i5 CPU, space for both a wireless card (I got the Intel 7260 802.11ac which is dual-radio and supports the new 11ac protocol, even though my router is not 11ac yet), and a mSATA SSD (I got a Transcend 128GB one), as well the 2.5” bay that allows me to use a good old spinning-rust harddrive to store the bulk of the data.

Be careful and don’t repeat my mistake! I originally ordered a very cool Western Digital Caviar Green 2TB HDD but while it is a 2.5” HDD, it does not fit properly in the provided cradle; the same problem used to happen with the first series of 1TB HDDs on PlayStation 3s. I decided to keep the HDD and bring it with me to Ireland, as I don’t otherwise have a 2TB HDD, instead I opted for a HGST 1.5TB HDD (no link for this one as I bought it at Fry’s the same day I picked up the rest, if nothing else because I had no will to wait, and also because I forgot I needed a keyboard).

While I could have just put OpenELEC on the device, I decided instead to install my trusted Gentoo — a Core i5 with 16GB of RAM and a good SSD is well in its ability to run it. And since I was finally setting something up that needs (for myself) to turn on very quickly, I decided to give systemd a go (especially as Robbins is now considered a co-maintainer for OpenRC which drains all my will to keep using it). The effect has been stunning, but there are a few issues that needs to be ironed out; for instance, as far as I can tell, there is no unit for rngd which means that both my laptop (now converted to systemd) and the device have no entropy, even though they both have the rdrand instruction; I’ll try to fix this lack myself.

Another huge problem for me has been getting the audio to work; while I’ve been told by the XBMC people that the NUC are perfectly well supported, I couldn’t for the sake of me get the audio to work for days. At the end it was Alexander Patrakov who pointed out to intel_iommu=on,igfx_off as a kernel option to get it to work (kernel bug #67321 still unfixed). So if you have no HDMI output on your NUC, that’s what you have to do!

Speaking about XBMC and Gentoo, the latest version as of last week (which was not the latest upstream version, as a new one got released exactly while I was installing the box), seem to force you to install FFmpeg over libav – I honestly felt a bit sorry for the developers of XBMC at LinuxTag while they were trying to tell me how the multi-threaded h264 decoder from FFmpeg is great… Anton, who wrote it, is a libav developer! – but even after you do that, it seems like it does not link it in, preferring a bundled copy of it instead. Which also doesn’t seem to build support for multithread (uh?). This is something that I’ll have to look into once I’m back in Dublin.

Other than that, there isn’t much to say; the one remaining big issue is to figure out how to properly have XBMC start up at boot without nasty autologin hacks on systemd. And of course finding a better way than using a transmission user to start the Transmission daemon, or at least find a better way to share the downloads with XBMC itself. Probably separating the XBMC and Transmission users is a good idea.

Expect more posts on what’s going on with my XBMC box in the future, and take this one as a reference about the NUC audio issue.

It’s that time of the year again…

Which time of the year? The time when Google announces the new Summer of Code!

Okay so you know I”m not always very positive about the outcome of Summer of Code work, even though I’m extremely grateful to Constanze (and Mike, who got it i tree now!) for the work on filesystem based capabilities — I’m pretty sure at this point that it also has been instrumental for the Hardened team to have their xattr-based PaX marking (I’m tempted to re-consider Hardened for my laptops now that Skype is no longer a no-go, by the way). Other projects (many of which centred around continuous integration, with no results) ended up in much worse shape.

But since being always overly negative is not a good way to proceed in life, I’m going to propose a few possible things that could be useful to have, both for Gentoo Linux and libav/VLC (whichever is going to be part of GSoC this year). Hopefully if something comes out of them is going to be good.

First of all, a re-iteration of something I’ve been asking of Gentoo for a while: a real alternatives-like system. Debian has a very well implemented tool for selecting among alternative packages supporting multiple tools. In Gentoo we have eselect — and a bunch of modules. My laptop counts 10 different eselect packages installed, and for most of them, the overhead of having another package installed is bigger than the eselect module itself! This also does not really work that well, as for instance you cannot choose the tar command, and pkg-config vs pkgconf require you to make a single selection by installing one or the other (or disabling the flag from pkgconf, but that defeats the point, doesn’t it?).

Speaking of eselect and similar tools, we still have gcc-config and binutils-config as their own tools, without using the framework that we use for about everything else. Okay, the last guy who tried to make these bit more than he could chew, and the result has been abysmal, but the reason there is likely that the target was set too high: re-do the whole compiler handling so that it could support non-GCC compilers.. this might actually be too small a project for GSoC, but might work as a qualification task, similar to the ones we’ve got for libav in the past.

Going to libav, one thing that I was discussing with Luca, J-B and other VLC developers, was the possibility to integrate at least part of the DVD handling that is currently split between libdvdread and libdvdnav into libav itself. VLC already forked the two libraries (and I rewrote the build system) — me and Luca were looking into merging them back into a single libdvd library already… but a rewrite and especially one that can reuse code from libav, or introduce new code that can be shared, would probably be a good thing. I haven’t looked into it but I wouldn’t be surprised if libdvbpsi could follow the same treatment.

Finally, another project that could sound cool for libav would be to create a library, API and ABI compatible with xine, that only uses libav. I’m pretty sure that if most of the internals of xine are dropped (including the configuration file and the plugin system), it would be possible to have a shallow wrapper around libav instead of having a full blown project. It might lose support for some files, such as modules, and DVDs, but it would probably be a nice proof of concept and would show what we still need .. and the moment when we can deal with those formats straight into libav, we know we have something better than simply libav.

On a similar note, one of the last things I’ve worked on, in xine, was the “audio out conversion branch”, see for instance this very old post — it is no more no less than what we now is libavresample, just done much worse. Indeed, libavresample actually has the SIMD-optimized routines I never found out how to actually write, which makes it much nicer. Since xine, at the moment I left it, was actually quite nicely using libavutil already, it might be interesting to see what happens if all the audio conversion code is killed, and replaced with libavresample.

So these are my suggestions for this season of GSoC, at least for the projects I’m involved on… maybe I’ll even have time to mentor them this year, as with a bit of luck I’ll have stable employment when the time comes for this to happen (more on this to follow, but not yet).

Postmortem of a patch, or how do you find what changed?

Two days ago, Luca asked me to help him figure out what’s going on with a patch for libav which he knew to be the right thing, but was acting up in a fashion he didn’t understand: on his computer, it increased the size of the final shared object by 80KiB — while this number is certainly not outlandish for a library such as libavcodec, it does seem odd at a first glance that a patch removing source code is increasing the final size of the executable code.

My first wild guess which (spoiler alert) turned out to be right, was that removing branches out of the functions let GCC optimize the function further and decide to inline it. But how to actually be sure? It’s time to get the right tools for the job: dev-ruby/ruby-elf, dev-util/dwarves and sys-devel/binutils enter the battlefield.

We’ve built libav with and without the patch on my server, and then rbelf-size told us more or less the same story:

% rbelf-size --diff libav-{pre,post}/avconv
        exec         data       rodata        relro          bss     overhead    allocated   filename
     6286266       170112      2093445       138872      5741920       105740     14536355   libav-pre/avconv
      +19456           +0         -592           +0           +0           +0       +18864 

Yes there’s a bug in the command, I noticed. So there is a total increase of around 20KiB, where is it split? Given this is a build that includes debug info, it’s easy to find it through codiff:

% codiff -f libav-{pre,post}/avconv
[snip]

libavcodec/dsputil.c:
  avg_no_rnd_pixels8_9_c    | -163
  avg_no_rnd_pixels8_10_c   | -163
  avg_no_rnd_pixels8_8_c    | -158
  avg_h264_qpel16_mc03_10_c | +4338
  avg_h264_qpel16_mc01_10_c | +4336
  avg_h264_qpel16_mc11_10_c | +4330
  avg_h264_qpel16_mc31_10_c | +4330
  ff_dsputil_init           | +4390
 8 functions changed, 21724 bytes added, 484 bytes removed, diff: +21240

[snip]

If you wonder why it’s adding more code than we expected, it’s because there are other places where functions have been deleted by the patch, causing some reductions in other places. Now we know that the three functions that the patch deleted did remove some code, but five other functions added 4KiB each. It’s time to find out why.

A common way to do this is to generate the assembly file (which GCC usually does not represent explicitly) to compare the two — due to the size of the dsputil translation unit, this turned out to be completely pointless — just the changes in the jump labels cause the whole file to be rewritten. So we rely instead on objdump, which allows us to get a full disassembly of the executable section of the object file:

% objdump -d libav-pre/libavcodec/dsputil.o > dsputil-pre.s
% objdump -d libav-post/libavcodec/dsputil.o > dsputil-post.s
% diff -u dsputil-{pre,post}.s | diffstat
 unknown |245013 ++++++++++++++++++++++++++++++++--------------------------------
 1 file changed, 125163 insertions(+), 119850 deletions(-)

As you can see, trying a diff between these two files is going to be pointless, first of all because of the size of the disassembled files, and secondarily because each instruction has its address-offset prefixed, which means that every single line will be different. So what to do? Well, first of all it’s useful to just isolate one of the functions so that we reduce the scope of the changes to check — I found out that there is a nice way to do so, and it involves relying on the way the function is declared in the file:

% fgrep -A3 avg_h264_qpel16_mc03_10_c dsputil-pre.s
00000000000430f0 :
   430f0:       41 54                   push   %r12
   430f2:       49 89 fc                mov    %rdi,%r12
   430f5:       55                      push   %rbp
--
[snip]

While it takes a while to come up with the correct syntax, it’s a simple sed command that can get you the data you need:

% sed -n -e '/ dsputil-func-pre.s
% sed -n -e '/ dsputil-func-post.s
% diff -u dsputil-func-{pre,post}.s | diffstat
 dsputil-func-post.s | 1430 ++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 1376 insertions(+), 54 deletions(-)

Okay that’s much better — but it’s still a lot of code to sift through, can’t we reduce it further? Well, actually… yes. My original guess was that some function was inlined; so let’s check for that. If a function is not inlined, it has to be called, the instruction for which, in this context, is callq. So let’s check if there are changes in the calls that happen:

% diff -u =(fgrep callq dsputil-func-pre.s) =(fgrep callq dsputil-func-post.s)
--- /tmp/zsh-flamehIkyD2        2013-01-24 05:53:33.880785706 -0800
+++ /tmp/zsh-flamebZp6ts        2013-01-24 05:53:33.883785509 -0800
@@ -1,7 +1,6 @@
-       e8 fc 71 fc ff          callq  a390 
-       e8 e5 71 fc ff          callq  a390 
-       e8 c6 71 fc ff          callq  a390 
-       e8 a7 71 fc ff          callq  a390 
-       e8 cd 40 fc ff          callq  72e0 
-       e8 a3 40 fc ff          callq  72e0 
-       e8 00 00 00 00          callq  43261 
+       e8 00 00 00 00          callq  8e670 
+       e8 71 bc f7 ff          callq  a390 
+       e8 52 bc f7 ff          callq  a390 
+       e8 33 bc f7 ff          callq  a390 
+       e8 14 bc f7 ff          callq  a390 
+       e8 00 00 00 00          callq  8f8d3 

Yes, I do use zsh — on the other hand, now that I look at the code above I note that there’s a bug: it does not respect $TMPDIR as it should have used /tmp/.private/flame as base path, dang!

So the quick check shows that avg_pixels8_l2_10 is no longer called — but does that account for the whole size? Let’s see if it changed:

% nm -S libav-{pre,post}/libavcodec/dsputil.o | fgrep avg_pixels8_l2_10
00000000000072e0 0000000000000112 t avg_pixels8_l2_10
00000000000072e0 0000000000000112 t avg_pixels8_l2_10

The size is the same and it’s 274 bytes. The increase is 4330 bytes, which is around 15 times more than the size of the single function — what does that mean then? Well, a quick look around shows this piece of code:

        41 b9 20 00 00 00       mov    $0x20,%r9d
        41 b8 20 00 00 00       mov    $0x20,%r8d
        89 d9                   mov    %ebx,%ecx
        4c 89 e7                mov    %r12,%rdi
        c7 04 24 10 00 00 00    movl   $0x10,(%rsp)
        e8 cd 40 fc ff          callq  72e0 
        48 8d b4 24 80 00 00    lea    0x80(%rsp),%rsi
        00 
        49 8d 7c 24 10          lea    0x10(%r12),%rdi
        41 b9 20 00 00 00       mov    $0x20,%r9d
        41 b8 20 00 00 00       mov    $0x20,%r8d
        89 d9                   mov    %ebx,%ecx
        48 89 ea                mov    %rbp,%rdx
        c7 04 24 10 00 00 00    movl   $0x10,(%rsp)
        e8 a3 40 fc ff          callq  72e0 
        48 8b 84 24 b8 04 00    mov    0x4b8(%rsp),%rax
        00 
        64 48 33 04 25 28 00    xor    %fs:0x28,%rax
        00 00 
        75 0c                   jne    4325c 

This is just a fragment but you can see that there are two calls to the function, followed by a pair of xor and jne instructions — which is the basic harness of a loop. Which means the function gets called multiple times. Knowing that this function is involved in 10-bit processing, it becomes likely that the function gets called twice per bit, or something along those lines — remove the call overhead (as the function is inlined) and you can see how twenty copies of that small function per caller account for the 4KiB.

So my guess was right, but incomplete: GCC not only inlined the function, but it also unrolled the loop, probably doing constant propagation in the process.

Is this it? Almost — the next step was to get some benchmark data when using the code, which was mostly Luca’s work (and I have next to no info on how he did that, to be entirely honest); the results on my server has been inconclusive, as the 2% loss that he originally registered was gone in further testing and would, anyway, be vastly within margin of error of a non-dedicated system — no we weren’t using full-blown profiling tools for that.

While we don’t have any sound numbers about it, what we’re worried about is for cache-starved architectures, such as Intel Atom, where the unrolling and inlining can easily cause performance loss, rather than gain — which is why all us developers facepalm in front of people using -funroll-all-loops and similar. I guess we’ll have to find an Atom system to do this kind of runs on…

Crashes and DoS, what is it with them anyway?

During the recent Gentoo mudslinging about libav and FFmpeg, one of the contention points is the fact that FFmpeg boasts more “security fixes” than libav over time. Any security conscious developer would know that assessing the general reliability of a software requires much more than just counting CVEs — as they only get assigned when bugs are reported as security issues by somebody.

I ended up learning this first-hand. In August 2005 I was just fixing a few warnings out of xine-ui, with nothing in mind but cleaning up the build log — that patch ended up in Gentoo, but no new release was made for xine-ui itself. Come April of 2006 and a security researcher marked them as a security issue — we were already covered, for the most part, but other distros weren’t. The bug was fixed upstream, but not released, simply because nobody considered them security issues up to that point. My lesson was that issues that might lead to security problems are always better looked at from a security expert — that’s why I originally started working with ocert for verifying issues within xine.

So which kind of issues are considered security issues? In this case the problem was a format string — this is obvious, as it can theoretically allow, under given conditions, to write to arbitrary memory. The same is true for buffer overflows obviously. But what about unbound reads, which in my experience form the vast majority of crashes out there? I would say that there are two widely different problems with them, which can be categorized as security issues: information disclosure (if the attacker can decide where to read and can get useful information out of said read — such as the current base address for the executable or libraries of the process, which can be used later), and good old crashes — which for security purposes are called DoS: Denial of Service.

Not all DoS are crashes, not all crashes are DoS, though! In particular, you can DoS an app without having it crashing, but rather deadlocking, or otherwise exhausting all of one scarce resource — this is the preferred method for DoS on servers; indeed this is the way the Slowloris attack for Apache worked: it used all the connection handlers and caused the server to not answer legitimate clients; a crash would be much easier to identify and recover from, which is why DoS on servers are rarely full-blown crashes. Crashes cannot realistically be called DoS when they are user-initiated without a third-party intervening. It might sounds silly, and remind of an old joke – “Doctor, doctor, if I do this it hurts!” ”Stop doing that, then!” – but it’s the case: if going to the app’s preferences and clicking something causes the app to crash, then there’s a bug which is a crash but is not a DoS.

This brings us to one of the biggest problem with calling something a DoS: it might be a DoS in one use-case, and not in another — let’s use libav as an example. It’s easy to argue that any crash in the libraries for decoding a stream as a DoS, as it’s common to download a file, and try to play it; said file is the element in the equation that comes from a possible attacker, and anything that can happen due to its decode is a security risk. Is it possible to argue that a crash in an encoding path is a DoS? Well, from a client’s perspective, it’s not common to — it’s still very possible that an attacker can trick you into downloading a file and re-encoding it, but it’s less common a situation, and in my experience, most of the encoding-related crashes are triggered only with a given subset of parameters, which makes it more difficult for an attacker to exploit than a decoder-side DoS. If the crash only happens when using avconv, also, it’s hard to declare it a DoS taking into consideration that at most, it should crash the encoding process, and that’s about it.

Let’s now turn the table, and instead of being the average user downloading movies from The Pirate Bay, we’re a video streaming service, such as YouTube, Vimeo or the like — but without the right expertise, which means that a DoS on your application is actually a big deal. In this situation, assuming your users control the streams that get encoded, you’re dealing with an input source that is untrusted, which means that you’re vulnerable to both crashes in the decoder and in the encoder as real-world DoS attacks. As you see what earlier required explicit user interaction and was hard to consider a full-blown DoS now gets much more important.

This kind of issues is why languages like Ada were created, and why many people out there insist that higher-level languages like Java, Python and Ruby are more secure than C, thanks to the existence of exceptions for error handling, making it easier to have fail-safe conditions which should solve the problem of DoS — the fact that there are just as many security issues in software written in high-level languages as low-level shows how false that concept is nowadays. Because while it does save from some kind of crashes, it also creates issues by the increase in the sheer area of exposure: the more layers, more code is involved in execution, and that can actually increase the chance for somebody to find an issue in them.

Area of exposure is important also for software like libav: if you enable every possible format under the sun for input and output, you’re enabling a whole lot of code, and you can suffer from a whole lot of vulnerabilities — if you’re targeting a much more reduced audience, like for instance you’re using it on a device that has to output only H.264 and Speex audio, you can easily turn everything else off, and reduce your exposure many times. You can probably see now why even when using libav or ffmpeg as backend, Chrome does not support all the input files that they support; it would just be too difficult to validate all the possible code out there, while it’s feasible to validate a subset of them.

This should have established the terms on what to consider DoS and when — so how do you handle this? Well, the first problem is to identify the crashes; you can either wait for an attack to happen, and react to that, or proactively try to identify crash situations, and obviously the latter is what you should do most of the time. Unfortunately, this requires the use of many different techniques, and none yields a 100% positive result, even the combined results are rarely sufficient to argue that a piece of software is 100% safe from crashes and other possible security issues.

One obvious thing is that you just have to make sure the code is not allowing things that should not happen, like incredibly high values or negative ones. This requires manual work and analysis of code, which is usually handled through code reviews – on the topic there is a nice article by Mozilla’s David Humphrey – at least for what concerns libav. But this by itself is not enough, as many times it’s values that are allowed by the specs, but are not handled properly, that cause the crashes. How to deal with them? A suggestion would be to use fuzzing, which is a technique in which a program is executed receiving, as input, a file that is corrupted starting from a valid one. A few years ago, a round of FFmpeg/VLC bugs were filed after Sam Hocevar released, and started using, his zzuf tool (which should be in Portage, if you want to look at it).

Unfortunately, fuzzing, just like using particular exemplars of attacks in the wild, have one big drawback – one that we could call “zenish” – you can easily forget that you’re looking at a piece of code that is crashing on invalid input, and you just go and resolve that one small issue. Do you remember the calibre security shenanigan ? It’s the same thing: if you only fix the one bit that is crashing on you without looking at the whole situation, an attacker, or a security researcher, can actually just look around and spot the next piece that is going to break on you. This is the one issue that me, Luca and the others in the libav project get vocal about when we’re told that we don’t pay attention to security only because it takes us a little longer to come up with a (proper) fix — well, this, and the fact that most of the CVE that are marked as resolved by FFmpeg we have had no way to verify for ourselves because we weren’t given access to the samples for reproducing the crashes; this changed after the last VDD for at least those coming from Google. If I’m not mistaken, at least one of them ended up with a different, complete fix rather than the partial bandaid put in by our peers at FFmpeg.

Testsuites for valid configurations and valid files are not useful to identify these problems, as those are valid files and should not cause a DoS anyway. On the other hand, just using a completely shot-in-the-dark fuzzing technique like zzuf could or could not help, depending on how much time you can pour to look at the failures. Some years ago, I read an interesting book, Fuzzing: Brute Force Vulnerability Discovery by Sutton, Greene and Amini. It was a very interesting read, although last I checked, the software they pointed to was mostly dead in the water. I should probably get back at it and see if I can find if there are new forks of that software that we can use to help getting there.

It’s also important to note that it’s not just a matter of causing a crash, you need to save the sample that caused the issue, and you need to make sure that it’s actually crashing. Even a “all okay” result might not be actually a pass, as in some cases, a corrupted file could cause a buffer overflow that, in a standard setup, could let the software keep running — hardened, and other tools, make it nicer to deal with that kind of issues at least…

Unpaper (un)planning

So after just shy of an year of me forking unpaper to save it from the possible shutdown of BerliOS, and to update it to fit better in a 2012 distribution, I guess it’s time to have some ideas on how to move forward.

Now, while I haven’t done as much work on it as I’d have liked to begin with, we have a new series of packages, version 0.4.x, which has been packaged in Gentoo since day one (obviously), and is now available in Debian as well; these versions have a proper, Autotools-based build system, source code split in multiple files and cleaned up a little bit, and have a rewritten option parsing, although an imperfect one still.

The main issue with the option parsing is that the original parameters are not compatible with your average Unix long options — the end result is that I had to come up with enough hacks to have them parsed properly, and I can’t get the command to behave like a Unix command unless I also break compatibility with its original parameters. Which is something that might very well happen.

But there is another issue in all of this: the code still uses an handmade parser of netpbm-style image files, which is quite nasty to be honest, and likely prone to issues (I haven’t tried to see if I could overflow it, but I wouldn’t be surprised if it was possible; error handling also needs lots of work). It’s obvious that what I would like to do is replace the whole image loading and saving to use an external library of some kind, which also means that it would be possible to support more input and output formats at the same time — which is an especially good thing because right now during the testing phase I have to convert some PNGs to PNM to process.

My first try has been using gdk-pixbuf: the reason is that it’s already a commonly-used library, it has been split off Gtk lately, and using it and the rest of Glib it means that unpaper then has very little of its own utility functions, as Glib provides most of it, file access, option parsing and most importantly threadpools, which I wanted to use to be able to implement --jobs inside unpaper itself. Unfortunately the gdk-pixbuf library doesn’t seem to be extremely well suited for image processing as much as it is for decoding; saving files is clumsy and even the in-memory representation doesn’t feel extremely easy to play with.

The other solution I had in mind was to use libav for loading and saving — for sure its decoders and encoders are as optimised as it makes sense and will keep so, and it’s not so difficult to foresee that if the code is rewritten to be compatible with libav itself it could just become a piece of libavfilter, making unpaper just a frontend. The problem is that this still requires a lot of work, I’m pretty sure.

And there is one more issue here to say something about: for a while I won’t have as much motivation to work on unpaper for the simple reason that I won’t have a scanner! Since I couldn’t get an US Visa at this time, I won’t be able to actually move to the United States anytime soon, which put a hold to all my plans to ship or sell my stuff here and then buy what remains there. For this reason I won’t have as much use for unpaper as I had up to now. I don’t doubt at some point I’ll have again a scanner and the need to archive data (actually I’m quite sure this will happen sooner rather than later), so I will keep looking into improving unpaper in the mean time still, but it might take a backseat to other things.

Is x32 for me? Short answer: no.

For the long answer keep reading.

So thanks to Anthony and the PaX team yesterday I was able to set up the first x32 testing system — of which I lamented last week I was unable to get a hold of. The reason why it wasn’t working with LXC was actually quite simple: the RANDMMAP feature of the hardened kernel, responsible for the stronger ASLR was changing the load address of x32 binaries to be outside of the 32-bit range of the x32 pointers. Version 3.4.3 solves the issue finally and so I could set it up properly.

The first was the hard step: trying out libav. This is definitely an interesting and valuable test, simply because libav has so many hand-crafted assembly routines that it really shouldn’t suffer much from the otherwise slower amd64 ABI, and at the same time, Måns was sure that it wouldn’t have worked out of the box — which is indeed the case. From one side YASM still doesn’t support the new ABI which means that everything relying on it both in libav, x264 and other projects won’t work; from the other side the inline asm (which is handled by GCC) is now a completely different “third way” from the usual 32-bit x86 and the 64-bit amd64.

While I was considering setting up a tbx32 instance to see what would actually work, the answer right now is “way too little”; heck even Ruby 1.9 doesn’t work right now because it’s using some inline asm that is no longer working.

More interesting is that the usual way to discern which architecture one’s in are going to fail, badly:

  • sizeof(long) and sizeof(void*) are both 4 (which means that both types are 32-bit), like on x86;
  • __x86_64__ is defined just like on amd64 and there is no define specific for x32; the best you can do is to check for __x86_64__ and __SIZEOF_LONG__ == 4 __ILP32__ at the same time — edit: yes I knew that there had to be one define more specific than that, I just didn’t bother to look it up before; the point of needing two checks still stands, mmkay?

What does this mean? It simply means that considering it took us years to have a working amd64 system, which was in many ways a “pure” new architecture, which could be easily discerned from the older x86, we’re going to spend some more years trying to get x32 working … and all for what? To have a smaller address range and thus smaller pointers, to save on memory usage and memory bandwidth … by the time x32 will be ready, I’d be ready to bet that neither concerns will be that important — heck I don’t have a single computer with less than 8GB of RAM, right now!

It might be more interesting to know why is x32 so important for enough people to work on it; to me, it seems like the main reason is that it saves a lot of memory on C++ programs, simply because every class and every object has so many pointers (functions, virtual functions and so on so forth), that the change from 32 to 64 bit has been a hit big enough. Given that there is so much software still written in C++ (I’m unconvinced as to why to be honest), it’s likely that there is enough interest in working on this to improve the performance.

But at the end of the day, I’m really concerned that this might not be worth the effort: we’re calling unto us another big problem with this kind of multilib (considering we never really solved multilib for the “classic” amd64, and that Debian is still working hard to get their multiarch method out of the door), plus a number of software porting problems that will keep a lot of people busy for the next years. The efforts would probably be better directed at improving the current software and moving everything to pure 64-bit.

On a final note, of course libav works if you disable all the hand-written assembly code, as everything is also available in pure C. But many routines can be as much as 10 or more times slower in pure C than using AVX, for instance, which means that even if you have a noticeable improvement going from amd64 to x32, you’re going to lose more by losing the assembly.

My opinion? It’s not worth it.

Tinderboxing again

Finally Excelsior is doing what it has been bought to do: tinderboxing. Well, not really to full capacity but I’ve got enough pieces of the puzzle in place that it should soon start building regularly.

What the tinderbox set up is doing now is actually a limited run: it’s building the reverse dependencies of virtual/ffmpeg to find out which packages actually require libpostproc, so that we can fix their dependencies with libav — to follow up on Tomáš’s post I’m also strongly attached to libav, being, together with Luca, part of the development team of the project.

Speaking about libav, one of the things that since a couple of days ago Excelsior is doing, is running FATE so to both cover AMD Bulldozer’s architecture and Gentoo Hardened. For those who do not know, FATE is an automated testing environment that configures, builds, and run the tests for libav and then send the result to a central server for analysis. The whole build takes three minutes on Excelsior. Thanks to libav’s highly-parallelized makefiles.

There is only one problem with the Tinderbox, and is once again the logs analysis. To be honest this time I have some idea on how to proceed: the first step is to replace the grep command I have used before with a script that produces an HTML output of the log file. Yes many people said before that HTML is not a good idea for this kind of thing: since nobody else has helped me writing a better log analysis tool, that’s going to be enough.

So what has to happen is reading line by line the input log, then create an HTML file with (numbered) lines, marked with a specific (CSS) file that makes the row red. As I said before it would be nice to have some kind of javascript to jump from one error line to the other. Until this is all set, though, just creating the HTML file would be enough.

The next step would probably be getting the HTML output on S3 (for easy access), and write it down on a database that actually give you an index of the error logs — for those who wonder, using s3fs as PORT_LOGDIR is not a good idea at all. I guess tomorrow I might try to find some time to work on this. I could find some time on the weekend, given that my original idea (biking around with Luca) is disabled due to me falling and (lightly) injuring myself last night.