unpaper and libav status update

The other day I wrote about unpaper and the fact that I was working on making it use libav for file input. I have now finished converting unpaper (in a branch) so that it does not use its own image structure, but rather the same AVFrame structure that libav uses internally and externally. This meant not only supporting stripes, but using the libav allocation functions and pixel formats.

This also enabled me to use libav for file output as well as input. While for the input I decided to add support for formats that unpaper did not read before, for output at the moment I’m sticking with the same formats as before. Mostly because the one type of output file I’d like to support is not currently supported by libav properly, so it’ll take me quite a bit longer to be able to use it. For the curious, the format I’m referring to is multipage TIFF. Right now libav only supports single-page TIFF and it does not support JPEG-compressed TIFF images, so there.

Originally, I planned to drop compatibility with previous unpaper version, mostly because to drop the internal structure I was going to lose the input format information for 1-bit black and white images. At the end I was actually able to reimplement the same feature in a different way, and so I restored that support. The only compatibility issue right now is that the -depth parameter is no longer present, mostly because it and -type constrained the same value (the output format).

To reintroduce the -depth parameter, I want to support 16-bit gray. Unfortunately to do so I need to make more fundamental changes to the code, as right now it expects to be able to get the full value at most at 24 bit — and I’m not sure how to scale a 16-bit grayscale to 24-bit RGB and maintain proper values.

While I had to add almost as much code to support the libav formats and their conversion as there was there to load the files, I think this is still a net win. The first point is that there is no format parsing code in unpaper, which means that as long as the pixel format is something that I can process, any file that libav supports now or will support in the future will do. Then there is the fact that I ended up making the code “less smart” by removing codepath optimizations such as “input and output sizes match, so I won’t be touching it, instead I’ll copy one structure on top of the other”, which means that yes, I probably lost some performance, but I also gained some sanity. The code was horribly complicated before.

Unfortunately, as I said in the previous post, there are a couple of features that I would have preferred if they were implemented in libav, as that would mean they’d be kept optimized without me having to bother with assembly or intrinsics. Namely pixel format conversion (which should be part of the proposed libavscale, still not reified), and drawing primitives, including bitblitting. I think part of this is actually implemented within libavfilter but as far as I know it’s not exposed for other software to use. Having optimized blitting, especially “copy this area of the image over to that other image” would be definitely useful, but it’s not a necessary condition for me to release the current state of the code.

So current work in progress is to support grayscale TIFF files (PAL8 pixel format), and then I’ll probably turn to libav and try to implement JPEG-encoded TIFF files, if I can find the time and motivation to do so. What I’m afraid of is having to write conversion functions between YUV and RGB, I really don’t look forward to that. In the mean time, I’ll keep playing Tales of Graces f because I love those kind of games.

Also, for those who’re curious, the development of this version of unpaper is done fully on my ZenBook — I note this because it’s the first time I use a low-power device to work on a project that actually requires some processing power to build, but the results are not bad at all. I only had to make sure I had swap enabled: 4GB of RAM are no longer enough to have Chrome open with a dozen tabs, and a compiler in the background.

unpaper and libav

I’ve resumed working on unpaper since I have been using it more than a couple of times lately and there has been a few things that I wanted to fix.

What I’ve been working on now is a way to read input files in more formats; I was really aggravated by the fact that unpaper implemented its own loading of a single set of file formats (the PPM “rawbits”); I went on to look into libraries that abstract access to image formats, but I couldn’t find one that would work for me. At the end I settled for libav even though it’s not exactly known for being an image processing library.

My reasons to choose libav was mostly found in the fact that, while it does not support all the formats I’d like to have supported in unpaper (PS and PDF come to mind), it does support the formats that it supports now (PNM and company), and I know the developers well enough that I can get bugs and features fixed or implemented as needed.

I have now a branch can read files by using libav. It’s a very naïve implementation of it though: it reads the image into an AVFrame structure and then convert that into unpaper’s own image structure. It does not even free up the AVFrame, mostly because I’d actually like to be able to use AVFrame instead of unpaper’s structure. Not only to avoid copying memory when it’s not required (libav has functions to do shallow-copy of frames and mark them as readable when needed), but also because the frames themselves already contain all the needed information. Furthermore, libav 12 is likely going to include libavscale (or so Luca promised!) so that the on-load conversion can also be offloaded to the library.

Even with the naïve implementation that I implemented in half an afternoon, unpaper not only supports the same input file as before, but also PNG (24-bit non-alpha colour files are loaded the same way as PPM, 1-bit black and white is inverted compared to PBM, while 8-bit grayscale is actually 16-bit with half of it defining the alpha channel) and very limited TIFF support (1-bit is the same as PNG; 8-bit is paletted so I have not implemented it yet, and as for colour, I found out that libav does not currently support JPEG-compressed TIFF – I’ll work on that if I can – but otherwise it is supported as it’s simply 24bpp RGB).

What also needs to be done is to write out the file using libav too. While I don’t plan to allow writing files in any random format with unpaper, I wouldn’t mind being able to output through libav. Right now the way this is implemented, the code does explicit conversion back or forth between black/white, grayscale and colour at save time, and this is nothing different than the same conversion that happens at load time, and should rather be part of libavscale when that exists.

Anyway, if you feel like helping with this project, the code is on GitHub and I’ll try to keep it updated soon.

Frontends to command-line or libraries?

I know I’m still convalescing so I should be resting, not thinking about development problems, but this is something that ended up in my mind because I have one thing to absolutely do (scan the release documents from the hospital to send to my GP), and for which I miss an easy interface I could instruct my mother, or my sister, to use.

Don’t get me wrong, I know there are a few SANE frontends, starting from xsane itself, but the problem is that I don’t usually stop at scanimage when I do it from the console. What I usually do is to launch a batch scan to TIFF format, then use tiffcp to join the different TIFF files in a single file with multiple pages, and then use tiff2pdf to convert it to a PDF file, which is opened by any computer I might need to send the data to (and it also is quite smaller than the original TIFF file). Lately, I’ve started trying to add to the chain also unpaper, a tool that removes some of the defects usually found in scanning pages from books (like black borders and similar), which works on PNM (thus requiring a change in the scanning command, and a further conversion later on).

But I don’t want to start fleshing down how to actually write such a tool, or I might actually start writing it right now, which is not what I’m supposed to do while I’m convalescing.

What I want to think about is that here comes one huge debate between writing frontends for command-line tools, which just interfaces with them as process calling, or writing a full-fledged program that interfaces with the low-level libraries to provide similar functionalities.

I already happen to discuss that quite often since xine and mplayer embody the two spirits: xine has its library, frontends interface with that, and a single process is used; mplayer instead has a command interface, and frontends execute a new process for playing videos.

There are of course advantages and disadvantages, one easy to spot disadvantage to xine’s approach is that a crash or freeze in xine results in a crash or freeze of the frontend, which is something Amarok users have been unfortunately familiar with.

In the case of the scanning toolchain, though, I guess it’d be probably easier to use a frontend for the tools, as re-implementing all the different functionalities would be a non-trivial work.

The disadvantages of doing it this way, though, is that you’ll have to make sure that the tools don’t change their parameters between versions, otherwise it’d be a problem to ensure the correct functionality of the tool. Also, the tools need to provide enough options to control with granularity the execution of their task, which is something unpaper actually does, but in turn makes them almost unusable for final users.

I know I’m not going much anywhere with this post, I’m afraid, but I just wanted to reflect on the fact that to have a command line tool designed to be used by frontends, you almost certainly make its syntax so complex that users would fail to grasp the basic concepts, and in turn, you’d need a command line interface to the tool too… which is why there are so many scripts interfacing to ffmpeg for converting videos, I guess.

On the other hand, one can easily write such a frontend using scripting languages, even if it’s graphical, such as with ruby-gtk2 (think RubyRipper).