This Time Self-Hosted
dark mode light mode Search

Unpaper (un)planning

So after just shy of an year of me forking unpaper to save it from the possible shutdown of BerliOS, and to update it to fit better in a 2012 distribution, I guess it’s time to have some ideas on how to move forward.

Now, while I haven’t done as much work on it as I’d have liked to begin with, we have a new series of packages, version 0.4.x, which has been packaged in Gentoo since day one (obviously), and is now available in Debian as well; these versions have a proper, Autotools-based build system, source code split in multiple files and cleaned up a little bit, and have a rewritten option parsing, although an imperfect one still.

The main issue with the option parsing is that the original parameters are not compatible with your average Unix long options — the end result is that I had to come up with enough hacks to have them parsed properly, and I can’t get the command to behave like a Unix command unless I also break compatibility with its original parameters. Which is something that might very well happen.

But there is another issue in all of this: the code still uses an handmade parser of netpbm-style image files, which is quite nasty to be honest, and likely prone to issues (I haven’t tried to see if I could overflow it, but I wouldn’t be surprised if it was possible; error handling also needs lots of work). It’s obvious that what I would like to do is replace the whole image loading and saving to use an external library of some kind, which also means that it would be possible to support more input and output formats at the same time — which is an especially good thing because right now during the testing phase I have to convert some PNGs to PNM to process.

My first try has been using gdk-pixbuf: the reason is that it’s already a commonly-used library, it has been split off Gtk lately, and using it and the rest of Glib it means that unpaper then has very little of its own utility functions, as Glib provides most of it, file access, option parsing and most importantly threadpools, which I wanted to use to be able to implement --jobs inside unpaper itself. Unfortunately the gdk-pixbuf library doesn’t seem to be extremely well suited for image processing as much as it is for decoding; saving files is clumsy and even the in-memory representation doesn’t feel extremely easy to play with.

The other solution I had in mind was to use libav for loading and saving — for sure its decoders and encoders are as optimised as it makes sense and will keep so, and it’s not so difficult to foresee that if the code is rewritten to be compatible with libav itself it could just become a piece of libavfilter, making unpaper just a frontend. The problem is that this still requires a lot of work, I’m pretty sure.

And there is one more issue here to say something about: for a while I won’t have as much motivation to work on unpaper for the simple reason that I won’t have a scanner! Since I couldn’t get an US Visa at this time, I won’t be able to actually move to the United States anytime soon, which put a hold to all my plans to ship or sell my stuff here and then buy what remains there. For this reason I won’t have as much use for unpaper as I had up to now. I don’t doubt at some point I’ll have again a scanner and the need to archive data (actually I’m quite sure this will happen sooner rather than later), so I will keep looking into improving unpaper in the mean time still, but it might take a backseat to other things.

Comments 2
  1. My little scanning GUI project, gscan2pdf, has provided a frontend for unpaper for some time.You can scan, unpaper, OCR, and save as PDF, DjVu, or whatever, with a couple of clicks

  2. Dear Diego, good to read that you take care of unpaper. I think unpaper is a great tool, but the original author apparently abandoned it and there is also room for improvements, of course. I hope that your visa issue is solved, soon! My best wishes, Gerry.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.