Eight Years Later, unpaper-7.0.0 Is Released

You may remember that I have, many many years ago (it will be 11 in August), taken over unpaper, because the original author was not interested in maintaining it, and I was using it for pre-processing all of my scanned documents. I have since stopped scanning from Linux due to… multiple factors, but I have kind of kept maintaining unpaper in the meantime.

Due to difficulties with maintaining the project while working for my previous employer, I had left it significantly behind. The previous release for it was over eight years ago, and it was nearly impossible to run it with modern FFmpeg/libav versions. And while I was hoping to rework its interface, I decided to settle for now just for a new build system.

I did a horrible job at making a release, to be honest. I forgot to update the NEWS file, I can’t get a GPG key to work for saving my life, and I have no idea how to even announce the new release to anyone out there, which is why this is ending up (with a significant delay) on the blog instead. If someone needed more proofs that automation is good, you probably want to add this to the list. Thankfully, my Python packages are published when I push a tag instead, a lot fewer things for me to get wrong.

So let’s start with a round of thanks: Hannes Franke, Julien Danjou, Jussi Pakkanen, Marcel Metz, Oleg Pudeyev, Stefan Weil, Victor Song, and a1346054 contributed to this release. Without them there wouldn’t be support for FFmpeg 5.0, and there wouldn’t be much of a working build system either.

In terms of features, there shouldn’t be much — there’s some realignment between documentation (including the man page) and the code, and some general quality improvements (quality of the code, not the output.) A number of bugfixes were actually triggered by improvement in compilers’ diagnostics, that surfaced pre-existing issues with the code.

The future of unpaper is still… clear as mud. Not only I’m not currently using it myself for any workflow, meaning I don’t have any feeling of how well or badly it behaves, but also I don’t even pretend to understand how the algorithms behind it work. They are basically all the way it was originally written before I took it over, as I only cared about fixing its input and output routine for the most part.

I think that on the medium term I might just spend some time to write that programmatic interface I wrote about. Not sure if it should be written as a simple JSON interface with a schema, or a more complex protocol buffers interface — after all, the importance in all of this is to know the semantics of the interface. This is likely going to be the kind of work that goes well to do on Twitch, for some scoping out, and deciding how to address the problem. But as usual, finding a good time to stream is not going to be easy, particularly as lately I resumed going to the office, and that means evenings are busier.

Also, preparing the release took me a while — and was not straightforward at all. Not only I hadn’t made a release in a long while, but I found myself unable to sign the release, as my NUC is having trouble with PC/SC, and also got confused, because Meson’s dist command does not appear to be much sophisticated at all. I think that in the future I’ll just tie the releases to the Git repository and be done with it, like I always disliked others doing, unfortunately.