Flattr for comments

You probably know already that my blog is using Flattr for micro-donation, both to the blog as a whole and to the single articles posted here. For those who don’t know, Flattr is a microdonation platform that splits a monthly budget into equal parts to share with your content creators of choice.

I’ve been using, and musing about, Flattr for a while and sometimes I ranted a little bit of how things have been moving in their camp. One of the biggest problems with the service is the relative scarce adoption. I’ve got a ton of “pending flattrs” as described on their blog for Twiter and Flickr users, mostly.

Riling up adoption of the service is key for it to be useful for both content creators and consumers: the former can only get something out of the system if their content is liked by enough people, and the latter will only care about adding money to the system if they find great content to donate to. Or if they use Socialvest to get the money while they spend it somewhere else.

So last night I did my part in trying to increase the usefulness of Flattr: I added it to the comments of my blog. If you do leave a comment and fill the email field, that email will be used, hashed, to create a new “thing” on Flattr, whether you’re already registered or not — if you’re not registered, the things will be kept pending until you register and associate the email address. This is not much different from what I’ve been doing already with gravatar, which uses the same method (the hashed email address).

Even though the description of the parameters needed to integrate Flattr for comments are described in the partnership interface there doesn’t seem to be a need to be registered as a partner – indeed you can see in the pages’ sources that there is no revenue key present – and assuming you already are loading the Flattr script for your articles’ buttons, all you have to add is the following code to the comment template (for Typo, other languages and engines will differ slightly of course!):

<% if comment.email != "" -%>




<% end -%>

Update (2017-07-20): No I’m not sure where the code ended up for this one, sorry :(

So if I’m not making money with the partner site idea, why am I bothering with adding these extra buttons? Well, I often had people help me out a lot in comments, pointing out obvious mistakes I made or things I missed… and I’d like to be able to easily thank the commenters when they help me out… and now I can. Also, since this requires a valid email field, I hope for more people to fill it in, so that I can contact them if I want to ask or tell them something in private (sometimes I wished to contact people who didn’t really leave an easy way to contact them).

At any rate, I encourage you all to read the comments on the posts, and Flattr those you find important, interesting or useful. Think of it like a +1 or a “Like”. And of course, if you’re not subscribed with Flattr, do so! You’ll never know what other people could like, that you posted!

Stop inventing a new ontology for each service!

Last month I wrote a post noting who makes use of semantic data for the web in particular pointing out that Facebook, Google, Readability and Flattr all use different way to provide context to the content: OpenGraph, Schema.org, hNews and their own version of microformats respectively.

Well, NewsBlur – which, even though I criticized for the HTTP implementation, is still my best suggestion for a Google Reader replacement, if anything because it’s open source even though it’s a premium service – seems to come up with its own way to get semantic data.

The FAQ for publishers states that you can use one of a number of possible selectors to provide NewsBlur with an idea of how your content is structured — completely ignoring the fact that schema.org already includes all the structure, and it would be relatively easy to get that data explicitly. Even better, since NewsBlur has a way to show public comments within the NewsBlur interface it would be possible for it to display the comments on the post themselves, as they are also tagged and structured with the same ontology. I’ve opened an idea about it — hopefully somebody, if not the author, will feel like implementing this.

But this is by far not limited to NewsBlur! While Readability added a special case for my blog so that it actually gets the right data out of it, the content guide still only describe support for the hNews format, even though Schema.org has all the same data and more. And Flattr, well, still does not seem to care about getting data via semantic information — the best match would be support for the link relation in feeds that can be autodiscovered, but then I don’t really have an idea of where Flattr would find the metadata to create the “thing” on their side.

Please, all you guys who work on services — can we get all behind the same ontology, so that we don’t have to start adding four times redundant information on pages, increasing their size for no advantage? Please!

Who consumes the semantic web?

In my previous post I’ve noted that I was adding support for the latest fad method for semantic tagging of data on web pages, but it was obviously not clear who actually consumes that data. So let’s see.

In the midst of the changes to Typo that I’ve been sending to support a fully SSL-compatible blog install (mine is not entirely yet, mostly because most of the internal links from one post to the next are not currently protocol-relative), I’ve added one commit to give a bit more OpenGraph insights — OpenGraph is used by Facebook, almost exclusively. The only metadata that I provide on that protocol, though, is an image for the blog – since I don’t have a logo, I’m sending my gravatar – the title of the single page and the global site title.

Why that? Well, mostly because this way if you do post a link to my blog on facebook, it will appear with the title of the post itself instead of the one that is visible on the page. This solves the problem of whether the title of the blog itself should be dropped out of the <title> tag.

For what concerns Google, instead, the most important part of metadata you can provide them seems to be authorship tagging which uses Google+ to connect content of the same author. Is this going to be useful? Not sure yet, but at least it shows up in a less anonymous way in the search results, and that can’t be bad. Unlike what they say on the link, it’s possible to use an invisible <link> tag to connect the two, which is why you don’t find a G+ logo on my blog anywhere.

What else do search engines do with the remaining semantic data? Not sure, it doesn’t seem to explain it, and since I don’t know what it does behind the scenes it’s hard for me to give a proper answer. But I can guess, and hope, that they use it to reduce the redundancy of the current index. For instance, pages that are actually a list of posts, such as the main index, the categories/tags and archives will now properly tell that they are describing a blog posting whose URL is, well, somewhere else. My hope would be for the search engines to know then to link to the declared blog post’s URL instead of the index page. And possibly boost the results for the posts that result more popular (given they can then count the comments). What I’m surely counting on, is for descriptions in search results to be more humanly-centered.

Now in the case of Google you can use their Rich Snippet testing tool that gives you an idea of what it finds. I’m pretty sure that they take all this data with a grain of salt though, seeing as how many players are there in the “SEO” world, with people trying to game the system altogether. But at least I can hope that things will move in the right direction.

Interestingly, when I first implemented the new semantic data, Readability did not support it, and would show my blog’s title instead of the post’s title when reading the articles from there — after a feedback on their site they added some workaround for my case, so you can enjoy their app with my content just fine. Hopefully, with time, the microformat will be supported in the general sense.

On the other hand, Flattr still has no improvement on using metadata, as far as I can see. They require that you actually add a button manually, including repeating that kind of metadata (content type, language, tags) that is already easily inferred from the microformat given. Hereby, I’d like to reiterate my plea to Flattr developers to listen to OpenGraph and other microformat data, and at least use that to augment the manually-inserted buttons. Supporting the schema.org format, by the way, should make it relatively easy to add per-fragment buttons — i.e., I wouldn’t mind having a per-comment Flattr button to reward constructive comments, like they have on their own blog, but without the overhead that it adds to do so manually.

Right now this is all the semantic data that I figured out that is being used. Hopefully things will become more useful in the future.

What I’d like from my blog

My blog is, at this point, a vital part of my routine. I use my blog to write about my personal projects, I write about the non-restricted parts of my jobs, and I write about the work that goes into Gentoo Linux and other projects I follow.

I have over 2100 posts over time, especially thanks to the recent import of my original blog on Gentoo infrastructure. I don’t really know if it’s a lot, but sometimes Typo seems to miss something about it. Unfortunately I’m also running an older version of Typo, because I haven’t switched that virtual server to Ruby 1.9 yet as one of my customers is running a version of Radiant that is not going to work otherwise.

Said customer also bitched so hard, and screamed not to keep the site on my server, but as it happens the new webmasters that are supposed to pick up the website, and should have been cheaper and faster than me… have been working since June and still delivered nothing. Hopefully they’ll be done soon and I can kick said customer from the server.

Anyway, at this point there are a few things that I’d like to get out of my blogging platform in the future, which might require me to fork Typo and create my own version, which is likely going to be stripped down — as many things I really don’t care about, that are added here, like the short URLs, which I might just export as I think I used them at some point, but then I would handle through mod_rewrite rather than on the Rails side.

So let’s see what I don’t like about the current Typo I’m using:

  • The database access is more than a bit messed up; it probably has to do that upstream only cares about MySQL, while I want to run it on PostgreSQL; and this causes more than a couple of problems — have you noticed that sometimes my posts end up password-protected? Well, what happens is that the settings for the single posts are serialized in YAML and de-serialized, but somethings something bad happens and the YAML becomes invalid, causing the password-protection to kick in. I know there is an ActiveRecord extension that allows for key-value pairs to be stored in PostgreSQL-specific column types instead of having to (de)serialize them all the time, but again, this wouldn’t be something upstream would use.
  • Alternatively I’ve been toying with the idea of using MongoDB as a backend. Even with the issues that I have pointed out before, I think it might work well for a blog, especially since then the comments would be tied tot he post itself, rather than have the current connected tables.
  • There is a problem with the tags handling, again something upstream doesn’t seem to care about – at some point I remember reading they were mostly interested in making every single word in a post a tag to cross-connect posts with the same word; it’s one of the reasons why I’m not sure if I want to update it. If I change the title of one of the tags to make it more descriptive, then I edit a post that has that tag, it creates one more tag for each word in that title, instead of preserving the older tags. I really should clean up the tags I got right now.
  • I would also like that when I get to the “new post” page it would create it already and then get me back to editing it — this is important to me because sometimes if I have to restart Chromium, or suspend the laptop, something goes very wrong and it creates multiple drafts for the same post. And cleaning them up is a long task.
  • A better implementation of notification for new posts, and integration with Flattr, would be also very good. While IFTTT makes it easy to post the new entries to Twitter and LinkedIn, its lack of integration for Flattr is a major pain, and the fact that right now, to use auto-submit, I have to duplicate part of the content in the HTML of the pages, is also a problem. So being able to create a “Flattr thing” the moment when I actually post something would be a major plus for me.
  • Since I’m actually quite the paranoid, another thing I would like to have would be either two-factor authentication with Google Authenticator on a cellphone, or (actually, in addition to) certificate-based authentication for the admin interface. Having a safe way to make sure that I’m the only one logging in would make me remove some of the administrative interface rules on ModSecurity, which would in turn let me write posts from public WiFi networks sidestepping the problem I posted about the other day.
  • Scheduled posting. This used to be supported, but it’s been completely broken for years at this point, but it was very useful to me a long time ago since I would just write a bunch of posts and schedule them to be posted once a day. I suppose this should now be changed so that the planned posts are only actually posted if a process is called to make sure that the new posts are now “elapsed”… but again this is something that I’d like to have, and you readers would probably enjoy, as it would probably make for more and better content overall.

I definitely do not want to go with WordPress, I just wish I had the time to write my own Typo fork, and make it more usable for what I do, rather than hoping that the upstream development for Typo does not go in a direction I don’t like at all.. Maybe somebody else has the same requirements and would like to join me in this project; if so, send me an email.. maybe it’ll finally be the time I decide to start on the fork itself.

Tinderbox and expenses

I’ve promised some insight into how much running the tinderbox actually costed me. And since today marks two months from Google AdSense’s crazy blacklisting of my website, I guess it’s a good a time as any other.

SO let’s start with the obvious first expense: the hardware itself. My original Tinderbox was running on the box I called Yamato, which costed me some €1700 and change, without the harddrives, this was back in 2008 — and about half the cost was paid with donation from users. Over time, Yamato had to have its disks replaced a couple of times (and sometimes the cost came out of donations). That computer has been used for other purposes, including as my primary desktop for a long time, so I can’t really complain about the parts that I had to pay myself. Other devices, and connectivity, and all those things, ended up being shared between my tinderbox efforts and my freelancing job, so I also don’t complain about those in the least.

The new Tinderbox host is Excelsior, which has been bought with the Pledgie which got me paying only some $1200 of my pocket, the rest coming in from the contributors. The space, power and bandwidth, have been offered by my employer which solved quite a few problems. Since now I don’t have t pay for the power, and last time I went back to Italy (in June) I turned off, and got rid of, most of my hardware (the router was already having some trouble; Yamato’s motherboard was having trouble anyway, I saved the harddrive to decide what to do, and sold the NAS to a friend of mine), I can assess how much I was spending on the power bill for that answer.

My usual power bill was somewhere around €270 — which obviously includes all the usual house power consumption as well as my hardware and, due to the way the power is billed in Italy, an advance on the next bill. The bill for the months between July and September, the first one where I was fully out of my house, was for -€67 and no, it’s not a typo, it was a negative bill! Calculator at hand, he actual difference between between the previous bills and the new is around €50 month — assuming that only a third of that was about the tinderbox hardware, that makes it around €17 per month spent on the power bill. It’s not much but it adds up. Connectivity — that’s hard to assess, so I’d rather not even go there.

With the current setup, there is of course one expense that wasn’t there before: AWS. The logs that the tinderbox generates are stored on S3, since they need to be accessible, and they are lots. And one of the reasons why Mike is behaving like a child about me just linking the build logs instead of attaching them, is that he expects me to delete them because they are too expensive to keep indefinitely. So, how much does the S3 storage cost me? Right now, it costs me a whopping $0.90 a month. Yes you got it right, it costs me less than one dollar a month for all the storage. I guess the reason is because they are not stored for high reliability or high speed access, and they are highly compressible (even though they are not compressed by default).

You can probably guess at this point that I’m not going to clear out the logs from AWS for a very long time at this point. Although I would like for some logs not to be so big for nothing — like the sdlmame one that used to use the -v switch to GCC which causes all the calls to print a long bunch of internal data that is rarely useful on a default log output.

Luckily for me (and for the users relying on the tinderbox output!) those expenses are well covered with the Flattr revenue from my blog’s posts — and thank to Socialvest I no longer have to have doubts on whether I should keep the money or use it to flattr others — I currently have over €100 ready for the next six/seven months worth of flattrs) Before this, between my freelancer’s jobs, Flattr, and the ads on the blog, I would also be able to cover at least the cost of the server (and barely the cost of the domains — but that’s partly my fault for having.. a number).

Unfortunately, as I said at the top of the post, there no longer are ads served by Google on my blog. Why? Well, a month and a half ago I received a complain from Google, saying that one post of mine in which I namechecked a famous adult website, in the context of a (at the time) recent perceived security issue, is adult material, and that it goes against the AdSense policies to have ads served on a website with adult content. I would still argue that just namechecking a website shouldn’t be considered adult content, but while I did submit an appeal to Google, a month and a half later I have no response at hand. They didn’t blacklist the whole domain though, they only blacklisted my blog, so the ads are still showed on Autotools Mythbuster (which I count to resume working almost full time pretty soon) but the result is bleak: I went down from €12-€16 a month to a low €2 a month due to this, and that is no longer able to cover for the serve expense by itself.

This does not mean that anything will change in the future, immediate or not. This blog for me has more value than the money that I can get back from it, as it’s a way for me to showcase my ability and, to a point, get employment — but you can understand that it still upsets me a liiiittle bit the way they handled that particular issue.

How Flattr grew back for me

I wrote about flattr more than a couple of times in the past. In particular, I’ve complained about the fact that its system made it difficult for people not take their money out, as they take a continuous 10% stream out of each people’s revenue monthly. Also, the revenue out of Flattr at least for me has been, for a while, just a notch above that of Google’s AdSense, which does not require direct interaction from users to begin with.

But one of the things they stared this year made it possible to increase significantly (well depending on your habits) the amount of money that runs in the system. Socialvest is a very neat service that uses the various affiliate systems to gather you funds that you can then employ to donate straight to a non-profit (including Flattr itself!) and if you link it with your Flattr account, you’ll also see that money transferred to your Flattr funds, which you can then use to flattr others.

For the user it’s extremely simple actually: you install a browser extension, and then go around doing your online shopping as usual. Some websites will show up a ribbon telling you that you can use Socialvest with them, in which case the extension injects the needed affiliate code into the order forms so that you get your “rebate”. Considering that Amazon has a 4% affiliate fee, it’s extremely interesting, as I do most of my shopping on Amazon (ThinkGeek also should be supported but when I tried, it seemed like it didn’t work as intended, unfortunately). The nicest part is that it seems to work fine with gift cards as well.

Using SocialVest hasn’t really changed my spending habits — although it did change my preference in where to buy TV series and music, from Apple’s iTunes Store to Amazon’s stores. This was helped by me getting a Kindle Fire and Amazon releasing an Instant Video app for iPad. And now from the fact that Amazon launched the MP3 Store in Italy as well. Furthermore it seems like the J-Pop catalogue in Amazon is quite bigger than Apple’s, and that’s good news for me.

So go on, if you’re using Flattr, and go to Socialvest to have more funds to flattr the content you care about. There’s nothing to lose in my opinion.

More Flattr downsides

I’ve written a bit about flattr before and in particular it was just three months ago that I last complained about seeing it dwindling down.

Even if the developers have started working to push more features out, to find more content to flattr… it seems to me like it hasn’t spurn enough enthusiasm in people to add more money into the system. At the end of the month, the dozen euros that used to come in my flattr, which I would then reinvest on more flattrs to other, have cut down to half.

A few months ago seeing the trend, I gave myself a rule; since I’m on flattr also because I make content, and most of the time said content is not something I’m paid to make, I expect to have a positive balance out of it: if any given month the revenue is lower than the monthly amount, I won’t be using it the following month; this was helped by the fact that now you can wait to add funds even to the middle of the month, so I can just wait for the new revenue to be finalised, and then move it to funds.

This happened the past month. This month seems to be better, but not so much. By comparison, Google AdSense – which is on this blog and on my Autotools Mythbuster – and does not require a direct donation, brought me a couple of euros more than flattr, consistently, for the past six months.

What’s the issue? I guess the main problem here is that Flattr still takes 10% of each month’s revenue, which means that, given my monthly budget of €5, to have it available, I have to get at least €5.56 in donations, and then what my targets’ revenue is going to be €4.50. And over that they take a fee (5% if I recall correctly) when you add funds directly, and then you have PayPal taking another fee if you want to withdraw the revenue.

While this was designed as an easy way to handle microdonations, the impact on these starts to be big enough that what was considered a huge chunk of money being eaten away by PayPal is now looking not too bad — myself, if flattr is going to go down the drain, I think I’ll get people I feel deserve something Amazon gift certificates, the same way I’m listing them as primary donation point for me as well — especially since those I can use to buy books for the Kindle.

Flattr, please, try to get more people involved by reducing the percentage fee you take, or just moving it to when you add and receive funds, rather than just when you pass them around to others.

Flattr and funding

Last month I observed on my Twitter feed that it looked like Flattr is losing traction, compared with something as “old school” as Google AdSense service. At the time I was confronted by one angry user, who seemed to think I don’t know what I’m saying. Given that I’m probably one of the early adopters (although not too early, admittedly), it didn’t make much sense.

I repeated the same concerns a few days ago, after coming back from my (long-needed) vacation, as I could compare the Flattr revenue with the AdSense one.. Flattr did come out on top, but for less than half an euro difference. Not really an indication of Flattr performing any better. If anything, considering that I have much more content with Flattr buttons than I have with ads, it is performing relatively worse.

Turns out I’m not the only one concerned with Flattr’s well-being — and I remember Michal being one of the early adopters of the idea as well.

Why does this happen? It’s a very tough question to ask, but I might theorize a few reasons that make sense to me — your mileage may vary though.

First of all, Flattr wanted to expand its reach and removed the first barrier it had, which required you to spend money to receive money. While such a requirement made it a closed circle, which Flattr didn’t want to be, it also ensured that there was not a “money black hole”. Nowadays, you can be flattr’d without even flattr’ing anyone.. it can easily be seen as right from one point of view, but it doesn’t mean it’s the best choice. This also tends to ignore one detail: if you have things that are being flattr’d, you never had to keep adding money to your balance.. you could just convert the revenue into means. I think that myself I only added the original €10 to the account, and I’m flattr’ing through the means since then…

This leads me straight into another issue that probably make Flattr not an option for many: the fees. As Michal points out, the 10% fee that Flattr takes is … hefty. But that wouldn’t be the trouble if the fee was applied to the fund you add to your account. Instead, the fee is applied, each month, to the revenue you receive. Which means that, once I transfer the funds to the means, they’ll be cut another 10% when they are transferred to my flattr’d targets. Honestly, it bothers me; not as much as stopping from using it, but it does bother me.

Then there is the most obvious problem which is what most people, including me and Michal, already noted before: it is hard to find flattr’able content! It’s not that there isn’t much content that is flattr’able (there is quite a bit), but for people like me who like to use Google Reader over Twitter to read news (i.e. using the feed and not a link to the blog itself), it’s difficult to know when the post you just read and saved you ton of time comes from an author that does use flattr. It’s not much of a technical issue – it is true that Typo does not allow me to automatically add content at the end of the posts, but it wouldn’t stop me – as much as most Planets (which is what I use to find posts, for what it’s worth) who seem to frown upon such “advertisement”.

At this point.. I’m honestly doubtful about its well-being.. so if one day you no longer see a Flattr button on this blog.. you know why.

Gems using hydra development model

As I said on twitter I think I have a part of me that’s pretty much a masochist, as I’ve had the nice idea of trying out MongoDB for a personal experiment (a webapp to manage clients’ hardware configuration registration, stuff for my usual “day job”), so I started looking into packaging Ruby (and Rails) support for it.

Luckily Rails 3 started abstracting the ORM as well, allowing an almost drop-in replacement of ActiveRecord with a MongoDB adapter instead.. there is a nice guide that actually tells you almost all that is needed in the basic case (of course it could be more automated, but that’s beside the point here). The core of that guide is the mongo_mapper gem which then depends on a thin layer that is further built on top of the driver depending in turn on bson that provides a pure Ruby interface as well as a JRuby version… there is a separate gem for the C-based extension … if your head is spinning or are wondering if I became a linkspammer, I can’t blame you.

The end result is that at the very least I got to package

  • bson
  • mongo
  • plucky
  • mongo_mapper

Not too shabby, but I got in way worse situations before, such as the dependency web of Bones and its extensions. What I didn’t expect was the way these packages are developed rather than packaged.

The gems, as usual, are shallow, and contain no tests, Rakefile, or any other useful files that we need to package them properly as ebuilds. So I had to rely on the GitHub repositories, thankfully we do that mostly transparently in Ruby eclasses (as it’s way too common for us to have to rely on snapshots); unfortunately while for plucky and mongo_mapper the situation was clear (the homepage of the gems points to GitHub, and the two have separate repositories), the situation gets complex for the mongo driver and related gems.

First, let’s ignore the issue with bson and bson_ext (the C-based extension)… that is stuff we would generally can deal with in Gentoo itself (the C extension will be built for all the targets supporting it; JRuby will get its own extension, and all in the same package). The problem is that, after digging through the MongoDB website – finding the list of repositories is definitely not easy – the repository for both mongo and bson (and bson_ext!) is the same … and not in the way that Rails 2.3 was all in the same repositories (with each gem having its own subdirectory), but merging the content of the three gems in the same structure. Saying that it’s messy is not enough.

It is no doubt to me that I’ll handle that just fine at some point in the future, but it really makes me wonder how is it possible for them to consider this a good design and development practice. It’s a mystery. For now I don’t think I’ll spend much more time on this issue as I have other tasks for my job to take care of, which moves this to the back-burner…

And on that topic I’d like to try again an experiment to gauge the interest on packaging these gems; my blog is Flattr] enabled – even though lately AdSense on this and Autotools Mythbuster is getting me more money than Flattr – and this post is as well. If you’ve got an account (or feel like opening one), and you’re interested in seeing ebuilds for the Mongo Ruby driver in Gentoo, give a flattr to this post. If I see that count increasing in the next two days I’ll use the Christmas weekend to work on it.

Unpaper fork, part 2

Last month I posted a call to action hoping for help with cleaning up the unpaper code, as the original author has not been updating it since 2007, and it had a few issues. While I have seen some interest in said fork and cleanup, nobody stepped up with help, so it is proceeding, albeit slowly.

What is available now in my GitHub repository is mostly cleaned up, although still not extremely more optimised than the original — I actually removed one of the “optimisations” I added since the fork: the usage of sincosf() function. As Freddie pointed out in the other post’s comments, the compiler has a better chance of optimising this itself; indeed both GCC and PathScale’s compiler optimise two sin and cos calls with the same argument into a single sincos call, which is good. And using two separate calls allows declaring the temporary used to store the results as constant.

And indeed today I started rewriting the functions so that temporaries are declared as constant as possible, and with the most limited scope as it’s applicable to theme. This was important to me for one reason: I want to try making use of OpenMP to improve its performance on modern multicore systems. Since most of the processing is applied independently to each pixel, it should be possible for many iterative cycles to be executed in parallel.

It would also be a major win in my book if the processing of input pages was feasible in parallel as well: my current scan script has to process the scanned sheets in parallel itself, calling many copies of unpaper, just to process the sheets faster (sometimes I scan tens of sheets, such as bank contracts and similar). I just wonder if it makes sense to simply start as many threads as possible, each one handling one sheet, or if that would risk to hog the scheduler.

Finally there is the problem of testing. Freddie also pointed me at the software I remembered to check the differences between two image files: pdiff — which is used by the ChromiumOS build process, by the way. Unfortunately I then remembered why I didn’t like it: it uses the FreeImage library, which bundles a number of other image format libraries, and upstream refuses to apply sane development to it.

What would be nice for this would be to either modify pdiff to use a different library – such as libav! – to access the image data, or to find or write something similar that does not require such stupidly-designed libraries.

Speaking about image formats, it would be interesting to get unpaper to support other image format beside PNM; this way you wouldn’t have to keep converting from and to the other formats when processing. One idea that Luca gave me was to make use of libav itself to handle that part: it already supports PNM, PNG, JPEG and TIFF, so it would provide most of the features it’d be needing.

In the mean time, please let me know if you like how this is doing — and remember that this blog, the post and me are Flattr enabled!