Tiny Tiny RSS: don’t support Nazi sympathisers

XKCD #1357 — Free Speech

After complaining about the lack of cache hits from feed readers, and figuring out why NewsBlur (that was doing the right thing), and then again fixing the problem, I started looking at what other readers kept being unfixed. It turned out that about a dozen people used to read my blog using Tiny Tiny RSS, a PHP-based personal feed reader for the web. I say “used to” because, as of 2017-08-17, TT-RSS is banned from accessing anything from my blog via ModSecurity rule.

The reason why I went to this extent is not merely technical, which is why you get the title of this blog the way it is. But it all started with me filing requests to support modern HTTP features for feeds, particularly regarding the semantics of permanent redirects, but also about the lack of If-Modified-Since, which allows significant reduction on the bandwidth usage of a blog1. Now, the first response I got about the permanent redirect request was disappointing but it was a technical answer, so I provided more information. After that?

After that the responses stopped being focused on the technical issues, but rather appear to be – and that’s not terribly surprising in FLOSS of course – “not my problem”. Except, the answers also came from someone with a Pepe the Frog avatar.2 And this is August of 2017, when America shown having a real Nazi problem, and willingly associating themselves to alt-right is effectively being Nazi sympathizers. The tone of the further answers also show that it is no mistake or misunderstanding.

You can read the two bugs here: and . Trigger warning: extreme right and ableist views ahead.

While I try to not spend too much time on political activism on my blog, there is a difference from debating whether universal basic income (or even universal health care) is a right nor not, and arguing for ethnic cleansing and the death of part of a population. So no, no way I’ll refrain from commenting or throwing a light on this kind of toxic behaviour from developers in the Free Software community. Particularly when they are not even holding these beliefs for themselves but effectively boasting them by using a loaded avatar on their official support forum.

So what you can do about this? If you get to read this post, and have subscribed to my blog through TT-RSS, you now know why you don’t get any updates from it. I would suggest you look for a new feed reader. I will as usual suggest NewsBlur, since its implementation is the best one out there. You can set it up by yourself, since it’s open source. Not only you will be cutting your support to Nazi sympathisers, but you also will save bandwidth for the web as a whole, by using a reader that actually implements the protocol correctly.

Update (2017-08-06): as pointed out in the comments by candrewswpi, FreshRSS is another option if you don’t want to set up NewsBlur (which admittedly may be a bit heavy). It uses PHP so it should be easier to migrate given the same or similar stack. It supports at least proper caching, but I’m not sure about the permanent redirects, it needs testing.

You could of course, as the developers said on those bugs, change the User-Agent string that TT-RSS reports, and keep using it to read my blog. But in that case, you’d be supporting Nazi sympathisers. If you don’t mind doing that, I may ask you a favour and stop reading my blog altogether. And maybe reconsider your life choices.

I’ll repeat here that the reason why I’m going to this extent is that there is a huge difference between the political opinions and debates that we can all have, and supporting Nazis. You don’t have to agree with my political point of view to read my blog, you don’t have to agree with me to talk with me or being my friend. But if you are a Nazi sympathiser, you can get lost.

  1. you could try to argue that in this day and age there is no point in worrying about bandwidth, but then you don’t get to ever complain about the existence of CDNs, or the fact that AMP and similar tools are “undemocratizing” the web.
  2. Update (2017-08-03): as many people have asked: no it’s not just any frog or any Pepe that automatically makes you a Nazi sympathisers. But the avatar was not one of the original illustrations, and the attitude of the commenter made it very clear what their “alignment” was. I mean, if they were fans of the original character, they would probably have the funeral scene as their avatar instead.

Fake “Free” Software hurts us all

Brian Krebs, the famous information security reporter, posted today (well, at the time of writing) an article on security issues with gSOAP library. I found this particularly interesting to me because I remembered seeing the name before. Indeed, Krebs links to the project’s SourceForge page, which is something to see. It has a multi-screen long list of “features”, including a significant number of variant and options for the protocol, which ends in the following:

Licenses: GPLv2, gSOAP public license (for engine and plugins), commercial non-GPL license available upon request (software is 100% in-house developed, no third-party GPL contributions included)

Ah, there we go. “Free” Software.

You may remember my post just a few days ago about the fact that Free Software to actually make a difference in society the way Stallman prophesizes needs a mediating agency, and at least right now that agency is companies and the free market. I argued that making your software usable by companies that provide services or devices is good, as it makes for healthy, usable and used projects, and increase competition and reduce costs of developing new solutions. So is gSOAP the counterexample? I really don’t think so. gSOAP is the perfect example for me that matches my previous rant about startups and fake communities.

The project at first look like the poster child for FSF-compatible software, since it’s licensed under GPL-2, and it clearly has no CLA (Contributor License Agreement), though the company provides a “way out” from GPL-2 obligations by buying a commercial license. This is not, generally speaking, a bad thing. I have seen many examples, including in the open multimedia cliques, of using this trick to foster the development of open solutions while making money to pay salary or build new Free Software companies.

But generally, those projects allow third party contributions with a CLA or similar agreement that allows the “main” copyright holders to still issue proprietary licenses, or enforce their own rights. You can for instance see what Google’s open source projects do about it. Among other things, this contribution method also disallows re-licensing the software, as that requires agreement from all copyright holders. In the case of gSOAP, that is not the case: as they say their software is «100% in-house developed».

They are very well proud of this situation, because it gives them all the power: if you want to use gSOAP without paying, you’re tied to the GPL, which may or may not become a compliance problem. And if you happen to violate the license, they have all the copyright to sue you or just ask you to settle. It’s a perfect situation for copyright trolls.

But, because of this, even though the software is on paper “Free Software” according to FSF, it’s a piece of proprietary software. Sure you can fork the library and build your own GPL-2 instead, as you have the freedom of fork, but that does not make it a community, or a real Free Software project. And it also means you can’t contribute patches to it to make it more secure, safer, or better for society. You could report bugs, including security bugs, but what’s the likeliness that you would actually care to do so, given that one of the first thing they make clear on their “project” page is that they are not interested in your contributions? And we clearly can see that the particular library could have used some care from the community, given its widespread adoption.

What does this mean to me, is that gSOAP is a clear example that just releasing something under GPL-2 is not enough to make it Free Software, and that even “Free” Software released under GPL-2 can be detrimental to society. And it also touches on the other topic I brought up recently, that is that you need to strike a balance between making code usable to companies (because they will use, and thus very likely help you extend or support your project) and keeping it as a real community or a real project. Clearly in this case the balance was totally off. If gSOAP was available with a more liberal license, even LGPL-2, they would probably lose a lot in license fees, as for most companies, just using this as a shared library would be enough. But it would then allow developers, both hobbyists and working for companies, to contribute fixes so that they trickle down on everybody’s device.

Since I do not know what the proprietary license that the company behind gSOAP requires their customers to agree with, I cannot say whether there is any incentive in said companies to provide fixes back to the project, but I assume if they were to do so, they wouldn’t be contributing them under GPL-2, clearly. What I can say is that for the companies I worked for in the past, actually receiving the library under GPL-2 and being able to contribute the fixes back would have been a better deal. The main reason is that as much as a library like this can be critical to connected devices, it does not by itself contain any of the business logic. And there are ways around linking GPL-2 code into the business logic application, that usually involve some kind of RPC between the logic and the frontend. And being able to contribute the changes back to the open source project would allow them to not have to maintain a long set of patches to sync release after release — I had the unfortunate experience of having to deal with something in that space before.

My advice, is once again, to try figuring out what you want to achieve by releasing a piece of software. Are you trying to make a statement, sticking it to the man, by forcing companies not to use your software? Are you trying to make money by forcing companies interested in using your work to buy a special license from you? Or are you contributing to Free Software because you want to improve society? In the latter case, I would suggest you consider a liberal license, to make sure that your code (that can be better than proprietary, closed-source software!) is actually available for those mediating agencies that transform geeky code into usable gadgets and services.

I know, it would be oh so nicer, if by just releasing something awesome under GPL-2 you’d force every company to release all their firmware as Free Software as well, but that’s not going to happen. Instead, if they feel they have to, they will look for worse alternatives, or build their own (even worse) alternatives, and keep them to themselves, and we’ll all be the poorer for it. So if in doubt, consider MIT or Apache 2 licenses. The latter in particular appears to be gaining more and more traction as both Google and Microsoft appear to be fond of the license, and Facebook’s alternative is tricky.

Some of you may consider me a bit of a hypocrite since I have released a number of projects under more restrictive Free licenses (including AGPL-3!) before I came to the conclusion that’s actually bad for society. Or at least before I came back to that idea, as I was convinced of that back in 2004, when I wrote (In Italian) of why I think MySQL is bad for Free Software (I should probably translate that post, just for reference). But what I decided is that I’ll try now to do my best to re-license the projects for which I own the full copyright under Apache 2 licenses. This may take a little while until I figure out all the right steps, but I feel is the right thing to do.

Free Idea: Free Software stack for audiobooks

This post is part of a series of free ideas that I’m posting on my blog in the hope that someone with more time can implement. It’s effectively a very sketched proposal that comes with no design attached, but if you have time you would like to spend learning something new, but no idea what to do, it may be a good fit for you.

This is clearly not a new idea, as I posted about something very similar over eight years ago. At the time I was looking for a way of encoding audibooks coming from audio CD in a format that was compatible with the iPod Classic. Since then, Apple appears to have done their best to make the audiobooks experience on iOS the worst possible, to the point that I don’t really use my iPod Touch as my primary audiobook player any more.

As an aside to the free idea, which can probably give a bit more context for you all, let me describe the problems I have with the current approach to audiobooks by Apple. A few iOS major versions ago, they decided to move the audiobooks handling from the Music app to the iBooks app; this would be reasonable, given that they are books, and it was always a bit strange to have them in a separate application, but it also meant you lost the ability to build playlists with them.

Playlists with audiobooks are great, because they allow you to “stitch” multiple books of the same series, so that you can play them for hours on end, for instance if you need them to sleep. I used to have a playlist for the Hitchhikers’ Guide to the Galaxy radio series and one for the books, one for Dresden Files, and one for the News Quiz, including both the collected editions in CD by BBC, my own “audiobooks” built out of the podcasts, and the more recent podcast episodes that I have not collected into audiobook files yet.

So what is the idea? There are two components that, as far as I can see, are currently heavily lacking in the FLOSS world. The first is a way to generate audiobook files, which is what I complained eight years ago. Indeed, if you look even at a random sample on Project Gutenberg, the audiobook is actually a ton of files (47!) each with a chapter in them. A proper audiobook file would be a single file, with chapter markers, and per-chapter metadata (chapter title, and in that case, the performer).

It’s more than just a matter of having a single file to move around. While of course the hardware improvements made a number of these points moot, the original reason to have a single big file over multiple small files was to avoid having to seek to a different point in the disk in-between chapters. It also allows the decoder to keep going, between chapters, as there is no “end of stream” but rather just a marker that at a given point in time some different metadata applies. Again, as I said this is no longer as relevant as it used to be, but it’s also not entirely gone.

The other component that is currently lacking, is a good playback solution. While VLC can obviously play those files right now, and if I’m not mistaken it also extracts the per-chapter metadata correctly, it lacks two features that make enjoying audiobooks possible. The first is possibly complicated, and relates to the ability to store bookmarks and current-playing time. While supposedly VLC supports the feature for resuming from last playback, I have heard it’s still sometimes unreliable (I have no idea how it’s implemented), plus it does not support just bookmarking a given time in a file/book. Bookmarking is particularly important when listening to non-novel audiobooks, as you may want to go back to it afterwards, to re-listen to advice or take a reference to further details.

The other feature is basically UI heavy, and it involves mostly the mobile UI (at least the Android one) and is the ability to scan backward and forward in the file. You have probably seen this in other players including Netflix’s own app, that allow you to scan back 30 seconds — in audiobooks it’s also useful to scan forward 30 seconds, particularly when considering the bookmarks above.

As usual for Free Ideas I have no time to work on this myself. I can give the idea details out, and depending on things I may be able to contribute to a bounty on it, but otherwise, no code I can share about this yet.

The overengineering of ALSA userland

This is a bit of an interesting corner case of a rant. I have not written this when I came up with it, because I came up with it many years ago when I actively worked on multimedia software, but I have only given it in person to a few people before, because at the time it would have gained too much unwanted attention by random people, the same kind of people who might have threatened me for removing XMMS out of Gentoo so many years ago. I have, though, spoken about this with at least one of the people working on PulseAudio at the time, and I have repeated this at the office a few times, while the topic came up.

For context you may want to read this rant from almost ten years ago by Mike Melanson, who was at the time working for Adobe on Flash Player for Linux. It’s a bit unfortunate that the drawings from the post are missing (but maybe Mike has a copy?) but the whole gist is that the Linux Audio API were already bloody confusing at the time, and this was before PulseAudio came along to stay. So where are we right now?

Well, the good news is that for the most part things got simpler: aRTs and ESounD are now completely gone, eradicated in the favour of PulseAudio, which is essentially the only currently used consumer sound daemon. Jack2 is still the standard for the pro-audio crowd, but even those people seem to have accepted that multimedia players are unlikely to care for it, and it should be limited to proaudio software. On the kernel driver side, the actually fairly important out-of-kernel drivers are effectively gone, in favour of development happening as a separate branch of the Linux kernel itself (GIT was not a thing at the time, oh how things have changed!) and OSS is effectively gone. I don’t even know if it’s available in the kernel, but the OSS4 fanboys have been quiet for long enough that I assume they gave up too.

ALSA itself hasn’t really changed much in all this time, either in the kernel or as userland. In the kernel, it got more complex for supporting things like jack sense, as HDA started supporting soft-switching between speaker and headphones output. In the userland, the plugins interface that was barely known before is now a requirement to properly use PulseAudio, both in Gentoo and in most other distributions. Which effectively makes my rant not only still relevant, but possibly more relevant. But before I go into details, I should take a step back and explain what the whole thing with userland and drivers is, with ALSA. I’ll try to simplify the history and the details, so if you know this very well you may notice I may skip some details, but nobody really cares that much about those.

The ALSA project was born back when Linux was in version 2.4 — and unlike today, that version was the version for a long time. Indeed up until version 3.0, a “minor” version would just be around forever; the migration from 2.4 to 2.6 was a massive amount of work and took distributions, developers and users alike a lot of coordination. In Linux 2.4, the audio drivers were based off the OSS interface, which essentially meant you had /dev/dspX and /dev/mixerX, and you were done — most of the time mixer0 matched a number of dspX devices, and most devices would have input and output capabilities, but that’s about all you knew. Access to the device was almost always exclusive to one process, except if the soundcard had multiple hardware mixer channels, in which case you could open the device multiple times. If you needed processes to share the device, your only option was to use a daemon such as the already named aRTs or ESounD. The ALSA project aimed to replace the OSS interface (that by then became a piece of proprietary software in its newer versions) with a new, improved interface in the following “minor” version (2.5, which stabilized as 2.6), as well as on the old one through additional kernel modules — the major drawback from my point of view, is that this new interface became Linux-specific, while OSS has been (and is) supported by most of the BSDs as well. But, sometimes you have to do this anyway.

The ALSA approach provides a much more complex device API, but mostly for good reason, because sound cards are (or were) a complex interface, and are not consistent among themselves at all. To make things simpler to application developers who previously only had to use open() and similar functions, ALSA provided an userland library, provided in a package called alsa-lib, but more often known as its filename: libasound. While the interface of the library is not simple either, it does provide a bit of wrapping around the otherwise very low-level APIs. It also abstracts some of the problems away of figuring out which cards are present and which mixer refers to which. The project also provided a number of tools and utilities to configure the devices, query for information or playback raw sound — and even a wrapper for applications implementing OSS access only, in the form of a preloadable library catching accesses to /dev/dsp to convert them to ALSA API calls — not different from the similar utilities provided by arts, esd or PulseAudio.

In the original ALSA model, access to the device was still limited to one process per channel, but as soundcards with more than one hardware channel became quickly obsolete (particularly as soundcard kind-of standardized over AC’97, then HDA) the need for sharing access arose again, and since both arts and esd had their limits (and PulseAudio was far from ready), the dmix interface arrived — in this setup, the first process opening the device would actually have access, as well as set up a shared memory area for other processes to provide their audio, which then would be mixed together in userland, particularly in the process space of the first process opening the device. This had all sorts of problems, particularly when sharing across users, or when sharing with processes that only used sound for a limited amount of time.

What dmix actually used was the ability of ALSA to provide “virtual” devices, which can be configured for alsa-lib to see. Another feature that got more spotlight thanks to the lowering of featureset in soundcards, particularly with the HDA standard, is the ability to provide plugins to extend the functionality of alsa-lib — for a while the most important one was clearly the libsamplerate-based resampling plugin which almost ten years ago was the only way to provide non-crackling sound out of an HDA soundcard. These plugins included other features, such as a plugin providing a virtual device for encoding to Dolby AC3 so that you could us S/PDIF pass-through to a surround decoder. Nowadays, the really important plugin is the one PulseAudio one, which allows any ALSA-compatible application to talk to PulseAudio, by configuring a default virtual device.

Okay now that the history lesson is complete, let me see to write down what I think is a problem with our current, modern setup. I’ll exclude in my discussion proaudio workstations, as these have clearly different requirements from “mainstream” and most likely would still argue (from a different point) that the current setup is overengineered. I’ll also exclude most embedded devices, including Android, since I don’t think PA ever won over the phone manufacturers outside of Nokia — although I would expect that a lot of them actually do rely on PulseAudio a bit and so the discussion would apply.

In a current Linux desktop, your multimedia applications end up falling into two main categories: those that implement PulseAudio support and those that implement ALSA support. They may use some wrapper library such as SDL, but at the end of the day, these are the two APIs that allow you to output sound on modern Linux. A few rare cases of (proprietary, probably) apps implementing OSS can be ignored, as they would either then use aoss or padsp to preload the right library to provide support to whichever stack you prefer. Whichever distribution you’re using all of these two classes of apps are extremely likely to be going out of your speaker through PulseAudio. If the app only support ALSA, the distribution is likely providing a configuration file so that the default ALSA device is a virtual device pointing at the PulseAudio plugin.

When the app talks to PulseAudio directly, it’ll use its API through the client library, that then IPCs through its custom protocol to the PulseAudio Daemon, which will then use alsa-lib through its API, ignoring all the virtual devices configured, which in turn will talk with the kernel drivers through its device files. It’s a bit different for Bluetooth devices, but you get the gist. This at first sight should sound just fine.

If you look at an app that only supports ALSA interfaces, it’ll use the alsa-lib API to talk to the default device, which uses the PulseAudio client library to IPC to the PulseAudio daemon, and so as above. In this case you have alsa-lib on both sides: the source application and the sink daemon. So what am I complaining about? Well here is the thing: the parts of ALSA that the media application uses versus the parts of ALSA that the PulseAudio daemon uses are almost entirely distinct: one only provides access to the virtual devices configured, and the other only gives access to the raw hardware. The fact that they share the API is barely a matter, in my opinion.

From my point of view, what would be a better solution would be for libasound to be provided by PulseAudio directly, implements a subset of ALSA API, that either show the devices as the sinks configured in PulseAudio or, PA wants to maintain the stream/sink abstraction itself, just a single device that is PulseAudio. No configuration files, no virtual devices, no plugins whatsoever, but if the application is supporting ALSA, it gets automatically promoted to PulseAudio. Then on the daemon side, PulseAudio can either fork alsa-lib, or have alsa-lib provide a simpler library, that only provides access to the hardware devices, and removes support for configuration files and plugins (after all PulseAudio already has its own module system.) Last I heard, there actually is an embedded version of libasound that implements only the minimal amount of features needed to access a sound device through ALSA. This not only should reduce the amount of “code at play” (pardon the pun), but also reduce the chance that you can misconfigure ALSA to do the wrong thing.

Misconfiguring ALSA is probably the most common reason for your sound not working the way you expect on Linux — the configuration files and options, defaults and so on kept changing, and since ten years ago things are so different that you’re likely to find very bad, old advise out there. And it’s not always clear not to follow it. And for instance for the longest time Adobe Flash, thinking of doing the right thing, would not actually abide to the default ALSA configuration, and rather try to access the hardware device itself (mostly because of nasty bugs with dmix), which meant that PulseAudio wouldn’t be able to access it anymore itself. The quickly sketched architecture above would solve that problem, as the application would not actually be able to tell the difference between the hardware device and the PulseAudio virtual device — the former would just not be there!

And just to close up my ALSA rant, I would like to remember you all, that alsa-lib still comes with its own LISP interpreter: the ALISP dialect was meant to provide even more configurability to the sound access interface, and most distributions, as far as I know, still have it enabled. Gentoo provides a (default-off) alisp USE flag, so you’re at least spared that part in most cases.

On the conference circuit

You may remember that I used not to be a fan of travel, and that for a while I was absolutely scared by the idea of flying. This has clearly not been the case in a while, given that I’ve been working for US companies and traveling a lot of the time.

One of the side effects of this is that I enjoy the “conference circuit”, to the point that I’m currently visiting three to four conferences a year, some of which for VideoLAN and others for work, and in a few cases for nothing in particular. This is an interesting way to keep in touch with what’s going on in the community and in the corporate world out there.

Sometimes, though, I wish I had more energy and skills to push through my ideas. I find it curious how nowadays it’s all about Docker and containers, while I jumped on the LXC bandwagon quite some time ago thanks to Tiziano, and because of that need I made Gentoo a very container-friendly distribution from early on. Similarly, O’Reilly now has a booklet on static site generators which describe things not too far from what I’ve been doing since at least 2006 for my website, and for xine’s later on. Maybe if I wasn’t at the time so afraid of traveling I would have had more impact on this, but I guess (to use a flying metaphor) I lost my slot there.

To focus bit more on SCaLE14x in particular, and especially about Cory Doctorow’s opening keynote, I have to say tht the conference is again a good load of fun. Admittedly I rarely manage to go listening to talks, but the amount of people going in and out of the expo floor, and the random conversation struck there are always useful.

In the case of Doctorow’s keynote, while he’s (as many) a bit too convinced, in my opinion, that he has most if not all the answers, his final argument was a positive one: don’t try to be “pure” (as FSF would like you to be), instead hedge your bets by contributing (time, energy, money) to organizations and projects that work towards increasing your freedom. I’ve been pleasantly surprised to hear Cory name, earlier in that talk, VLC and Handbrake — although part of the cotnext in which he namechecked us is likely going to be a topic for a different post, once I have something figured out.

My current trip brings me to San Francisco tonight, for Enigma 2016, and on this note I would like to remember to conferencegoers that, while most of us are aiming for a friendly and relaxed atmosphere, there is some opsec you should be looking into. I don’t have a designated conference laptop (just yet, I might get myself a Chromebook for it) but I do have at least a privacy screen. I’ve seen more than a couple corp email interfaces running on laptops while walking the expo floor this time.

Finally, I need to thank TweetDeck for their webapp. The ability to monitor hashtags, and particularly multiple hashtags from the same view is gorgeous when you’re doing back-to-back conferences (#scale14x, #enigma2016, #fosdem.) I know at least one of them is reading, so, thanks!

Report from SCaLE13x

This year I have not been able to visit FOSDEM. Funnily enough this confirms the trend of me visiting FOSDEM only on even-numbered years, as I previously skipped 2013 as I was just out for my first and only job interview, and 2011 because of contract related timing. Since I still care for going to an open source conference early in the year, I opted instead for SCaLE, the timing of which fit perfectly my trip to Mountain View. It also allowed me to walk through Hermosa Beach once again.

So Los Angeles again it was, which meant I was able to meet with a few Gentoo developers, a few VideoLAN developers who also came all the way from Europe, and many friends who I have met at various previous conferences. It is funny how I end up meeting some people more often through conferences than I meet my close friends from back in Italy. I guess this is the life of the frequent travelers.

While my presence at SCaLE was mostly a way to meet some of the Gentoo devs that I had not met before, and see Hugo and Ludovic from VideoLAN who I missed at the past two meetings, I did pay some attention to the talks — I wish I could have had enough energy to go to more of them, but I was coming from three weeks straight of training, during which I sat for at least two hours a day in a room listening to talks on various technologies and projects… doing that in the free time too sounded like a bad idea.

What I found intriguing in the program, and in at least one of the talks I was able to attend, was that I could find at least a few topics that I wrote about in the past. Not only now containers are now all the rage, through Docker and other plumbing, but there was also a talk about static site generators, of which I wrote in 2009 and I’ve been using for much longer than that, out of necessity.

All in all, it was a fun conference and meeting my usual conference friends and colleagues is a great thing. And meeting the other Gentoo devs is what sparked my designs around TG4 which is good.

I would like to also thank James for suggesting me to use Tweetdeck during conferences, as it was definitely nicer to be able to keep track of what happened on the hashtag as well as the direct interactions and my personal stream. If you’re the occasional conferencegoer you probably want to look into it yourself. It also is the most decent way to look at Twitter during a conference on a tablet, as it does not require you to jump around between search pages and interactions (on a PC you can at least keep multiple tabs open easily.)

When (multimedia) fiefdoms crumble

Mike coined the term multimedia fiefdoms recently. He points to a number of different streaming, purchase and rental services for video content (movies, TV series) as the new battleground for users (consumers in this case). There are of course a few more sides in this battle, including music and books, but the idea is still perfectly valid.

What he didn’t get into the details of is what happens one of those fiefdoms capitulates, declaring itself won over, and goes away. It’s not a fun situation to be in, but we actually have plenty of examples of it, and these, more than anything else, should drive the discourse around and against DRM, in my opinion.

For some reasons, the main example of failed fiefdoms is to be found in books, and I lived through (and recounted) a few of those instances. For me personally, it all started four years ago, when I discovered Sony gave up on their LRF format and decided to adopt the “industry standard” ePub by supporting Adobe Digital Editions (ADEPT) DRM scheme on their devices. I was slow on the uptake, the announcement came two years earlier. For Sony, this meant tearing down their walled garden, even though they kept supporting the LRF format and their store for a while – they may even do still, I stopped following two years ago when I moved onto a Kindle – for the user it meant they were now free to buy books from a number of stores, including some publishers, bookstores with online presence and dedicated ebookstores.

But things didn’t always go smoothly: two years later, WHSmith partnered with Kobo, and essentially handed the latter all their online ebook market. When I read the announcement I was actually happy, especially since I could not buy books off WHSmith any more as they started looking for UK billing addresses. Unfortunately it also meant that only a third of the books that I bought from WHSmith were going to be ported over to Kobo due to an extreme cock-up with global rights even to digital books. If I did not go and break the DRM off all my ebooks for the sake of it, I would have lost four books, having to buy them anew again. Given this was not for the seller going bankrupt but for a sell-out of their customers, it was not understandable that they refused to compensate people. Luckily, it did port The Gone-Away World which is one of my favourite books.

Fast forward another year, and the Italian bookstore LaFeltrinelli decided to go the same way, with a major exception: they decided they would keep users on both platforms — that way if you want to buy a digital version of a book you’ll still buy it on the same website, but it’ll be provided by Kobo and in your Kobo library. And it seems like they at least have a better deal regarding books’ rights, as they seemed to have ported over most books anyway. But of course it did not work out as well as it should have been, throwing an error in my face and forcing me to call up Kobo (Italy) to have my accounts connected and the books ported.

The same year, I end up buying a Samsung Galaxy Note 10.1 2014 Edition, which is a pretty good tablet and has a great digitizer. Samsung ships Google Play in full (Store, Movies, Music, Books) but at the same time install its own App, Video, Music and Book store apps, it’s not surprising. But it does not take six months for them to decide that it’s not their greatest idea, in May this year, Samsung announced the turn down of their Music and Books stores — outside of South Korea at least. In this case there is no handover of the content to other providers, so any content bought on those platforms is just gone.

Not completely in vain; if you still have access to a Samsung device (and if you don’t, well, you had no access to the content anyway), a different kind of almost-compensation kicks in: the Korean company partnered with Amazon of all bookstores — surprising given that they are behind the new “Nook Tablet” by Barnes & Noble. Beside a branded «Kindle for Samsung» app, they provide one out of a choice of four books every month — the books are taken from Amazon’s KDP Select pool as far as I can tell, which is the same pool used as a base for the Kindle Owners’ Lending Library and the Kindle Unlimited offerings; they are not great but some of them are enjoyable enough. Amazon is also keeping honest and does not force you to read the books on your Samsung device — I indeed prefer reading from my Kindle.

Now the question is: how do you loop back all this to multimedia? Sure books are entertaining but they are by definition a single media, unless you refer to the Kindle Edition of American Gods. Well, for me it’s still the same problem of fiefdoms that Mike referred to; indeed every store used to be a walled garden for a long while, then Adobe came and conquered most with ePub and ADEPT — but then between Apple and their iBooks (which uses its own, incompatible DRM), and Amazon with the Kindle, the walls started crumbling down. Nowadays plenty of publishers allow you to buy the book, in ePub and usually many other formats at the same time, without DRM, because the publishers don’t care which device you want to read your book on (a Kindle, a Kobo, a Nook, an iPad, a Sony Reader, an Android tablet …), they only want for you to read the book, and get hooked, and buy more books.

Somehow the same does not seem to work for video content, although it did work to an extent, for a while at least, with music. But this is a different topic.

The reason why I’m posting this right now is that just today I got an email from Samsung that they are turning down their video store too — now their “Samsung Hub” platform gets to only push you games and apps, unless you happen to live in South Korea. It’s interesting to see how the battles between giants is causing small players to just get off the playing fields… but at the same time they bring their toys with them.

Once again, there is no compensation; if you rented something, watch it by the end of the year, if you bought something, sorry, you won’t be able to access it after new year. It’s a tough world. There is a lesson, somewhere, to be learnt about this.

Small differences don’t matter (to unpaper)

After my challenge with the fused multiply-add instructions I managed to find some time to write a new test utility. It’s written ad hoc for unpaper but it can probably be used for other things too. It’s trivial and stupid but it got the job done.

What it does is simple: it loads both a golden and a result image files, compares the size and format, and then goes through all the bytes to identify how many differences are there between them. If less than 0.1% of the image surface changed, it consider the test a pass.

It’s not a particularly nice system, especially as it requires me to bundle some 180MB of golden files (they compress to just about 10 MB so it’s not a big deal), but it’s a strict improvement compared to what I had before, which is good.

This change actually allowed me to explore one change that I abandoned before because it resulted in non-pixel-perfect results. In particular, unpaper now uses single-precision floating points all over, rather than doubles. This is because the slight imperfection caused by this change are not relevant enough to warrant the ever-so-slight loss in performance due to the bigger variables.

But even up to here, there is very little gain in performance. Sure some calculation can be faster this way, but we’re still using the same set of AVX/FMA instructions. This is unfortunate, unless you start rewriting the algorithms used for searching for edges or rotations, there is no gain to be made by changing the size of the code. When I converted unpaper to use libavcodec, I decided to make the code simple and as stupid as I could make it, as that meant I could have a baseline to improve from, but I’m not sure what the best way to improve it is, now.

I still have a branch that uses OpenMP for the processing, but since most of the filters applied are dependent on each other it does not work very well. Per-row processing gets slightly better results but they are really minimal as well. I think the most interesting parallel processing low-hanging fruit would be to execute processing in parallel on the two pages after splitting them from a single sheet of paper. Unfortunately, the loops used to do that processing right now are so complicated that I’m not looking forward to touch them for a long while.

I tried some basic profile-guided optimization execution, just to figure out what needs to be improved, and compared with codiff a proper release and a PGO version trained after the tests. Unfortunately the results are a bit vague and it means I’ll probably have to profile it properly if I want to get data out of it. If you’re curious here is the output when using rbelf-size -D on the unpaper binary when built normally, with profile-guided optimisation, with link-time optimisation, and with both profile-guided and link-time optimisation:

% rbelf-size -D ../release/unpaper ../release-pgo/unpaper ../release-lto/unpaper ../release-lto-pgo/unpaper
    exec         data       rodata        relro          bss     overhead    allocated   filename
   34951         1396        22284            0        11072         3196        72899   ../release/unpaper
   +5648         +312         -192           +0         +160           -6        +5922   ../release-pgo/unpaper
    -272           +0        -1364           +0         +144          -55        -1547   ../release-lto/unpaper
   +7424         +448        -1596           +0         +304          -61        +6519   ../release-lto-pgo/unpaper

It’s unfortunate that GCC does not give you any diagnostic on what it’s trying to do achieve when doing LTO, it would be interesting to see if you could steer the compiler to produce better code without it as well.

Anyway, enough with the microptimisations for now. If you want to make unpaper faster, feel free to send me pull requests for it, I’ll be glad to take a look at them!

The subtlety of modern CPUs, or the search for the phantom bug

Yesterday I have released a new version of unpaper which is now in Portage, even though is dependencies are not exactly straightforward after making it use libav. But when I packaged it, I realized that the tests were failing — but I have been sure to run the tests all the time while making changes to make sure not to break the algorithms which (as you may remember) I have not designed or written — I don’t really have enough math to figure out what’s going on with them. I was able to simplify a few things but I needed Luca’s help for the most part.

Turned out that the problem only happened when building with -O2 -march=native so I decided to restrict tests and look into it in the morning again. Indeed, on Excelsior, using -march=native would cause it to fail, but on my laptop (where I have been running the test after every single commit), it would not fail. Why? Furthermore, Luca was also reporting test failures on his laptop with OSX and clang, but I had not tested there to begin with.

A quick inspection of one of the failing tests’ outputs with vbindiff showed that the diffs would be quite minimal, one bit off at some non-obvious interval. It smelled like a very minimal change. After complaining on G+, Måns pushed me to the right direction: some instruction set that differs between the two.

My laptop uses the core-avx-i arch, while the server uses bdver1. They have different levels of SSE4 support – AMD having their own SSE4a implementation – and different extensions. I should probably have paid more attention here and noticed how the Bulldozer has FMA4 instructions, but I did not, it’ll show important later.

I decided to start disabling extensions in alphabetical order, mostly expecting the problem to be in AMD’s implementation of some instructions pending some microcode update. When I disabled AVX, the problem went away — AVX has essentially a new encoding of instructions, so enabling AVX causes all the instructions otherwise present in SSE to be re-encoded, and is a dependency for FMA4 instructions to be usable.

The problem was reducing the code enough to be able to figure out if the problem was a bug in the code, in the compiler, in the CPU or just in the assumptions. Given that unpaper is over five thousands lines of code and comments, I needed to reduce it a lot. Luckily, there are ways around it.

The first step is to look in which part of the code the problem appears. Luckily unpaper is designed with a bunch of functions that run one after the other. I started disabling filters and masks and I was able to limit the problem to the deskewing code — which is when most of the problems happened before.

But even the deskewing code is a lot — and it depends on at least some part of the general processing to be run, including loading the file and converting it to an AVFrame structure. I decided to try to reduce the code to a standalone unit calling into the full deskewing code. But when I copied over and looked at how much code was involved, between the skew detection and the actual rotation, it was still a lot. I decided to start looking with gdb to figure out which of the two halves was misbehaving.

The interface between the two halves is well-defined: the first return the detected skew, and the latter takes the rotation to apply (the negative value to what the first returned) and the image to apply it to. It’s easy. A quick look through gdb on the call to rotate() in both a working and failing setup told me that the returned value from the first half matched perfectly, this is great because it meant that the surface to inspect was heavily reduced.

Since I did not want to have to test all the code to load the file from disk and decode it into a RAW representation, I looked into the gdb manual and found the dump commands that allows you to dump part of the process’s memory into a file. I dumped the AVFrame::data content, and decided to use that as an input. At first I decided to just compile it into the binary (you only need to use xxd -i to generate C code that declares the whole binary file as a byte array) but it turns out that GCC is not designed to compile efficiently a 17MB binary blob passed in as a byte array. I then opted in for just opening the raw binary file and fread() it into the AVFrame object.

My original plan involved using creduce to find the minimal set of code needed to trigger the problem, but it was tricky, especially when trying to match a complete file output to the md5. I decided to proceed with the reduction manually, starting from all the conditional for pixel formats that were not exercised… and then I realized that I could split again the code in two operations. Indeed while the main interface is only rotate(), there were two logical parts of the code in use, one translating the coordinates before-and-after the rotation, and the interpolation code that would read the old pixels and write the new ones. This latter part also depended on all the code to set the pixel in place starting from its components.

By writing as output the calls to the interpolation function, I was able to restrict the issue to the coordinate translation code, rather than the interpolation one, which made it much better: the reduced test case went down to a handful of lines:

void rotate(const float radians, AVFrame *source, AVFrame *target) {
    const int w = source->width;
    const int h = source->height;

    // create 2D rotation matrix
    const float sinval = sinf(radians);
    const float cosval = cosf(radians);
    const float midX = w / 2.0f;
    const float midY = h / 2.0f;

    for (int y = 0; y < h; y++) {
        for (int x = 0; x < w; x++) {
            const float srcX = midX + (x - midX) * cosval + (y - midY) * sinval;
            const float srcY = midY + (y - midY) * cosval - (x - midX) * sinval;
            externalCall(srcX, srcY);

Here externalCall being a simple function to extrapolate the values, the only thing it does is printing them on the standard error stream. In this version there is still reference to the input and output AVFrame objects, but as you can notice there is no usage of them, which means that now the testcase is self-contained and does not require any input or output file.

Much better but still too much code to go through. The inner loop over x was simple to remove, just hardwire it to zero and the compiler still was able to reproduce the problem, but if I hardwired y to zero, then the compiler would trigger constant propagation and just pre-calculate the right value, whether or not AVX was in use.

At this point I was able to execute creduce; I only needed to check for the first line of the output to match the “incorrect” version, and no input was requested (the radians value was fixed). Unfortunately it turns out that using creduce with loops is not a great idea, because it is well possible for it to reduce away the y++ statement or the y < h comparison for exit, and then you’re in trouble. Indeed it got stuck multiple times in infinite loops on my code.

But it did help a little bit to simplify the calculation. And with again a lot of help by Måns on making sure that the sinf()/cosf() functions would not return different values – they don’t, also they are actually collapsed by the compiler to a single call to sincosf(), so you don’t have to write ugly code to leverage it! – I brought down the code to

extern void externCall(float);
extern float sinrotation();
extern float cosrotation();

static const float midX = 850.5f;
static const float midY = 1753.5f;

void main() {
    const float srcX = midX * cosrotation() - midY * sinrotation();

No external libraries, not even libm. The external functions are in a separate source file, and beside providing fixed values for sine and cosine, the externCall() function only calls printf() with the provided value. Oh if you’re curious, the radians parameter became 0.6f, because 0, 1 and 0.5 would not trigger the behaviour, but 0.6 which is the truncated version of the actual parameter coming from the test file, would.

Checking the generated assembly code for the function then pointed out the problem, at least to Måns who actually knows Intel assembly. Here follows a diff of the code above, built with -march=bdver1 and with -march=bdver1 -mno-fma4 — because turns out the instruction causing the problem is not an AVX one but an FMA4, more on that after the diff.

        movq    -8(%rbp), %rax
        xorq    %fs:40, %rax
        jne     .L6
-       vmovss  -20(%rbp), %xmm2
-       vmulss  .LC1(%rip), %xmm0, %xmm0
-       vmulss  .LC0(%rip), %xmm2, %xmm1
+       vmulss  .LC1(%rip), %xmm0, %xmm0
+       vmovss  -20(%rbp), %xmm1
+       vfmsubss        %xmm0, .LC0(%rip), %xmm1, %xmm0
        .cfi_def_cfa 7, 8
-       vsubss  %xmm0, %xmm1, %xmm0
        jmp     externCall@PLT

It’s interesting that it’s changing the order of the instructions as well, as well as the constants — for this diff I have manually swapped .LC0 and .LC1 on one side of the diff, as they would just end up with different names due to instruction ordering.

As you can see, the FMA4 version has one instruction less: vfmsubss replaces both one of the vmulss and the one vsubss instruction. vfmsubss is a FMA4 instruction that performs a Fused Multiply and Subtract operation — midX * cosrotation() - midY * sinrotation() indeed has a multiply and subtract!

Originally, since I was disabling the whole AVX instruction set, all the vmulss instructions would end up replaced by mulss which is the SSE version of the same instruction. But when I realized that the missing correspondence was vfmsubss and I googled for it, it was obvious that FMA4 was the culprit, not the whole AVX.

Great, but how does that explain the failure on Luca’s laptop? He’s not so crazy so use an AMD laptop — nobody would be! Well, turns out that Intel also have their Fused Multiply-Add instruction set, just only with three operands rather than four, starting from Haswell CPUs, which include… Luca’s laptop. A quick check on my NUC which also has a Haswell CPU confirms that the problem exists also for the core-avx2 architecture, even though the code diff is slightly less obvious:

        movq    -24(%rbp), %rax
        xorq    %fs:40, %rax
        jne     .L6
-       vmulss  .LC1(%rip), %xmm0, %xmm0
-       vmovd   %ebx, %xmm2
-       vmulss  .LC0(%rip), %xmm2, %xmm1
+       vmulss  .LC1(%rip), %xmm0, %xmm0
+       vmovd   %ebx, %xmm1
+       vfmsub132ss     .LC0(%rip), %xmm0, %xmm1
        addq    $24, %rsp
+       vmovaps %xmm1, %xmm0
        popq    %rbx
-       vsubss  %xmm0, %xmm1, %xmm0
        popq    %rbp
        .cfi_def_cfa 7, 8

Once again I swapped .LC0 and .LC1 afterwards for consistency.

The main difference here is that the instruction for fused multiply-subtract is vfmsub132ss and a vmovaps is involved as well. If I read the documentation correctly this is because it stores the result in %xmm1 but needs to move it to %xmm0 to pass it to the external function. I’m not enough of an expert to tell whether gcc is doing extra work here.

So why is this instruction causing problems? Well, Måns knew and pointed out that the result is now more precise, thus I should not work it around. Wikipedia, as linked before, points also out why this happens:

A fused multiply–add is a floating-point multiply–add operation performed in one step, with a single rounding. That is, where an unfused multiply–add would compute the product b×c, round it to N significant bits, add the result to a, and round back to N significant bits, a fused multiply–add would compute the entire sum a+b×c to its full precision before rounding the final result down to N significant bits.

Unfortunately this does mean that we can’t have bitexactness of images for CPUs that implement fused operations. Which means my current test harness is not good, as it compares the MD5 of the output with the golden output from the original test. My probable next move is to use cmp to count how many bytes differ from the “golden” output (the version without optimisations in use), and if the number is low, like less than 1‰, accept it as valid. It’s probably not ideal and could lead to further variation in output, but it might be a good start.

Optimally, as I said a long time ago I would like to use a tool like pdiff to tell whether there is actual changes in the pixels, and identify things like 1-pixel translation to any direction, which would be harmless… but until I can figure something out, it’ll be an imperfect testsuite anyway.

A huge thanks to Måns for the immense help, without him I wouldn’t have figured it out so quickly.


This past weekend I had the honor of hosting the VideoLAN Dev Days 2014 in Dublin, in the headquarters of my employer. This is the first time I organize a conference (or rather help organize it, Audrey and our staff did most of the heavy lifting), and I made a number of mistakes, but I think I can learn from them and be better the next time I’ll try something like this.

Photo credit: me

Organizing an event in Dublin has some interesting and not-obvious drawbacks, one of which is the need for a proper visa for people who reside in Europe but are not EEA citizens, thanks to the fact that Ireland is not part of Schengen. I was expecting at least UK residents not to need any scrutiny, but Derek proved me wrong as he had to get an (easy) visa at entrance.

Getting just shy of a hundred people in a city like Dublin, which is by far not a metropolis like Paris or London would be is an interesting exercise, yes we had the space for the conference itself, but finding hotels and restaurants for the amount of people became tricky. A very positive shout out is due to Yamamori Sushi that hosted the whole of us without a fixed menu and without a hitch.

As usual, meeting in person with the people you work with in open source is a perfect way to improve collaboration — knowing how people behave face to face makes it easier to understand their behaviour online, which is especially useful if the attitudes can be a bit grating online. And given that many people, including me, are known as proponent of Troll-Driven Development – or Rant-Driven Development given that people like Anon, redditors and 4channers have given an even worse connotation to Troll – it’s really a requirement, if you are really interested to be part of the community.

This time around, I was even able to stop myself from gathering too much swag! I decided not to pick up a hoodie, and leave it to people who would actually use it, although I did pick up a Gandi VLC shirt. I hope I’ll be able to do that at LISA as I’m bound there too, and last year I came back with way too many shirts and other swag.