More HTTP misbehaviours

Flameeyes

12 years ago

Today I have been having some fun: while looking at the backlog on IRCCloud, I found out that it auto-linked Makefile.am which I prompty decided to register it with Gandi — unfortunately I couldn’t get Makefile.in or configure.ac as they are both already registered. After that I decided to set up Google Analytics to report how many referrer arrive to my websites through some of the many vanity domains I registered over time.

After doing that, I spent some time staring at the web server logs to make sure that everything was okay, and I found out some more interesting things: it looks like a lot of people have been fetching my blog Atom feed through very bad feed readers. This is the reification of my forecast last year when Google Reader got shut down.

Some of the fetchers are open source, so I ended up opening issues for them, but that is not the case for all of them. And even when they are open source, sometimes they don’t even accept pull requests implementing the feature, for whichever reason.

So this post is a bit of a name-and-shame, which can be positive for open-source projects when they can fix things, or negative for closed source services that are trying to replace Google Reader and failing to implement HTTP properly. It will also serve as a warning for my readers from those services, as they’ll stop being able to fetch my feed pretty soon, as I’ll update my ModSecurity rules to stop these people from fetching my blog.

As I noted above, both Stringer and Feedbin fail to properly use compressed responses (gzip compression), which means that they fetch over 90KiB every turn instead of just 25KiB. The Stringer devs already reacted and seem to be looking into fixing this very soon now. Feedbin I have no answer from yet (but it’s pretty soon anyway), but it worries me for another reason too: it does not do any caching at all. And somebody set up a Feedbin instance in the Prague University that fetches my feed, without compression, without caching, every two minutes. I’m going to soon blacklist it.

Gwene still has not replied to the pull request I sent in October 2012, but on the bright side, it has not fetched my blog since a long time ago. Feedzirra (now Feedjira) used by IFTTT still does not enable compressed responses by default, even though it seems to support the option (Stringer is also based on it, it seems).

It’s not just plain feed readers that fail at implementing HTTP. Distributed social network Friendica – that aims at doing a better job than Diaspora at that – seem also to forget about implementing either compressed responses or caching. At least it seems to only fetch my feed every twelve hours. On the other hand, it seems to also get someone’s timeline from Twitter, so when it encounters a link to my blog it first send a HEAD request, and then fetches the page. Three times. Also uncompressed.

On the side of non-open-source services, FeedWrangler has probably one of the worst implementations of HTTP I’ve ever seen: it does not support compressed responses (90KiB feed), does not do caching (every time!), and while it would fetch at one hour intervals, it does not understand that a 301 is a permanent redirection, and there’s no point in keeping around two feed IDs for /articles.rss and /articles.atom (each with one subscriber). That’s 4MiB a day, which is around 2% of the bandwidth my website serves, over a day. While this is not an important amount, and I don’t have limitation on the server’s egress, it seems silly that 2% of my bandwidth is consumed on two subscribers, when the site has over a thousand visitors a day.

But what takes the biscuit is definitely FeedMyInbox: while it fetches only every six hours, it implements neither caching nor compression. And I found it only when looking into the requests coming from bots without a User-Agent header. The requests come from 216.198.247.46 which is svr.feedmyinbox.com. I’m soon also going to blacklist this until they stop being douches and provide a valid user agent string.

They are by far not the only ones though; there is another bot that fetches my feed every three hours that will soon follow the same destiny. But this does not have an obvious service attached to it, so if whatever you’re using to read my blog tells you it can’t fetch my blog anymore, try to figure out if you’re using a douchereader.

Please remember that software on the net should be implemented for collaboration between client and server, not for exploitation. Everybody’s bandwidth gets worse when you heavily use a service that is not doing its job at optimizing bandwidth usage.

Share this: