Four years ago, I wrote a musing about the negative effects of the Google Reader shutdown on content publishers. Today I can definitely confirm that some of the problems I foretold materialized. Indeed, thanks to the fortuitous fact that people have started posting my blog articles to reddit and hacker news (neither of which I’m fond of, but let’s leave that aside), I can declare that the vast majority of the bandwidth used by my blog is consumed by bots and in particular by feed readers. But let’s start from the start.
This blog is syndicated over a feed, the URL and format of which changed a number of times before, mostly with the software, or with the update of the software. The most recent change was due to switching from Typo to Hugo, and the feed name changing. I could have kept the original feed name, but it made little sense at the time, so instead I set up permanent redirects from the old URLs to the new URLs, as I always do. I say I always do because I keep working even the original URLs from when I ran the blog off my home DSL.
Some services and feed reading software know how to deal with permanent redirects correctly, and will (eventually) replace the old feed URL with the new one. For instance NewsBlur will replace URLs after ten fetches replied with a permanent redirect (which is sensible, to avoid accepting a redirection that was set up by mistake and soon rolled back, and to avoid data poisoning attacks). Unfortunately, it seems like this behaviour is extremely rare, and so on August 14th I received over three thousands requests for the old Typo feed URL (admittedly, that was the most persistent URL I used). In addition to that, I also received over 300 requests for the very old Typo
/xml/ feeds, of which 122 still pointing at my old dynamic domain, which is now pointing at the normal domain for my blog. This has been the case now for almost ten years, and yet some people still have subscription to those URLs! At least one Liferea and one Akregator pointing at those URLs.
But while NewsBlur implements sane semantics for handling permanent redirects, it is far from a perfect implementation. In particular even though I have brought this up many times, Newsblur is not actually sending
If-None-Match headers, which means it will take a copy of the feed at every request. Even though it does support compressed responses (non fetch of the feed is allowed without compressed responses), NewsBlur is requesting the same URL more than twice an hour, because it seems to have two “sites” described by the same URL. At 50KiB per request, that makes up about 1% of the total bandwidth usage of the blog. To be fair, this is not bad at all, but one has to wonder why they can’t be saving the last modified or etag values — I guess I could install my own instance of NewsBlur and figure out how to do that myself, but who knows when I would find the time for that.
Update (2017-08-16): Turns out that, as Samuel pointed out in the comments and on Twitter, I wrote something untrue. NewsBlur does send the headers, and supports this correctly. The problem is an Apache bug that causes 304 never to be issued when using
To be fair, even rawdog, which I use for Planet Multimedia, does not appear to support these properly. Oh and speaking of Planet Multimedia, would someone be interested in providing a more modern template so that Monty’s pictures don’t take over the page, that would be awesome!
There actually are a few other readers that do support these values correctly, and indeed receive 304 (Not Modified) status code most of the time. These include Lighting (somebody appears to be still using it!) and at least yet-another-reader-service Willreadit.com — this latter appears to be in beta and being invite only; it’s probably the best HTTP implementation I’ve seen for a service with such a rough website. Indeed the bot landing page points out how it supports
If-Modified-Since and gzip-compressed responses. Alas it does not appear to learn from persistent redirects though, so it’s currently fetching my blog’s feed twice, probably because there are at least two subscribers for it.
Also note that supporting
If-Modified-Since is a prerequisite for supporting delta feeds which is an interesting way to save even more bandwidth (although I don’t think this is feasible to do with a static website at all).
At the very least it looks like we won the battle for supporting compressed responses. The only 406 (Not Acceptable) responses for the feed URL are for Fever, which is no longer developed or supported. Even Gwene, which I pointed out was hammering my blog last time I wrote about this, is now content to get the compressed version. Unfortunately it does not appear like my pull request was ever merged, which means it’s likely the repository itself is completely out of sync with what is being run.
So in 2017, what is the current state of the art feed reader support? NewsBlur has recently added support for JSON Feed which is not particularly exciting – when I read the post I was reminded, by the screenshot of choice there, where I heard of Brent Simmons before: Vesper, which is an interesting connection, but I should not go into that now – but at least shows that Samuel Clay is actually paying attention to the development of the format — even though that development right now appears to just avoiding XML. Which to be honest is not that bad of an idea: since HTML (even HTML5) does not have to be well-formatted XML, you need to provide it as cdata in an XML feed. And the way you do that makes it very easy to implement it incorrectly.
Also, as I wrote this post I realized what else I would like from NewsBlur: the ability to subscribe to an OPML feed as a folder. I still subscribe to lots of Planets, even though they seem to have lost their charm, but a few people are aggregated in multiple planets and it would make sense to be able to avoid duplicate posts. If I could tell NewsBlur «I want to subscribe to this Planet, aggregate it into a folder», it would be able to tell the duplicated feeds, and mark the posts as read on all of them at the same time. Note that what I’d like is something different from just importing an OPML description of the planet! I would like for the folder to be kept in sync with the OPML feed, so that if new feeds are added, they also get added to the folder, and same for removed feeds. I should probably file that on GetSatisfaction at some point.