I know I’m pretty needy on this matters, but there is one thing that drives me crazy about software using HTTP and that is the non-use of the HTTP features that have been created to save bandwidth. I hate that kind of software because from one side, I don’t always connect via flatrate (my phone provider has a semi-flatrate by the traffic) and from the other, I know that servers don’t always have free bandwidth.
So when I find a free software project designed to warn you about changes in web pages that does not seem to know about
If-None-Match, I tend to discard it; I especially am worried if the software does not warn you against setting the polling every minute.
But it doesn’t stop at free software; in the last week I’ve noticed an extensive increase in the traffic generated from my blog; rather than the usual 200⁄300 MB a day, it started generating 500/600MB a day, constantly, without any new referrer that might explain the increase. After looking at the statistics for a little longer I noticed that someone from Germany were making over 1000 requests a day.. quick check on the logs and it turned out that the user was requesting my main article feed once per minute, via his feed reader.
Now, already checking once per minute is a bit too much; most planets and other feed readers check at most once per hour; I usually try hard not to post more than twice per day. But the problem there is that the feed reader software (FeedReader by NewsBrain) is actually braindamanged. It does not use the HTTP headers to only request if the feed has changed, so it kept requesting the same content over and over and over and over. Given that it seems to be a commercial proprietary software, and it doesn’t seem to have a clue about the protocol it’s designed to use, that feed reader is now blacklisted in my server and will not work with the websites hosted there.
So please, if you develop a software that make use of the HTTP protocol, learn to use its features!
You are completely right to block user agents that can’t be bothered to think about scaling or read the RFCs. Or just use httplib2 like sane people.The docs for the universal feed parser say so right there:http://www.feedparser.org/d…