And I’m not referring to the bile and acid reflux; of course they have that in common, but I don’t care much about those; what I care they to have in common is simply spam.
In the past two months, my ModSecurity-based antispam method for the blog started to fail; sure enough, filtering by user agent works well enough to filter a lot of spam, but a lot of that has still been passing through. New waves of spam declare themselves coming from Firefox 3.0, or MSIE 7 on Vista, .. all plausible User-Agent strings that I cannot simply filter.
So I went a step further and started looking at newer patters; the answer was obvious after a few calls to the MultiRBL enquirer — most, if not all, the spam is coming from open proxies. I cannot be sure for all because I only went down to search the block lists that various IRC networks use to filter their connections; a lot of IPs are in most of them, a few are just in one or two, and a few are in none. But given the way those proxy work, it’s quite natural that they might be found blog spamming but not IRC spamming.
Where does this leave us? Not really any further than we were before; IRC networks use double-protection: they use the DNSBL to shut people out when they know them, and all the others, they portscan to make sure that they are not really open proxies. Now this can backfire, since not all networks probe their clients properly – I had my own vserver blacklisted on a network before because it redirected to my main website when called with an unknown vhost… and the IRC network only tested if it accepted the request rather than testing that the request caused a further connection – and takes a bit of time to perform especially for slower clients. On a distributed, non-low-latency network like IRC, this is acceptable. For blog comments, not so much.
Right now I can only rely on the multiple DNSBL checks, and even those tend to be cumbersome and slow down the connection; having a much tougher test is not going to be a happy situation neither for my server, nor for the people who would like to comment.
On the other hand, I’m thinking whether I can prepare the tougher test directly in ModSecurity, and running the complete, hungry test on the second comment in a day. That, would be a nice situation. But I’ll need more time to do that, as it would need to shell out to some other program, or write an openproxy checker in LUA.
If somebody has other ideas and solutions, they are definitely welcome.