You probably remember the series of posts I already wrote about my antispam that uses the User-agent field to reject at the source a number of comments that are likely to be spam. The idea is definitely working right, just yesterday it filtered out 134 spam comments (no false positives, after a quick check), and at the same time I have no need to use obnoxious captchas, or to block comments on old posts (and just yesterday I got an interesting one on an almost year old post ).
Unfortunately this was still not perfect; luckily there is a second antispam pass that is applied directly by Typo using some heuristics (like the number of links) and akismet; this second pass is both good and bad. For instance it always marks as spam the posts where people do provide references for their comment, which is a bit tiresome. Sure it does not delete the posts, but only queue them up for moderation, but still. Unfortunately the second pass couldn’t be disabled or loosened up because usually I would get around three spam comments every day or so (which is still a lot less compared to the hundreds sometimes the filter kills at the source).
But last night, thanks to Mark, I was able to refine the antispam even more (and the comment policy now is updated to reflect that); I added a couple more DNSBL (DNS-based blacklists): proxyBL DroneBL and CBL . I left them running on the untested input during the night and the results are quite interesting. Just one or two hits on ProxyBL, but about two posts an hour hit DroneBL right away, and of those a few wouldn’t have hit my usual User-Agent-only antispam.
But since I don’t want to hit other services when I can filter the spam myself, I’ve now re-configured the checks to only apply if the comment didn’t hit any other check first (this way all the bogus user agent posts would be dropped and then the remaining “valid” ones would be checked). In particular, CBL is set as the very last check, for a very important reason: CBL does not sanction its use for non-mail related filtering. Unfortunately, CBL is also the only list that had a couple of IP addresses from which false negatives arrived yesterday, so I really wouldn’t have wanted to ignore it entirely. But I am responsible for any problem related to CBL with this kind of use; please don’t ever bother CBL upstream about this.
And another change, related to the blog spam, might be of interest. I’ve tried re-enabling the trackback support, but as it was easy to guess, there seems to be nothing but spam passing through it nowadays; very few valid installations actually use the trackback support, and they definitely don’t justify the amount of spam I’d be getting; on the other hand, Typo should be able to trackback itself to link posts together when I note something about them, and that’s one thing that I’d really like to keep; so for now I’ve enabled the trackback feature from within Typo, but I’ve stopped it on the Apache configuration, by allowing only the server’s own IP address to access the location.
I’ll publish the modsecurity configuration someday in the near future, hopefully.