After the recent Typo update I had some trouble with Akismet not working properly to mark comments as spam, at least the very few spam comments that could get past my ModSecurity Ruleset — so I set off to deal with it a couple of days ago to find out why.
Well, to be honest, I didn’t really want to focus on why at first. The first thing I found out while looking at the way Typo uses akismet, is that it still used a bundled, hacked, ancient akismet library.. given that the API key I got was valid, I jumped to the conclusion, right or wrong it was, that the code was simply using an ancient API that was dismissed, and decided to look around if there is a newer Akismet version; lo and behold, a 1.0.0 gem was released not many months ago.
After fiddling with it a bit, the new Akismet library worked like a charm, and spam comments passing through ModSecurity were again marked as such. A pull request and its comments later, I got a perfectly working Typo which marks comments as spam as good as before, with one less library bundled within it (and I also got the gem into Portage so there is no problem there).
But this left me with the problem that some spam comments were still passing through my filters! Why did that happen? Well, if you remember my idea behind it was validating the User-Agent header content… and it turns out that the latest Firefox versions have such a small header that almost every spammer seem to have been able to copy it just fine, so they weren’t killed off as intended. So more digging in the requests.
Some work later, and I was able to find two rules with which to validate Firefox, and a bunch of other browsers; the first relies on checking the
Connection: keep-alive header that is always sent by Firefox (tried in almost every possible combination), and the other relies on checking the
Content-Type on the POST request for a charset being defined: browsers will have it, but whatever the spammers are using nowadays doesn’t.
Of course, the problem is that once I actually describe and upload the rules, spammers will just improve their tools to not commit these mistakes, but in the mean time I’ll have some calm, spamless blog. I still won’t give in to captchas!
At any rate, beside adding these validations, thanks to another round of testing I was able to fix Opera Turbo users (now they can comment just fine), and that lead me to the choice of tagging the ruleset and .. releasing it! Now you can download it from GitHub or, if you use Gentoo, just install it as
www-apache/modsec-flameeyes — there’s also a live ebuild for the most brave.
I’ve gotten rid of a lot of garbage traffic by rejecting connections that have no host in the header, or that have an IP address rather than hostname. Lots of the script kiddie crap just blindly go through IP blocks. You can whitelist sources of search engine crawlers that you don’t want to block.I’ve also found it useful to grab the source address from unwanted sources, and pass it to a script to add it to a blacklist ipset.