I started looking at ModSecurity when I wanted to implement a Uesr-Agent based antispam method which has proven time and time again working quite well to the point I started publishing the ruleset which takes care not only of working as an antispam method, as well as a way to avoid tons of bad crawlers from finding my email addresses and so on.
When I first proposed this kind of filtering I received quite a few complains, that the HTTP protocol didn’t define the User-Agent in such a way, but thanks first to EFF’s Panopticlick – demonstrating clearly that the “anonymised” requests are not as anonymous as their perpetrators would expect them to be – and most recently SpiderLabs’s work I am now fully certain that I took the right road.
I’ve spent a bit more work on the rules this week, to make them further resilient to fake the requests such as those coming from scriptkiddies’ tools such as the HOIC tool described in the SpiderLabs’s blog post linked above. One of the most interesting detection I came up with is for real Chrome requests: while it seems to me like Google itself does not leverage it, Chrome as of version 18 is still implementing their own proposed Shared Dictionary Compression for HTTP even though I don’t think it’ll ever be used in the real world. Being the only browser actually requesting such an encoding, I can easily assume a connection between the two — this was only disattended by Epiphany, which in its most recent versions declares to be Chrome… which means you then have a browser claiming to be another (Chrome), which in turn claims to be a third (Safari), which uses an engine (KHTML) claiming to be the same as another (Gecko), all the while declaring it’s all compatible with Mozilla/5.0.
One issue I found while doing this work had to do with Android. For both versions 2 and 3 (is somebody really hoping to use Android 1?), the (default, AOSP) browser sends a full-fledged HTTP request, which among other things include an
Accept header. This is what every browser I ever tried does, to the point that ModSecurity’s own Core Rule Set assigns negative points to requests coming without one; in my ruleset it’s further tightened by checking whether the request is purportedly from a known browser, and if so rejecting it if it doesn’t include that header; this worked up to now — note that requests coming through a Proxy, making that explicit through a
Via header, are not validated against these checks simply because many proxies are known to muck with the headers.
Anyway as I was saying this is disattended badly by Android 4 (up to 4.0.3, and CyanogenMod as well); it might have started as a way to minimise the bandwidth usage, but for whatever reason in this version, the AOSP browser does not send an
Accept reader at all — actually it seems like it dropped most of the headers that it was sending before and that are not strictly necessary for the server to process the request. I could have sworn that
Accept was mandatory for the HTTP protocol, but it seems that either I was totally mistaken, or it was only noted in some recommendation that never made it to the standard. The ruleset now exonerates Android 4 from that particular test, but I’m not really too happy about it.
But that’s definitely not the only thing that is out of place with Android. Indeed, if you take an HTC Android device, the browser you open is not the AOSP one, but it’s HTC’s own implementation. This version … does not fully declare itself as an Android device, using a browser compatible with Mobile Safari. Instead, what it reports itself as is a complete Safari, and not in the way that Chrome does it, but by pretending it’s Mac OS X 10.6.3 running on an Intel Mac. Honestly, that’s way crazy to do.
There are a few more things that I hope to be able to handle in my ruleset to make it even tighter, without adding substantial false positives. This means not only fewer spam comments, but also fewer crawlers finding our email addresses, and fewer risks associated with Denial of Service attacks, distributed or not.
If you would like to help with the ruleset,
you can find it on Flattr where it’s depressingly stopping at only two clicks. If you would like to use the ruleset, you can find it on GitHub and you can use it for free, obviously.
The level of spam I’ve had to deal with on Rosetta Code lately is incredible. Four to eight spam users a day getting past captcha twice; once for account creation, once for spam page creation. This is on my shortlist of priorities; I need to find or build a MediaWiki extension to block on fingerprint. (Or possibly on something provided by mod_security)
Michael you could give a try to my ruleset: it does not only work with blogs, but with forums and wikis as well. Give it a try!