I have in my TODO list (always expected to happen, I have no idea when yet), to update the mod security rules that I posted some time ago; while the ones I posted mostly work, I had to add one more exception on the HTTP/1.0 posting (Opera, in some configuration) and I’ve added a few more blacklists for known spamming User-agents (Project Honeypot seems quite useful for double-checking that, and is why you actually find Project Honeypot-induced hidden links in my blog; another item in my TODO list, adding this to the xine Bugzilla too).
With the filtering on, I had one only person reporting false positives (moesaji) and some post from time to time with spam passing through mod_sec and hitting the Typo anti-spam measure (which is not perfect but can deal with the lower rate of spam that I receive now). Today though I found some strangely large hit of spam. Note that for my new standards, “strangely large hit” means nine spam comments on three posts. So I just executed the usual script to get the new data from access log on the server, and it started being interesting.
The one post that stood out from the rest because it was the absolutely usual spam comment reports, as user agent, the Opera for Wii browser. It’s a first for me, in both spam and non-spam, with that user agent. I do use the PSP browser from time to time and I tried blogging from the PlayStation 3, but at this point I don’t doubt the User-Agent header is being forged, because I don’t see someone able to easily hijack a Wii to post spam comments around.
The remaining posts are much more interesting. First of all they come with no User-Agent at all, which means that I forgot to ban that particular case with mod_sec (just checking that it’s ^$
does not work, probably because that expectes User-Agent:
instead of no header at all), and I’ll have to fix that in a moment, but there is one other interesting issue, that wouldn’t have been that interesting if I didn’t read Planet Debian almost daily.
The other day I read (and shared with Google Reader), a post by Steve Kemp about how spammers don’t know the syntax of your site and will try to link their website with different methods all at once. In particular he refers that his anti-spam comments service now takes care of identifying that too (reminds me that I have to search or write a plugin for Typo to check on that the comments, again in my TODO list).
How does that make the spam I received today interesting? Well instead of having one spam comment with three different link methods, different IPs in the same C-class posted four comments on the same article, with the usual “Very nice site” text, one without link, and three with the three different links. A quite nice way to avoid the detection as Kemp reported. Which brings me to the final question of the post: are spammers monitoring us? Or is just strange luck that as soon as Kemp found a mostly “no false positive” rule to identify spam, they start to work it around?
At any rate please remember to disable browser anonymisers when you want to post comments on my blog, I don’t like those and you’d have no reason to since I’m not an evildoer that registers the users’ preferences in browsers — I just use them to avoid filling the net with spam.
By the way since it’s far from trivial, I think it might be worth to at least refer already how the mod_sec rule to kill no-user-agent requests look like. The problem is that some of the documentation you can Google for refers to the old mod_sec 1 version, and the new documentation is … lacking.<typo:code># Since we cannot check for _missing_ user agent we have to check if it’s present first.SecRule REQUEST_HEADERS_NAMES “^user-agent” “setvar:tx.flameeyes_has_ua=1″# and then check whether the variable is not setSecRule TX:FLAMEEYES_HAS_UA “!^1” “log,msg:’Missing User-Agent header when posting comments, spam.’,deny,status:403″</typo:code>that particular piece of rule (thanks to “this post”:http://www.modsecurity.org/… in the mod_security blog itself for showing how to work with transaction variables) will take care of it; I don’t like the two-step method but it works.
Actually, the rule does not seem to work…