This Time Self-Hosted
dark mode light mode Search

The monthly spam analysis

New month, clean slate of Awstats-generated statistics for my blog and website to analyse.

When looking at the statistics generated by Awstats, it’s much easier to find referrer spammers at the start of the month: the links that provide more than a few referrers in a matter of hours are usually not real links but rather spamlinks. Unfortunately at least one I misunderstood one – sorry Bruno! – but on the other hand it usually proves itself quite useful.

But looking out for these spammes is not only going to populate the list of bad referrers, it also brings out the opportunity to look at fake browsers’ user-agent strings that they use. I’m not sure on why they do it, but rather than simply gathering realistic user-agent lists (which would probably be much harder to counter), they seem to Google them up; and since most statistics generator mangle them, they also get mangled. This is the only reason I can find for some of them to be so blatantly broken that it’s a piece of cake to filter them out with ModSecurity so that they can’t post comments or spam my referrer lists.

Today I was able to find another of those common situations; some spammers seem to try passing themselves as the Opera browser, but in doing so they forgot the space after the agent’s short name (Opera/9.62) and before the open parenthesis with the agent’s details. This is never done by the real browsers, so it can be safely considered one of the tagging features of spammers. My ruleset linked above also contains checks for fake Opera strings reporting as “Mozilla” (since it never does), for fake strings converting spaces to the + symbol, or not closing the details’ parenthesis at all.

While finding referrer spammers at the beginning of the month is quite easy, it also seem to be much more common for spammers to try harder during this time; this is probably for the same reason: it’s much easier for a spammer in a day to hit the top-ten of the statistics’ page, and that’s where the real pagerank comes from. Ah, the hard life of spammers and antispam developers.

Speaking about statistics software, I previously noted a shortcoming in Awstats relating to the lack of rel=nofollow on the referrer links. I sent the patch (applied in Gentoo) to add that attribute upstream and although Laurent accepted the patch and added it already, he pointed out that the page has a global noindex, nofollow directive in the <head> tag, which should cover already the issue of giving pagerank to spammers. While I agree that the theory would like us to know that this is the case, a quick check could find me a number of Awstats-generated pages all over the network. Don’t ask me why that is, even though it shouldn’t be indexed to begin with. At any rate, spammers seem to rely on that presence.

Finally, I’d like to remind you that my ModSecurity Ruleset is available on Github for free; if you do use it and are a Flattr user, though, I’d invite you to flattr it — it also will give me a way to track roughly how many people are affected by my changes.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.