When I started working on my antispam filtering based on user agent strings provided by browsers, I got quite a bit of feedback of people who complained that user agent strings weren’t meant to be used for that, that they should be used just for statistical purposes and other stuff like that. Indeed, reading around free software planets, you find lots of people maintaining this position, that no logic code should be conditional to the user agent string; this usually involves people working on Debian-based systems where Firefox is banned and Iceweasel is the way.
Now, I understand that what I’m doing is borderline valid for the protocol; that discriminating users based on their user agent string is not ethically perfect; but let me say that the thing works pretty nicely; I observe from time to time the comments that gets denied (mod_security can keep them in log) and I haven’t found a single false positive; there are a few false negatives (that is, spam that passes through the mod_security filter and reaches the blog); but luckily the antispam features in Typo itself are good enough at that point. Lately this has happened because a few spambots started declaring themselves some almost credible IE 7 on Windows XP or Vista; while IE8 was released, I’d rather give it a few more months before starting to reject those, too.
These results start making me wonder how much what I’m doing is abuse and how much is it use; there are some questionable reasons behind logic switches happening between Firefox and Iceweasel, but that does not involve me. And at the same time, one would expect that stuff like that is doomed to happening; both Apple and Google seems to have accepted that, and you can see that Safari is still declaring itself KHTML, and Google Chrome declares itself as Safari, too. Sure that most of the code that tries to identify one of the three of them should just hit on WebKit (well, that is, if KDE were to finally decide to go with one engine that is getting support out there), but at the same time they try to be pragmatic and accept that there is code logic based on user agents.
Back to my usage, since publishing the rules on this blog starts to get messy, because of mod_security itself (funny!), I’m probably going to post them on a git or something in the next few days; I’ll be adding also the public service rules that I’ve been using now for a while at least for my friend’s site (and actually found a couple of friends of his that had dialers on their systems and never noticed).
So maybe I’m using it for something that wasn’t designed for, on the other hand, it works and it really does not differ much from using statistical analysis of headers from email messages; and you know that your mail server, or client, or proxy, or whatever is doing something like that with spamassassin!