This Time Self-Hosted
dark mode light mode Search

IRC and Blogs have something in common

And I’m not referring to the bile and acid reflux; of course they have that in common, but I don’t care much about those; what I care they to have in common is simply spam.

In the past two months, my ModSecurity-based antispam method for the blog started to fail; sure enough, filtering by user agent works well enough to filter a lot of spam, but a lot of that has still been passing through. New waves of spam declare themselves coming from Firefox 3.0, or MSIE 7 on Vista, .. all plausible User-Agent strings that I cannot simply filter.

So I went a step further and started looking at newer patters; the answer was obvious after a few calls to the MultiRBL enquirer — most, if not all, the spam is coming from open proxies. I cannot be sure for all because I only went down to search the block lists that various IRC networks use to filter their connections; a lot of IPs are in most of them, a few are just in one or two, and a few are in none. But given the way those proxy work, it’s quite natural that they might be found blog spamming but not IRC spamming.

Where does this leave us? Not really any further than we were before; IRC networks use double-protection: they use the DNSBL to shut people out when they know them, and all the others, they portscan to make sure that they are not really open proxies. Now this can backfire, since not all networks probe their clients properly – I had my own vserver blacklisted on a network before because it redirected to my main website when called with an unknown vhost… and the IRC network only tested if it accepted the request rather than testing that the request caused a further connection – and takes a bit of time to perform especially for slower clients. On a distributed, non-low-latency network like IRC, this is acceptable. For blog comments, not so much.

Right now I can only rely on the multiple DNSBL checks, and even those tend to be cumbersome and slow down the connection; having a much tougher test is not going to be a happy situation neither for my server, nor for the people who would like to comment.

On the other hand, I’m thinking whether I can prepare the tougher test directly in ModSecurity, and running the complete, hungry test on the second comment in a day. That, would be a nice situation. But I’ll need more time to do that, as it would need to shell out to some other program, or write an openproxy checker in LUA.

If somebody has other ideas and solutions, they are definitely welcome.

Comments 9
  1. Akismet works to hide spam from the published feed and articles’ pages… not so much to filter them beforehand.Given that I have to waste time to delete the comment spam beforehand (and the newest Typo updates actually make spam comments even more obnoxious because they are shown expanded on the feedback view as well), I’d rather filter them beforehand entirely.This way, I can also let _all_ the posts open for comments even years after writing them, which is quite rare unfortunately, in other blogs.

  2. @user99 this blog, my website, and the two links to the right are _all_ on the same vserver; _all_ with the same configuration, and filtering is done actively _only_ on commenting.

  3. @ flameeyes: I understand …still from two different IP’s the links on the right are inaccessible. Cname not resolving? I’ll ‘dig’ them IP- homeother IP- work

  4. (on lunch :-0 )they don’t resolve the same.david@random ~ $ dig; <<>> DiG 9.7.1 <<>>;; global options: +cmd;; Got answer:;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 7751;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0;; QUESTION SECTION:;;; ANSWER;; Query time: 149 msec;; SERVER:;; WHEN: Mon Aug 30 12:07:41 2010;; MSG SIZE rcvd: 45david@random ~ $ dig; <<>> DiG 9.7.1 <<>>;; global options: +cmd;; Got answer:;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 28139;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 0;; QUESTION SECTION:;;; ANSWER;; Query time: 38 msec;; SERVER:;; WHEN: Mon Aug 30 12:08:21 2010;; MSG SIZE rcvd: 92

  5. david@random ~ $ dig; <<>> DiG 9.7.1 <<>>;; global options: +cmd;; Got answer:;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 55979;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 1, ADDITIONAL: 0;; QUESTION SECTION:;;; ANSWER SECTION:;; AUTHORITY 2010082301 86400 3600 3600000 86400;; Query time: 28 msec;; SERVER:;; WHEN: Mon Aug 30 23:20:32 2010;; MSG SIZE rcvd: 121david@random ~ $ ping hosting.flameeyes.euping: unknown host hosting.flameeyes.eudavid@random ~ $ ping http://www.altercut.itping: unknown host http://www.altercut.itdavid@random ~ $ ping altercut.itPING ( 56(84) bytes of data.64 bytes from ( icmp_req=1 ttl=47 time=154 ms

  6. Perhaps you’ve already considered this, but have you thought of using a Hashcash-like system? Hashcash in its original incarnation requires a client to find a hash collision of N bits (configurable). There are some javascripts implementations, one plug-in for WordPress iirc. You could either require a small N for each comment, or a somewhat larger N for the first comment and set a cookie.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.