Like many other people out there, I like looking at the statistics synthesized from the web server access logs, to know who writes about me, and what people are interested to read that I wrote. And as many of those, I produce the statistics through awstats — with its multitude of problems and limitations. What has this to do with the post’s title, you might ask? Well, awstats instance (as well as other log analyser software), in particular the public ones, are the major reason why you shouldn’t allow people to become users of general sites, such as blogs, and other sites created using various kind of content management systems. Let me try to quality this statement.
So why shouldn’t you register anyway? After all, what bad can it do, to have those things stored on server-side? Well there is one very simple reason: spam and bots again. Leaving alone the sheer problem of XSS injection in the displayed content, when your users get an “user page”, that displays, for instance, their homepage as a link, they are a glutton dish for spammers. When I say that, most people tend to answer me that comments on blogs are already quite nice for spammers, and they encompass a much broader range of pages at that point, which is true, but at the same time, comments tend not to go unscrutinised. You check for new comments, and if you’re the blog author you most likely read the comments’ feed to see what people say about your posts, so you can find spam comments with relative ease.
On the other hand, I don’t know of many web application that let you scrutinise users, and especially users that change their details. Once a spammer has registered, it might just wait for a week before changing the homepage link to an URL pointing to SPAM, scam or whatever else. Would you notice? Most likely, you wouldn’t notice — especially if the comments were auto-generated well enough. And that would mean more links to the spamming site. The solutions for many hides in having
rel="nofollow" in the links to the commenters’ posts so that search engine won’t index them. This only works to a point, given that some lesser-sophisticated crawlers ignore that option and then reproduce the links without the
It gets even worse: some websites don’t put
rel="nofollow" at all on the users’ account pages, which obviously list the address to their portrayed site. Obviously though such pages are difficult to reach; sometimes they are even not linked at all from the pages of the site itself, as the webmaster often think of that problem. To work around these issues, there is one very easy way: you make use of the publicly-available awstats (and other analysers) instances: you send requests to some website with the referrer set to the user page, enough times that it shows in the top-ten referrer summary. The webcrawlers will do the rest for you.
But why using an user page rather than directly the website to spam for? It’s definitely a compromise: from one side, the indirection only works with the most vulnerable websites and with the least sophisticated crawlers; from the other, even the least sophisticated anti-spam system can detect a handful of known-bad referrers (I have my own blacklist for this blog, it helps reducing the amount of bandwidth used for serving content to spam-bots), and you cannot add to that list general news, portal or blog sites that allow users to register, at least not easily (you can possibly use a regular expression to solve the problem, so to only reject requests coming from users’ pages, rather than from the whole site that might actually point to your server, but it adds CPU processing power to the mix).
This said, my suggestion is once again,the same: don’t try to add users’ registration where it’s not due. Find alternative ways around that; leave comments without registration, use services like disqus, support a third party authentication system like Google or Facebook, just don’t make it be a new user. And if you do, make sure you have a way to review possibly fraudulent users quickly and on a schedule, so that you don’t end up hosting trampolines for various kind of spammers!