In my previous post on the matter, I called for a boycott of Semalt by blocking access to your servers from their crawler, after a very bad-looking exchange on Twitter with a supposed representative of theirs.
After I posted that, I got threatened by the same representative to be sued for libel, even though what that post was about was documenting their current practices, rather than shaming them. This got enough attention of other people who has been following the Semalt situation so that I could actually gather some more information on the matter.
In particular, there are two interesting blog posts by Joram van den Boezen about the company and its tactics. Turns out that what I thought was a very strange private cloud set up – coming as it was from Malaysia – was actually a botnet. Indeed, what appears from Joram’s investigations is that the people behind Semalt use sidecar malware both to gather URLs to crawl, and to crawl them. And this, according to their hosting provider is allowed because they make it clear in their software’s license.
This is consistent with what I have seen of Semalt on my server: rather than my blog – which fares pretty well on the web as a source of information – I found them requesting my website, which is almost dead. Looking at all the websites in all my servers, the only other affected is my friend’s which is by far not really an important one. But if we start from accepting Joram’s findings (and I have no reason not to), then I can see how that can happen.
My friend’s website is visited mostly by the people in the area we grew up in, and general friends of his. I know how bad their computers can be, as I have been doing tech support on them for years, and paid my bills that way. Computers that were bought either without a Windows license or with Windows Vista, that got XP installed on them so badly that they couldn’t get updates even when they were available. Windows 7 updates that were done without actually possessing a license, and so on so forth. I have, at some point, added a ModRewrite-based warning for a few known viruses that would alter the Internet Explorer User-Agent
field.
Add to this that even those who shouldn’t be strapped for cash would want to avoid paying for anything if they can, you can see why software such as SoundFrost and other similar “tools” to download YouTube videos into music files would be quite likely to be found in computers that end up browsing my friend’s site.
What remains still not clear from all this information is why they are doing it. As I said in my previous post, there is no reason to abuse the referrer field, that is, beside to spam the statistics of the websites. Since the company is selling SEO services, one assumes that they do so to attract more customers. After all, if you spend time checking your Analytics output, you probably are the target audience of SEO services.
But after that, there are still questions that have no answer. How can that company do any analytics when they don’t really seem to have any infrastructure but rather use botnets for finding and accessing websites? Do they only make money with their subscriptions? And here is where things can get tricky, because I can only hypothesize and speculate, words that are dangerous to begin with.
What I can tell you is that out there, many people have no scruple, and I’m not referring to Semalt here. When I tried to raise awareness about them on Reddit (a site that I don’t generally like, but that can be put to good use sometimes), I stopped by the subreddit to get an idea of what kind of people would be around there. It was not as I was expecting, not at all. Indeed what I found is that there are people out there seriously considering using black hat SEO services. Again, this is speculation, but my assumption is that these are consultants that basically want to show their clients that their services are worth it by inflating the access statistics to the websites.
So either these consultants just buy the services out of companies like Semalt, or even the final site owners don’t understand that a company promising “more accesses” does not really mean “more people actually looking at your website and considering your services”. It’s hard for people who don’t understand the technology to discern between “accesses” and “eyeballs’. It’s not much different from the fake Twitter followers, studied by Barracuda Labs a couple of years ago — I know I read a more thorough study of one of the websites selling this kind of money but I can’t find it. That’s why I usually keep that stuff on Readability.
So once again, give some antibiotics to the network, and help cure the web from people like Semalt and the people who would buy their services.
After reading your article and checking with web server logs of machines I administer, I decided to add all matching IPs to dronebl by type 6 (unknown spambot) for “abusive behaviour”