Where did the discussion move to?

The oldest post you’ll find on this blog is from nearly sixteen years ago, although it’s technically a “recovered” post that came from a very old Blogspot account I used when I was in high school. The actual blog that people started following is probably fourteen years old, when Planet Gentoo started and I started writing about my development there. While this is nowhere as impressive as Scalzi’s, it’s still quite an achievement in 2020, when a lot of people appear to have moved to Medium posts or Twitter threads.

Sixteen years are an eternity in Internet terms, and that means the blog has gone through a number of different trends, from the silly quizzes to the first copy-and-paste list memes, from trackbacks to the anti-spam fights. But the one trend that has been steady over the past six years (or so) is the mistreatment of comments. I guess this went together with the whole trend of toxic comments increasing, and the (not wrong) adage of “don’t read the comments”, but it’s something that saddened me before, and that saddens me today.

First of all, the lack of comments feels, to me, like a lack of engagement. While I don’t quite write with the intention of pleasing others, I used to have meaningful conversations with readers of the blog in the past — whether it was about correcting my misunderstanding of things I have no experience with, or asking follow up questions that could become more blog posts for other to find.

Right now, while I know there’s a few readers of the blog out there, it feels very impersonal. A few people might reply to the Tweet that linked to the new post, and maybe one or two might leave a comment on LinkedIn, but that’s usually where the engagement ends for me, most of the time. Exception happen, including my more recent post on zero-waste, but even those are few and far between nowadays. And not completely unexpectedly, I don’t think anyone is paying attention to the blog’s Facebook page.

It’s not just the big social media aggregators, such as Reddit and Hacker News, that cause me these annoyances. Websites like Boing Boing, which Wikipedia still calls a “group blog”, or Bored Panda, and all their ilks, appear to mostly be gathering posts from other people and “resharing” them, nowadays. On the bright side of the spectrum, some of these sites at least appear to add their own commentary on the original content, but in many other cases I have seen them reposting the “eye catchy” part of the original content (photo, diagram, infographic, video) without the detailed explanations, and sometimes making it hard to even find the original credit.

You can imagine that it is not a complete coincidence that I’m complaining about this after having had to write a full-on commentary due to Boing Boing using extremely alarmist tones around a piece of news that, in my view, barely should have been notable. Somehow it seems news around diabetes and glucometers have this effect on people — you may remember I was already annoyed when Hackaday was tipped about my project, and decided to bundle it with an (unsafe!) do-it-yourself glucometer project that got the most of the comments on their own post.

I guess this ends up sounding a lot like an old man shouting at clouds — but I also still think that discussing ideas, posts, opinions with the creators are worth doing, particularly if the creators have the open mind of listening to critique of their mistakes — and, most importantly, the “capacitance” to send abuse away quickly. Because yeah, comments became toxic a long time ago, and I can’t blame those who prefer not to even bother with comments in the first place, despite disliking it myself.

To conclude, if you have anything to discuss or suggest me, please do get in touch. It’s actually a good feeling to know that people care.

Some of my thoughts on comments in general

One of the points that is the hardest for me to make when I talk to people about my blog is how important comments are for me. I don’t mean comments in source code as documentation, but comments on the posts themselves.

You may remember that was one of the less appealing compromises I made when I moved to Hugo was accepting to host the comments on Disqus. A few people complained when I did that because Disqus is a vendor lock-in. That’s true in more ways than one may imagine.

It’s not just that you are tied into a platform with difficulty of moving out of it — it’s that there is no way to move out of it, as it is. Disqus does provide you the ability to download a copy of all the comments from your site, but they don’t guarantee that’s going to be available: if you have too many, they may just refuse to let you download them.

And even if you manage to download the comments, you’ll have fun time trying to do anything useful with them: Disqus does not let you re-import them, say in a different account, as they explicitly don’t allow that format to be imported. Nor does WordPress: when I moved my blog I had to hack up a script that took the Disqus export format, a WRX dump of the blog (which is just a beefed up RSS feed), and produces a third file, attaching the Disqus comments to the WRX as WordPress would have exported them. This was tricky, but it resolved the problem, and now all the comments are on the WordPress platform, allowing me to move them as needed.

Many people pointed out that there are at least a couple of open-source replacements for Disqus — but when I looked into them I was seriously afraid they wouldn’t really scale that well for my blog. Even WordPress itself appears sometimes not to know how to deal with a >2400 entries blog. The WRX file is, by itself, bigger than the maximum accepted by the native WordPress import tool — luckily the Automattic service has higher limits instead.

One of the other advantages of having moved away from Disqus is that the comments render without needing any JavaScript or third party service, make them searchable by search engines, and most importantly, preserves them in the Internet Archive!

But Disqus is not the only thing that disappoints me. I have a personal dislike for the design, and business model, of Hacker News and Reddit. It may be a bit of a situation of “old man yells at cloud”, but I find that these two websites, much more than Facebook, LinkedIn and other social media, are designed to take the conversation away from the authors.

Let me explain with an example. When I posted about Telegram and IPv6 last year, the post was sent to reddit, which I found because I have a self-stalking recipe for IFTTT that informs me if any link to my sites get posted there. And people commented on that — some missing the point and some providing useful information.

But if you read my blog post you won’t know about that at all, because the comments are locked into Reddit, and if Reddit were to disappear the day after tomorrow there won’t be any history of those comments at all. And this is without going into the issue of the “karma” going to the reposter (who I know in this case), rather than the author — who’s actually discouraged in most communities from submitting their own writings!

This applies in the same or similar fashion to other websites, such as Hacker News, Slashdot, and… is Digg still around? I lost track.

I also find that moving the comments off-post makes people nastier: instead of asking questions ready to understand and talk things through with the author, they assume the post exist in isolation, and that the author knows nothing of what they are talking about. And I’m sure that at least a good chunk of that is because they don’t expect the author to be reading them — they know full well they are “talking behind their back”.

I have had the pleasure to meet a lot of people on the Internet over time, mostly through comments on my or other blogs. I have learnt new thing and been given suggestions, solutions, or simply new ideas of what to poke at. I treasure the comments and the conversation they foster. I hope that we’ll have more rather than fewer of them in the future.

Why HTTPS anyway?

You probably noticed that in the past six months I had at least two bothersome incidents related to my use of HTTPS in the blog. An obvious question at this point would be why on earth would I care about making my blog (and my website) HTTPS only.

Well, first of all, the work I do for the blog usually matches fairly closely the work I do for xine’s Bugzilla so it’s not a real doubling of the effort, and actually allows me to test things out more safely than with some website that actually holds information that has some value. In the case of the Bugzilla, there are email addresses and password hashes (hopefully properly salted, I trust Bugzilla for that, although I would have preferred OAuth 2 to avoid storing those credentials), and possibly security bugs reported with exploit information that should not be sent out in the clear.

My blog has much less than that; the only user is me, and while I do want to keep my password private, there is nothing that stops me from using a self-signed certificate only for the admin interface. And indeed I had that setup for a long while. But then I got the proper certificate and made it optionally available on my blog. Unfortunately that made it terrible to deal with internal and external links to the blog, and the loading of resources; sure there were ways around it but it was still quite a pain.

The other reason for that is simply to cover for people who leave comments. Most people connecting through open networks, such as from Starbucks, will have their traffic easily sniffable as no WPA is in use (and I’ve actually seen “secure” networks using WEP, alas), and I could see how people preferred not posting their email in comments. And back last year I was pushing hard for Flattr (I don’t any more) and I was trying to remove reasons for not using your email when commenting, so HTTPS protection was an interesting point to make.

Nowadays I stopped pushing for Flattr, but I still include gravatar integration and I like having a way to contact the people who comment on my blog especially as they make points that I want to explore more properly, so I feel it’s in my duty to protect their comments as they flow by using HTTPS at the very least.

Flattr for comments

You probably know already that my blog is using Flattr for micro-donation, both to the blog as a whole and to the single articles posted here. For those who don’t know, Flattr is a microdonation platform that splits a monthly budget into equal parts to share with your content creators of choice.

I’ve been using, and musing about, Flattr for a while and sometimes I ranted a little bit of how things have been moving in their camp. One of the biggest problems with the service is the relative scarce adoption. I’ve got a ton of “pending flattrs” as described on their blog for Twiter and Flickr users, mostly.

Riling up adoption of the service is key for it to be useful for both content creators and consumers: the former can only get something out of the system if their content is liked by enough people, and the latter will only care about adding money to the system if they find great content to donate to. Or if they use Socialvest to get the money while they spend it somewhere else.

So last night I did my part in trying to increase the usefulness of Flattr: I added it to the comments of my blog. If you do leave a comment and fill the email field, that email will be used, hashed, to create a new “thing” on Flattr, whether you’re already registered or not — if you’re not registered, the things will be kept pending until you register and associate the email address. This is not much different from what I’ve been doing already with gravatar, which uses the same method (the hashed email address).

Even though the description of the parameters needed to integrate Flattr for comments are described in the partnership interface there doesn’t seem to be a need to be registered as a partner – indeed you can see in the pages’ sources that there is no revenue key present – and assuming you already are loading the Flattr script for your articles’ buttons, all you have to add is the following code to the comment template (for Typo, other languages and engines will differ slightly of course!):

<% if comment.email != "" -%>

<% end -%>

Update (2017-07-20): No I’m not sure where the code ended up for this one, sorry :(

So if I’m not making money with the partner site idea, why am I bothering with adding these extra buttons? Well, I often had people help me out a lot in comments, pointing out obvious mistakes I made or things I missed… and I’d like to be able to easily thank the commenters when they help me out… and now I can. Also, since this requires a valid email field, I hope for more people to fill it in, so that I can contact them if I want to ask or tell them something in private (sometimes I wished to contact people who didn’t really leave an easy way to contact them).

At any rate, I encourage you all to read the comments on the posts, and Flattr those you find important, interesting or useful. Think of it like a +1 or a “Like”. And of course, if you’re not subscribed with Flattr, do so! You’ll never know what other people could like, that you posted!

Why moderated comments can be a problem

You might know already that I don’t like moderating comments; I did for a long time because of spam, but nowadays I prefer filtering comments out with mod_security based on User-Agent and other diagnostics. One of the reasons why I don’t like moderated comments is that, often times, comments can correct a wrong blog post and make it not extremely bad.

I don’t pretend I’m extremely good at what I do and that I never make mistakes; I’m sure I do, but if I say something very stupid, usually somebody corrects me on the comments, and the post still keep a value. When I see posts about people reinventing the wheel, and making it hexagonal for some reason (like reinventing grep --include -r with a script using find and a for loop), and find out that the comments are moderated, then I’m usually appalled. First, because lots of users that don’t know better will read the post and apply what it says without thinking twice about it. Second, because in the comments, that appeared in a batch right away, beside a number of duplicate suggestions, there has been even more suggestions in a number of polygons, but just a couple of really round wheels. You probably know what I’m referring to if you follow some of the planets out there, I don’t really want to name names here.

Today, another example of moderated comments that hinder clearing up a blog post that isn’t really proper. When you rant about a software or a feature, especially when you explicitly say you don’t understand why it does what it does, leaving open comments allows for people to actually solve the mystery for you; if you moderate it, you’re probably wasting time of more than one person who has the answer, since they’ll probably try to explain it when they see no comments present already.


Yes, again spam filtering

You might remember that I reported success with my filters using User-Agent value as reported by the clients; unfortunately it seems like I was really speaking way too soon. While the amount of spam I had to manually remove from the blog decreased tremendously, which allowed me to disable the 45 days limits on commenting, and also comment moderation, it still didn’t cut it, and caused a few false positives.

The main problem is that the filter on HTTP/1.0 behaviour was hitting almost anybody that tried to comment with a proxied connection: default squid configuration doesn’t use HTTP/1.1 and so downgrades everything to 1.0; thanks to binki and moesasji I was able to track down the issue and now my ruleset (which I’m going to attach at the end of the post) checks for the Via header to identify proxies. Unfortunately, the result is that now I get much more spam; indeed lots and lots of comment spam comes through open proxies, which is far from an uncommon thing.

I guess one option would be to use the SORBS DNSBL blacklists to filter out known open proxies; unfortunately either I misconfigured the dnsbl lookup module for Apache (which I hoped I got already working) or the proxies I’m receiving spam from are not listed there at all. I was also told that mod_security can handle the lookup itself, which is probably good since I can reduce the lookups for the open proxy to the case when a proxy is actually used.

I was also told to look at the rules from Got Root which also list some user agent based filtering; I haven’t done so yet though, because I start to get worried: my rules are already executing a number of regular expression matching on the User-Agent header, and I’m trying to do my best to make sure that the expressions are generic enough but not too broad; on the other hand Got Root’s rules seems to provide a straight match of a series of user agents, which means lots and lots of checks added; the rules also seems to either be absolute (for any requested URL) or just for WordPress-based blogs, which means I’d have to adapt or tinker with them since I’m currently limiting the antispam measures through the use of Apache’s Location block (previously LocationMatch, but the new Typo version uses a single URL for all the comments posting).

What I’d like to see is some kind of Apache module that is able to match an User-Agent against a known list of bad User-Agents, as well as a list of regular expressions, compiled in some kind of bytecode, to be much much faster than the “manual” parsing that is done now. Unfortunately I don’t have neither time nor expertise with Apache to take care of that myself, which means either someone else does it, or I’m going to keep with mod_security for a while longer.

Anyway here’s the beef!

SecDefaultAction "pass,phase:2,t:lowercase"

# Ignore get requests since they cannot post comments.
SecRule REQUEST_METHOD "^get$" "pass,nolog"

# 2009-02-27: Kill comments where there is no User-Agent at all; I
# don't care if people like to be "anonymous" in the net, but the
# whole thing about anonymous browsers is pointless.
SecRule REQUEST_HEADERS:User-Agent "^$" 
    "log,msg:'Empty User-Agent when posting comments.',deny,status:403"

# Since we cannot check for _missing_ user agent we have to check if
# it's present first, and then check whether the variable is not
# set. Yes it is silly but it seems to be the only way to do this with
# mod_security.
SecRule REQUEST_HEADERS_NAMES "^user-agent" 
    "log,msg:'Missing User-Agent header when posting comments.',deny,status:403"

# Check if the comment arrived from a proxy; if that's the case we
# cannot rely on the HTTP version that is provided because it's not
# the one of the actual browser. We can, though, check if it's an open
# proxy blacklist.
    "setvar:tx.flameeyes_via_proxy=1,log,msg:'Commenting via proxy'"

# If we're not going through a proxy, and it's not lynx, and yet we
# have an HTTP/1.0 comment request, then it's likely a spambot with a
# fake user agent.
# Note the order of the rules is explicitly set this way so that the
# majority of requests from HTTP/1.1 browsers (legit) are ignored
# right away; then all the requests from proxies, then lynx.
SecRule REQUEST_PROTOCOL "!^http/1.1$" 
    "log,msg:'Host has to be used but HTTP/1.0, posting spam comments.',deny,status:403,chain"
SecRule TX:FLAMEEYES_VIA_PROXY "!1" "chain"
SecRule REQUEST_HEADERS:User-Agent "!lynx"

# Ignore very old Mozilla versions (not modern browsers, often never
# exiting) and pre-2 versions of Firefox.
# Also ignore comments coming from IE 5 or earlier since we don't care
# about such old browsers. Note that Yahoo feed fetcher reports itself
# as MSIE 5.5 for no good reason, but we don't care since it cannot
# _post_ comments anyway.
# 2009-02-27: Very old Gecko versions should not be tollerated, grace
# the period 2007-2009 for now.
# 2009-03-01: Ancient Opera versions usually posting spam comments.
# 2009-04-22: Some spammers seem to send requests with "Opera "
# instead of "Opera/", so list that as an option.
SecRule REQUEST_HEADERS:User-Agent "(mozilla/[0123]|firefox/[01]|gecko/200[0123456]|msie ([12345]|7.0[ab])|opera[/ ][012345678])" 
    "log,msg:'User-Agent too old to be true, posting spam comments.',deny,status:403"

# The Mozilla/4.x and /5.x agents have 0 as minor version, nothing
# else.
SecRule REQUEST_HEADERS:User-Agent "(mozilla/[45].[1-9])" 
    "log,msg:'User-Agent sounds fake, posting spam comments.',deny,status:403"

# Malware and spyware that advertises itself on the User-Agent string,
# since a lot of spam comments seem to come out of browsers like that,
# make sure we don't accept their comments.
SecRule REQUEST_HEADERS:User-Agent "(funwebproducts|myie2|maxthon)" 
    "log,msg:'User-Agent contains spyware/adware references, posting spam comments.',deny,status:403"

# Bots usually provide an http:// address to look up their
# description, but those don't usually post comments. Consider any
# comment coming from a similar User-Agent as spam.
SecRule REQUEST_HEADERS:User-Agent "http://" 
    "log,msg:'User-Agent spamming URLs, posting spam comments.',deny,status:403"

    "^mozilla/4.0+" "log,msg:'Spaces converted to + symbols, posting spam comments.',deny,status:403"

# We expect Windows XP users to upgrade at least to IE7. Or use
# Firefox (even better) or Safari, or Opera, ...
# All the comments coming from the old default OS browser have a high
# chance of being spam, so reject them.
# 2009-04-22: Note that we shouldn't check for 5.0 and 6.0 NT versions
# specifically, since Server and x64 editions can have different minor
# versions.
SecRule REQUEST_HEADERS:User-Agent "msie 6.0;( .+;)? windows nt [56]." 
    "log,msg:'IE6 on Windows XP or Vista, posting spam comments.',deny,status:403"

# List of user agents only ever used by spammers
# 2009-04-22: the "Windows XP" declaration is never used by official
# MSIE agent strings, it uses "Windows NT 5.0" instead, so if you find
# it, just kill it.
SecRule REQUEST_HEADERS:User-Agent "(libwen-us|msie .+; .*windows xp)" 
    "log,msg:'Confirmed spam User-Agent posting spam comments.',deny,status:403"

Spam attacks

I have in my TODO list (always expected to happen, I have no idea when yet), to update the mod security rules that I posted some time ago; while the ones I posted mostly work, I had to add one more exception on the HTTP/1.0 posting (Opera, in some configuration) and I’ve added a few more blacklists for known spamming User-agents (Project Honeypot seems quite useful for double-checking that, and is why you actually find Project Honeypot-induced hidden links in my blog; another item in my TODO list, adding this to the xine Bugzilla too).

With the filtering on, I had one only person reporting false positives (moesaji) and some post from time to time with spam passing through mod_sec and hitting the Typo anti-spam measure (which is not perfect but can deal with the lower rate of spam that I receive now). Today though I found some strangely large hit of spam. Note that for my new standards, “strangely large hit” means nine spam comments on three posts. So I just executed the usual script to get the new data from access log on the server, and it started being interesting.

The one post that stood out from the rest because it was the absolutely usual spam comment reports, as user agent, the Opera for Wii browser. It’s a first for me, in both spam and non-spam, with that user agent. I do use the PSP browser from time to time and I tried blogging from the PlayStation 3, but at this point I don’t doubt the User-Agent header is being forged, because I don’t see someone able to easily hijack a Wii to post spam comments around.

The remaining posts are much more interesting. First of all they come with no User-Agent at all, which means that I forgot to ban that particular case with mod_sec (just checking that it’s ^$ does not work, probably because that expectes User-Agent: instead of no header at all), and I’ll have to fix that in a moment, but there is one other interesting issue, that wouldn’t have been that interesting if I didn’t read Planet Debian almost daily.

The other day I read (and shared with Google Reader), a post by Steve Kemp about how spammers don’t know the syntax of your site and will try to link their website with different methods all at once. In particular he refers that his anti-spam comments service now takes care of identifying that too (reminds me that I have to search or write a plugin for Typo to check on that the comments, again in my TODO list).

How does that make the spam I received today interesting? Well instead of having one spam comment with three different link methods, different IPs in the same C-class posted four comments on the same article, with the usual “Very nice site” text, one without link, and three with the three different links. A quite nice way to avoid the detection as Kemp reported. Which brings me to the final question of the post: are spammers monitoring us? Or is just strange luck that as soon as Kemp found a mostly “no false positive” rule to identify spam, they start to work it around?

At any rate please remember to disable browser anonymisers when you want to post comments on my blog, I don’t like those and you’d have no reason to since I’m not an evildoer that registers the users’ preferences in browsers — I just use them to avoid filling the net with spam.

Service announcement about disabled comments

Just a service announcement post for the few users following my blog. As the GMN is often reporting blog posts of mine that are more than a month old I decided to make some changes to how comments get disabled.

Up to today, posts older than 30 days had their comments section disabled, this is because after a while most of the comments arriving on these are just simple spam, and while I’m forced to premoderate the comments anyway (too much spam otherwise), I’d rather not having more of it to remove.

Now I moved the limit to 90 days, so there should be enough time to comment even when GMN posts a link a month later.

If you wanted to comment on a not-so-recent post of mine and you found the comments disabled, you might want to take a look now, as the change is retroactive.

If I don’t see an exponential amount in spam I might just as well leave all the posts enabled even after 90 days.. there’s time to see that ;)

Just a note about comment moderation

This is not really Gentoo-related, just a service notice for those who wonders, as I was asked in private about this. I do moderate comments, and I don’t accept all of them. In particular, since a few months I automatically delete some people’s posts from the queue without even looking at the comment.

This blacklist (which is actually just a pgsql command, I do it the hard way because typo does not support a proper blacklist) does not include Ciaran, as you can guess if you read my comments. He sometimes have valid technical points, so I usually moderate him by hand – he also seems to have high consideration of either me or the people who actually follow my blog, as he does comment so much. I tend to limit the possible recursiveness of his arguments though, so you’ll usually find just one post from him unless he makes a new point I find valid.

Yes this is not a democratic forum, I never said it was. But unless you contributed just pointless talk for enough time to drive me off, I usually tend to allow people to disagree with me.

As for what Ferdy said, my post wasn’t as much as a rebuttal to him than it was to others, I think who I wanted to understand did so already. But as I said I admit my mistake: I didn’t check the logs. I know now it was a bad idea, but I didn’t. Although this is indeed a mistake, I don’t consider ever having voted for the meeting on the 15th of May, I voted to reschedule, that’s right, but I didn’t even read any date.

So my point still stands: the meeting wasn’t agreed upon. If the majority of the council will agree that the rule still applies, that’s fine by me, but again I’ll propose the next council will, first task on their list, clarify the rules on meetings (as Paul said): when they can be considered agreed upon and when the rule should apply.