Some of my thoughts on comments in general

One of the points that is the hardest for me to make when I talk to people about my blog is how important comments are for me. I don’t mean comments in source code as documentation, but comments on the posts themselves.

You may remember that was one of the less appealing compromises I made when I moved to Hugo was accepting to host the comments on Disqus. A few people complained when I did that because Disqus is a vendor lock-in. That’s true in more ways than one may imagine.

It’s not just that you are tied into a platform with difficulty of moving out of it — it’s that there is no way to move out of it, as it is. Disqus does provide you the ability to download a copy of all the comments from your site, but they don’t guarantee that’s going to be available: if you have too many, they may just refuse to let you download them.

And even if you manage to download the comments, you’ll have fun time trying to do anything useful with them: Disqus does not let you re-import them, say in a different account, as they explicitly don’t allow that format to be imported. Nor does WordPress: when I moved my blog I had to hack up a script that took the Disqus export format, a WRX dump of the blog (which is just a beefed up RSS feed), and produces a third file, attaching the Disqus comments to the WRX as WordPress would have exported them. This was tricky, but it resolved the problem, and now all the comments are on the WordPress platform, allowing me to move them as needed.

Many people pointed out that there are at least a couple of open-source replacements for Disqus — but when I looked into them I was seriously afraid they wouldn’t really scale that well for my blog. Even WordPress itself appears sometimes not to know how to deal with a >2400 entries blog. The WRX file is, by itself, bigger than the maximum accepted by the native WordPress import tool — luckily the Automattic service has higher limits instead.

One of the other advantages of having moved away from Disqus is that the comments render without needing any JavaScript or third party service, make them searchable by search engines, and most importantly, preserves them in the Internet Archive!

But Disqus is not the only thing that disappoints me. I have a personal dislike for the design, and business model, of Hacker News and Reddit. It may be a bit of a situation of “old man yells at cloud”, but I find that these two websites, much more than Facebook, LinkedIn and other social media, are designed to take the conversation away from the authors.

Let me explain with an example. When I posted about Telegram and IPv6 last year, the post was sent to reddit, which I found because I have a self-stalking recipe for IFTTT that informs me if any link to my sites get posted there. And people commented on that — some missing the point and some providing useful information.

But if you read my blog post you won’t know about that at all, because the comments are locked into Reddit, and if Reddit were to disappear the day after tomorrow there won’t be any history of those comments at all. And this is without going into the issue of the “karma” going to the reposter (who I know in this case), rather than the author — who’s actually discouraged in most communities from submitting their own writings!

This applies in the same or similar fashion to other websites, such as Hacker News, Slashdot, and… is Digg still around? I lost track.

I also find that moving the comments off-post makes people nastier: instead of asking questions ready to understand and talk things through with the author, they assume the post exist in isolation, and that the author knows nothing of what they are talking about. And I’m sure that at least a good chunk of that is because they don’t expect the author to be reading them — they know full well they are “talking behind their back”.

I have had the pleasure to meet a lot of people on the Internet over time, mostly through comments on my or other blogs. I have learnt new thing and been given suggestions, solutions, or simply new ideas of what to poke at. I treasure the comments and the conversation they foster. I hope that we’ll have more rather than fewer of them in the future.

How blogging changed in the past ten years

One of the problems that keeps poking back at me every time I look for an alternative software for this blog, is that it somehow became not your average blog, particularly not in 2017.

The first issue is that there is a lot of history. While the current “incarnation” of the blog, with the Hugo install, is fairly recent, I have been porting over a long history of my pseudo-writing, merging back into this one big collection the blog posts coming from my original Gentoo Developer blog, as well as the few posts I wrote on the KDE Developers blog and a very minimal amount of content from my (mostly Italian) blog when I was in high school.

Why did I do it that way? Well the main thing is that I don’t want to lose the memories. As some of you might know already, I faced my mortality before, and I came to realize that this blog is probably the only thing of substance that I had a hand on, that will outlive me. And so I don’t want to just let migration, service turndowns, and other similar changes take away what I did. This is also why I did publish to this blog the articles I wrote for other websites, namely NewsForge and Linux.com (back when they were part of Geeknet).

Some of the recovery work actually required effort. As I said above there’s a minimal amount of content that comes from my high school days blog. And it’s in Italian that does not make it particularly interesting or useful. I had deleted that blog altogether years and years ago, so I had to use the Wayback Machine to recover at least some of the posts. I will be going through all my old backups in the hope of finding that one last backup that I remember making before tearing the thing down.

Why did I tear it down in the first place? It’s clearly a teenager’s blog and I am seriously embarrassed by the way I thought and wrote. It was 1314 years ago, and I have admitted last year that I can tell so many times I’ve been wrong. But this is not the change I want to talk about.

The change I want to talk about is the second issue with finding a good software to run my blog: blogging is not what it used to be ten years ago. Or fifteen years ago. It’s not just that a lot of money got involved in the mean time, so now there are a significant amount of “corporate blogs”, that end up being either product announcements in a different form, or the another outlet for not-quite-magazine content. I know of at least a couple of Italian newspapers that provide “blogs” for their writers, which look almost exactly like the paper’s website, but do not have to be reviewed by the editorial board.

In addition to this, a lot of people’s blogs stopped providing as much details of their personal life as they used to. Likely, this is related to the fact that we now know just how nasty people on the Internet can be (read: just as nasty as people off the Internet), and a lot of the people who used to write lightheartedly don’t feel as safe, correctly. But there is probably another reason: “Social Media”.

The advent of Twitter and Facebook made it so that there is less need to post short personal entries, too. And Facebook in particular appears to have swallowed most of the “cutesy memes” such as quizzes and lists of things people have or have not done. I know there are still a few people who insist on not using these big names social networks, and still post for their friends and family on blogs, but I have a feeling they are quite the minority. And I can tell you for sure that since I signed up for Facebook, a lot of my smaller “so here’s that” posts went away.

Distribution chart of blog post sizes over time

This is a bit of a rough plot of blog sizes. In particular I have used the raw file size of the markdown sources used by Hugo, in bytes, which make it not perfect for Unicode symbols, and it includes the “front matter”, which means that particularly all the non-Hugo-native posts have their title effectively doubled by the slug. But it shows trends particularly well.

You can see from that graph that some time around 2009 I almost entirely stopped writing short blog posts. That is around the time Facebook took off in Italy, and a lot of my interaction with friends started happening there. If you’re curious of that visible lack of posts just around half of 2007, that was the pancreatitis that had me disappear for nearly two months.

With this reduction in scope of what people actually write on blogs, I also have a feeling that lots of people were left without anything to say. A number of blogs I still follow (via NewsBlur since Google Reader was shut down), post once or twice a year. Planets are still a thing, and I still subscribe to a number of them, but I realize I don’t even recognize half the names nowadays. Lots of the “old guard” stopped blogging almost entirely, possibly because of a lack of engagement, or simply because, like me, many found a full time job (or a full time family), that takes most of their time.

You can definitely see from the plot that even my own blogging has significantly slowed down over the past few years. Part of it was the tooling giving up on me a few times, but it also involves the lack of energy to write all the time as I used to. Plus there is another problem: I now feel I need to be more accurate in what I’m saying and in the words I’m using. This is in part because I grew up, and know how much words can hurt people even when meant the right way, but also because it turns out when you put yourself in certain positions it’s too easy to attack you (been there, done that).

A number of people that think argue that it was the demise of Google Reader1 that caused blogs to die, but as I said above, I think it’s just the evolution of the concept veering towards other systems, that turned out to be more easily reachable by users.

So are blogs dead? I don’t think so. But they are getting harder to discover, because people use other platforms and it gets difficult to follow all of them. Hacker News and Reddit are becoming many geeks’ default way to discover content, and that has the unfortunate side effect of not having as much of the conversation to happen in shared media. I am indeed bothered about those people who prefer discussing the merit of my posts on those external websites than actually engaging on the comments, if nothing else because I do not track those platforms, and so the feeling I got is of talking behind one’s back — I would prefer if people actually told me if they shared my post on those platforms; for Reddit I can at least IFTTT to self-stalk the blog, but that’s a different problem.

Will we still have blogs in 10 years? Probably yes, but they will not look like the ones we’re used to most likely. The same way as nowadays there still are personal homepages, but they clearly don’t look like Geocities, and there are social media pages that do not look like MySpace.


  1. Usual disclaimer: I do work for Google at the time of writing this, but these are all personal opinions that have no involvement from the company. For reference, I signed the contract before the Google Reader shutdown announcement, but started after it. I was also sad, but I found NewsBlur a better replacement anyway.
    [return]

IPv6: WordPress has a long way to go, too

I recently complained about Hugo and the fact that it seems like its development was taken over by SEO-types, that changed its defaults to something I’m not comfortable with. In the comments to that post I have let it understood that I’ve been looking into WordPress as an alternative once again.

The reason why I’m looking into WordPress is that I expected it to be a much easier setup, and (assuming I don’t go crazy on the plugins) an easily upgradeable install. Jürgen told me that they now support Markdown, and of course moving to WordPress means I don’t need to keep using Disqus for comments, and I can own my comments again.

The main problem with WordPress, like most PHP apps, is that it requires particular care for it to be set up securely and safely. Luckily I do have some experience with this kind of work, and I thought I might as well share my experience and my setup once I got it running. But here is where things got complicated, to the point I’m not sure if I have any chance of getting this working, so I may have to stick with Hugo for much longer than I was hoping. And almost all of the problems fall back to the issue that my battery of test servers are IPv6-only. But don’t let me get ahead of myself.

After installing, configuring, and getting MySQL, Apache, and PHP-FPM to all work together nicely (which believe me was not obvious), I tried to set up the Akismet plugin, which failed. I ignored that, removed it, and then figured out that there is no setting to enable Markdown at all. Turns out it requires a plugin, which, according again to Jürgen, is the Jetpack plugin from WordPress.com itself.

Unfortunately, I couldn’t get the plugins page to work at all: it would just return an error connecting to WordPress.org. The first problem was that the Plugins page wouldn’t load at all, and a quick tcpdump later told me that WordPress tried connecting to api.wordpress.org. Which despite having eight separate IP addresses to respond from, has no IPv6. Well, that’s okay, I have a TinyProxy running on the host system that I use to fetch distfiles from the “outside world” that is not v6-compatible, so I just need to set this up, right? After all, I was already planning on disallowing direct network access to WordPress, so that’s not a big deal.

Well, the first problem is that the way to set up proxies with WordPress is not documented in the default wp-config.php file. Luckily I found that someone else wrote it down. And that started me on the right direction. Except it was not enough, as the list of plugins and the search page would come up, but they wouldn’t download, with the same error about not being able to establish a (secure) connection to WordPress.org, but only on the Apache error log at first — the page itself would have a debug trace if you ask WP to enable debug reporting.

Quite a bit of debugging later, with tcpdump and editing the source code, I found the problem: some of the requests sent by WordPress target HTTP endpoints, and others (including the downloads, correctly) target HTTPS endpoints. The HTTP endpoints worked fine, but the HTTPS ones failed. And the reason why they failed is that they tried to connect to TinyProxy with TLS. TinyProxy does not support TLS, because it really just performs the minimal amount of work needed of a proxy. And for what it’s worth, in my setup it only allows local connections, so there is no real value in adding TLS to it.

Turns out this bug is only present if PHP does not have curl support, and WordPress fallback to fsockopen. Enabling the curl USE flag for the ebuild was enough to fix the problem, and I reported the bug. I honestly wonder if the Gentoo ebuild should actually force curl on, for WordPress, but I don’t want to go there yet.

By the way, I originally didn’t want to say this on this blog post, but since it effectively went viral, I also found out at that point that the reason why I could get a list of plugins, is that when the connection to api.wordpress.org with HTTPS fails, the code retries explicitly with HTTP. It’s effectively a silent connection downgrade (you’d still find the warning in the log, but nothing would appear like breaking at first). This appears to include the “new version check” of WordPress, which makes it an interesting security issue. I reported it via WordPress Hacker One page before my tweet got viral — and sorry, I didn’t at first realize just how bad that downgrade was.

So now I have an installation of WordPress, mostly secure, able to look for, fetch and install plugins. Let me install JetPack to get that famous Markdown support that is to me a requirement and dealbreaker. For some reason (read: because WordPress is more Open Core than Open Source), it requires activating with a WordPress.com account. That should be easy, yes?

Error Details: The Jetpack server was unable to communicate with your site https:// [OMISSIS] [IXR -32300: transport error: http_request_failed cURL error 7: ]

I hid away the URL for my test server, simply to avoid spammers. The website is public, and it has a valid certificate (thank you Let’s Encrypt), and it is not firewalled or requiring any particular IP to connect to. But, IPv6 only. Which makes it easy for me, as it reduces the amount of scanners and spammers while I try it out, and since I have an IPv6-enabled connection at both home and the office, it makes it easy to test with.

Unfortunately it seems like the WordPress infrastructure not only is not reachable from the IPv6 Internet, but it does not even egress onto it either. Which makes once again the myth of IPv6-only networks infeasible. Contacting WordPress.com on Twitter ended up with them opening a support ticket for me, and a few exchanges and logs later, they confirmed their infrastructure does not support IPv6 and, as they said, «[they] don’t have an estimate on when [they] may».

Where does this leave me? Well, right now I can’t activate the normal Jetpack plugin, but they have a “development version”, which they assert is fully functional for Markdown, and that should let me keep evaluating WordPress without this particular hurdle. Of course this requires more time, and may end up with me hitting other roadblock at this point, I’m not sure. So we’ll see.

Whether I’ll get it to work or not, I will share my configuration files in the near future, because it took me a while to get them set up properly, and some of them are not really explained. You may end up seeing a new restyle of the blog in the next few months. It’s going to bother me a bit, because I usually prefer to kepe the blog the same way for years, instead, but I guess that needs to happen this time around. Also, changing the posts’ paths again means I’ll have to set up another chain of redirects. If I do that, I have a bit of a plan to change the hostname of the blog too.

If you make me register on your blog, you should feel responsible for it.

I was discussing with some friends a few days ago about the need for users to register on blogs to leave comments. Myself, if I don’t see at least an option to use OpenID (which in my case goes to StartSSL) I tend not to comment at all, pretty happy with choosing Disqus with Facebook or Twitter login over custom registration forms (but as you notice, this blog has no registration at all).

So why do I take this stance? Well, it should be obvious but seems like not everybody guesses it properly. The reason is that the moment when you make users register to your blog, you should feel responsible for their safety and security. The moment when you make them choose a password, it’s more than likely that the majority of them is going to choose a “usual” password. And that can be a very nice prize for a group of cybercriminals looking into getting access to Amazon or Google accounts.

If you don’t believe that online account credentials are actually worth something, you might want to read this article by Brian Krebs which explains which uses bad people have for your email account.

Now, it’s true that if you attack one single big fish such as LinkedIn, or more recently Ubuntu Forums or Apple’s dev site, but those are hard to crack for the most part, which means that they are out of the league for the most crooks out there. On the other hand, especially thanks to people abusing plugins badly, WordPress installs can be cracked in just a few minutes each, and while each is unlikely to bring more than a handful of passwords, actively scanning for vulnerable WordPress instances is very common. I’m happy that the one WordPress I manage is behind ModSecurity, GrSec, and PHP running in FPM as its own unprivileged user.

So please, please, please: try your best not to make people register on your website with a password. It’s not safe, not for you and not for your users. For them because if you get cracked, their safety is at risk. For you because you become a very yummy target for crooks.

Why users shouldn’t register to general sites

Like many other people out there, I like looking at the statistics synthesized from the web server access logs, to know who writes about me, and what people are interested to read that I wrote. And as many of those, I produce the statistics through awstats — with its multitude of problems and limitations. What has this to do with the post’s title, you might ask? Well, awstats instance (as well as other log analyser software), in particular the public ones, are the major reason why you shouldn’t allow people to become users of general sites, such as blogs, and other sites created using various kind of content management systems. Let me try to quality this statement.

The first problem is: why register in the first place? Quite a few sites let you register to store things like a signature for comments, an avatar, and little niceties like that. Most of them, make sure that you have to confirm your email address as well as provide a captcha to make it (slightly) more harder for bots and spammer to register and leave comments. These might be nice things to have, but I don’t think they are really that worth it: you can use cookies to store the basic details like name, URL and email to use for comments (a lot of sites nowadays do that, I think WordPress supports it out of the box), and services like Gravatar make it much cleaner to handle your avatars.

So why shouldn’t you register anyway? After all, what bad can it do, to have those things stored on server-side? Well there is one very simple reason: spam and bots again. Leaving alone the sheer problem of XSS injection in the displayed content, when your users get an “user page”, that displays, for instance, their homepage as a link, they are a glutton dish for spammers. When I say that, most people tend to answer me that comments on blogs are already quite nice for spammers, and they encompass a much broader range of pages at that point, which is true, but at the same time, comments tend not to go unscrutinised. You check for new comments, and if you’re the blog author you most likely read the comments’ feed to see what people say about your posts, so you can find spam comments with relative ease.

On the other hand, I don’t know of many web application that let you scrutinise users, and especially users that change their details. Once a spammer has registered, it might just wait for a week before changing the homepage link to an URL pointing to SPAM, scam or whatever else. Would you notice? Most likely, you wouldn’t notice — especially if the comments were auto-generated well enough. And that would mean more links to the spamming site. The solutions for many hides in having rel="nofollow" in the links to the commenters’ posts so that search engine won’t index them. This only works to a point, given that some lesser-sophisticated crawlers ignore that option and then reproduce the links without the nofollow.

It gets even worse: some websites don’t put rel="nofollow" at all on the users’ account pages, which obviously list the address to their portrayed site. Obviously though such pages are difficult to reach; sometimes they are even not linked at all from the pages of the site itself, as the webmaster often think of that problem. To work around these issues, there is one very easy way: you make use of the publicly-available awstats (and other analysers) instances: you send requests to some website with the referrer set to the user page, enough times that it shows in the top-ten referrer summary. The webcrawlers will do the rest for you.

But why using an user page rather than directly the website to spam for? It’s definitely a compromise: from one side, the indirection only works with the most vulnerable websites and with the least sophisticated crawlers; from the other, even the least sophisticated anti-spam system can detect a handful of known-bad referrers (I have my own blacklist for this blog, it helps reducing the amount of bandwidth used for serving content to spam-bots), and you cannot add to that list general news, portal or blog sites that allow users to register, at least not easily (you can possibly use a regular expression to solve the problem, so to only reject requests coming from users’ pages, rather than from the whole site that might actually point to your server, but it adds CPU processing power to the mix).

This said, my suggestion is once again,the same: don’t try to add users’ registration where it’s not due. Find alternative ways around that; leave comments without registration, use services like disqus, support a third party authentication system like Google or Facebook, just don’t make it be a new user. And if you do, make sure you have a way to review possibly fraudulent users quickly and on a schedule, so that you don’t end up hosting trampolines for various kind of spammers!

Blog posts are no replacement for documentation

Since I was asked in a previous post I’d like to make some notes about why I “document by blog post” in so many occasions.

I know perfectly well that my blog posts are no replacement for proper documentation; code, procedures and policies need to be properly documented, and tied to the project they are supposed to document. Documentation by blog post is difficult to write, manage and search, and can be indeed useless for the most art.

So why do I write it? Well, most of the time I start a blog post with some ideas in mind, write down it, and then depending on the feedback I either continue the topic or drop it entirely. I guess the most prominent counter-example is the For A Parallel World (which I know I haven’t updated in a while).

Writing proper documentation is important, and I know that pretty well, I have written and ranted about that before as well. And it’s knowing that, that I started the Autotools Mythbuster project which, to be honest, has given me mixed feedback, and satisfaction. The problem is: writing a blog takes just a modicum of effort, because I don’t have any obligation about form, or grammar, or language; I might soft-swear from time to time in a post, I might rant, I might have some smaller mistakes around, both in grammar and content, and so on. I don’t go updating blog posts to fix grammar and style and so on. Writing complex and organized documentation requires a lot more work, and when I say a lot I mean quite a lot more. Of course the result is also of much higher quality, because of that.

I have tried finding alternative routes to get the good results out without having to just apply that much effort in my (unpaid) free time; the first option was LWN, which actually helped me paying for a good part of Yamato’s hardware. Unfortunately LWN is not a perfect solution for me; partly because my topics tend to be quite low-level, too low-level for the LWN readers I’m afraid, and too distant from the Kernel as well (which is probably the only low-level area that LWN really writes a lot about); the other problem is that LWN is still something similar to a magazine, a journal, and thus does not allow an easy way to organised documentation; like autotools-mythbuster is. It would still be a puzzle of entries; of higher quality than a blog, but still a puzzle.

The classical form for organised documentation is that of a book; in today’s age, ebooks are also quite often used, to avoid the whole mass-production and distribution trouble for topics that might not be of enough interest (interestingly enough, that’s not true still for a lot of books, so lately I actually had to by more paper books because I couldn’t find PDFs of them to use with the Reader). Now, this also have troubles; as you might remember I already tried looking for a publisher for Autotools Mythbuster, before going with the open project it’s now.

The idea behind that would have been putting as much effort as possible into that single piece of documentation, complete it as much as possible and get it out in some complete form. There you go: high-quality results, paid effort, and organised up. Unfortunately, finding a publisher is never an easy task, and for that topic in particular, I ended up hitting a stone wall: O’Reilly already had somebody working on the topic, and the book is out now I think (I haven’t read it). This actually was ignoring a problem with classical books: they cannot easily be updated; and documentation often has to be, to correct mistakes, grammar, style, and especially to be kept up to date with what they document. For instance, Autotools Mythbuster has a specific section on forward porting (which I’ll probably keep updating for the future versions as well).

So the final option was making it an open book; again, the effort is not ignorable, so my first solution was to write on it on a donation basis: would have covered the effort I needed to put into it, and would still have been able to be there for everybody. I didn’t count in the fact that the topic is too developer-oriented to actually be of any use to people who would be donating. Indeed, I wish to thank the last two donors (in terms of time), Thomas Egger (who sent me a good mouse to replace the stupid Mighty Mouse, you’ll soon see results about that, by the way), and Joseph Booker (who sent me some books, I started with The Brief Wondrous Life of Oscar Wao because I was meaning to read it for almost two years now, but the useful one will soon prove useful, I’m sure). But they, like most others, never explicitly named the guide. And so I’m trying to find more time for the general postings than that in particular.

Just a note before you start wondering about the guide; yes I haven’t updated it in a while. Why? Because I sincerely feel like it’s not useful any more. As I said it requires a positive amount of effort to be extended; there is, true, some interest on it, but not enough to actually have moved anyone to ever try funding its extension. With O’Reilly now publishing a complete book on the matter, I don’t think it’s worth my time keeping it up. I might still extend it if I have to correct some build system, or if I discover something new, but not going to keep extending it by my own will without such a need.

Bottom-line: I could probably write more extensive, organised, and precise documentation about lots of stuff, especially the stuff I write about on the blog from time to time, but the problem is always the same: it requires time and effort; and both are precious commodity; most of my time is already committed to paid work nowadays, and Gentoo is getting more and more to the third place (first is work, second health). Documenting what I can with the blog is, in my opinion, still better than nothing, so I’ll keep doing that.

Why moderated comments can be a problem

You might know already that I don’t like moderating comments; I did for a long time because of spam, but nowadays I prefer filtering comments out with mod_security based on User-Agent and other diagnostics. One of the reasons why I don’t like moderated comments is that, often times, comments can correct a wrong blog post and make it not extremely bad.

I don’t pretend I’m extremely good at what I do and that I never make mistakes; I’m sure I do, but if I say something very stupid, usually somebody corrects me on the comments, and the post still keep a value. When I see posts about people reinventing the wheel, and making it hexagonal for some reason (like reinventing grep --include -r with a script using find and a for loop), and find out that the comments are moderated, then I’m usually appalled. First, because lots of users that don’t know better will read the post and apply what it says without thinking twice about it. Second, because in the comments, that appeared in a batch right away, beside a number of duplicate suggestions, there has been even more suggestions in a number of polygons, but just a couple of really round wheels. You probably know what I’m referring to if you follow some of the planets out there, I don’t really want to name names here.

Today, another example of moderated comments that hinder clearing up a blog post that isn’t really proper. When you rant about a software or a feature, especially when you explicitly say you don’t understand why it does what it does, leaving open comments allows for people to actually solve the mystery for you; if you moderate it, you’re probably wasting time of more than one person who has the answer, since they’ll probably try to explain it when they see no comments present already.

Sigh.

Getting the news out

After my previous post about manifestos I’ve had a short conversation with Neddy about the need to know more about the candidates when you come to vote.

I agree at the moment there isn’t much material to go by to judge candidates from. Last year we had a GWN edition in which the candidates there interviewed on a few questions. Unfortunately I’m afraid there is not enough time for a similar GMN edition this time, for two main reasons: 1) the election is not in the scheduled timeframe; and 2) the time was shortened to two plus two weeks instead of the old one plus one month.

What I’m going to write about now is some ideas I had in the last two days about improving the situation, looking at next year’s election, rather than this year, that is done as it is already. It’s not something that I started thinking just two days ago though, I discussed about similar issues with Araujo before, when I mistakenly thought his project was abandoned, and it was discussed before too, as users thought Gentoo was dying because we failed to put out news on what’s going on behind the scenes. It’s a recurring problem.

The first problem is that there are too many “silent” developers in the project. I tend to write a lot on the blog, not only about my personal stuff, but also about Gentoo work, ideas and development (which is what appears on Planet Gentoo). When I’m doing something, it’s likely that both other developers and users following the Planet will know about it and its details. Almost always better than those who only follows the mailing list, as I admittedly fail when it comes to write status reports on the mailing lists (I’ll return to that later).

While I’m criticised for writing about details, and alleged to do that just to make small things appear like a big deal (which is never my intention, I just think people should know about details too, I don’t usually minimise others’ results, and if the others don’t write about what they do it cannot be my fault), I think other developers should try to follow this path by writing more about what they are doing in their roles.

I suppose I could try to make a bot follow Planet Gentoo and create some statistics of who blogs there and how much.

But this is far from being the only solution we have. Although the idea of doing regular Status Reports by various teams seems not to get going too much (we tried that before, but after a couple of iterations people tend to forget about it entirely), there is a different approach that I don’t think we tried yet. It requires more workforce, although non-developers are perfect to deal with this. Instead of waiting for teams to provide their Status Reports, query them repeatedly at a given time so to know what they have been doing.

Instead of looking at FreeBSD Foundation, like William seems to want, I’d say we look at the FreeBSD Project, and in particular at their Quarterly Status Report, which summarises the most important things that are going on in their project. If you want the minutiae, instead, you can refer to Planet FreeBSD where developers seems to write a lot about their internals.

Another possible approach is the one taken by the KDE project, with their KDE Commit Digest (I would have linked to it but they are in the middle of a move). We could have users looking at the commits happening in a given week, and put out a summary of the notable changes. With notable I mean version bumps for new major versions that the user might be interested in, new USE flags that allow doing stuff that wasn’t possible before, long-awaited fixes, and obviously changes in frameworks like Portage, Layman, Catalyst, and so on.

Looking again at all the BSD-based projects at once, now, you can also note that bsdtalk has quite some issues out there, with interviews to a wide range of developers. Although LinuxCrazy covered Gentoo in the last few issues interviewing Donnie, Neddy and Mike, we can’t really compare the two at the moment, can we? Kudos to David for the effort up to now though.

And returning on something I wrote about recently, Ohloh’s journals can be useful for keeping in touch with users, too. As far as I can see right now I’m the only one using them for the Gentoo project, but it would be nice if more people said what they are working on sometimes, just to let it be known, even if they don’t want to write a full blown blog about that.

This entry is getting longer than I expect actually, so I’ll probably cut short now, and wait for people to comment on this, I really would like to know opinions of developers and users on the issue and on my proposals.

Later, or tomorrow, I’ll write more specifically on how to improve the knowledge available to make a decision on council elections.

How’s this year Summer of Code coming?

I hope Joshua won’t get mad at me, but I have to write about this, maybe it will act as a good way to get the mistake noticed.

I’m afraid this year’s SoC is going to follow the path that the previous two instances took already. What makes me afraid of this is that there is little to no coordination between parts.

First off, the announcement for SoC was pretty late, GMN didn’t talk about that at all, which is already a negative bonus. Considering the short timeframe that applicants have to submit their ideas, it isn’t a very nice idea at all. For what it’s worth, it wasn’t even listed in the LWN announcements.

The official SoC ideas page got some new additions, but they came pretty late, not soon enough to give time to the students to start thinking of what to do, and maybe discussing it with the contacts.

There is also a shortage of mentors. I’m afraid this had to be foreseen, there is little to no incentive for mentors to actually do their work, there is little project spirit around lately, and I do understand it. Finding a way to actually get more mentors next year is not going to be an easy thing to solve, so I think we should start looking into that already.

And even with the very few mentors that are around, I can’t see much coordination. I’m not on IRC at the moment as I’m from the laptop, but I got Jabber and my mail client open, neither gave me any information about being accepted as a mentor or about the URL of the mentor’s dashboard to see the applications!

I don’t see any soc@gentoo.org alias or anything like that, and that is also a bad thing: I got a few users contacting me for some ideas, because I actually blog (and care) about Summer of Code. I had to refer them to other developers because I can’t handle them, not my area, or just not something I’d feel comfortable to mentor. Having a single alias that users could write to would allow all the developers interested in SoC to answer as they see fit. Yeah sure there is the mailing list, but you can guess that most people wouldn’t like to make their application’s details public, after all, they are not public even after SoC closes.

The deadlines, short as they are, were not posted on the recently created Gentoo Calendar (at Google of course); while just recently born, it would be a nice addition for this kind of stuff.

Up to now I listed the problems that should have been avoided by the SoC team itself (note to self: try to cut away more time next year so you can be part of the team and make the changes), but the biggest problem of all I wanted to leave last.

I think that both me and Donnie tried to make this point before, but Gentoo developers should really try to blog more. In today’s status of Free Software, blogs are often used to share and bounce around ideas, and to make projects and subprojects more advertised. Try to compare Planet Gnome with Planet Gentoo, and let me know.

In particular, there is just no material on Summer of Code in Planet Gentoo! Just me, Luca and Joshua blogged about it, as far as I can see. I’ve been trying Google Reader in the past weeks (which turned out to be quite good now that I don’t have my Akregator at hand), and I’ve started tagging all the posts I seen (not even read fully!) who wrote about Summer of Code. The result is right now 45 items, and please be known that I started on March 19th, with the exception of one post I was interested in and decided to look up afterward. The vast majority of the posts come from Planet Gnome, which I named before, but there are many posts from Planet KDE too.

I’m sure there are way more posts about Summer of Code around, I just probably don’t follow a lot of blogs of other projects involved, but the fact that Gentoo is not so much on that list is not something I like.

This entry will add to the list, though I’m not happy with this. I really really really hope next year we can avoid these mistakes.. at least I can say I tried though.

Tips for localizers, even from Microsoft

Many might not remember this, because I haven’ t blogged on that topic in quite a long time, but one of my interests is also localization of software. This mainly springs from the fact that I saw and I continue seeing people who don’ t even want to use Linux because they don’ t know English nor they are willing to learn it just for that.

For instance, I wrote before a proposal for internationalizing ebuilds, early this year. I also tried coming up with a feasible way to internationalize init scripts without having to deal with different scripts per every language possible.

With my interest in internationalization and localization, I also started following a Microsoft employee blog, Sorting It All Out by Michael S. Kaplan. A very interesting blog that deals with international issues like language support, Unicode and keyboards. Even if it’s of course mostly centred upon Microsoft products, it’s still an enlightening reading for Free Software developers, as it tends to explain the reasons for some choices made in Windows for properly support internationalization.

Also, the blog is quite interesting because it really takes a critic eye on the problems, showing even what Microsoft did wrong, and those are errors that other developers should really learn from. He also is a Mac user, beside the obvious Windows user, so he sometimes compare Apple and Microsoft products, giving an objective look at the implementations. So it’s really a suggested reading for any of my readers also interested in internationalization and localization problems.

Anyway, today I read his entry about redundant messages, and I was sure: Free Software developers should really learn to check out technology blogs even when they are from “the other side”.

That’s really a common mistake for free software too, using way too many strings to convey the same message. This makes translation quite more difficult, sometimes very difficult, and they might as well confuse users pretty badly.

The same applies to non-identical messages, and I’m actually seeing this in xine right now. The description of plugins is internationalized, but the problem is that even similar plugins use very different descriptions, This means that fuzzy translations can’t really help with translating new plugins.

So for 1.2 one of the entries in my personal TODO list (which I should remember to write on xine’s actual TODO). is to design and document a proper description scheme to be followed by the description of the plugins, this way the description would follow the same scheme, wouldn’t throw off the user with very different messages, and will make translation quite easier.

Kaplan announced that today’s post will be the first of a long series; I will certainly follow it so that I can learn from it and make it easier to localize xine.. then I’ll be hoping that more people will join the xine project to update the translations. On this note, I’ll also write a new entry to call for translators.

Update (2017-04-28): I feel very sad to have found out over a year and a half later that Michael died. The links in this and other posts to his blog are now linked to the archive kindly provided and set up by Jan Kučera. Thank you, Jan. And thank you, Michael.