The UTF-8 security challenge

I make no mystery of the fact I like my surname to be spelt correctly, even if that’s internationally difficult. I don’t think that’s too much to ask sincerely; if you want to refer to me and you don’t know how to spell my surname, you have a few other options, starting from my nickname (“Flameeyes”), which I keep on using everywhere, included the domain of this blog because, well, it’s a name as good as my “real” name. While I know other developers, starting from Donnie, prefer to be recognized mainly by their real name; since I know my name is difficult to type for most English speakers, I don’t usually ask that much; Flameeyes was, after all, more unique to me than “Diego Pettenò”, since of the latter there are other three just in my city.

But even without going with nicknames, that might not sound “professional”, I’m fine with being called Diego (in Gentoo I’m the only one; for what concern multimedia areas, I’m Diego #2 since “the other Diego” Biurrun takes due priority), or since a few months ago Diego Elio (I don’t pretend to be unique in the world but when I chose my new name, beside choosing my grandfather’s name, I also checked I wouldn’t step in the shoes of another developer), or, if you really really need to type my name in full, “Diego Petteno`” (yes there is an ascii character to represent my accent and it’s not the usual single quotation mark; even the quotation mark, though, works as a tentative, like for banks and credit cards . If you’re in a particular good mood and want to tease me around you could also use 炎目 (which is probably a too literal translation of “Flameeyes” in kanji); I think the only person ever using that to call me has been Chris (White), and it also does not solve the issue of UTF-8.

Turns out it’s not that easy at all. I probably have gone a little overboard the other day about one GLSA mistyping my name (it still does), because our security guys are innocent on the matter: glsa-check breaks with UTF-8 in the GLSA XML files (which is broken of glsa-check, since you should not assume anything about the encoding of XML files, each file declares its own encoding!), which makes it hard to type my name; tthe reason why I was surprised (and somewhat annoyed) is that I was expecting it to be typed right for once, py handled it and I’m sure he has the ò character on his keyboard.

Curious about this, I also wanted to confirm how the other distributions would handle my name. A good chance to do that was provided by CVE-2008-4316 (which I discussed briefly already ). The results are funny, disappointing and interesting at the same time.

The oCERT advisory has a broken encoding and shows the “unknown character” symbol (�); on the other hand, Will’s mail at SecurityFocus shows my name properly. Debian cuts my surname, while Ubuntu simply mistype it; on the other hand, Red Hat is showing it properly; score one for Red Hat.

One out of four distributions (Gentoo has no GLSA on the matter, but I know what would have happened, nor the CVE links to other distributions, just a few more security-focused sites I’m not interested about in this momet) handle my name correctly, that’s not really good. Especially, I’m surprised that the one distribution getting it right is Red Hat, since the other two are the ones I usually see called in the mix when people talk about localising Free Software packages. Gentoo at least does not pretend to be ready for internationalisation in the first place (although we have a GLEP that does ).

Okay I certainly am a nit-picker, but we’re in 2009, there are good ways to handle UTF-8, and the only obstacles I see nowadays are very old legacy software and English speakers who maintain that seven bits are enough to encode the world, which is not true by a long shot.