Internationalisation is important

I wrote before about internationalisation. Albeit I’m a good English reader (I don’t speak or write it too well, but I can read it pretty well), I do think about the people who are not this lucky, and I can ensure you, I know a lot of those people, because in Italy it’s not as in a lot of other countries where knowing English is a common thing.

One of the nice things I can tell about Fedora is that they are quite concerned with this kind of problems, and they deprecated GTK+ 1.2 because its Unicode/UTF-8 and i18n supports are pretty much missing, while GTK+ 2 has a very good support for both. This was quite a good thing to do, in my opinion, because it focuses the attention of people to the need for modern toolkits and reasonable translation support.

Ubuntu also is concerned with the availability of free software for people who don’t speak English, and this is shown by their tries to improve the availability of translated software through the Rosetta application in Canonical’s Launchpad framework — that if I have to be honest I don’t trust, as I don’t understand the reason why they don’t try to free Launchpad, if they have the success of Free Software at heart, but that’s another story entirely.

I also read quite a bit of entries in Planet Debian about their efforts to improve localisation support in their packages, and that is quite good.

Something I instead never seen in Gentoo are people concerned with improving the internationalisation support in Gentoo, either packages, tools or ebuilds themselves. I used to fix some ebuilds so that they answered correctly to LINGUAS instead of abusing the cjk flag, I used to work on CJK packages to improve them, so that they actually worked, and I started adding proper support for the LINGUAS variable to a few KDE packages, but these are really little things, they don’t really change the user experience so much, as the bases still require you to know English to an higher degree than is required for other distributions.

Maybe asking for every bit and string of Gentoo being translated is too much of an overkill, but it would be nice to see at least some tests on getting Portage, baselayout, or whatever else to actually be translated.

Why am I writing all of this? Well if there is one thing that I suppose everybody would agree on about last year’s Summer of Code, most of the ideas proposed weren’t thought about for too long, and the results were badly affected by this. So as now we start talking about Summer of Code 2007, I want to write my proposal here.

It would be nice to have a project for Google Summer of Code 2007 that actually has an impact on users. Someone could try to add good i18n support for Gentoolkit, or for Portage-Utils, maybe even for Portage itself. Although these seems to be easy tasks, there is at least one that is non trivial and might be worth pointing for: finding a good way to allow internationalised messages in init.d files and ebuilds. For init.d files it might be easier than you think: gettext can be used to translate strings directly, you just need to have a tarball with the init.d file, the .po files, compile the .mo files, and get baselayout to have a “echonls” function so that the strings you echo there are passed through gettext (if you installed baselayout with nls useflag of course). Ebuilds can be harder, but I have a couple of ideas myself, the problem is to find a way that is good enough not to hinder the general ebuild maintenance, and not to bloat the tree too much.

I know most likely my suggestions won’t be considered at all, or might just be considered ramblings of a fool former dev, but, well, I have still hope that someone will work on that sooner or later.

Another appeal for CJK improvements

I’m trying a new appeal looking for proxy maintainers. Thanks to Patrick, we’re finding more and more packages currently rotting down in ebuilds that don’t get updated, and sometimes not even bumped.

Unfortunately, I’m not good enough with Japanese (and I’ll leave alone other languages of which I don’t really know anything) so what I can do is just simple testing and looking if it seems to work fine. Zhang Le is already helping taking care of zhcon, but there is plenty of ebuilds that needs to be cleaned up and fixed. I’ve fixed ochusha a few minutes ago to install the desktop file in the right place, so that it appears on menus, but I can only find these things when I stumble across them.

So, once again, if you’re interested in helping Gentoo to support CJK packages, please contact me or the CJK team.. if you’re maintaining locally or in an overlay a particular ebuild because the one in portage is outdated or broken, open a bug for it and state clearly that you’d like to be the proxied maintainer for that, and I’ll look into it.

I’ll be trying to run an announce on GWN about this too, let’s hope I can find enough people to help with this ;)

English .po files

I think I talked about this before, but I don’t remember when, and a quick search through my blog didn’t find anything (it might be because there are way too many posts that I can remember of).

Anyway, when I talked about i18n, I’m sure I at least put a line on the possibility of creating .po file for English language, may it be neutral English or either American or British English, but in any case, this is easy to do, and it’s a reason to leave nls useflag enabled even for people who care nothing about languages that are not English.

This is a pretty neat way to handle proper English grammar, considering that a lot of authors simply use other languages and thus is unable to always write proper English, to the result that often you find broken C locales that don’t really sound like English (I know, neither my blog sounds like English :/).

So, why do I talk about this now again? Well, there’s an interesting reading on Michael Kaplan’s blog (basically the only MSDN blog I follow), that might be interesting to read to people that would like more proper English messages.

I think this is one of the things that are easily ignored when thinking about internationalisation, but the differences between the language spoken in a part of a country and the one spoken in another are often big enough to make them almost different languages, so thinking on a larger scale (different countries), it does make sense to consider them different languages… although I admit I have no idea whether Swiss Italian is any different from our, and how.

I’m not a linguist, at all, even my Italian advanced grammar is quite bad (although good enough to please my Italian teacher during high school, and not get him more angry at me, as I wasn’t even studying history :P), but I admit curiosity in the matter, and michkap’s blog is a good reading for that, I would suggest it to anybody who’s interested in i18n, even if it usually talks about Windows and other Microsoft products (of course), as it provides hints and insights that should be consider also when designing Free Software.

Update (2017-04-28): I feel very sad to have found out over a year and a half later that Michael died. The links in this and other posts to his blog are now linked to the archive kindly provided and set up by Jan Kučera. Thank you, Jan. And thank you, Michael.

Don’t you hate when…

… you’re working on cleaning up a series of ebuild, and while looking for configuring one your whole IME setup goes down because skim crashed, and neither restarting X helps? I do.

And I do even more when this happens during the night, and I’m already nervous because I was sick the whole day, and it’s too hot for my mind to stay clear.

So, as I was pretty pissed off at that, I decided to simply shut everything down and tomorrow I’ll see what to do. If the 9250 arrives before noon, I’ll probably put it on before actually doing any work, so it’s a good idea to leave the box turned off tonight.

In particular I would end up burning myself again if I tried to remove the GeForce out of that box after leaving it on for some days without ever shutting it down.

Unfortunately I was also trying to update Defiant so that I could keyword a few things, but that will have to wait.

So, what I was working on before getting hindered by my own X? After a bug received about scim-tables I passed the night looking at the scim methods ebuilds that needed to be cleaned up, scim-pinyin and scim-tables in primis, as they had an optional automagic KDE support that now came down to a kde useflag.

There is a lot of work to do on the CJK ebuilds I’m afraid, mostly because I see the same errors repeated over and over, probably because of the ebuilds copied from one package to the other, and cleaning them up to have a decent style is going to be a long work. At least I can try to make myself useful in CJK herd other than patching Qt (considering that daisuke didn’t release an official patch for Qt 3.3.6 yet).

Oh of course yesterday Amarok got added to portage, and there is trouble with that, as usual. It’s a quite widespread application, so it’s somewhat normal that people find troubles with it the same day it’s released, sigh. I’ll try to blog tomorrow after I’ll have fixed a few errors.

Now I really need something to calm myself down… I’ll read a book, as yesterday I finished The Wolves of the Calla by Stephen King, and I usually put a different genre of books between every book in the Dark Tower series to find it even more stimulating.

Another nice trick… improving English of non-native programs

So, in the way for internationalisation (gee, I spelt it entirely this time) there’s another thing to take into account, that I haven’t before. A lot of software is not written by native English speakers.

Why that is important? Well it’s quite normal for people who don’t speak English, although if they can read it, to commit errors when they write messages for a program, and of course that’s someone one might not want.

So, what’s my solution? The first common solution is to fix the strings theirself, improving them in the “C” locale, that is the literals as they are hardcoded into the program. But this has a quite big downside: all the po translations will be invalidated for that string, as the identifier changed.

Yes of course this has to be corrected when the error is so that you don’t understand what the problem is saying, but it’s not a good idea to break the translations to improve the general spelling and grammar of the messages, if they are understandable anyway.

So, how can this be fixed without breaking stuff, you’re asking yourself? (Or at least I hope you’re sensible to i18n problems enough to ask that to yourself :) )

There’s one simple way to improve the English spelling and grammar without breaking other translations, and is to provide a translation for the “en” language. An en.po that will fix the spelling and grammar so that they make more sense for the native English speakers, leaving the C locale to be just want can be understood. Under this idea, having nls enabled for English users has quite a lot of sense.

I’d like to see this trick used so that i18n can improve without being rejected every now and then :) So if you’re a native English speaker, submit translations anyway! That will help all the rest of native English speakers. Of course one has to pay attention: a British translation would be en_GB, an American English translation would be en_US, while an International English would be just en.

If anyone wants to submit International, British or American English translations for unieject, the pot file is here.

My personal i18n: fixing xine-lib

Okay, so today I got CVS access to xine-lib, tonight I checked out a copy of the CVS, and started investigating a little glitch I seen the other day but never actually gone looking for it.

The first thing I gone to look for was an error about xine being unable to identify the current charset… this ended up being xine-lib relying on a gettext internal macro that, as any other gettext macro, is unreliable and often changes without notify. I replaced that with a proper check for nl_langinfo() function that finally solves the problem.

Then it was the time to understand why the translation although present and installed wasn’t used.. it ended up being the gettext domain used that was different from the one installed. Fixed that now on upstream CVS, going to put a patch for that in Gentoo.

Still, I do have problems with some strings not coming up translated, and I’m afraid that somehow xine is taking up the pre-built .gmo files that are not update with the new Italian translation for instance.

I’m going to fix that too, and also going to replace the old command used to find the translatable files with something that does not take three pointless commands (find, xargs, and an extra grep).

Anyway, the work to do is really a lot, I’ll try to do my best so that before 1.1.2 release i18n support is complete. I’m really looking forward if someone is going to provide new translations for xine-lib and xine-ui :)

Update: (time: 3:58 am) I finally fixed that problem, it was indeed caused by the pre-built .gmo files; unfortunately I also found that the file was regenerated by a command that considered two files that are not tarballed (because moved somewhere else), so I had to fix that, too. Unfortunately this broke my almost perfect Italian translation, so I had to fix the translation of the ~20 strings missing :)

Do we need an i18n project?

So, people who follwo me since a long time knows I’m one of those people who likes to throw idea, make them start, and when they are mature enough to work by themselves find something else to start.

Although some people – me included – find this behaviour a bit strange and unsatisfing, up to now I think I was able to get some stuff working fine, like the —as-needed things that now most of the devs fix by themselves (although of course I always try to give a hand fixing the most difficult things).

Now, in the past days I wrote about translation and i18n in genre. I updated xine-lib and xine-ui in the tree with updated po files for Italian language, and as I said, I’ll wait patiently for other people to send me po files for other languages. As of tonight, I also created an Italian po file for gxine HEAD, that I sent to Darren for next version, it’s an original translation this time as there wasn’t one before.

I was thinking of translating VLC, but there are too many strings for myself alone, especially as I need to focus on my job still for a few days, and leave myself enough breath to take care of other stuff (as my felow developers already knew, in the past days I came very near to burn out because of people repeatedly asking me to find a solution to a problem that has no solution since… years – the infamous dependency range – without even reading the f..ine manual that would have already told them what they were asking me, and I’m not referring to users, but to devs – if they were users, it was fine by me).

Anyway, I was now reflecting that it would be interesting having a i18n project. Such a project could take care for instance of starting that increased UTF-8 support in Gentoo that was discussed more than one time in Gentoo before and nobody actually worked on. Yes, GDP and GWN has a lot of translators already that allwos to provide documentation and news in many differnet languages, but the general support for other languages is poor, starting from the metadata that misses lots of languages and it’s not always updated.

One of the problems is due to the fact that most of the GDP translators have no access to gentoo-x86 module, being doc developers. I think an i18n project should take care of this too.

Trying to summarise my thoughts now, this project would be in symbiosys with GDP and GWN, so that translators can be resource (if they want) for both; it would require gentoo-x86 committers to update metadata there, it would take care of making portage i18n capable (always if portage devs like the idea of course), and of updating the translations for the software in portage that’s not updated already. There are lot of stuff also Gentoo related that could use some i18n support, although some of it could be tricky a lot (think of pax-utils and portage-utils, that I don’t think vapier and solar are much intersted in making i18n-capable :) ).

I should really try to find some other developers interested in this, and then start thinking of proposing this officially…

My translation scandal

Maybe I will focus on me more and more flames this way, but I’d like for this to actually become something more widespread.

I fully translated xine-lib today, all the message but a huge one that is an help screen for a plugin (too long for me to translate myself).

Now if somebody wants to provide a new po file to update xine-lib’s ones, just mail me or the xine-devel mailing list, I’ll put a copy of the file in the patchset for xine-lib and push a new one, so that emerging xine-lib will take the update po file.

This is one of the ways that people can help Free Software even if they can’t program. I hope that we will have enough people out there wanting to update the translations so that we get better experiences for non-English speakers. We already have a good translated documentation, translating software is another interesting thing to do :)

Oh I didn’t forget about the i18n-able portage, I actually spoke quickly with Antarus, I hope I’ll be able to convince the portage devs to do that for 2.2 when the new branch opens :)