Inclusive Language, Inclusive Documentation

I have been considering writing something about this for a while, but at the same time, I still feel it’s not my place to write about this. But in a world of hot takes and shitposting, I guess I still it useful to express my opinion with more than a few threaded hundreds of characters. But before I dig into a topic I’m not entirely apt for, let me try to give a quick summary of the issue at hand, for those who might not have worked with North Americans long enough to be exposed to this — feel free to respectfully comment on the post if you think I am simplifying too much.

At the core of the issue is that some of the words that made it into the tech jargon (software and hardware) come from the past, and carry with them a number of racist, sexist, and in general divisive connotations. The reason why I called out North America for this is that the most prominent example of this jargon is the “master” and “slave” pair of words — these are quite obviously inflammatory in cultures that have most of their history tainted by slave trade. This does not automatically mean these words okay for the rest of the world, but making the distinction does, in my opinion, make it clearer on why some people feel more strongly about this topic more than others.

Myself, I’m playing at the lowest difficulty level as a straight, cis white married man. Even more, my first language is not English, so these terms have been for most of my early tech scene presence just terms, without any specific connotation attached. Which made it take a while for me to start caring about inclusive language, which in turn is why I’m writing this post — because I have a personal view that we should care about the inclusive language, even if we can’t empathise with those affected by the connotations.

See, here’s the thing: these words tend to be jargon carried over, sometimes across more than a couple of different fields, which is how they lost their original, divisive connotations. I can see philologists being able to write entire books on how tech borrowed words and language from so many different fields — unfortunately the one philologist I know is not really doing anything in that field. But together with losing connection with their often racist origin, they tend to be describing things by analogies that, over the years, don’t even apply anymore!

So for example, in addition to the baggage (to say the least), the terms “master” and “slave” don’t describe correctly the relationship between instances of a database server such as MySQL — replacing them everywhere with more appropriate terms such as “primary” and “replica” (or “leader” and “follower” depending on how the configuration is set) is going to make the documentation not just more inclusive, but more readable and clear.

Let me qualify something as well: when I talk about documentation, I don’t refer just to the text files or the website of a project, or the comments in the source code. The UI and the messages that are shown to users are just as much documentation as the out-of-band prose, which is why I keep insisting for developers who intend to get their systems, services, or programs used by others should pay attention to documentation. Updating terms in documentation does not mean just doing a find-and-replace throughout the text files accompanying the code, but sometimes it would require new APIs (for libraries and frameworks) or updates to the UI to make sure everything is consistent.

Being clear in documentation, and making sure that the UI and the APIs are consistent with the prose, is a hobbyhorse of mine but I do believe that it can easily make or break a product or a tool. And documentation can become less clear and thus usable, when it becomes subject to all matters of opinions and disagreement, not just around inclusive language, but also (maybe even more so) around jokes and humour.

For example, back when I was in the old bubble, I worked for a couple of years on a framework named Sisyphus — I can name it because the SRE Book did name it already. Being a framework, it provided both generic implementations, as well as the ability for teams to build their own automation solution based on it. And since these solutions tended to run in the internal Cloud, you would end up with multiple instances (depending on teams), built into separate packages, with multiple jobs (sometimes even within the same team.) Which means you had situations where people wanted to pluralize the word “Sisyphus.”

Sisyphus is a proper name, not a noun, so it does not have a plural form, but that didn’t stop people from coming up with suggestions and sometimes making a long statement about it, including “Sisyphuses”, “Sisyphi” (pronounced with the wrong-Latin-US-style “ay” ending), and Sisyph. This usually came up in humorous contexts, so it would be all fun and fine at the end of the day, except for the rare case where someone would want to replace one plural form to another in a document, in which case a long commend thread in the margin would ensue and waste everybody’s time.

But one time, just once I actually raised my tone at somebody for starting a discussion about this. There was a change request being sent by a tech writer, who wanted to introduce an opening paragraph to the official guide for the framework, stating that the plural of Sisyphus should be Sisyphi. And, no. Because that not just brings a topic ready for bikeshedding in a piece of documentation that should bring clarity, but it also meant that finding the word “Sisyphus” in a bunch of documents is no longer trivial: you need to look for all the various alternatively spelled plurals. At my straight out refusal to accept the change, I was questioned on the alternative, so here’s my answer to that: don’t even pluralize names, and instead be descriptive — there would be multiple Sisyphus implementations, built in multiple Sisyphus packages, running multiple Sisyphus jobs, which are often referred to as Sisyphus instances, voilà!

This is relevant to the inclusive language discussion: when you’re looking to replace words that carry negative connotations, don’t just try for a one-to-one replacement, because it might just get people confused or might get them to want to fight back out of spite. Be more descriptive instead, so that the end result is more inclusive tout-court. For instance, instead of trying to replace “blacklist” with “blocklist”, in places where there’s nothing really to “block”, try to describe what the list is of and for. Package versions disallowed in production could become disallowed_package_versions; tests that are flaky and should not block a release could become known_flaky_tests or non_release_blocking_tests. Yes it makes the name longer, but you may want to reconsider your lack of CASE if you’re optimizing for typing efficiency. On the other hand, if you’re not supported by an editor with auto-completion, you may find yourself frustrated at the Nth time that you typed blacklist out of habit, just to be reminded that it is one character different now.

The other side of the problem, as Paolo Bonzini pointed out on Twitter, is about ensuring that when the text is translated, it would still make sense, and would carry on the inclusivity — it would be a huge wasted opportunity if, by looking for the shortest distance change to apply for inclusivity sake, you would forego the ability to be more inclusive in translations.

Translation is, actually, one of the sorest point. One of my earliest jobs was as a translator – I translated Ian Sommerville’s Software Engineering (at the time, 7th edition) to Italian – and some of the terms were very difficult to translate to Italian in a way that would both be meaningful and consistent with the industry. Safety, security, reliability, dependability, and trustworthiness are terms that are relatively easy to distinguish between each other in English — but that is not the case when translating them to Italian, as some of their usual translations overlap, so trying to give them distinct translations and be consistent with their usage in other technical literature was already hard as it was.

When you find yourself replacing terms that have existed for many years and for which unofficial, but commonly accepted, translations are already widely common, you are going to have some pushback from the non-English speakers not just out of lack of empathy, but also because it literally makes their job harder. Which does not mean that the inclusivity of language is not a real problem, it just means that there’s a lot more effort than just a search-and-replace!

Again, at least for Italian, I would much prefer having to translate a longer phrase that consistently provides meaning rather than having to come up with a translation for “blocklist” without the context necessary to know what (and how) it blocks something. Trying to provide translations without context leads to trap such as the Twitter-mentioned “about” in MacOSX 10.3 and 10.4, which was given a single translation to Italian, both in the form of “about 30 minutes remaining” and “about this Mac” — leading to the hilarious “Informazioni su 30 minuti rimanenti” since the translation only referred to the latter form.

Speaking of translation (and internationalization in general) — there’s a ton of jargon (and humour) that is obvious to people who work in the tech industry, or even for people who live or have visited long enough North America, that make no sense to anyone else around them — and when that gets adopted and adapted, it might even grow a life outside of its original reference, as the whole xenophobic shenanigans around the way “yak shaving” no longer refers to a Canadian cartoon, shows. Even when the reference might be obvious across generations in USA it might not carry over to one generation later in the rest of the world. Avoiding the culturally obvious jargon in favour of descriptive phrasing makes it more inclusive as well, because clear documentation is inclusive documentation.

P.S.: I have intentionally thrown curve balls at the readers in this post, to point out that ambiguous or non-inclusive terms don’t strictly have to be heavy with negative connotations. I have added links for the most obviously confusing ones, feel free to comment to ask clarification if anything else I wrote does not quite fit in your cultural frame. Also this is one of the things I absolutely adore ebooks for, as both with Kindle and with the Boox I can tab onto a word to get a defintiion, when it leaves me confused.

P.P.S.: It took me reading a vaguely US Army inspired book series to finally realize that the title scheme used by the vast majority of North American tech firms, and thus by extension reaching into the European tech industry as well, is inspired by military ranks — this becomes clearer when you think that most companies refer to their “6” level (whichever designation you use, such as IC6 in the current bubble) as “Staff Engineer” which matches the E-6 for “Staff Sergeant” in many US armed forces (OR-6 in NATO.) Take this as you might, but in and by itself, it can be a source of conflict.