This Time Self-Hosted
dark mode light mode Search

Inclusive Language, Inclusive Documentation

I have been considering writing something about this for a while, but at the same time, I still feel it’s not my place to write about this. But in a world of hot takes and shitposting, I guess I still it useful to express my opinion with more than a few threaded hundreds of characters. But before I dig into a topic I’m not entirely apt for, let me try to give a quick summary of the issue at hand, for those who might not have worked with North Americans long enough to be exposed to this — feel free to respectfully comment on the post if you think I am simplifying too much.

At the core of the issue is that some of the words that made it into the tech jargon (software and hardware) come from the past, and carry with them a number of racist, sexist, and in general divisive connotations. The reason why I called out North America for this is that the most prominent example of this jargon is the “master” and “slave” pair of words — these are quite obviously inflammatory in cultures that have most of their history tainted by slave trade. This does not automatically mean these words okay for the rest of the world, but making the distinction does, in my opinion, make it clearer on why some people feel more strongly about this topic more than others.

Myself, I’m playing at the lowest difficulty level as a straight, cis white married man. Even more, my first language is not English, so these terms have been for most of my early tech scene presence just terms, without any specific connotation attached. Which made it take a while for me to start caring about inclusive language, which in turn is why I’m writing this post — because I have a personal view that we should care about the inclusive language, even if we can’t empathise with those affected by the connotations.

See, here’s the thing: these words tend to be jargon carried over, sometimes across more than a couple of different fields, which is how they lost their original, divisive connotations. I can see philologists being able to write entire books on how tech borrowed words and language from so many different fields — unfortunately the one philologist I know is not really doing anything in that field. But together with losing connection with their often racist origin, they tend to be describing things by analogies that, over the years, don’t even apply anymore!

So for example, in addition to the baggage (to say the least), the terms “master” and “slave” don’t describe correctly the relationship between instances of a database server such as MySQL — replacing them everywhere with more appropriate terms such as “primary” and “replica” (or “leader” and “follower” depending on how the configuration is set) is going to make the documentation not just more inclusive, but more readable and clear.

Let me qualify something as well: when I talk about documentation, I don’t refer just to the text files or the website of a project, or the comments in the source code. The UI and the messages that are shown to users are just as much documentation as the out-of-band prose, which is why I keep insisting for developers who intend to get their systems, services, or programs used by others should pay attention to documentation. Updating terms in documentation does not mean just doing a find-and-replace throughout the text files accompanying the code, but sometimes it would require new APIs (for libraries and frameworks) or updates to the UI to make sure everything is consistent.

Being clear in documentation, and making sure that the UI and the APIs are consistent with the prose, is a hobbyhorse of mine but I do believe that it can easily make or break a product or a tool. And documentation can become less clear and thus usable, when it becomes subject to all matters of opinions and disagreement, not just around inclusive language, but also (maybe even more so) around jokes and humour.

For example, back when I was in the old bubble, I worked for a couple of years on a framework named Sisyphus — I can name it because the SRE Book did name it already. Being a framework, it provided both generic implementations, as well as the ability for teams to build their own automation solution based on it. And since these solutions tended to run in the internal Cloud, you would end up with multiple instances (depending on teams), built into separate packages, with multiple jobs (sometimes even within the same team.) Which means you had situations where people wanted to pluralize the word “Sisyphus.”

Sisyphus is a proper name, not a noun, so it does not have a plural form, but that didn’t stop people from coming up with suggestions and sometimes making a long statement about it, including “Sisyphuses”, “Sisyphi” (pronounced with the wrong-Latin-US-style “ay” ending), and Sisyph. This usually came up in humorous contexts, so it would be all fun and fine at the end of the day, except for the rare case where someone would want to replace one plural form to another in a document, in which case a long commend thread in the margin would ensue and waste everybody’s time.

But one time, just once I actually raised my tone at somebody for starting a discussion about this. There was a change request being sent by a tech writer, who wanted to introduce an opening paragraph to the official guide for the framework, stating that the plural of Sisyphus should be Sisyphi. And, no. Because that not just brings a topic ready for bikeshedding in a piece of documentation that should bring clarity, but it also meant that finding the word “Sisyphus” in a bunch of documents is no longer trivial: you need to look for all the various alternatively spelled plurals. At my straight out refusal to accept the change, I was questioned on the alternative, so here’s my answer to that: don’t even pluralize names, and instead be descriptive — there would be multiple Sisyphus implementations, built in multiple Sisyphus packages, running multiple Sisyphus jobs, which are often referred to as Sisyphus instances, voilà!

This is relevant to the inclusive language discussion: when you’re looking to replace words that carry negative connotations, don’t just try for a one-to-one replacement, because it might just get people confused or might get them to want to fight back out of spite. Be more descriptive instead, so that the end result is more inclusive tout-court. For instance, instead of trying to replace “blacklist” with “blocklist”, in places where there’s nothing really to “block”, try to describe what the list is of and for. Package versions disallowed in production could become disallowed_package_versions; tests that are flaky and should not block a release could become known_flaky_tests or non_release_blocking_tests. Yes it makes the name longer, but you may want to reconsider your lack of CASE if you’re optimizing for typing efficiency. On the other hand, if you’re not supported by an editor with auto-completion, you may find yourself frustrated at the Nth time that you typed blacklist out of habit, just to be reminded that it is one character different now.

The other side of the problem, as Paolo Bonzini pointed out on Twitter, is about ensuring that when the text is translated, it would still make sense, and would carry on the inclusivity — it would be a huge wasted opportunity if, by looking for the shortest distance change to apply for inclusivity sake, you would forego the ability to be more inclusive in translations.

Translation is, actually, one of the sorest point. One of my earliest jobs was as a translator – I translated Ian Sommerville’s Software Engineering (at the time, 7th edition) to Italian – and some of the terms were very difficult to translate to Italian in a way that would both be meaningful and consistent with the industry. Safety, security, reliability, dependability, and trustworthiness are terms that are relatively easy to distinguish between each other in English — but that is not the case when translating them to Italian, as some of their usual translations overlap, so trying to give them distinct translations and be consistent with their usage in other technical literature was already hard as it was.

When you find yourself replacing terms that have existed for many years and for which unofficial, but commonly accepted, translations are already widely common, you are going to have some pushback from the non-English speakers not just out of lack of empathy, but also because it literally makes their job harder. Which does not mean that the inclusivity of language is not a real problem, it just means that there’s a lot more effort than just a search-and-replace!

Again, at least for Italian, I would much prefer having to translate a longer phrase that consistently provides meaning rather than having to come up with a translation for “blocklist” without the context necessary to know what (and how) it blocks something. Trying to provide translations without context leads to trap such as the Twitter-mentioned “about” in MacOSX 10.3 and 10.4, which was given a single translation to Italian, both in the form of “about 30 minutes remaining” and “about this Mac” — leading to the hilarious “Informazioni su 30 minuti rimanenti” since the translation only referred to the latter form.

Speaking of translation (and internationalization in general) — there’s a ton of jargon (and humour) that is obvious to people who work in the tech industry, or even for people who live or have visited long enough North America, that make no sense to anyone else around them — and when that gets adopted and adapted, it might even grow a life outside of its original reference, as the whole xenophobic shenanigans around the way “yak shaving” no longer refers to a Canadian cartoon, shows. Even when the reference might be obvious across generations in USA it might not carry over to one generation later in the rest of the world. Avoiding the culturally obvious jargon in favour of descriptive phrasing makes it more inclusive as well, because clear documentation is inclusive documentation.

P.S.: I have intentionally thrown curve balls at the readers in this post, to point out that ambiguous or non-inclusive terms don’t strictly have to be heavy with negative connotations. I have added links for the most obviously confusing ones, feel free to comment to ask clarification if anything else I wrote does not quite fit in your cultural frame. Also this is one of the things I absolutely adore ebooks for, as both with Kindle and with the Boox I can tab onto a word to get a defintiion, when it leaves me confused.

P.P.S.: It took me reading a vaguely US Army inspired book series to finally realize that the title scheme used by the vast majority of North American tech firms, and thus by extension reaching into the European tech industry as well, is inspired by military ranks — this becomes clearer when you think that most companies refer to their “6” level (whichever designation you use, such as IC6 in the current bubble) as “Staff Engineer” which matches the E-6 for “Staff Sergeant” in many US armed forces (OR-6 in NATO.) Take this as you might, but in and by itself, it can be a source of conflict.

Comments 4
  1. More specific words do have disadvantages, quite similar in fact to your example of the various alternative spellings of “plural Sisyiphus”.

    Going back to another example you give, “blacklist”/”blocklist”. Today, an “ip_blacklist” might be an “ip_blocklist”, a “spammer_ip_list”, a “dos_ips_regex”, or many other things. Or maybe it’s still “ip_blacklist”.
    If I’m grep’ing, I will probably think to use the regexp “bl.cklist”, but I am unlikely to come up with all the other alternative spellings.
    Search engines also benefit from the simple substitution, as they handle it. For example if I am trying to get myself removed from one such list, Google Search knows how to translate “blocklist removal”. If, on the other hand, I type “spam list removal” I get entirely different results.

    For people – such as myself – who are quite doubtful about the usefulness of this new way of speaking, search and replace also has the advantage that it gets the paperwork done quickly. I understand this argument may not convince you, but I had to mention it.

    1. I don’t buy your argument. As long as the naming used in a configuration file or UI is consistent with the documentation for the software, you should never find yourself struggling to find it. And if you’re looking how to block an IP address… well the docs would say that, no?

      But I think the main point I want you to think about is that the “new way” of speaking is in no way new, and that at this point the changes will happen both in official docs and common parlance. It’s going to be up to you if you want to be perceived as holding onto a past that has moved on or follow the world into the present and future.

  2. First and foremost, thanks for responding, it is an interesting topic to discuss.

    On the first point, I find interesting that you are talking about consistency within “*the* documentation of *the*” software”. If you look for example at TiDB’s doc, they will sometime talk of “master”, and sometime of “leader”, depending on which *other* software TiDB interconnects with (and thus, what time and where was this other software written).

    This is gets even more interesting in the second part, where you slightly misquoted me as talking about “*the* new way”, while I talked about “*this* new way”.
    As per your suggestion, I will focus on that. I agree that this is one important difference between both of our perspectives. That is: I’m not confident to see this process converging.

    Quite the contrary.
    Geographically, a quick Google Trend search hints that “allowlist” only has significant use in the Seattle, SF, LA and NY metros within the US, and London in the UK. Different words from this “affirmative corpspeak” had different distributions, but all were strongly geographically correlated, with no trend of them gaining any traction in the country as a whole.
    Over time, I have seen the size of banned words increase, further diverging. In my company, different divisions are now growing different lists (some going as far as to include “native speaker” and “mother tongue”), balkanizing the language.
    Socially, it seems to me that, once a word gets a political connotation, convergence is very unlikely. First because western democracies do have some disagreements, and I could imagine Palantir banning “blocklist”. And second, there is issue of non western countries (as in TiDB above).
    I am genuinely curious to know which words will the Chinese and Indians take offense at. Quite possibly, it won’t be “blacklist”.

    In any case, I wish I shared your enthusiasm… I have to deal with this at work, and if I could swallow the cool aid (another forbidden word in my division, by the way) it would help me cope with the cognitive dissonance.

  3. I share your opinion, with some degree of annoyance when the issue is pushed to it’s limits.

    What is perceived as offensive is a grey area, which shade depends on culture and origins, as you hinted. It isn’t clear who should draw the line and where it should be drawn to prevent having to rewrite all documentation and code every time a terminology is newly discovered to be potentially offensive.

    But it has to be drawn because there is a cost to every such change. It’s obviously time spent, as well as a risk to break the system, and because now multiple words describe one exact same concept, it creates confusion in people’s mind.

    Taking the lowest entry bar, where everything that could potentially be offensive should be renamed is a never-ending story. Change master to leader, then maybe replace leader with guide, because leader in German is Führer, and so on.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.