This Time Self-Hosted
dark mode light mode Search

Talk Like A Systems Engineer: Yaks All The Way Down

Professional niches have – time immemorial – built their own special dialects and languages, sometimes using this as an identification sign of individuals that are members of the group rather than outside. This is most obviously identified with the concept of jargon, often with a tinge of negative connotations, but there’s more to that. Some of it is composed of metaphors, becoming closer to a form of poetry than technical speech. It’s memes built on shared professional background that can convey significant amount of information, as long as they are used intentionally, and to an informed audience.

I regularly find myself reaching out to these figures of speech, sometimes making them up myself. But I fear that these are often opaque to newcomers — I do not want that to be the case, as I think we already do enough disservice to our future colleagues, and so I decided to try and define more of this “language” to be approachable by those who don’t have yet the abovementioned shared professional background.

To make the points stick, and also inject some smiles in what may otherwise be dry technical content, and on suggestion of my friend Alex, I commissioned some art to go with the various concepts I’m about to dive into. The vignettes are the awesome work of Furryviza, who I will totally vouch for to illustrate complex concepts based on a rough description!

Yak Shaving — I mean, someone will have to do it, no?

This term is particularly well known within bubbles, to the point that Google Dublin has enshrined it in their faux-pub microkitchen The Shaven Yak, but it’s a general industry term; even American Express uses it, sourcing it back to MIT in the ’90s. The semantics of the term tend to drift particularly between groups and teams, but I personally use it to refer to tasks that hold a lot of dependencies around them, most if not all of which are not in scope of the original task but are tightly related to it.

To give an example, say you’re working on a new command line tool to automate a business process, and you end up using a common argument parsing library. The library may already have the ability to parse dates, but as you use it, you realize that it fails to accept ISO8601-style day references, so you may want to go ahead and fix the library to accept them, and write some tests for it. You may also find that the interface you’re meant to call into has some rough edges that mean you either need to apply pre-validation of the input on your tool, or you may want to extend the validation to prevent inputs that would lead to failure cases. And when you’re about to deploy this to the users’ workstations, you figure out that the deployment tool has been misconfigured and is attempting to deliver full debug information on laptops with limited disk space, so you spend half a day trying to fix this up.

This is just a made up example, mixing a number of different situations I pretty much did find myself in, but replacing some of the actual obstacles. It’s not dissimilar from the problems I dealt with while fixing packages in Gentoo, but it definitely goes well beyond these situations. If you want a more blow-by-blow telling of how yak shaving becomes part of one’s professional life, you may want to check Danila Kutenin’s post on std::sort — it starts with the premise of the task at hand (improving std::sort performance) and goes into details of a number of problems identified along the way, that needed to be addressed before the task could be completed despite, by themselves, not affecting the performance of std::sort at all!

A push back that I heard before about categorizing such work as “yak shaving” is that it is solving real issues. That’s usually coming from people who have been told that yak shaving is a pointless exercise, and done only to keep oneself entertained or look busy. I disagree with this take, as I refer to yak shaving when addressing actual issues, just not issues that got in the way of someone else doing their job before. They often are to be found in someone else’s service, project, or backlog — just never important enough for them to spend their time on.

Yak Styling — Because Sometimes You Just Don’t Have The Time!

Yak shaving is often a controversial activity. From one side, “good engineering practice” (as a number of current and former colleagues would call it) demands you solve issues at source and for good; from the other side, particularly in a corporate environment, it’s very difficult to justify an “infinite” amount of side-tasks, especially on others’ turfs. Different managers have different thresholds for how much time can be spent on yak shaving, but I have never found a manager who’s explicitly happy to leave an engineer shave away to their heart content (or should I say shear content? Ba-da-bum-tssh!)

And that’s the reason why, back when I was at Google, I changed my internal job title to Yak Stylist (“of the Typing Pool” — but that makes sense basically only for the Dublin SREs of the time.) The idea is that sometimes you can’t actually shave the yak you have in front of view, but you can at least make it look presentable, give it a few plaits, so that it’s just not that unruly.

In the contrived example above, I already sneaked in a possible “styling”: if you’re building a client for a service, and the latter does not provide enough input validation, it might be faster to apply the validation locally, rather than going out of your way to fix the service, which might be maintained by a different team. A typical approach of styling over shaving is to make sure that any identified problem is not simply worked around, but also documented — that’s the difference between ignoring the issues you encounter and doing something about them.

Sometimes you can’t just style away problems though, and shaving the yak will make your work easier, cleaner (or at least less messy) — or it might just a matter of making yourself happier. I personally find myself shaving more than styling if I’m still pondering for a good solution to my problem, or if I’m spending most of my time in meetings organizing work for others, and I want to feel more satisfied with myself and my work.

Striking a balance between which work to “plait” (file an issue with a project, document a known pitfall, …) and which work to “shave” is something I still struggle with at times, and can only recommend people try to define between themselves and their managers (because, at the end of the day, they are the ones writing the performance reports!)

A Rabbit Hole With A Yak At The End — Yes, It’s A Trap

(Or, if your workplace is less strict about easily misunderstood terminology, you can refer to as a “yak hole”.)

There’s a relatively famous clip out there from the TV show Malcolm In The Middle, where the father of the protagonist comes home to find a blown light bulb, going to pick up a spare only to find the shelf in which the box sits to be broken, then the drawer with the tools needing to be oiled, the can of WD0-40 empty, and finally the car not starting.

Some people think that’s yak shaving, but it is rather rabbit holing (or ratholing depending on how much of a negative connotation you want to apply to it.) The problem with that is one thing leads to another, and you end up down the rabbit hole, in a very Carrollian way, to fix different problems that have not quite much to do with your original task. The difference is fundamental in my opinion: yak shaving is about many tasks, individually unrelated, all connected to a core task, while rabbit holing is about many tasks, connected to one another, starting from a core task.

The reason why we tend to use two different terms for what is effectively the same activity is related to the spin you put on your work. By experience, it’s easier to spin it into a rabbit hole when the original task really cannot be completed without going down to it, while the rat hole is the connotation you (or your manager) may use for a long series of tasks that may need doing, but did not block you from completing your core task.

If you take again the clip above, we never get to see whether there is a working lightbulb in the box of spares — and that would be the signifier between the two: if there is, the task (changing the lightbulb) would have been completed before setting off to a new task (fixing the shelf) — if there isn’t, you could justify attempting to fix the shelf or oil the drawer, since the car would still be needed to buy a new box of bulbs.

There is a third type of “hole”, which I refer to usually as “a rabbit hole with a yak at the end”. These are usually chains of tasks where you end up with what starts as a reasonable chain of dependencies between tasks, often within one’s team or organization scope, only to eventually end up with a much bigger task than you started with, that involves multiple (or external) projects or teams.

To give a concrete example of this from some time ago, I was working on a team that maintained an automation framework, which a number of separate teams depended on. I was making some changes to an API that was an easy way to cause yourself pain, but after landing my change, I got a report that I broke automations for one of the critical teams we supported. I fixed the issue, but then wondered how much test coverage the implementers of our framework had, but to know that I had to enable coverage reporting. Once I enabled that, a few tests, including for another critical team started failing, leading me to discover that we had been assuming certain code was not just tested, but in use, while it never worked to begin with, and the test only passed because a Python test missing a if __name__ == "__main__" block in Bazel always passed.

I could follow most of the rabbit hole to that point: adding coverage metrics for the users of our framework was clearly in the remit of my team. Removing the never-actually-working code was a worthy cleanup to undertake. But fixing Bazel to be more sensible in how it handled Python unit tests? Heh, yeah that was definitely not a yak for me to shave — I could give it a couple of plait to make it easier for someone with more interest: I filed an internal report about the problem, and provided a workaround (enable coverage reporting, that execution mode failed the incorrectly-written test), as well as made sure that all of my team’s supported tests were written correctly (spoilers: they weren’t — but that’s a different rabbit hole, now!)

Rake Collection — Only You Can Prevent The Next Face Smash!

I have written about rake collection before, which means I won’t be going into intense details over this, but I think it’s an important topic to add words to.

This metaphor is (mostly) my own, although inspired from a tweet by Matthew Garrett. If you consider the obstacles to get your job done as rakes (which, as Sideshow Bob knows, are extremely painful to step on!), a good engineer should be collecting the rakes they found around, rather than throwing more of them around.

The context was in particular related to the concept of seniority – experience bringing the ability to notice rakes without necessarily stepping on them – but it doesn’t have to be limited to them. Any contributor that just stepped on a rake and smashed their face should consider taking that rake off the floor — and should be rewarded for doing so.

The “rakes” can take many forms. They may be Tanya’s traps, which are more often found in tools and interfaces that can be confusing or difficult to use, or maybe related to processes, or organizational patterns. It might be a confusing log message that points you in the wrong direction, an alert that misdirects you, or the need to go and request a director sign-off on a resource request to complete a routine operation.

During my career I collected dozens, if not hundreds of rakes. They ranged from fixing hyperlinks so that you could reach the right service in one click (rather than having to backtrack to find the right name to use), to take ownership of a core library to make sure that the thousands of callers don’t have to manually check if starting up their RPC server succeeded.

As I already noted, seniority affects your ability to notice these rakes — to extend the metaphor, through experience you also gain the ability to grab the rake before it hits you in the face and breaks your glasses. With this I mean that the more you work within a certain team, organization, company, or industry, the easier it becomes to get rid of the problems that more inexperienced contributors would be facing.

Let me talk about this in the context of more operational teams. In my experience, most teams will end up with a number of “legacy alerts” that members of the rotation who received it before know how to interpret just by its title. The title itself may not be an accurate description of the problem, but by experience you end up remembering that when that particular alert fires, it’s likely that a completely different system got stuck. If the alert comes with any documentation, it’s very unlikely that the documentation even points at the right place, and if there are links to dashboards, it’s probable that they don’t even exist anymore.

When a new contributor joins such a rotation, it’s very likely that they will take the alert to face value, and spend time trying to figure out what it means, and why it’s firing, and where did the dashboard move, and so on. Experience may make the difference between spending hours debugging a problem in the wrong system, which then escalates to a full-blown incident, and realize that the alert is misleading. At that point, fixing the alert title to be more meaningful, updating the documentation to point at the right system, and possibly remove the broken dashboard link (or replacing it with a more relevant one) would be collecting the rake.

Diagonal Contributions — I Help You All, But You All Help Me!

This is another topic I wrote about in the past. And while the rest of the idioms apply to the industry in general, no matter the size of your reality, this is one of those that only really make sense at bigger companies, since in a small five people startup, every contribution is effectively a diagonal contribution.

In my experience at least, most big companies end up with teams that can be broadly categorized as “product” or “infrastructure”. This is not a perfect categorization, obviously, because you could make a product out of your infrastructure, or you could build infrastructure to support multiple services in the same product. Indeed, this type of tension is often (again, in my experience) the source of many re-orgs.

When working on a product, or product infrastructure, team, most of your work is oriented at improving the product (“vertical”) while on an infrastructure team, your work is likely to ensure that the internal users are satisfied (“horizontal”). This should make it a bit more obvious what I mean with “diagonal”, then: from the point of view of a vertically-focused team, it’s about making sure that the work applies across a range of (internal, usually) users, while still advancing the needs of the product itself.

I needed this metaphor to describe some of the work I accomplished a few years back, and explain to those evaluating it why I didn’t just solve a problem in our tooling, but went out of my way to fix the underlying framework to do the right thing for everyone in the company. From an engineer point of view, this may sound trivial, but the truth is that your career depends heavily on how your manager, and their peers, perceive your impact and your work.

Indeed, unlike the other idioms I’m talking about here, this is almost exclusively spin, and can be applied to any of the others. You can “collect a diagonal rake”, like I did when I decided to fix all of the RPC servers not to silently ignore a failure of opening their listening ports, rather than just address it in my own team’s code. You can “shave a diagonal yak” by when you spend spare time to address a number of blockers to improving a core library that your team uses but doesn’t maintain.

I have to admit my track record of convincing management about the importance of my work based on this particular idiom has been… spotty at best. I still believe in the concept, though. In a healthy organization, I see that the contributions outside of one’s own specific team are generally celebrated and rewarded — particularly if discussed and scheduled with the stakeholders at the beginning of the work.

So, What? Are The Yaks Taking Over?

Groups, organizations, and industry will always come up with new idioms, and new jargon. It’s shorthand to point people in the general direction of something they have seen and dealt with in the past already. Codifying more of these terms, describing them and opening their meaning to non-members of the group is in my opinion a necessary step to allow more access to an industry that has, for good or bad, taken over the world.

The terms I’m using here are not universal — particularly not for those that I ended up coining myself (such as yak styling and rake collection), but I have had good luck with having them adopted by my teams. They don’t replace “corporate speak”, particularly as used by upper management, but I find them an important stepping stone to build the shared context that makes work easier.

And if you’re wondering why am I particularly fixated on yaks to explain concepts related to my work, the answer is that I don’t think that professionalism needs to be dry and serious all the time. Injecting a level of humour in what is otherwise a fairly boring description of design, discussion, coding, and evaluation work is part of what makes me happy to keep doing my job.

Try it, and let me know if it made the conversations with your peers less awkward and more fun!

Comments 3
  1. Why would you go to all the trouble of producing all those illustrations for an article about “yac” jargon?

    1. Because it was a good idea a friend of mine suggested me. And once I found the right artist, it was a pleasure to have her draw them.

      Why would I not have more art I like made, given the opportunity? They’re memorable and they had lots of people talking about the post in the first place!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.