This Time Self-Hosted
dark mode light mode Search

Life Of A Core Maintainer: Chase, Blaze, Slumb, Restart

When I originally drafted my post about paychecks and FLOSS, log4shell hadn’t happened yet, and the amount of talk around funding maintainers of FLOSS only scratched the surface at rare times. After the whole discourse around how log4j is used very widely, and has had unpaid maintainers for many years, I have posted a bit of a thread on Twitter, but since I’m more of a long-form mild-take person, I thought I would expand on some of the things I said there on this blog.

The first topic I want to tackle is the cyclic nature of maintaining core libraries and tools. And this requires me to define first of all what I mean with “core libraries and tools”. WIth this expression I refer to libraries and tools that are for the most part self-contained and that do not need to interface with an “outside” a lot, and are depended upon by pretty much everything you’re working with, depending on whatever the context is.

For programming languages with standard libraries, the standard library itself is usually a core library, but a very special one at that. For companies that extend their reach to significantly different technology stacks, there’s usually a medium-sized list of libraries and tools that all the business logic is written around — some of what I’m about to write will apply to core systems too, but that might not apply as much. For operating systems, the list can be infinitely long. I’m sure that Windows has core libraries nobody has touched in over twenty years by now, and on Linux we do have tools that release even less frequently than coreutils — I mean, just take which!

So what’s with chase, blaze, slumb, restart? Well these are four modes that I would use to (over)simplify the life of a maintainer of such a project, in my experience. Please note that while I do have quite a bit of experience on the topic, I am by no mean the right person to try to make a business study of it, so don’t take my words for the absolute truth. Just take them for what they are: the experience of a senior engineer who is sometimes a little bitter about the way things turned out.

Originally, I was going to try and refer to this as a boom and bust cycle, which is a term from the economics field relating to expansion and contraction of an economy. But projects like these are not economics, and despite the presence of the same patterns in proprietary, commercial realities, trying to apply an economics metaphor to FLOSS is probably a bad idea. Also, it turns out to not be quite as bimodal at all!

I’ll start with two phases, which is where things my sound complicated but truth be told are probably the easiest phases ever: chase and blaze. These are the formative parts of a project: every project goes through these, because you’re either chasing a competitor (in the largest of senses) or blazing a new trail. If your project tries to solve something that was never solved before, you’re blazing; for anything else, you’re chasing. This might sound unfair to so many projects that have innovated significantly, but this is where things a little complicated: you may start a project by chasing, and eventually move onto blazing!

These phases are probably the easiest both from the point of view of motivation, and possibly funding, more so if it’s a chase project. The reason for that is that you have a clear goal in mind: catch up with something else, usually meaning reaching feature parity (or close enough to it.) This is usually quantifiable («We’re missing features X, Y, and Z. And we’re XX% faster right now.») and provides enough work to justify hiring someone to work on the project full time for some amount of time.

A blaze project is much easier on motivation («We’re solving the unsolved») but makes funding a bit more of a problem, unless the problem is something that is big enough you can get investments, or an existing company is willing to fund (usually because they intend to rely on it.) This is the type of projects that often get spawned by Big Tech, either because they are trying to solve the issue, or because there’s enough bored engineer that have nothing else better to do than to see what sticks (why are there bored engineers? I’ll get to that in a moment!)

At some point, a chase project reaches its goal — it may have reached feature parity or gone close enough that the difference doesn’t matter. Now you have two options: either you start innovating, and thus turn into a blaze project, or you’ll be entering the slumb phase.

The slumb is when your project is “done”: it works, there’s nothing left to chase, and there’s no current interesting trail to blaze. You “just” need to keep it running, which usually means nudging it around to keep up with the changes in your environment, such as new architectures, compiler changes, and so on. This is where it becomes harder to justify more than maybe one person full time on the project, particularly for companies that would be paying those engineers, even more so because a lot of the work needed in this phase is to support new environments, something that in medium companies rarely come up, while in bigger companies tend to be funded by a different organization than the core project’s (e.g. if a new architecture is being experimented with, it’s usually the hardware organization funding the changes needed to the core tools.)

You would think that you can get out of a slumb and go back to chase and/or blaze — but in my experience that’s very, very rare: slumbering projects tend to be difficult to engage with because “they work”, and nobody really feels like poking at them unless there’s a necessity for it. It is instead significantly more likely that a slumbering core project will be the target of a new chase project, aiming at maybe replacing it with a more modern design, programming language, or freedom from no longer relevant constraints.

Now one thing I want to clarify is that a chase project may be new while taking the original code as it is: you’re free to fork a project and take it into a different direction, but if you need to convince users (which may include teams inside companies, or distributions, or final users) to use your version rather than the original one, then it’s a new project. I’m not going to spend much time going in depth about this topic, but if you’re interested in how to “break the mold” of a project to innovate, you may want to check out The Other Side Of Innovation instead.

You may think that not getting into slumb would then avoid most of the issues: it would allow you to keep a project running with more people on board for longer. But the problem with that is that, once you ran out of projects to chase, and if you haven’t managed to blaze a new trail, your only other alternative to slumb is to keep busy with unnecessary work, and this often leads to further problem, as features are added that are not well understood, tested, or expected. Which (unsurprisingly) is often what ends up creating security vulnerabilities where the issue is not a programming mistake.

It should be noted that this also happens to products, as I touched on when talking about breadwinner products. I think the vast majority of people wonder why products that are working perfectly fine end up changing every few months just for the sake of changing: the reason often is that you either have a fully staffed team that is looking at the service, and wants to have opportunities to demonstrate the impact needed to get promoted, or you get no staffing at all, and now you have a slumbering product that risks being completely unable to react to important changes.

Earlier in this post, I did say that this applies to libraries that don’t have much contact with external environments. The reason why I say that is that anything that has to keep working with the external environment is, de facto, always chasing something. Take FFmpeg for example: even though it basically works for most of its core functionality, and many, many users wouldn’t care or need to update it once it does (that is why many distributions used to take yearly snapshots, before the project started having an actual release process), there’s an external pressure to keep people involved in the project for the long term: there’s always a new (or old) format to decode or encode, there’s always performance savings to run on hardware that is older, cheaper, or lower power.

The same basically works for any project that has to do with formats, because at the very least there is the risk that one of the other projects dealing with the same format would cause bugs to be introduced, where a particular file is misgenerated. This puts them in a more reactive mode of operations, but that also allows them to maintain a more stable engagement in terms of development, which can justify the investment from the point of view of a company, as well as provide space for sponsorships from final users. Core projects are unlikely to have a similar pattern.

When talking about internal company core projects, though, there’s two more things that come into play, that are much harder to apply to FLOSS projects: the first is outward pressure from migrations, and the latter is visibility on the way these projects are made use of. These are not always positives, but understanding how these fit together should give you a better idea of why big tech companies end up spending a lot more time than you would consider reasonable to build alternative projects.

Migrations are basically a continuous stream in big companies (and even in some medium ones), because while specific projects and products can slumber, a company is always reacting on external pressures: market, regulations, public opinion, and general society changes — and all these pressures build on each other. At the fulcrum of these is often the matter of money, because we’re a capitalistic society (statement of fact, not value judgement): if the market is shrinking, it’s a good idea to cut long-term costs, which might require short-term investment, and even if a breadwinner product is still bringing in money, it’s better if the cost of running the service scales sub-linearly with the number of users, and the time you keep the product running.

So decisions that make perfect sense when working on a product, particularly in the infra space, might not actually be fit for purpose a year or two later: the users might have decided to use the product in a different way, an external factor you depended upon is now well different, and you’re better off ditching your index-less document storage for a well defined relational database. I can make up literally dozen of scenarios that I have experienced, consulted for, or read about, both in big tech companies, and in companies that are on the border between medium and large, but whatever the cause, the result is the same: even core projects and libraries will be affected, which give a bit more sense in keeping a couple of people staffing a project that is otherwise slumbering.

This relates to the next point, which is the visibility. When working on FLOSS projects, you have a very limited view of who’s using them: you can find other FLOSS projects if you spend a lot of time digging (decentralization doesn’t help), but you would never know about software using it behind closed doors. When working on internal projects, it’s much easier to know who’s using them. Particularly when working with monorepo (or similar mostly-centralized SCM solutions), if you’re assigned to work full time on a core project, you can look at how other engineers are making use of it, and figure out how to improve your project that way.

To be clear, this last part is often controversial: it feels (and I nearly described it, myself) as if you need to justify your continuing employment — that is definitely not the point I want to make here. The point is that you can move out of slumb into blaze, if you can gather ideas on what your project can improve on. And even from the company, capitalistic point of view, it means you can justify not just your continuing employment, but you may even find enough data to request management to assign more people to your project!

This part is probably the hardest problem I see with FLOSS maintenance, as discussed on the back of log4shell: the maintainer of a core project that is used by many companies is unlikely to know that all of them are needing the same feature, which prevents them from advertising the development of the feature as something that can be sponsored. This is something that VideoLAN had better luck with: during the various conferences and meetups we discussed “bounties” that often were co-sponsored between the association and a few companies needing those features. But even those sometimes were suggested off-the-record, which is not a very sustainable model for projects that don’t already have a strong connection to corporate players.

Unfortunately, as often is the case, I don’t have answers. Well, not easy answers that can be implemented in the framework of the society we live in, and in FLOSS itself. I have already suggested that I feel we need co-operatives for Free Software (at the time I was talking about hosting, but I would say we need more scope than that), and that I don’t agree with most of the wide no-profits organizations we have right now. I just think it is important to have these discussions, even when solutions are not clear (or clearly not feasible.)

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.