Distributions are becoming irrelevant: difference was our strength and our liability

For someone that has spent the past thirteen years defining himself as a developer of a Linux distribution (whether I really am still a Gentoo Linux developer or not is up for debate I’m sure), having to write a title like this is obviously hard. But from the day I started working on open source software to now I have grown a lot, and I have realized I have been wrong about many things in the past.

One thing that I realized recently is that nowadays, distributions lost the war. As the title of this post says, difference is our strength, but at the same time, it is also the seed of our ruin. Take distributions: Gentoo, Fedora, Debian, SuSE, Archlinux, Ubuntu. They all look and act differently, focusing on different target users, and because of this they differ significantly in which software they make available, which versions are made available, and how much effort is spent on testing, both the package itself and the system integration.

While describing it this way, there is nothing that scream «Conflict!», except at this point we all know that they do conflict, and the solutions from many different communities, have been to just ignore distributions: developers of libraries for high level languages built their own packaging (Ruby Gems, PyPI, let’s not even talk about Go), business application developers started by using containers and ended up with Docker, and user application developers have now started converging onto Flatpak.

Why the conflicts? A lot of time the answer is to be found in bickering among developers of different distributions and the «We are better than them!» attitude, which often turned to «We don’t need your help!». Sometimes this went all the way to the negative side to the point of «Oh it’s a Gentoo [or other] developer complaining, it’s all their fault and their problem, ignore them.» And let’s not forget of the enmity between forks (like Gentoo, Funtoo and Exherbo), in which both sides are trying to prove being better than the other. A lot of conflict all over the place.

There were of course at least two main attempts to standardise parts of how a distribution works: the Linux Standard Base and FreeDesktop.org. The former is effectively a disaster, the latter is more or less accepted, but the problem lies there: in the more-or-less. Let’s look at these two separately.

The LSB was effectively a commercial effort, which was aimed at pleasing (effectively) only the distributors of binary packages. It really didn’t make much of an assurance of the environment you could build things in, and it never invited non-commercial entities to discuss the reasoning behind the standard. In an environment like open source, the fact that the LSB became an ISO standard is not a badge of honour, but rather a worry that it’s over-specified and over-complicated. Which I think most people agree it is. There is also quite an overreach of specifying the presence of binary libraries, rather than being a set of guidelines for distributions to follow.

And yes, although technically LSB is still out there working, the last release I could find described in Wikipedia is from 2015, and I couldn’t even find at first search whether they certified any distribution version. Also, because of the nature of certifications, it’s impossible to certify a rolling-release distribution, which as it happens are becoming much more widespread than they used to.

I think that one of the problem of LSB, both from the adoption and usefulness point of views, is that it focused entirely too much on providing a base platform for binary and commercial application. Back when it was developed, it seemed like the future of Linux (particularly on the desktop) relied entirely on the ability for proprietary software applications to be developed that could run on it, the way they do on Windows and OS X. Since many of the distributions didn’t really aim to support this particular environment, convincing them to support LSB was clearly pointless.

FreeDesktop.org is in a much better state in this regard. They point out that whatever they write is not standards, but de-facto specifications. Because of the de-facto character of these, they started by effectively writing down whatever GNOME and RedHat were doing, but then grown to be significantly more cross-desktop, thanks to KDE and other communities. Because of the nature of the open source community, FD.o specifications are much more widely adopted than the “standards”.

Again, if you compare with what I said above, FD.o provides specifications that make it easier to write, rather than run, applications. It provides you with guarantees of where you should be looking for your file and which icons should be rendering, and which interfaces are exposed. Instead of trying to provide an environment where a in-house written application will keep running for the next twenty years (which, admittedly, Windows has provided for a very long time), it provides you building blocks interfaces so that you can create whatever the heck you want and integrate with the rest of the desktop environments.

As it happens, Lennart and his systemd ended up standardizing distributions a lot more than LSB or FD.o ever did, if nothing else by taking over one of the biggest customization points of them all: the init system. Now, I have complained about this before that it probably could have been a good topic for a standard even before systemd, and independently from it, that developers should have been following, but that’s another problem. At the end of the day, there is at least some cross-distribution way to provide init system support, and developers know that if they build their daemon in a certain way, then they can provide the init system integration themselves, rather than relying on the packagers.

I feel that we should have had much more of that. When I worked on ruby-ng.eclass and fakegem.eclass, I tried getting the Debian Ruby team, who had similar complaints before, to join me on a mailing list so that we could discuss a common interface between Gems developers and Linux distributions, but once again, that did not actually happen. My afterthought is that we should have had a similar discussion for CPAN, CRAN, PyPI, Cargo and so on… and that would probably have spared us the mess that is go packaging.

The problem is not only getting the distributions to overcome their differences, both in technical direction and marketing, but that it requires sitting at a table with the people who built and use those systems, and actually figuring out what they are trying to achieve. Because in particular in the case of Gems and the other packaging systems, who you should talk with is not only your distribution’s users, but most importantly the library authors (whose main interest is shipping stuff so that people can use them) and the developers who use them (whose main interest is being able to fetch and use a library without waiting for months). The distribution users are, for most of the biggest projects, sysadmins.

This means you have a multi-faceted problem to solve, with different roles of people, and different needs of them. Finding a solution that does not compromise, and covers 100% of the needs of all the roles involved, and requires no workflow change on any one’s part is effectively impossible. What you should be doing is focusing on choosing the very important features for the roles critical to the environment (in the example above, the developers of the libraries, and the developers of the apps using those libraries), requiring the minimum amount of changes to their workflow (but convince them to change the workflow where it really is needed, as long as it’s not more cumbersome than it was before for no advantage), and figuring out what can be done to satisfy or change the requirements of the “less important” roles (distribution maintainers usually being that role).

Again going back to the example of Gems: it is clear by now that most of the developers never cared of getting their libraries to be carried onto distributions. They cared about the ability to push new releases of their code fast, seamlessly and not have to learn about distributions at all. The consumers of these libraries don’t and should not care about how to package them for their distributions or how they even interact with it, they just want to be able to deploy their application with the library versions they tested. And setting aside their trust in distributions, sysadmin only care to have a sane handling of dependencies and being able to tell which version of which library is running on their production, to upgrade them in case of a security issues. Now, the distribution maintainers can become the nexus for all these problems, and solve it once and for all… but they will have to be the ones making the biggest changes in their workflow – which is what we did with ruby-ng – otherwise they will just become irrelevant.

Indeed, Ruby Gems and Bundler, PyPI and VirtualEnv, and now Docker itself, are expressions of that: distribution themselves became a major risk and cost point, by being too different between each other and not providing an easy way to just provide one working library, and use one working library. These roles are critical to the environment: if nobody publish libraries, consumers have no library to use; if nobody consumes libraries, there is no point in publishing them. If nobody packages libraries, but there are ways to publish and consume them, the environment still stands.

What would I do if I could go back in time, be significantly more charismatic, and change the state of things? (And I’m saying this for future reference, because if it ever becomes relevant to my life again, I’ll do exactly that.)

  • I would try to convince people that even on divergence of technical direction, discussing and collaborating is a good thing to do. No idea is stupid, idiotic or any other random set of negative words. The whole point of that is that you need to make sure that even if you don’t agree on a given direction, you can agree on others, it’s not a zero-sum game!
  • Speaking of, “overly complicated” is a valid reason to not accept one direction and take another; “we always did it this way” is not a good reason. You can keep using it, but then you’ll end up with Solaris a very stagnant project.
  • Talk with the stakeholders of the projects that are bypassing distributions, and figure out why they are doing that. Provide “standard” tooling, or at least a proposal on how to do things in such a way that the distributions are still happy, without causing undue burden.
  • Most importantly, talk. Whether it is by organizing mailing lists, IRC channels, birds of a feather at conferences, or whatever else. People need to talk and discuss the issues at hand in clear, in front of the people building the tooling and making the decisions.

I have not done any of that in the past. If I ever get in front of something like this, I’ll do my best to, instead. Unfortunately, this is a position that, in the current universe we’re talking about, would have required more privilege than I had before. Not only for my personal training and experience to understand what should have been done, but because it requires actually meeting with people and organizing real life summits. And while nowadays I did become a globetrotter, I could never have afforded that before.

LibreSSL, OpenSSL, collisions and problems

Some time ago, on the gentoo-dev mailing list, there has been an interesting thread on the state of LibreSSL in Gentoo. In particular I repeated some of my previous concerns about ABI and API compatibility, especially when trying to keep both libraries on the same system.

While I hope that the problems I pointed out are well clear to the LibreSSL developers, I thought reiterating them again clearly in a blog post would give them a wider reach and thus hope that they can be addressed. Please feel free to reshare in response to people hand waving the idea that LibreSSL can be either a drop-in, or stand-aside replacement for OpenSSL.

Last year, when I first blogged about LibreSSL, I had to write a further clarification as my post was used to imply that you could just replace the OpenSSL binaries with LibreSSL and be done with it. This is not the case and I won’t even go back there. What I’m concerned about this time is whether you can install the two in the same system, and somehow decide which one you want to use on a per-package basis.

Let’s start with the first question: why would you want to do that? Everybody at this point knows that LibreSSL was forked from the OpenSSL code and started removing code that has been needed unnecessary or even dangerous – a very positive thing, given the amount of compatibility kludges around OpenSSL! – and as such it was a subset of the same interface as its parent, thus there would be no reason to wanting the two libraries on the same system.

But then again, LibreSSL never meant to be considered a drop-in replacement, so they haven’t cared as much for the evolution of OpenSSL, and just proceeded in their own direction; said direction included building a new library, libtls, that implements higher-level abstractions of TLS protocol. This vaguely matches the way NSS (the Netscape-now-Mozilla TLS library) is designed, and I think it makes sense: it reduces the amount of repetition that needs to be coded in multiple parts of the software stack to implement HTTPS for instance, reducing the chance of one of them making a stupid mistake.

Unfortunately, this library was originally tied firmly to LibreSSL and there was no way for it to be usable with OpenSSL — I think this has changed recently as a “portable” build of libtls should be available. Ironically, this wouldn’t have been a problem at all if it wasn’t that LibreSSL is not a superset of OpenSSL, as this is where the core of the issue lies.

By far, this is not the first time a problem like this happens in Open Source software communities: different people will want to implement the same concept in different ways. I like to describe this as software biodiversity and I find it generally a good thing. Having more people looking at the same concept from different angles can improve things substantially, especially in regard to finding safe implementations of network protocols.

But there is a problem when you apply parallel evolution to software: if you fork a project and then evolve it on your own agenda, but keep the same library names and a mostly compatible (thus conflicting) API/ABI, you’re going to make people suffer, whether they are developers, consumers, packagers or users.

LibreSSL, libav, Ghostscript, … there are plenty of examples. Since the features of the projects, their API and most definitely their ABIs are not the same, when you’re building a project on top of any of these (or their originators), you’ll end up at some point making a conscious decision on which one you want to rely on. Sometimes you can do that based only on your technical needs, but in most cases you end up with a compromise based on technical needs, licensing concerns and availability in the ecosystem.

These projects didn’t change the name of their libraries, that way they can be used as drop-rebuild replacement for consumers that keep to the greatest common divisor of the interface, but that also means you can’t easily install two of them in the same system. And since most distributions, with the exception of Gentoo, would not really provide the users with choice of multiple implementations, you end up with either a fractured ecosystem, or one that is very much non-diverse.

So if all distributions decide to standardize on one implementation, that’s what the developers will write for. And this is why OpenSSL will likely to stay the standard for a long while still. Of course in this case it’s not as bad as the situation with libav/ffmpeg, as the base featureset is going to be more or less the same, and the APIs that have been dropped up to now, such as the entropy-gathering daemon interface, have been considered A Bad Idea™ for a while, so there are not going to be OpenSSL-only source projects in the future.

What becomes an issue here is that software is built against OpenSSL right now, and you can’t really change this easily. I’ve been told before that this is not true, because OpenBSD switched, but there is a huge difference between all of the BSDs and your usual Linux distributions: the former have much more control on what they have to support.

In particular, the whole base system is released in a single scoop, and it generally includes all the binary packages you can possibly install. Very few third party software providers release binary packages for OpenBSD, and not many more do for NetBSD or FreeBSD. So as long as you either use the binaries provided by those projects or those built by you on the same system, switching the provider is fairly easy.

When you have to support third-party binaries, then you have a big problem, because a given binary may be built against one provider, but depend on a library that depends on the other. So unless you have full control of your system, with no binary packages at all, you’re going to have to provide the most likely provider — which right now is OpenSSL, for good or bad.

Gentoo Linux is, once again, in a more favourable position than many others. As long as you have a full source stack, you can easily choose your provider without considering its popularity. I have built similar stacks before, and my servers deploy stacks similarly, although I have not tried using LibreSSL for any of them yet. But on the desktop it might be trickier, especially if you want to do things like playing Steam games.

But here’s the harsh reality, even if you were to install the libraries in different directories, and you would provide a USE flag to choose between the two, it is not going to be easy to apply the right constraints between final executables and libraries all the way into the tree.

I’m not sure if I have an answer to balance the ability to just make the old software use the new library and the side-installation. I’m scared that a “solution” that can be found to solve this problem is bundling and you can probably figure out that doing so for software like OpenSSL or LibreSSL is a terrible idea, given how fast you should update in response to a security vulnerability.

Multiple SSL implementations

If you run ~arch you probably noticed the a few days ago that all Telepathy-based applications failed to connect to Google Talk. The reason for this was a change in GnuTLS 3.1.7 that made it stricter while checking security parameters’ compliance, and in particular required 816-bit primes on default connections, whereas the GTalk servers provided only 768. The change has since been reverted, and version 3.1.8 connects to GTalk by default once again.

Since I hit this the first time I tried to set up KTP on KDE 4.10, I was quite appalled by the incompatibility, and was tempted to stay with Pidgin… but at the end, I found a way around this. The telepathy-gabble package was not doing anything to choose the TLS/SSL backend but deep inside it (both telepathy-gabble and telepathy-salut bundle the wocky XMPP library — I haven’t checked whether it’s the same code, and thus if it has to be unbundled), it’s possible to select between GnuTLS (the default) or OpenSSL. I’ve changed it to OpenSSL and everything went fine. Now this is exposed as an USE flag.

But this made me wonder: does it matter on runtime which backend one’s using? To be obviously honest, one of the reasons why people use GnuTLS over OpenSSL is the licensing concerns, as OpenSSL’s license is, by itself, incompatible with GPL. But the moment when you can ignore the licensing issues, does it make any difference to choose one or the other? It’s a hard question to answer, especially the moment you consider that we’re talking about crypto code, which tends to be written in such a way to be optimized for execution as well as memory. Without going into details of which one is faster in execution, I would assume that OpenSSL’s code is faster simply due to its age and spread (GnuTLS is not too bad, I have some more reserves for libgcrypt but nevermind that now, since it’s no longer used). Furthermore, Nikos himself noted that sometimes better algorithm has been discarded, before, because of the FSF copyright assignment shenanigan which I commented on a couple of months ago.

But more importantly than this, my current line of thought is wondering whether it’s better to have everything GnuTLS or everything OpenSSL — and if it has any difference to have mixed libraries. The size of the libraries themselves is not too different:

        exec         data       rodata        relro          bss     overhead    allocated   filename
      964847        47608       554299       105360        15256       208805      1896175   /usr/lib/libcrypto.so
      241354        25304        97696        12456          240        48248       425298   /usr/lib/libssl.so
       93066         1368        57934         2680           24        14640       169712   /usr/lib/libnettle.so
      753626         8396       232037        30880         2088        64893      1091920   /usr/lib/libgnutls.so

OpenSSL’s two libraries are around 2.21MB of allocated memory, whereas GnuTLS is 1.21 (more like 1.63 when adding GMP, which OpenSSL can optionally use). So in general, GnuTLS uses less memory, and it also has much higher shared-to-private ratios than OpenSSL, which is a good thing, as it means it creates smaller private memory areas. But what about loading both of them together?

On my laptop, after changing the two telepathy backends to use OpenSSL, there is no running process using GnuTLS at all. There are still libraries and programs making use of GnuTLS (and most of them without an OpenSSL backend, at least as far as the ebuild go) — even a dependency of the two backends (libsoup), which still does not load GnuTLS at all here.

While this does mean that I can get rid of GnuTLS on my system (NetworkManager, GStreamer, VLC, and even the gabble backend use it), it means that I don’t have to keep loaded into memory that library at all time. While 1.2MB is not that big a save, it’s still a drop in the ocean of the memory usage.

So, while sometimes what I call “software biodiversity” is a good thing, other times it only means we end up with a bunch of libraries all doing the same thing, all loaded at the same time for different software… oh well, it’s not like it’s the first time this happens…

My take on the separate /usr issue

This is a blog post I would have definitely preferred not to write — it’s a topic that honestly does not touch me that much, for a few reasons I’ll explore in a moment, and at the same time it’s one that is quite controversial as it has quite a few meanings layered one on top of the other. Since I’m writing this, I would though first to make sure that the readers know who I am and why I’m probably going to just delete posts that tell me that I don’t care about compatibility with older systems and other operating systems.

My first project within Gentoo has been Gentoo/FreeBSD — I have a (sometimes insane) interest in portability with operating systems that are by far not mainstream. I’m a supporter of what I define “software biodiversity”, and I think that even crazy experiments have the right to exist, if anything to learn tricks and issues to avoid. So please don’t give me that kind of crap I noted above.

So, let’s see — I generally have little interest in keeping things around just for the sake of it, and as I wrote a long time ago I don’t use a separate /boot in most cases. I also generally dislike legacies for the sake of legacies. I guess it’s thus a good idea to start looking at which legacies bring us to the point of discussing whether /usr should be split. If it’s not to be split, there’s no point debating supporting split /usr, no?

The first legacy, which is specific to Gentoo is tied to the fact that our default portage tree is set to /usr/portage, and that both the ebuilds’ tree itself and the source files (distfiles), as well as the built binary packages, are stored there. This particular tree is hungry in both disk space itself and even more so in inodes. Since both the tree, and in general the open source projects we package, keep growing, the amount of these two resources we need increases as well, and since they are by default on /usr, it’s entirely possible that if this tree’s resources are allocated statically when partitioning, it’ll reach a point where there won’t be enough space, or inodes, to allocate anything in it. If /usr/portage resides in the root filesystem, it’s also very possible, if not even very likely, that the system would stop entirely to work because there is not enough space available on it.

One solution to this problem is to allocate /usr/portage on its own partition — I still don’t like it that much as an option, because /usr is supposed to be, standing to the FHS/LSB, for read-only data. Most other distributions you’ll find using subdirectories in /var as that’s what it’s designed to be used for. So why are we using /usr? Well, it turns out that this is something that was inspired from FreeBSD, where /usr is used for just about everything including temporary directories and other similar uses. Indeed, /usr/portage finds its peer in /usr/ports which is where Daniel seems to have been inspired to write Portage in the first place. It should be an easy legacy to overcome, but probably migrating it is going to be tricky enough that nobody has done so yet. Too bad.

Before somebody ask, yes, for a while ago splitting the whole /var – which is general considered much more sensible – was a pain in the neck, among other things because things were using /var/run and other similar paths before the partition could be mounted. The situation is now much better thanks to the fact that /run is now available much earlier in the boot process — this is not yet properly handled by all the init scripts out there but we’re reaching that point, slowly.

Okay so to the next issue: when do you want to split /usr at all? Well, this all depends on a number of factors, but I guess the first question is whether you’re installing a new system or maintaining an old one. If you install a new one, I really can’t think of any good reason to split /usr out — the only one that comes passing in my mind is if you want to have it in LVM and keep the rootfs as a standalone partition — and I don’t see why. I’d rather, at that point, put the rootfs in LVM as well, and just use an initrd to accomplish that — if it’s too difficult, well, that’s a reason to fix the way initrd or LVM are handled, not to keep insisting to split /usr! Interestingly enough, such a situation calls for the same /boot split I resented five years ago. I still use LVM without having the rootfs in it, and without needing to split /usr at all.

Speaking of which, most ready-to-install distributions only offer the option of using LVM — it makes sense, as you need to cater for as many systems as possible at once. This is why Gentoo Linux is generally disconnected to the rest: the power of doing things for what you want to use it for, makes it generally possible to skip the overgeneralization, and that’s why we’re virtually the only distribution out there able to work without an initrd.

Another point that came up often is with a system where the space in rootfs was badly allocated, and /usr is being split because there is not enough space. I’m sorry that this is a common issue, and I do know that it’s a pain to re-partition such a system as it involves at least a minimal downtime. But this is why we have workarounds, including the whole initrd thing. I mean, it’s not that difficult to manage, with the initrd, and yes I can understand that it’s more work than just having the whole system boot without /usr — but it’s a sensible way to handle it, in my opinion. It’s work or work, work for everybody under the sun to get split /usr working properly, or work for those who got the estimate wrong and now need the split /usr and you can guess who I prefer doing the work anyway (hint: like everybody in this line of business, I’m lazy).

Some people have said that /usr is often provided on NFS, and a very simple, lightweight rootfs is used in these circumstances — I understand this need, but the current solution to support split /usr is causing the rootfs to not be as simple and lightweight as before — the initrd route in that sense is probably the best option: you just get an initrd to be able to mount the root through NFS, and you’re done. The only problem with this solution is handling if /etc needs to be different from one system to the next, but I’m pretty sure it’s something that can be more easily fixed as well.

I have to be honest, there is one part of /usr that I end up splitting away very often: /usr/lib/debug — the reason is simple: it keeps increasing with the size of the sources, rather than with the size of the compiled code, and with the new versions of the compilers, which add more debug information. I got to a point where the debug files occupied four/five times the size of the rest of the rootfs. But this is quite the exception.

But why would it have to be that much of a problem to keep a split /usr? Well, it’s mostly a matter of what you’re supposed to be able to use without /usr being mounted. For many cases, udev was and is the only problem, as they really don’t want much in the matter of early-boot environment beside being able to start lvm and mount /usr, but the big problem happen if you want to be able to have even a single login with /usr not mounted — because the PAM chain has quite a few dependencies that wouldn’t be available until it’s mounted. Moving PAM itself is not much of an option, and this gets worse, because start-stop-daemon can technically also use chains that, partially, need /usr to be available, and if that happens, no init script using s-s-d would be able to run. And that’s bad.

So, do I like the collapsing of everything in /usr? Maybe not that much because it’s a lot of work to support multiple locations, and to migrate configurations. But at the same time I’m not going to bother, I’ll just keep the rootfs and /usr in the same partition for the time being, and if I have to split something out, it’ll be /var.