LibreSSL, OpenSSL, collisions and problems

Some time ago, on the gentoo-dev mailing list, there has been an interesting thread on the state of LibreSSL in Gentoo. In particular I repeated some of my previous concerns about ABI and API compatibility, especially when trying to keep both libraries on the same system.

While I hope that the problems I pointed out are well clear to the LibreSSL developers, I thought reiterating them again clearly in a blog post would give them a wider reach and thus hope that they can be addressed. Please feel free to reshare in response to people hand waving the idea that LibreSSL can be either a drop-in, or stand-aside replacement for OpenSSL.

Last year, when I first blogged about LibreSSL, I had to write a further clarification as my post was used to imply that you could just replace the OpenSSL binaries with LibreSSL and be done with it. This is not the case and I won’t even go back there. What I’m concerned about this time is whether you can install the two in the same system, and somehow decide which one you want to use on a per-package basis.

Let’s start with the first question: why would you want to do that? Everybody at this point knows that LibreSSL was forked from the OpenSSL code and started removing code that has been needed unnecessary or even dangerous – a very positive thing, given the amount of compatibility kludges around OpenSSL! – and as such it was a subset of the same interface as its parent, thus there would be no reason to wanting the two libraries on the same system.

But then again, LibreSSL never meant to be considered a drop-in replacement, so they haven’t cared as much for the evolution of OpenSSL, and just proceeded in their own direction; said direction included building a new library, libtls, that implements higher-level abstractions of TLS protocol. This vaguely matches the way NSS (the Netscape-now-Mozilla TLS library) is designed, and I think it makes sense: it reduces the amount of repetition that needs to be coded in multiple parts of the software stack to implement HTTPS for instance, reducing the chance of one of them making a stupid mistake.

Unfortunately, this library was originally tied firmly to LibreSSL and there was no way for it to be usable with OpenSSL — I think this has changed recently as a “portable” build of libtls should be available. Ironically, this wouldn’t have been a problem at all if it wasn’t that LibreSSL is not a superset of OpenSSL, as this is where the core of the issue lies.

By far, this is not the first time a problem like this happens in Open Source software communities: different people will want to implement the same concept in different ways. I like to describe this as software biodiversity and I find it generally a good thing. Having more people looking at the same concept from different angles can improve things substantially, especially in regard to finding safe implementations of network protocols.

But there is a problem when you apply parallel evolution to software: if you fork a project and then evolve it on your own agenda, but keep the same library names and a mostly compatible (thus conflicting) API/ABI, you’re going to make people suffer, whether they are developers, consumers, packagers or users.

LibreSSL, libav, Ghostscript, … there are plenty of examples. Since the features of the projects, their API and most definitely their ABIs are not the same, when you’re building a project on top of any of these (or their originators), you’ll end up at some point making a conscious decision on which one you want to rely on. Sometimes you can do that based only on your technical needs, but in most cases you end up with a compromise based on technical needs, licensing concerns and availability in the ecosystem.

These projects didn’t change the name of their libraries, that way they can be used as drop-rebuild replacement for consumers that keep to the greatest common divisor of the interface, but that also means you can’t easily install two of them in the same system. And since most distributions, with the exception of Gentoo, would not really provide the users with choice of multiple implementations, you end up with either a fractured ecosystem, or one that is very much non-diverse.

So if all distributions decide to standardize on one implementation, that’s what the developers will write for. And this is why OpenSSL will likely to stay the standard for a long while still. Of course in this case it’s not as bad as the situation with libav/ffmpeg, as the base featureset is going to be more or less the same, and the APIs that have been dropped up to now, such as the entropy-gathering daemon interface, have been considered A Bad Idea™ for a while, so there are not going to be OpenSSL-only source projects in the future.

What becomes an issue here is that software is built against OpenSSL right now, and you can’t really change this easily. I’ve been told before that this is not true, because OpenBSD switched, but there is a huge difference between all of the BSDs and your usual Linux distributions: the former have much more control on what they have to support.

In particular, the whole base system is released in a single scoop, and it generally includes all the binary packages you can possibly install. Very few third party software providers release binary packages for OpenBSD, and not many more do for NetBSD or FreeBSD. So as long as you either use the binaries provided by those projects or those built by you on the same system, switching the provider is fairly easy.

When you have to support third-party binaries, then you have a big problem, because a given binary may be built against one provider, but depend on a library that depends on the other. So unless you have full control of your system, with no binary packages at all, you’re going to have to provide the most likely provider — which right now is OpenSSL, for good or bad.

Gentoo Linux is, once again, in a more favourable position than many others. As long as you have a full source stack, you can easily choose your provider without considering its popularity. I have built similar stacks before, and my servers deploy stacks similarly, although I have not tried using LibreSSL for any of them yet. But on the desktop it might be trickier, especially if you want to do things like playing Steam games.

But here’s the harsh reality, even if you were to install the libraries in different directories, and you would provide a USE flag to choose between the two, it is not going to be easy to apply the right constraints between final executables and libraries all the way into the tree.

I’m not sure if I have an answer to balance the ability to just make the old software use the new library and the side-installation. I’m scared that a “solution” that can be found to solve this problem is bundling and you can probably figure out that doing so for software like OpenSSL or LibreSSL is a terrible idea, given how fast you should update in response to a security vulnerability.

Project health, and why it’s important — part of the #shellshock afterwords

Tech media has been all the rage this year with trying to hype everything out there as the end of the Internet of Things or the nail on the coffin of open source. A bunch of opinion pieces I found also tried to imply that open source software is to blame, forgetting that the only reason why the security issues found had been considered so nasty is because we know they are widely used.

First there was Heartbleed with its discoverers deciding to spend time setting up a cool name and logo and website for, rather than ensuring it would be patched before it became widely known. Months later, LastPass still tells me that some of the websites I have passwords on have not changed their certificate. This spawned some interest around OpenSSL at least, including the OpenBSD fork which I’m still not sure is going to stick around or not.

Just few weeks ago a dump of passwords caused major stir as some online news sources kept insisting that Google had been hacked. Similarly, people have been insisting for the longest time that it was only Apple’s fault if the photos of a bunch of celebrities were stolen and published on a bunch of sites — and will probably never be expunged from the Internet’s collective conscience.

And then there is the whole hysteria about shellshock which I already dug into. What I promised on that post is looking at the problem from the angle of the project health.

With the term project health I’m referring to a whole set of issues around an open source software project. It’s something that becomes second nature for a distribution packager/developer, but is not obvious to many, especially because it is not easy to quantify. It’s not a function of the number of commits or committers, the number of mailing lists or the traffic in them. It’s an aura.

That OpenSSL’s project health was terrible was no mystery to anybody. The code base in particular was terribly complicated and cater for corner cases that stopped being relevant years ago, and the LibreSSL developers have found plenty of reasons to be worried. But the fact that the codebase was in such a state, and that the developers don’t care to follow what the distributors do, or review patches properly, was not a surprise. You just need to be reminded of the Debian SSL debacle which dates back to 2008.

In the case of bash, the situation is a bit more complex. The shell is a base component of all GNU systems, and is FSF’s choice of UNIX shell. The fact that the man page states clearly It’s too big and too slow. should tip people off but it doesn’t. And it’s not just a matter of extending the POSIX shell syntax with enough sugar that people take it for a programming language and start using them — but that’s also a big problem that caused this particular issue.

The health of bash was not considered good by anybody involved with it on a distribution level. It certainly was not considered good for me, as I moved to zsh years and years ago, and I have been working for over five years years on getting rid of bashisms in scripts. Indeed, I have been pushing, with Roy and others, for the init scripts in Gentoo to be made completely POSIX shell compatible so that they can run with dash or with busybox — even before I was paid to do so for one of the devices I worked on.

Nowadays, the point is probably moot for many people. I think this is the most obvious positive PR for systemd I can think of: no thinking of shells any more, for the most part. Of course it’s not strictly true, but it does solve most of the problems with bashisms in init scripts. And it should solve the problem of using bash as a programming language, except it doesn’t always, but that’s a topic for a different post.

But why were distributors, and Gentoo devs, so wary about bash, way before this happened? The answer is complicated. While bash is a GNU project and the GNU project is the poster child for Free Software, its management has always been sketchy. There is a single developer – The Maintainer as the GNU website calls him, Chet Ramey – and the sole point of contact for him are the mailing lists. The code is released in dumps: a release tarball on the minor version, then every time a new micro version is to be released, a new patch is posted and distributed. If you’re a Gentoo user, you can notice this as when emerging bash, you’ll see all the patches being applied one on top of the other.

There is no public SCM — yes there is a GIT “repository”, but it’s essentially just an import of a given release tarball, and then each released patch applied on top of it as a commit. Since these patches represent a whole point release, and they may be fixing different bugs, related or not, it’s definitely not as useful has having a repository with the intent clearly showing, so that you can figure out what is being done. Reviewing a proper commit-per-change repository is orders of magnitude easier than reviewing a diff in code dumps.

This is not completely unknown in the GNU sphere, glibc has had a terrible track record as well, and only recently, thanks to lots of combined efforts sanity is being restored. This also includes fixing a bunch of security vulnerabilities found or driven into the ground by my friend Tavis.

But this behaviour is essentially why people like me and other distribution developers have been unhappy with bash for years and years, not the particular vulnerability but the health of the project itself. I have been using zsh for years, even though I had not installed it on all my servers up to now (it’s done now), and I have been pushing for Gentoo to move to /bin/sh being provided by dash for a while, at the same time Debian did it already, and the result is that the vulnerability for them is way less scary.

So yeah, I don’t think it’s happenstance that these issues are being found in projects that are not healthy. And it’s not because they are open source, but rather because they are “open source” in a way that does not help. Yes, bash is open source, but it’s not developed like many other projects in the open but behind closed doors, with only one single leader.

So remember this: be open in your open source project, it makes for better health. And try to get more people than you involved, and review publicly the patches that you’re sent!

LibreSSL: drop-in and ABI leakage

There has been some confusion on my previous post with Bob Beck of LibreSSL on whether I would advocate for using a LibreSSL shared object as a drop-in replacement for an OpenSSL shared object. Let me state this here, boldly: you should never, ever, for no reason, use shared objects from different major/minor OpenSSL versions or implementations (such as LibreSSL) as a drop-in replacement for one another.

The reason is, obviously, that the ABI of these libraries differs, sometimes subtly enought that they may actually load and run, but then perform abysmally insecure operations, as its data structures will have changed, and now instead of reading your random-generated key, you may be reading the master private key. nd in general, for other libraries you may even be calling the wrong set of functions, especially for those written in C++, where the vtable content may be rearranged across versions.

What I was discussing in the previous post was the fact that lots of proprietary software packages, by bundling a version of Curl that depends on the RAND_egd() function, will require either unbundling it, or keeping along a copy of OpenSSL to use for runtime linking. And I think that is a problem that people need to consider now rather than later for a very simple reason.

Even if LibreSSL (or any other reimplementation, for what matters) takes foot as the default implementation for all Linux (and not-Linux) distributions, you’ll never be able to fully forget of OpenSSL: not only if you have proprietary software that you maintain, but also because a huge amount of software (and especially hardware) out there will not be able to update easily. And the fact that LibreSSL is throwing away so much of the OpenSSL clutter also means that it’ll be more difficult to backport fixes — while at the same time I think that a good chunk of the black hattery will focus on OpenSSL, especially if it feels “abandoned”, while most of the users will still be using it somehow.

But putting aside the problem of the direct drop-in incompatibilities, there is one more problem that people need to understand, especially Gentoo users, and most other systems that do not completely rebuild their package set when replacing a library like this. The problem is what I would call “ABI leakage”.

Let’s say you have a general libfoo that uses libssl; it uses a subset of the API that works with both OpenSSL. Now you have a bar program that uses libfoo. If the library is written properly, then it’ll treat all the data structures coming from libssl as opaque, providing no way for bar to call into libssl without depending on the SSL API du jour (and thus putting a direct dependency on libssl for the executable). But it’s very well possible that libfoo is not well-written and actually treats the libssl API as transparent. For instance a common mistake is to use one of the SSL data structures inline (rather than as a pointer) in one of its own public structures.

This situation would be barely fine, as long as the data types for libfoo are also completely opaque, as then it’s only the code for libfoo that relies on the structures, and since you’re rebuilding it anyway (as libssl is not ABI-compatible), you solve your problem. But if we keep assuming a worst-case scenario, then you have bar actually dealing with the data structures, for instance by allocating a sized buffer itself, rather than calling into a proper allocation function from libfoo. And there you have a problem.

Because now the ABI of libfoo is not directly defined by its own code, but also by whichever ABI libssl has! It’s a similar problem as the symbol table used as an ABI proxy: while your software will load and run (for a while), you’re really using a different ABI, as libfoo almost certainly does not change its soname when it’s rebuilt against a newer version of libssl. And that can easily cause crashes and worse (see the note above about dropping in LibreSSL as a replacement for OpenSSL).

Now honestly none of this is specific to LibreSSL. The same is true if you were to try using OpenSSL 1.0 shared objects for software built against OpenSSL 0.9 — which is why I cringed any time I heard people suggesting to use symlink at the time, and it seems like people are giving the same suicidal suggestion now with OpenSSL, according to Bob.

So once again, don’t expect binary-compatibility across different versions of OpenSSL, LibreSSL, or any other implementation of the same API, unless they explicitly aim for that (and LibreSSL definitely doesn’t!)

LibreSSL and the bundled libs hurdle

It was over five years ago that I ranted about the bundling of libraries and what that means for vulnerabilities found in those libraries. The world has, since, not really listened. RubyGems still keep insisting that “vendoring” gems is good, Go explicitly didn’t implement a concept of shared libraries, and let’s not even talk about Docker or OSv and their absolutism in static linking and bundling of the whole operating system, essentially.

It should have been obvious how this can be a problem when Heartbleed came out, bundled copies of OpenSSL would have needed separate updates from the system libraries. I guess lots of enterprise users of such software were saved only by the fact that most of the bundlers ended up using older versions of OpenSSL where heartbeat was not implemented at all.

Now that we’re talking about replacing the OpenSSL libraries with those coming from a different project, we’re going to be hit by both edges of the proprietary software sword: bundling and ABI compatibility, which will make things really interesting for everybody.

If you’ve seen my (short, incomplete) list of RAND_egd() users which I posted yesterday. While the tinderbox from which I took this is out of date and needs cleaning, it is a good starting point to figure out the trends, and as somebody already picked up, the bundling is actually strong.

Software that bundled Curl, or even Python, but then relied on the system copy of OpenSSL, will now be looking for RAND_egd() and thus fail. You could be unbundling these libraries, and then use a proper, patched copy of Curl from the system, where the usage of RAND_egd() has been removed, but then again, this is what I’ve been advocating forever or so. With caveats, in the case of Curl.

But now if the use of RAND_egd() is actually coming from the proprietary bits themselves, you’re stuck and you can’t use the new library: you either need to keep around an old copy of OpenSSL (which may be buggy and expose even more vulnerability) or you need a shim library that only provides ABI compatibility against the new LibreSSL-provided library — I’m still not sure why this particular trick is not employed more often, when the changes to a library are only at the interface level but still implements the same functionality.

Now the good news is that from the list that I produced, at least the egd functions never seemed to be popular among proprietary developers. This is expected as egd was vastly a way to implement the /dev/random semantics for non-Linux systems, while the proprietary software that we deal with, at least in the Linux world, can just accept the existence of the devices themselves. So the only problems have to do with unbundling (or replacing) Curl and possibly the Python SSL module. Doing so is not obvious though, as I see from the list that there are at least two copies of libcurl.so.3 which is the older ABI for Curl — although admittedly one is from the scratchbox SDKs which could just as easily be replaced with something less hacky.

Anyway, my current task is to clean up the tinderbox so that it’s in a working state, after which I plan to do a full build of all the reverse dependencies on OpenSSL, it’s very possible that there are more entries that should be in the list, since it was built with USE=gnutls globally to test for GnuTLS 3.0 when it came out.

LibreSSL is taking a beating, and that’s good

When I read about LibreSSL coming from the OpenBSD developers, my first impression was that it was a stunt. I did not change my impression of it drastically still. While I know at least one quite good OpenBSD developer, my impression of the whole set is still the same: we have different concepts of security, and their idea of “cruft” is completely out there for me. But this is a topic for some other time.

So seeing the amount of scrutiny from other who are, like me, skeptical of the OpenBSD people left on their own is a good news. It keeps them honest, as they say. But it also means that things that wouldn’t be otherwise understood by people not used to Linux don’t get shoved under the rug.

This is not idle musings: I still remember (but can’t find now) an article in which Theo boasted not ever having used Linux. And yet kept insisting that his operating system was clearly superior. I was honestly afraid that the way the fork-not-a-fork project was going to be handled was the same, I’m positively happy to be proven wrong up to now.

I actually have been thrilled to see that finally there is movement to replace the straight access to /dev/random and /dev/urandom: Ted’s patch to implement a getrandom() system call that can be made compatible with OpenBSD’s own getentropy() in user space. And even more I’m happy to see that at least one of the OpenBSD/LibreSSL developers pitching in to help shape the interface.

Dropping out the egd support made me puzzled for a moment, but then I realized that there is no point in using egd to feed the randomness to the process, you just need to feed entropy to the kernel, and let the process get it normally. I have had, unfortunately, quite a bit of experience with entropy-generating daemons, and I wonder if this might be the right time to suggest getting a new multi-source daemon out.

So a I going to just blindly trust the OpenBSD people because “they have a good track record”? No. And to anybody that suggest that you can take over lines and lines of code from someone else’s crypto-related project, remove a bunch of code that you think is useless, and have an immediate result, my request is to please stop working with software altogether.

Security Holes
Copyright © Randall Munroe.

I’m not saying that they would do it on purpose, or that they wouldn’t be trying to do the darndest to make LibreSSL a good replacement for OpenSSL. What I’m saying is that I don’t like the way, and the motives, the project was started from. And I think that a reality check, like the one they already got, was due and a good news.

On my side, once the library gets a bit more mileage I’ll be happy to run the tinderbox against it. For now, I’m re-gaining access to Excelsior after a bad kernel update, and I’ll just go and search with elfgrep for which binaries do use the egd functionalities and need to be patched, I’ll post it on Twitter/G+ once I have it. I know it’s not much, but this is what I can do.

Another personal postmortem: Heartbleed

So I promised I would explain why it took me that long to get the Heartbleed problem sorted out. So here’s what was going on.

Early last month, the company I worked for in Los Angeles have been vacating their cabinet at the Los Angeles hosting facility where Excelsior was also hosted. While they have been offering to continue hosting the server, the fact that I could not remotely log into it with the KVM was making it quite difficult for me to update the kernel and to keep updating the LXC ebuild.

I decided to bite the bullet, and enquired for hosting at the same (new) facility of them, Hurricane Electric. While they have very cheap half-rack hosting, I needed to get a full cabinet to host Excelsior on. At $600/mo is not cheap, but I can (barely) afford it right now, the positive side of being an absolute inept socially, and possibly I’ll be able to share it with someone else pretty soon.

The bright side with using Hurricane is that I can rely on nigh-infinite public addresses (IPv6), which is handy for a server like Excelsior that runs basically a farm of virtual machines. The problem is that you don’t only need a server, you also need a switch and a VPN endpoint if you were to put a KVM in there. That’s what I did, I bought a Netgear switch, a Netgear VPN router, and installed the whole thing while I was in the Bay Area for a conference (Percona Live Conference if you’re curious).

Unfortunately, Heartbleed was announced in-between the server being shut down at the previous DC and it being fully operational in the new one — in particular it was still in Los Angeles, but turned down. How does that matter? Well, the vservers where this blog, the xine Bugzilla and other websites run are not powerful enough to build packages for Gentoo, so what I’ve been doing instead is building packages from Excelsior and uploading them to the vservers. This meant that to update OpenSSL, I needed Excelsior running.

Now to get Excelsior running, I spent a full weekend, having to go to the datacenter twice: I couldn’t get the router to work, and after some time my colleague who was kindly driving me there figured out that somehow the switch did not like for port 0 (or 1, depending how you count) on that switch to be set on a VLAN that is not the default, so connecting the router to any other port made it work as expected. I’m still not sure why that is the case.

After that, I was able to update OpenSSL — but the problem was getting a new set of SSL certificates for all the servers. You probably don’t remember my other postmortem, but when my blog’s certificate expired, I was also in the USA, and I had no access to my StartSSL credentials, as they were only on the other laptop. The good news was that I had the same laptop I used that time with me, and I was able to log in and generate new certificates. While at it, I replaced the per-host SNI ones with a wildcard one.

The problem was with the xine certificate: the Class 2 certificate was already issued with the previous user, which I had no access to still (because I never thought I would have needed it), which meant I could not request a revocation of the certificate. Not only StartSSL were able to revoke it for me anyway, but they also did so free of charge (again, kudos to them!).

What is the takeaway from all of this? Well, for sure I need a backup build host for urgent package rebuilds; I think I may rent a more powerful vserver for that somewhere. Also I need a better way to handle my StartSSL credentials: I had my Zenbook with me only because I planned to do the datacenter work. I think I’ll order one of their USB smartcard tokens and use that instead.

I also ordered another a USB SIM-sized card reader, to use with a new OpenPGP card, so expect me advertising a second GPG key (and if I remember this time, I’ll print some business card with the fingerprints). This should make it easier for me to access my servers even if I don’t have a trusted computer with me.

Finally, I need to set up PFS but to do that I need to update Apache to version 2.4 and last time that was a problem with mod_perl. With a bit of luck I can make sure it works now and I can update. There is also the Bugzilla on xine-project that needs to be updated to version 4, hopefully I can do that tonight or tomorrow.

Multiple SSL implementations

If you run ~arch you probably noticed the a few days ago that all Telepathy-based applications failed to connect to Google Talk. The reason for this was a change in GnuTLS 3.1.7 that made it stricter while checking security parameters’ compliance, and in particular required 816-bit primes on default connections, whereas the GTalk servers provided only 768. The change has since been reverted, and version 3.1.8 connects to GTalk by default once again.

Since I hit this the first time I tried to set up KTP on KDE 4.10, I was quite appalled by the incompatibility, and was tempted to stay with Pidgin… but at the end, I found a way around this. The telepathy-gabble package was not doing anything to choose the TLS/SSL backend but deep inside it (both telepathy-gabble and telepathy-salut bundle the wocky XMPP library — I haven’t checked whether it’s the same code, and thus if it has to be unbundled), it’s possible to select between GnuTLS (the default) or OpenSSL. I’ve changed it to OpenSSL and everything went fine. Now this is exposed as an USE flag.

But this made me wonder: does it matter on runtime which backend one’s using? To be obviously honest, one of the reasons why people use GnuTLS over OpenSSL is the licensing concerns, as OpenSSL’s license is, by itself, incompatible with GPL. But the moment when you can ignore the licensing issues, does it make any difference to choose one or the other? It’s a hard question to answer, especially the moment you consider that we’re talking about crypto code, which tends to be written in such a way to be optimized for execution as well as memory. Without going into details of which one is faster in execution, I would assume that OpenSSL’s code is faster simply due to its age and spread (GnuTLS is not too bad, I have some more reserves for libgcrypt but nevermind that now, since it’s no longer used). Furthermore, Nikos himself noted that sometimes better algorithm has been discarded, before, because of the FSF copyright assignment shenanigan which I commented on a couple of months ago.

But more importantly than this, my current line of thought is wondering whether it’s better to have everything GnuTLS or everything OpenSSL — and if it has any difference to have mixed libraries. The size of the libraries themselves is not too different:

        exec         data       rodata        relro          bss     overhead    allocated   filename
      964847        47608       554299       105360        15256       208805      1896175   /usr/lib/libcrypto.so
      241354        25304        97696        12456          240        48248       425298   /usr/lib/libssl.so
       93066         1368        57934         2680           24        14640       169712   /usr/lib/libnettle.so
      753626         8396       232037        30880         2088        64893      1091920   /usr/lib/libgnutls.so

OpenSSL’s two libraries are around 2.21MB of allocated memory, whereas GnuTLS is 1.21 (more like 1.63 when adding GMP, which OpenSSL can optionally use). So in general, GnuTLS uses less memory, and it also has much higher shared-to-private ratios than OpenSSL, which is a good thing, as it means it creates smaller private memory areas. But what about loading both of them together?

On my laptop, after changing the two telepathy backends to use OpenSSL, there is no running process using GnuTLS at all. There are still libraries and programs making use of GnuTLS (and most of them without an OpenSSL backend, at least as far as the ebuild go) — even a dependency of the two backends (libsoup), which still does not load GnuTLS at all here.

While this does mean that I can get rid of GnuTLS on my system (NetworkManager, GStreamer, VLC, and even the gabble backend use it), it means that I don’t have to keep loaded into memory that library at all time. While 1.2MB is not that big a save, it’s still a drop in the ocean of the memory usage.

So, while sometimes what I call “software biodiversity” is a good thing, other times it only means we end up with a bunch of libraries all doing the same thing, all loaded at the same time for different software… oh well, it’s not like it’s the first time this happens…

Choosing an MD5 implementation

_Yes this post might give you a sense of deja vu read but I’m trying to be more informative than ranty, if I can, over two years and a half after the original post…

Yesterday I ranted about gnulib and in particular about the hundred and some copies of the MD5 algorithm that it brings with it. Admittedly, the numbers seem more sensational than they would if you were to count the packages involved, as most of the copies come from GNU octave and another bunch come from GCC and so on so forth.

What definitely surprises me, though, is the way that people rely on said MD5 implementation, like there was no other available. I mean, I can understand GCC not wanting to add any more dependencies that could make it ABI-dependent – just look at the recent, and infamous, gmp bump – but did GNU forget that it has its own hash library ?

There are already enough MD5 implementations out there to fill a long list, so how do you choose which one to use, rather than add a new one onto your set? Well, that’s a tricky question. As far as I can tell, the most prominent factors in choosing an implementation for hash algorithms are non-technical:

Licensing is likely to be the most common issue: relying on OpenSSL’s libcrypto means that you rely on a software the license of which has been deemed enough incompatible with GPL-2 that there is a documented SSL exception that is used by projects using the GNU General Public License together with those libraries. This is the reason why libgcrypt exists, for the most part, but this continues GNU’s series of “let’s reinvent the wheel, and then reinvent it, and again”, GnuTLS (which is supposed to be a replacement to OpenSSL itself) also provides its own implementation. Great.

Availability can be a factor as well: software designed for BSD systems – such as libarchive – will make use of the libcrypto-style interface just fine; the reason is that at least FreeBSD (and I think NetBSD as well) provides those functions in its standard libraries set, making it the obvious available implementation (I wish we had something like that). Adding dependencies on a software is usually a problem, and that’s why gnulib’s used often times (being imported in the project’s sources, it adds no further dependency). So if your average system configuration already contains an acceptable implementation, then that’s what you should go for.

Given that all the interfaces are almost identical one to the other with the exception of the names and structure, and that their actual implementation has to follow the standard to make sense, the lack of many technical reasons in the prominent factors for choosing one library over another is generally understandable. So how should one proceed to choose which one to use? Well, I have some general rule of thumbs that could be useful.

The first is to use something that you already have available or you’re using already in your project: OpenSSL’s libcrypto, GLIB and libav/ffmpeg all provide an implementation of most hash functions. If you already rely on them for some of your needs, simply use their interfaces rather than looking for new ones. If there are bugs in those, or the interface is not good enough, or the performances are not as expected, try to get those fixed instead of cooking your own implementation.

If you are not using any other implementation already, then try to look at libgcrypt first; the reason why I suggest this is because it’s a common library (it is required among others by GnuPG), implementing all the main hashing algorithms, it’s LGPL so you have very few license concerns, and it doesn’t implement any other common interface, so you don’t risk symbol collision issues, as you would if you were to use a libcrypto-compatible library, and at the same time bring in anything else that used OpenSSL — I have seen that happening with Drizzle a longish time ago.

If you’re a contributor to one of the projects that use gnulib’s md5 code… please consider porting it to libgcrypt, all distributors will likely thank you for that!

Additional notes about the smartcard components’ diagram

Yesterday I wrote about smartcard software for Linux providing a complex UML diagram of various components involved in accessing the cards themselves. While I was quite satisfied with the results, and I feel flattered by Alon’s appreciation of it, I’d like to write a few notes about it, since it turns out it was far from being complete.

First of all, I didn’t make it clear that not all cards and tokens allows for the same stack to be used: for instance even though the original scdaemon component of GnuPG allows using both PC/SC and CT-API to interface to the readers, it only works with cards that expose the OpenPGP application (on the card itself); this was a by-design omission, mostly because otherwise the diagram would have felt pretty much unreadable.

One particularly nasty mistake I made related to the presence of OpenSSL in that diagram; I knew that older OpenSSL versions (0.9.x) didn’t have support for PKCS#11 at all, but somehow I assumed that they added support for that on OpenSSL 1.0; turns out I was wrong, and even with the latest version you’re required to use an external engine – a plugin – that makes the PKCS#11 API accessible to OpenSSL itself, and thus the software relying on it. This plugin is as usual developed by the OpenSC project.

Interestingly enough, NSS (used by Firefox, Chromium and Evolution among others) supports PKCS#11 natively, and actually seem to use that very interface to access its internal storage. Somehow, even though it has its own share of complexity problems, it makes me feel much more confident. Unfortunately I haven’t been able to make it work with OpenCryptoki, nor I have found how to properly make use of the OpenSC-based devices to store private and public keys accessible to the browser. Ah, NSS seem also to be the only component that allows access to multiple PKCS#11 providers at once.

Also, I found myself considering the diagram confusing, as software providing similar functionality was present at different heights in the image, with interfaces consumed and provided being connected on any side of the components’ representation. I have thus decided to clean it up, giving more sense to the vertical placement of the various components, from top to bottom: hardware devices, emulators, hardware access, access mediators, API providers, abstractions, libraries and – finally – user applications.

I also decided to change the colour of three interface connections: the one between applications and NSS, the new one between OpenSSL and its engines, and the one between OpenCryptoki and the ICA library (which now has an interface with a “S390” hardware device: I don’t know the architecture enough to understand what it accesses, and IBM documentation leaves a lot to be desired, but it seems to be something limited to their systems). These three are in-process interfaces (like PKCS#11 and, theoretically, PCS/SC and CT-API, more to that in a moment), while SCD, agent interfaces, Trousers and and TPM are out-of-process interfaces (requiring inter-process communication, or communication between the userland process and the kernel services) — I don’t know ICA to tell whether it uses in or out of process communication.

Unfortunately to give the proper details of in-process versus out of process communication, I would need to split up a number of those components in multiple ones, as all three of pcsc-lite and OpenCT use multiple interfaces: an in-process interface to provide the public interface to the application, and an out-of-process interface to access the underlying abstraction; the main difference between the two is where this happens (OpenCT implements all the drivers as standalone processes, and the in-process library accesses those; pcsc-lite instead has a single process that loads multiple driver plugins).

In the new diagram, I have detailed more the interaction between GnuPG, OpenSSH and their agents, simply because that situation is quite complex and even I strained to get it right: GnuPG never speaks directly with either of the SCD implementations: there is always another process, gpg-agent that interfaces them with GnuPG proper, and it is this agent to provide support for OpenSSH as well. The distinction is fundamental to show off that it is quite possible to have the PKCS#11-compatible SCD implementation provide keys for OpenSSH, even though the original ssh-agent (as well as ssh itself) have support for the PKCS#11 interface. And also, the original scdaemon implementation can, in very few cases, speak directly with the USB device via CCID.

I have decided to extend the diagram to incorporate the TPM engine for OpenSSL while at it, making it much more explicit that there is a plugin interface between OpenSSL and PKCS#11. This engine access trousers directly, without going trhough OpenCryptoki.

Speaking about OpenCryptoki, of which I’ll try to write about tomorrow, I decided to expose its internals a bit more than I have done for the rest of the packages for now. Mostly because it’ll make it much easier tomorrow to explain how it works, if I split it out this way. In particular, though, I’d like to note that it is the only component that does not expose a bridge between a daemon process and the applications using it; instead it uses an external daemon to regulate access to the devices, and then users in-process plugin-based communication to abstract access to different source of secure key storage.

OpenCT could use a similar split up, but since I’m not going to write about it for now, I don’t think I’ll work more on this. At any rate, I’m very satisfied with the new diagram. It has more details, and is thus more complex, but it also makes it possible to follow more tightly the interaction between various software, and why it is always so difficult to set up smartcard-based authentication (and hey, I still haven’t added PAM to the mix!)

What’s up with entropy? (lower-case)

You might remember me writing about the EntropyKey — a relatively cheap hardware device that is designed to help the kernel generating random numbers sold by SimTec for which I maintain the main software package (app-crypt/ekeyd).

While my personal crypto knowledge is pretty limited, I have noticed before how bad a depleted entropy can hit ssh sessions and other secure protocols that rely on good random numbers, which is why I got interested in the EntropyKey when I read about it, and ordered it as soon as it was possible, especially as I decided to set up a Gentoo-based x86 router for my home/office network, which would lack most of the usual entropy-gathering sources (keyboard, mouse, disk seeks).

But that’s not been all I cared about, with entropy, anyway: thanks to Pavel, Gentoo gathered support for timer_entropyd as well, which uses the system timers to provide further entropy for the pool. This doesn’t require any special software to work (contrarily to the audio_entropyd and video_entropyd software), even though it does make use of slightly more CPU. This makes it a very good choice for those situations where you don’t want to care about further devices, including the EntropyKey.

Although I wonder if it wouldn’t have made more sense to have a single entropyd software and modules for ALSA, V4L and timer sources. Speaking about which, video_entropyd needs a bit of love, since it fails to build with recent kernel headers because of the V4L1 removal.

Over an year after having started using the EntropyKey, I’m pretty satisfied, although I have a couple of doubts about it, which are mostly related to the software side of the equation.

The first problem seems to be mixed between hardware and software: both on my router and on Yamato when I reboot the system, the key is not properly re-initialised, leaving it in “Unknown” state until I unplug and re-plug it; I haven’t found a way to do this entirely in software, which, I’ll be honest, sucks. Not only it means that I have to have physical access to the system after it reboots, but the router is supposed to work out of the box without me having to do anything, and that doesn’t seem to happen with the EntropyKey.

I can take care of that as long as I live in the same house where the keys are, but that’s not feasible for the systems that are at my customers’. And since neither my home’s gateway (for now) nor (even more) my customers’ are running VPNs or other secure connections requiring huge entropy pools, timer_entropyd seems a more friendly solution indeed — even on an Atom box the amount of CPU that it consumes is negligible. It might be more of an issue once I find a half-decent solution to keep all the backups’ content encrypted.

Another problem relates to interfacing the key with Linux itself; while the kernel’s entropy access works just fine for sending more entropy to the pool, the key needs to be driven by an userland daemon (ekeyd); that daemon can drive the key in either of two options: one is by using the CDC-ACM interface (USB serial interface, used by old-school modems and cellphones), and the other is by relying on a separate userland component using libusb. The reason for two options to exist is not that it provides different advantages, as much as neither does always work correctly.

For instance the CDC-ACM option depends vastly on the version of Linux kernel that one is running, as can be seen by the compatibility table that is on the Key’s website. On the other hand, the userland method seems to be failing on my system, as the userland process seems to be dying from time to time. To be fair, the userland USB method is considered a nasty workaround and is not recommended; even a quick test here graphing entropy data shows that moving from the libusb method to CDC-ACM provides a faster replenishing of the poll.

Thanks to Jeremy I’ve now started using Munin and keeping an eye on the entropy available on the boxes I admin, which includes Yamato itself, the router, the two vservers, the customer’s boxes but, most importantly here, the frontend system I’ve been using daily. The reason why I consider it important to graph the entropy on Raven is that it doesn’t have a Key, and it’s using timer_entropyd instead. On the other hand, it is also a desktop system, and thanks to the “KB SSL” extension for Chrome, most of my network connection go through HTTPS, which should be depleting the entropy quite often. Indeed, what I want to do, once I have a full day graph of Raven’s entropy without interruptions, is comparing that with the sawtooth pattern that you can find on Simtec’s website.

For those wondering why I said “without interruptions”. — I have no intention to let Munin node data be available to any connection, and I’ve not yet looked into binding it up with TLS and verification of certificates, so for now the two vservers are accessed only through SSH port forwarding (and scponly). Also, it looks like Munin doesn’t support IPv6, and my long time customer is behind a double-NAT, with the only connection point I have is creating an SSH connection and forward ports with that. Unfortunately, this method relies on my previously noted tricks which leaves processes running in the background when connection to a box needs to be re-established (for instance because routing to my vservers went away or because the IPv6 address of my customer changed). When that happens, fcron considered the script still running and refuses to start a new job, since I didn’t set the exesev option to allow starting multiple parallel jobs.

And to finish it off, I found a link by looking around about Entropy Broker a software package designed to do more or less what I said above (handle multiple sources of entropy) and distribute them to multiple clients, with a secure transport whereas the Simtec-provided ekey-egd application allows sending data over an already secured channel. I’ll probably look into it, for my local network but more importantly for the vservers, which are very low on entropy, and for which I cannot use timer_entropyd (I don’t have privileges to add data to the pool). As it happens, it OpenSSL, starting version 0.9.7, tries accessing data through the EGD protocol before asking it to the kernel, and that library is the main consumer of entropy for my servers.

Unfortunately packaging it doesn’t look too easy, and thus will probably be delayed for the next few days, especially since my daily job is eating away a number of hours; Plus the package neither use standard build systems, lies about the language it is written in (“EntropyBroker is written in C”, when it’s written in C++), but most importantly, there are a number of daemon services for which init scripts need to be written…

At any rate, that’s a topic for a different post once I can get it to work.