The entropy factor

I have been dedicating my last few posts to the EntropyKey limitations and to shooting down Entropy Broker so I guess it is time for me to bring good news to my users, or at least some more usable news.

Let’s start with some positive news about EntropyKey; I have already suggested in the latter post that the problem I was getting with the key not working after a reboot could have been linked to the use of the userland USB access method. I can now confirm this: my router reboots fine with the CDC-based access. Unfortunately this brings up a different point: Linux’s CDC-ACM driver is far from perfect; as I pointed out, Simtec has a compatibility table for the kernel versions that are known to be defective. And even the latest release (2.6.38) is only partially fine — there are fixes in 2.6.38.2 that relates to possible memory corruption and null pointer dereferences in the drivers. But it definitely makes the EntropyKey much more useful than it might have been suggested by my post there.

I also wanted to explain a bit better what I’m trying to look at with entropy, processes creation, and load. One of the two EntropyKey devices I own is connected to Yamato, which is the same box that runs the tinderbox. The entropy on Yamato tends to be quite low in general, and when I was using it as my main workstation, it was feeling sluggish — thus why I got the Key. I started monitoring it lately because I was noticing the Key not behaving properly, and beside the already noted issues with the userland access method, I have noticed a different pattern: when the load (and the running processes) spikes, the entropy fell for a while, without it being replenished in time. I’m still not sure if it’s simply the process creation eating away the entropy (for the ASLR mostly, since Yamato, unlike my router, doesn’t use SSP and thus doesn’t require canaries), or if the build process for ChromiumOS (which is what triggers those spikes) eats up the entropy by itself.

Thanks to upstream as I said before, I know there are somewhere around 4KiB of entropy gathered by the key per second, which means if something drains more than 4KiB/second, /dev/urandom will start re-using entropy. I have discarded the notion that it might have been a priority issue since I made sure to decrease the niceness of the ekeyd daemon, and it didn’t make any significant issue; on the other hand I’m now experimenting with using the same interface designed for vservers that Jaervosz helped getting into Gentoo. This method splits the process reading data from the key and that of sending it to the kernel — a quick check shows me that it does have better results, which would suggest that there are indeed performance-related problems, more than high-level of crypto consumption, at least on basic usage — I have yet to run a full rebuild of ChromiumOS in this configuration, since it’s over 6am and I’ll probably not sleep today.

Actually, moving to the split EGD server/client interface hopefully should also help me by providing more entropy for Raven (my frontend system) without using timer_entropyd, which has a nasty, completely load-linked dip in entropy generated when the load spikes. as Alex points out in the comments of the latter post, the quality of entropy also matters.

In parallel with the entropy issues in Yamato, I have entropy issues with my two vservers, one of which runs this blog, the other runs xine’s bugzilla — and both of which seem to have huge trouble with missing entropy, although in slightly different ways. The former, Vanguard, never reached 1KiB of entropy during the last four days when I monitored it, and its daily range is usually below 300 bytes. The latter, Midas, has a spike in entropy availability during the night, when maintenance tasks run (mostly Bugzilla’s); I’m not sure if it’s because Apache then throttles HTTPS access or something else, but otherwise it also has the same range of entropy.

Since both Vanguard and Midas have an average amount of accesses over HTTPS, it made sense to me to consider this lack of entropy as a problem. Unfortunately running timer_entropyd there is not an option: the virtualisation software used does not let me inject further entropy into the system, which is quite a pain in the lower back side. When I first inspected Entropy Broker, I was thrilled to read that OpenSSL supports accessing EGD sockets directly rather than through the kernel-exposed /dev/yrandom device, since the EGD sockets are handled entirely in userland, and thus wouldn’t hit the limitation of the virtualised envrionment. With Entropy Broker being.. broken, I was still able to track down an alternative to that: PRNGD a pseudo-random number generation daemon that works entirely in user-space by using a number of software sources to provide fast and decent random number sources.

OpenSSL, OpenSSH and other software explicitly provide support for accessing EGD sockets, and explicitly refer to PRNGD as an alternative to the OS-provided interfaces. Unfortunately, neither would work out of the box for Gentoo — the former will look into EGD sockets only if the /dev/urandom and /dev/random devices wouldn’t have enough data to provide enough random bytes as requested (which is never going to happen with /dev/urandom), and has to have its sources’ edited to actually have a chance to use EGD, even so it would (rightly) privilege the /dev/random device; OpenSSH instead requires for an extra parameter to be passed to the ./configure call. For those interested, the ekey-egd-linux package can inject this data back into the kernel, like Entropy Broker would have done if it worked.

A tentative ebuild for prngd is in my overlay but I’m not sure if and when I’ll add it to the main tree. First of all, I haven’t compared the entropy’s quality yet, and it might well be that the PRNGD method is going to give very bad results, but I’m also quite concerned about the sources’ license: it seems to be a mess of as-is, MIT and other licenses altogether. While Gentoo is not as strict as Debian in the license handling, it doesn’t sound like something I’d love to keep around with my name tagged on it.

Sigh, 7am, I’ll probably be awakened in two hours again. I really need to get myself some kind of vacation, somewhere I can spend time reading and/or listening to music all day long for a week. Sigh. I wish!

Entropy Broken

In my previous post I noted the presence of Entropy Broker — software designed to gather entropy from a number of sources, and then concentrate it to be sent to a number of consumers. My main interest in this was to make use of the EGD interface to feed new entropy to OpenSSL so that it wouldn’t deplete the little one available on my two vservers — where I cannot simply push it via timer_entropyd.

Unfortunately it turned out not to be as simple as building it and writing init scripts for it. The software seems abandoned, with the last release back in 2009, but most importantly it doesn’t work.

I have already hinted in the other post that the website lies about the language the software is written in. In particular, the package declares itself as being written in C, when instead it is composed of a number of C++ source files, and uses the C++ compiler to build. Looking at the code, all of it is written in C style; a quick glance to the compiled results shows that nothing is used from the STL; the only two C++ language symbols that the compiled binaries rely on are the generic new and delete operators (_Znwm and _ZdlPv in mangled form). This alone spells bad.

After building, and setting up the basic services needed by Entropy Broker (the eb hub, server_timers ­­– that takes the place of timer_entropyd – and, in a virtual machine, client_linux_kernel), the results aren’t promising. The entropy pool is not replenished on the virtual machine, ever; network traffic is very very limited. The same more or less goes when using the EGD client (which is actually an EGD server acting as an eb-client). Even worse with the server_audio that seems to exit with error after reading a few data points. server_video doesn’t even build since it relies on V4L1 that has been dropped out of Linux 2.6.38 and later.

Returning a moment about my EntropyKey problems with the entropy not staying full, I’ve spoken with the EntropyKey developers briefly today. If those downward spikes happen, it usually is because something is consuming entropy faster than EntropyKey can replenish it, and since the EntropyKey can produce around 4KiB/s of entropy, that means a fast consumption of random data.

As an example, I was told that spawning a process eats 8 bytes of entropy, so something around 500 process spawned in a second would be enough to beat the Key’s ability to replenish the entropy. This might sound a lot but it really isn’t, especially when doing parallel builds, just think that a straight gcc invocation in Gentoo spawns about five processes (the gcc-config wrapper, gcc as the real frontend, cpp to preprocess, cc1 as the real compiler, and as which is the assembler), and that libtool definitely calls many more for handling inputs and outputs. And remember that the tinderbox builds with make -j12 whenever it can.

This seems to match the results I see from the Munin graphs, where entropy is depleted when load spikes, for instance when kicking off a build for ChromiumOS. But now I’m also wondering if the problem is that the ekeyd daemon gets a too low priority when trying to replenish it, which leaves it uncovered — I guess my next step is to add Munin monitoring for the ekeyd data as well as the entropy to see if I can link the two of them. Do note that the load on Yamato can easily reach 60 and over…

And a final word about timer_entropyd… a quick check seems to suggest that it only works correctly on systems that are mostly idle… my frontend system seems to be just fine in such a context, and indeed it does seem to do a good job there (load during the day never reached 1). It doesn’t seem to be a good idea for Yamato with its high load.

What’s up with entropy? (lower-case)

You might remember me writing about the EntropyKey — a relatively cheap hardware device that is designed to help the kernel generating random numbers sold by SimTec for which I maintain the main software package (app-crypt/ekeyd).

While my personal crypto knowledge is pretty limited, I have noticed before how bad a depleted entropy can hit ssh sessions and other secure protocols that rely on good random numbers, which is why I got interested in the EntropyKey when I read about it, and ordered it as soon as it was possible, especially as I decided to set up a Gentoo-based x86 router for my home/office network, which would lack most of the usual entropy-gathering sources (keyboard, mouse, disk seeks).

But that’s not been all I cared about, with entropy, anyway: thanks to Pavel, Gentoo gathered support for timer_entropyd as well, which uses the system timers to provide further entropy for the pool. This doesn’t require any special software to work (contrarily to the audio_entropyd and video_entropyd software), even though it does make use of slightly more CPU. This makes it a very good choice for those situations where you don’t want to care about further devices, including the EntropyKey.

Although I wonder if it wouldn’t have made more sense to have a single entropyd software and modules for ALSA, V4L and timer sources. Speaking about which, video_entropyd needs a bit of love, since it fails to build with recent kernel headers because of the V4L1 removal.

Over an year after having started using the EntropyKey, I’m pretty satisfied, although I have a couple of doubts about it, which are mostly related to the software side of the equation.

The first problem seems to be mixed between hardware and software: both on my router and on Yamato when I reboot the system, the key is not properly re-initialised, leaving it in “Unknown” state until I unplug and re-plug it; I haven’t found a way to do this entirely in software, which, I’ll be honest, sucks. Not only it means that I have to have physical access to the system after it reboots, but the router is supposed to work out of the box without me having to do anything, and that doesn’t seem to happen with the EntropyKey.

I can take care of that as long as I live in the same house where the keys are, but that’s not feasible for the systems that are at my customers’. And since neither my home’s gateway (for now) nor (even more) my customers’ are running VPNs or other secure connections requiring huge entropy pools, timer_entropyd seems a more friendly solution indeed — even on an Atom box the amount of CPU that it consumes is negligible. It might be more of an issue once I find a half-decent solution to keep all the backups’ content encrypted.

Another problem relates to interfacing the key with Linux itself; while the kernel’s entropy access works just fine for sending more entropy to the pool, the key needs to be driven by an userland daemon (ekeyd); that daemon can drive the key in either of two options: one is by using the CDC-ACM interface (USB serial interface, used by old-school modems and cellphones), and the other is by relying on a separate userland component using libusb. The reason for two options to exist is not that it provides different advantages, as much as neither does always work correctly.

For instance the CDC-ACM option depends vastly on the version of Linux kernel that one is running, as can be seen by the compatibility table that is on the Key’s website. On the other hand, the userland method seems to be failing on my system, as the userland process seems to be dying from time to time. To be fair, the userland USB method is considered a nasty workaround and is not recommended; even a quick test here graphing entropy data shows that moving from the libusb method to CDC-ACM provides a faster replenishing of the poll.

Thanks to Jeremy I’ve now started using Munin and keeping an eye on the entropy available on the boxes I admin, which includes Yamato itself, the router, the two vservers, the customer’s boxes but, most importantly here, the frontend system I’ve been using daily. The reason why I consider it important to graph the entropy on Raven is that it doesn’t have a Key, and it’s using timer_entropyd instead. On the other hand, it is also a desktop system, and thanks to the “KB SSL” extension for Chrome, most of my network connection go through HTTPS, which should be depleting the entropy quite often. Indeed, what I want to do, once I have a full day graph of Raven’s entropy without interruptions, is comparing that with the sawtooth pattern that you can find on Simtec’s website.

For those wondering why I said “without interruptions”. — I have no intention to let Munin node data be available to any connection, and I’ve not yet looked into binding it up with TLS and verification of certificates, so for now the two vservers are accessed only through SSH port forwarding (and scponly). Also, it looks like Munin doesn’t support IPv6, and my long time customer is behind a double-NAT, with the only connection point I have is creating an SSH connection and forward ports with that. Unfortunately, this method relies on my previously noted tricks which leaves processes running in the background when connection to a box needs to be re-established (for instance because routing to my vservers went away or because the IPv6 address of my customer changed). When that happens, fcron considered the script still running and refuses to start a new job, since I didn’t set the exesev option to allow starting multiple parallel jobs.

And to finish it off, I found a link by looking around about Entropy Broker a software package designed to do more or less what I said above (handle multiple sources of entropy) and distribute them to multiple clients, with a secure transport whereas the Simtec-provided ekey-egd application allows sending data over an already secured channel. I’ll probably look into it, for my local network but more importantly for the vservers, which are very low on entropy, and for which I cannot use timer_entropyd (I don’t have privileges to add data to the pool). As it happens, it OpenSSL, starting version 0.9.7, tries accessing data through the EGD protocol before asking it to the kernel, and that library is the main consumer of entropy for my servers.

Unfortunately packaging it doesn’t look too easy, and thus will probably be delayed for the next few days, especially since my daily job is eating away a number of hours; Plus the package neither use standard build systems, lies about the language it is written in (“EntropyBroker is written in C”, when it’s written in C++), but most importantly, there are a number of daemon services for which init scripts need to be written…

At any rate, that’s a topic for a different post once I can get it to work.