What’s up with entropy? (lower-case)

You might remember me writing about the EntropyKey — a relatively cheap hardware device that is designed to help the kernel generating random numbers sold by SimTec for which I maintain the main software package (app-crypt/ekeyd).

While my personal crypto knowledge is pretty limited, I have noticed before how bad a depleted entropy can hit ssh sessions and other secure protocols that rely on good random numbers, which is why I got interested in the EntropyKey when I read about it, and ordered it as soon as it was possible, especially as I decided to set up a Gentoo-based x86 router for my home/office network, which would lack most of the usual entropy-gathering sources (keyboard, mouse, disk seeks).

But that’s not been all I cared about, with entropy, anyway: thanks to Pavel, Gentoo gathered support for timer_entropyd as well, which uses the system timers to provide further entropy for the pool. This doesn’t require any special software to work (contrarily to the audio_entropyd and video_entropyd software), even though it does make use of slightly more CPU. This makes it a very good choice for those situations where you don’t want to care about further devices, including the EntropyKey.

Although I wonder if it wouldn’t have made more sense to have a single entropyd software and modules for ALSA, V4L and timer sources. Speaking about which, video_entropyd needs a bit of love, since it fails to build with recent kernel headers because of the V4L1 removal.

Over an year after having started using the EntropyKey, I’m pretty satisfied, although I have a couple of doubts about it, which are mostly related to the software side of the equation.

The first problem seems to be mixed between hardware and software: both on my router and on Yamato when I reboot the system, the key is not properly re-initialised, leaving it in “Unknown” state until I unplug and re-plug it; I haven’t found a way to do this entirely in software, which, I’ll be honest, sucks. Not only it means that I have to have physical access to the system after it reboots, but the router is supposed to work out of the box without me having to do anything, and that doesn’t seem to happen with the EntropyKey.

I can take care of that as long as I live in the same house where the keys are, but that’s not feasible for the systems that are at my customers’. And since neither my home’s gateway (for now) nor (even more) my customers’ are running VPNs or other secure connections requiring huge entropy pools, timer_entropyd seems a more friendly solution indeed — even on an Atom box the amount of CPU that it consumes is negligible. It might be more of an issue once I find a half-decent solution to keep all the backups’ content encrypted.

Another problem relates to interfacing the key with Linux itself; while the kernel’s entropy access works just fine for sending more entropy to the pool, the key needs to be driven by an userland daemon (ekeyd); that daemon can drive the key in either of two options: one is by using the CDC-ACM interface (USB serial interface, used by old-school modems and cellphones), and the other is by relying on a separate userland component using libusb. The reason for two options to exist is not that it provides different advantages, as much as neither does always work correctly.

For instance the CDC-ACM option depends vastly on the version of Linux kernel that one is running, as can be seen by the compatibility table that is on the Key’s website. On the other hand, the userland method seems to be failing on my system, as the userland process seems to be dying from time to time. To be fair, the userland USB method is considered a nasty workaround and is not recommended; even a quick test here graphing entropy data shows that moving from the libusb method to CDC-ACM provides a faster replenishing of the poll.

Thanks to Jeremy I’ve now started using Munin and keeping an eye on the entropy available on the boxes I admin, which includes Yamato itself, the router, the two vservers, the customer’s boxes but, most importantly here, the frontend system I’ve been using daily. The reason why I consider it important to graph the entropy on Raven is that it doesn’t have a Key, and it’s using timer_entropyd instead. On the other hand, it is also a desktop system, and thanks to the “KB SSL” extension for Chrome, most of my network connection go through HTTPS, which should be depleting the entropy quite often. Indeed, what I want to do, once I have a full day graph of Raven’s entropy without interruptions, is comparing that with the sawtooth pattern that you can find on Simtec’s website.

For those wondering why I said “without interruptions”. — I have no intention to let Munin node data be available to any connection, and I’ve not yet looked into binding it up with TLS and verification of certificates, so for now the two vservers are accessed only through SSH port forwarding (and scponly). Also, it looks like Munin doesn’t support IPv6, and my long time customer is behind a double-NAT, with the only connection point I have is creating an SSH connection and forward ports with that. Unfortunately, this method relies on my previously noted tricks which leaves processes running in the background when connection to a box needs to be re-established (for instance because routing to my vservers went away or because the IPv6 address of my customer changed). When that happens, fcron considered the script still running and refuses to start a new job, since I didn’t set the exesev option to allow starting multiple parallel jobs.

And to finish it off, I found a link by looking around about Entropy Broker a software package designed to do more or less what I said above (handle multiple sources of entropy) and distribute them to multiple clients, with a secure transport whereas the Simtec-provided ekey-egd application allows sending data over an already secured channel. I’ll probably look into it, for my local network but more importantly for the vservers, which are very low on entropy, and for which I cannot use timer_entropyd (I don’t have privileges to add data to the pool). As it happens, it OpenSSL, starting version 0.9.7, tries accessing data through the EGD protocol before asking it to the kernel, and that library is the main consumer of entropy for my servers.

Unfortunately packaging it doesn’t look too easy, and thus will probably be delayed for the next few days, especially since my daily job is eating away a number of hours; Plus the package neither use standard build systems, lies about the language it is written in (“EntropyBroker is written in C”, when it’s written in C++), but most importantly, there are a number of daemon services for which init scripts need to be written…

At any rate, that’s a topic for a different post once I can get it to work.

4 thoughts on “What’s up with entropy? (lower-case)

  1. i’ve connected all our servers in different datacenters at work in a VPN using tinc.as every server has a munin-node, they all can be queried securely. even those boxes in the office behind NAT.

    Like

  2. Just a datapoint, but I have not noticed any problems with my 2 EntropyKeys and rebooting?I’m running 2.6.36 and 2.6.32 on the two machines (plus some patches: vserver+grsec+aufs) and using the ACM method of integration?Note that I think you are a bit harsh with your implied message, pointing to their ACM / Kernel compatibility table. As near as I can tell the problem kernels are antique now and if you checkout the issues they were relatively minor and easily fixed? (From memory the most “recent” one was an accidental regression and fixed even in a point release?).Overall I’m extremely happy with my entropy key and rather unhappy with timer_entropyd…I have another machine (AMD) running timer_entropyd and it sits there permanently running at 7% cpu… However, it’s my only option (for entropy) as it’s a rented rack mount machine that I can’t touch.Thanks for your work on packaging the EntropyKey ekeyd !!

    Like

  3. The Linux ACM driver is far from issues-free still… just check the 2.6.38.2 changelog, there are three fixes for CDC-ACM: two possible null-pointer dereference and a memory corruption and panic. So reliability of it is a bit… unreliable.But I wouldn’t be surprised if the reboot issue is with the userland USB driver; the reason why I was using it was that ACM actually panic’ed on me before. I’ve moved Yamato to ACM again, I’ll see the router.

    Like

  4. Sure, but going through the changelog for 2.6.38.2 I see plenty of other big name subsystems with far more than three bug fixes. If it took until 2.6.x.2 to stop the problem then it probably wasn’t happening all that easily, and more to the point those problems are now sorted by definition…I suspect that many bugs in the CDC driver are largely corner cases of what is a fairly big spec. It seems likely that a given device will work well or poorly on any given kernel. Further whenever I have seen someone bring up such a problem on the lkml it seems to get a bunch of attention and quick resolution?My conclusion: some problems, but interest in fixing them and overall “solid enough”

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s