So I promised I would explain why it took me that long to get the Heartbleed problem sorted out. So here’s what was going on.
Early last month, the company I worked for in Los Angeles have been vacating their cabinet at the Los Angeles hosting facility where Excelsior was also hosted. While they have been offering to continue hosting the server, the fact that I could not remotely log into it with the KVM was making it quite difficult for me to update the kernel and to keep updating the LXC ebuild.
I decided to bite the bullet, and enquired for hosting at the same (new) facility of them, Hurricane Electric. While they have very cheap half-rack hosting, I needed to get a full cabinet to host Excelsior on. At $600/mo is not cheap, but I can (barely) afford it right now, the positive side of being an absolute inept socially, and possibly I’ll be able to share it with someone else pretty soon.
The bright side with using Hurricane is that I can rely on nigh-infinite public addresses (IPv6), which is handy for a server like Excelsior that runs basically a farm of virtual machines. The problem is that you don’t only need a server, you also need a switch and a VPN endpoint if you were to put a KVM in there. That’s what I did, I bought a Netgear switch, a Netgear VPN router, and installed the whole thing while I was in the Bay Area for a conference (Percona Live Conference if you’re curious).
Unfortunately, Heartbleed was announced in-between the server being shut down at the previous DC and it being fully operational in the new one — in particular it was still in Los Angeles, but turned down. How does that matter? Well, the vservers where this blog, the xine Bugzilla and other websites run are not powerful enough to build packages for Gentoo, so what I’ve been doing instead is building packages from Excelsior and uploading them to the vservers. This meant that to update OpenSSL, I needed Excelsior running.
Now to get Excelsior running, I spent a full weekend, having to go to the datacenter twice: I couldn’t get the router to work, and after some time my colleague who was kindly driving me there figured out that somehow the switch did not like for port 0 (or 1, depending how you count) on that switch to be set on a VLAN that is not the default, so connecting the router to any other port made it work as expected. I’m still not sure why that is the case.
After that, I was able to update OpenSSL — but the problem was getting a new set of SSL certificates for all the servers. You probably don’t remember my other postmortem, but when my blog’s certificate expired, I was also in the USA, and I had no access to my StartSSL credentials, as they were only on the other laptop. The good news was that I had the same laptop I used that time with me, and I was able to log in and generate new certificates. While at it, I replaced the per-host SNI ones with a wildcard one.
The problem was with the xine certificate: the Class 2 certificate was already issued with the previous user, which I had no access to still (because I never thought I would have needed it), which meant I could not request a revocation of the certificate. Not only StartSSL were able to revoke it for me anyway, but they also did so free of charge (again, kudos to them!).
What is the takeaway from all of this? Well, for sure I need a backup build host for urgent package rebuilds; I think I may rent a more powerful vserver for that somewhere. Also I need a better way to handle my StartSSL credentials: I had my Zenbook with me only because I planned to do the datacenter work. I think I’ll order one of their USB smartcard tokens and use that instead.
I also ordered another a USB SIM-sized card reader, to use with a new OpenPGP card, so expect me advertising a second GPG key (and if I remember this time, I’ll print some business card with the fingerprints). This should make it easier for me to access my servers even if I don’t have a trusted computer with me.
Finally, I need to set up PFS but to do that I need to update Apache to version 2.4 and last time that was a problem with mod_perl
. With a bit of luck I can make sure it works now and I can update. There is also the Bugzilla on xine-project that needs to be updated to version 4, hopefully I can do that tonight or tomorrow.
I bet you’re on a contract with HE, but if you ever need to move again I’ll gladly offer some gratis space.
Did you notice what the patch was only to limit the damage to 16k rather than 64k? They didn’t do anything about the funky use-after-free model they use for getting the heartbeat data or check the actual length of the heartbeat request against the length of the data received. The heartbleed is still there, clotting but not healed. (Of course, if I were truly motivated here, I’d make my own exploit setup to test my theory.)I’ve decided that for dev-libs/openssl-1.0.1g, the only safe thing for me is USE=”-tls-heartbeat”.That’s too bad. Keeping TLS connections alive to avoid repeated TLS negotiations (and repeated reliance on the PKI) would be a win for long connections. It’s a nice goal, but I think it’s still not implemented right in openssl.
Kevin, thanks for the offer, there are other uses for that cabinet that will appear in the coming months, which is why I ended up choosing that option, keep an eye out on the blog!Mike, I have honestly not tried to make heads and tails of the code myself for various reasons, but I have reasons to trust the people who both worked on the fix and who verified it that this is an actual fix rather than limit the damage.