The tinderbox is dead, long live the tinderbox

I announced it last November and now it became reality: the Tinderbox is no more, in hardware as well as software. Excelsior was taken out of the Hurricane Electric facility in Fremont this past Monday, just before I left for SCALE13x.

Originally the box was hosted by my then-employer, but as of last year, to allow more people to have access to is working, I had it moved to my own rented cabinet, at a figure of $600/month. Not chump change, but it was okay for a while; unfortunately the cost sharing option that was supposed to happen did not happen, and about an year later those $7200 do not feel like a good choice, and this is without delving into the whole insulting behavior of a fellow developer.

Right now the server is lying on the floor of an office in the Mountain View campus of my (current) employer. The future of the hardware is uncertain right now, but it’s more likely than not going to be donated to Gentoo Foundation (minus the HDDs for obvious opsec). I’m likely going to rent a dedicated server of my own for development and testing, as even though they would be less powerful than Excelsior, they would be massively cheaper at €40/month.

The question becomes what we want to do with the idea of a tinderbox — it seems like after I announced the demise people would get together to fix it once and for all, but four months later there is nothing to show that. After speaking with other developers at SCaLE, and realizing I’m probably the only one with enough domain knowledge of the problems I tackled, at this point, I decided it’s time for me to stop running a tinderbox and instead design one.

I’m going to write a few more blog posts to get into the nitty-gritty details of what I plan on doing, but I would like to provide at least a high-level idea of what I’m going to change drastically in the next iteration.

The first difference will be the target execution environment. When I wrote the tinderbox analysis scripts I designed them to run in a mostly sealed system. Because the tinderbox was running at someone else’s cabinet, within its management network, I decided I would not provide any direct access to either the tinderbox container nor the app that would mangle that data. This is why the storage for both the metadata and the logs was Amazon: pushing the data out was easy and did not require me to give access to the system to anyone else.

In the new design this will not be important — not only because it’ll be designed to push the data directly into Bugzilla, but more importantly because I’m not going to run a tinderbox in such an environment. Well, admittedly I’m just not going to run a tinderbox ever again, and will just build the code to do so, but the whole point is that I won’t keep that restriction on to begin with.

And since the data store now is only temporary, I don’t think it’s worth over-optimizing for performance. While I originally considered and dropped the option of storing the logs in PostgreSQL for performance reasons, now this is unlikely to be a problem. Even if the queries would take seconds, it’s not like this is going to be a deal breaker for an app with a single user. Even more importantly, the time taken to create the bug on the Bugzilla side is likely going to overshadow any database inefficiency.

The part that I’ve still got some doubts about is how to push the data from the tinderbox instance to the collector (which may or may not be the webapp that opens the bugs too.) Right now the tinderbox does some analysis through bashrc, leaving warnings in the log — the log is then sent to the collector through -chewing gum and saliva- tar and netcat (yes, really) to maintain one single piece of metadata: the filename.

I would like to be able to collect some metadata on the tinderbox side (namely, emerge --info, which before was cached manually) and send it down to the collector. But adding this much logic is tricky, as the tinderbox should still operate with most of the operating system busted. My original napkin plan involved having the agent written in Go, using Apache Thrift to communicate to the main app, probably written in Django or similar.

The reason why I’m saying that Go would be a good fit is because of one piece of its design I do not like (in the general use case) at all: the static compilation. A Go binary will not break during a system upgrade of any runtime, because it has no runtime; which is in my opinion a bad idea for a piece of desktop or server software, but it’s a godsend in this particular environment.

But the reason for which I was considering Thrift was I didn’t want to look into XML-RPC or JSON-RPC. But then again, Bugzilla supports only those two, and my main concern (the size of the log files) would still be a problem when attaching them to Bugzilla just as much. Since Thrift would require me to package it for Gentoo (seems like nobody did yet), while JSON-RPC is already supported in Go, I think it might be a better idea to stick with the JSON. Unfortunately Go does not support UTF-7 which would make escaping binary data much easier.

Now what remains a problem is filing the bug and attaching the log to Bugzilla. If I were to write that part of the app in Python, it would be just a matter of using the pybugz libraries to handle it. But with JSON-RPC it should be fairly easy to implement support for it from scratch (unlike XML-RPC) so maybe it’s worth just doing the whole thing in Go, and reduce the proliferation of languages in use for such a project.

Python will remain in use for the tinderbox runner. Actually if anything I would like to remove the bash wrapper I’ve written and do the generation and selection of which packages to build in Python. It would also be nice if it could handle the USE mangling by itself, but that’s difficult due to the sad conflicting requirements of the tree.

But this is enough details for the moment; I’ll go back to thinking the implementation through and add more details about that as I get to them.

The end of an era, the end of the tinderbox

I’m partly sad, but for the most part this is a weight that goes away from my shoulders, so I can’t say I’m not at least in part joyful of it, even though the context in which this is happening is not exactly what I expected.

I turned off the Gentoo tinderbox, never to come back. The S3 storage of logs is still running, but I’ve asked Ian to see if he can attach everything at his pace, so I can turn off the account and be done with it.

Why did this happen? Well, it’s a long story. I already stopped running it for a few months because I got tired of Mike behaving like a child, like I already reported in 2012 by closing my bugs because the logs are linked (from S3) rather than attached. I already made my position clear that it’s a silly distinction as the logs will not disappear in the middle of nowhere (indeed I’ll keep the S3 bucket for them running until they are all attached to Bugzilla), but as he keeps insisting that it’s “trivial” to change the behaviour of the whole pipeline, I decided to give up.

Yes, it’s only one developer, and yes, lots of other developers took my side (thanks guys!), but it’s still aggravating to have somebody who can do whatever he likes without reporting to anybody, ignoring Council resolutions, QA (when I was the lead) and essentially using Gentoo as his personal playground. And the fact that only two people (Michał and Julian) have been pushing for a proper resolution is a bit disappointing.

I know it might feel like I’m taking my toys and going home — well, that’s what I’m doing. The tinderbox has been draining on my time (little) and my money (quite more), but those I was willing to part with — draining my motivation due to assholes in the project was not in the plans.

In the past six years that I’ve been working on this particular project, things evolved:

  • Originally, it was a simple chroot with a looping emerge, inspected with grep and Emacs, running on my desktop and intended to catch --as-needed failures. It went through lots of disks, and got me off XFS for good due to kernel panics.
  • It was moved to LXC, which is why the package entered the Gentoo tree, together with the OpenRC support and the first few crude hacks.
  • When I started spendig time in Los Angeles for a customer, Yamato under my desk got replaced with Excelsior which was crowdfounded and hosted, for two years straight, by my customer at the time.
  • This is where the rewrite happened, from attaching logs (which I could earlier do with more or less ease, thanks to NFS) to store them away and linking instead. This had to do mostly with the ability to remote-manage the tinderbox.
  • This year, since I no longer work for the company in Los Angeles, and instead I work in Dublin for a completely different company, I decided Excelsior was better off on a personal space, and rented a full 42 unit cabinet with Hurricane Electric in Fremont, where the server is still running as I type this.

You can see that it’s not that ’m trying to avoid spending time to engineer solutions. It’s just that I feel that what Mike is asking is unreasonable, and the way he’s asking it makes it unbearable. Especially when he feigns to care about my expenses — as I noted in the previously linked post, S3 is dirty cheap, and indeed it now comes down to $1/month given to Amazon for the logs storage and access, compared to $600/month to rent the cabinet at Hurricane.

Yes, it’s true that the server is not doing only tinderboxing – it also is running some fate instances, and I have been using it as a development server for my own projects, mostly open-source ones – but that’s the original use for it, and if it wasn’t for it I wouldn’t be paying so much to rent a cabinet, I’d be renting a single dedicated server off, say, Hetzner.

So here we go, the end of the era of my tinderbox. Patrick and Michael are still continuing their efforts so it’s not like Gentoo is left without integration test, but I’m afraid it’ll be harder for at least some of the maintainers who leveraged the tinderbox heavily in the past. My contract with Hurricane expires in April; at that point I’ll get the hardware out of the cabinet, and will decide what to do with it — it’s possible I’ll donate the server (minus harddrives) to Gentoo Foundation or someone else who can use it.

My involvement in Gentoo might also suffer from this; I hopefully will be dropping one of the servers I maintain off the net pretty soon, which will be one less system to build packages for, but I still have a few to take care of. For the moment I’m taking a break: I’ll soon send an email that it’s open season on my packages; I locked my bugzilla account already to avoid providing harsher responses in the bug linked at the top of this post.

On professionalism (my first and last post explicitly about systemd)

I have been trying my best not to comment on systemd one way or another for a while. For the most part because I don’t want to have a trollfest on my blog, because moderating it is something I hate and I’m sure would be needed. On the other hand it seems like people start to bring me in the conversation now from time to time.

What I would like to point out at this point is that both extreme sides of the vision are, in my opinion, behaving childishly and being totally unprofessional. Whether it is name-calling of the people or the software, death threats, insults, satirical websites, labelling of 300 people for a handful of them, etc.

I don’t think I have been as happy to have a job that allows me not to care about open source as much as I did before as in the past few weeks as things keep escalating and escalating. You guys are the worst. And again I refer to both supporters and detractors, devs of systemd, devs of eudev, Debian devs and Gentoo devs, and so on so forth.

And the reason why I say this is because you both want to bring this to extremes that I think are totally uncalled for. I don’t see the world in black and white and I think I said that before. Gray is nuanced and interesting, and needs skills to navigate, so I understand it’s easier to just take a stand and never revise your opinion, but the easy way is not what I care about.

Myself, I decided to migrate my non-server systems to systemd a few months ago. It works fine. I’ve considered migrating my servers, and I decided for the moment to wait. The reason is technical for the most part: I don’t think I trust the stability promises for the moment and I don’t reboot servers that often anyway.

There are good things to the systemd design. And I’m sure that very few people will really miss sysvinit as is. Most people, especially in Gentoo, have not been using sysvinit properly, but rather through OpenRC, which shares more spirit with systemd than sysv, either by coincidence or because they are just the right approach to things (declarativeness to begin with).

At the same time, I don’t like Lennart’s approach on this to begin with, and I don’t think it’s uncalled for to criticize the product based on the person in this case, as the two are tightly coupled. I don’t like moderating people away from a discussion, because it just ends up making the discussion even more confrontational on the next forum you stumble across them — this is why I never blacklisted Ciaran and friends from my blog even after a group of them started pasting my face on pictures of nazi soldiers from WW2. Yes I agree that Gentoo has a good chunk of toxic supporters, I wish we got rid of them a long while ago.

At the same time, if somebody were to try to categorize me the same way as the people who decided to fork udev without even thinking of what they were doing, I would want to point out that I was reproaching them from day one for their absolutely insane (and inane) starting announcement and first few commits. And I have not been using it ever, since for the moment they seem to have made good on the promise of not making it impossible to run udev without systemd.

I don’t agree with the complete direction right now, and especially with the one-size-fit-all approach (on either side!) that tries to reduce the “software biodiversity”. At the same time there are a few designs that would be difficult for me to attack given that they were ideas of mine as well, at some point. Such as the runtime binary approach to hardware IDs (that Greg disagreed with at the time and then was implemented by systemd/udev), or the usage of tmpfs ACLs to allow users at the console to access devices — which was essentially my original proposal to get rid of pam_console (that played with owners instead, making it messy when having more than one user at console), when consolekit and its groups-fiddling was introduced (groups can be used for setgid, not a good idea).

So why am I posting this? Mostly to tell everybody out there that if you plan on using me for either side point to be brought home, you can forget about it. I’ll probably get pissed off enough to try to prove the exact opposite, and then back again.

Neither of you is perfectly right. You both make mistake. And you both are unprofessional. Try to grow up.

Edit: I mistyped eudev in the original article and it read euscan. Sorry Corentin, was thinking one thing and typing another.

Packaging for the Yubikey NEO, a frustrating adventure

I have already posted a howto on how to set up the YubiKey NEO and YubiKey NEO-n for U2F, and I promised I would write a bit more on the adventure to get the software packaged in Gentoo.

You have to realize at first that my relationship with Yubico has not always being straightforward. I have at least once decided against working on the Yubico set of libraries in Gentoo because I could not get a hold of a device as I wanted to use it. But luckily now I was able to place an order with them (for some two thousands euro) and I have my devices.

But Yubico’s code is usually quite well written, and designed to be packaged much more easily than most other device-specific middleware, so I cannot complain too much. Indeed, they split and release separately different libraries with different goals, so that you don’t need to wait for enough magnitude to be pulled for them to make a new release. They also actively maintain their code in GitHub, and then push proper make dist releases on their website. They are in many ways a packager’s dream company.

But let’s get back to the devices themselves. The NEO and NEO-n come with three different interfaces: OTP (old-style YubiKey, just much longer keys), CCID (Smartcard interface) and U2F. By default the devices are configured as OTP only, which I find a bit strange to be honest. It is also the case that at the moment you cannot enable both U2F and OTP modes, I assume because there is a conflict on how the “touch” interaction behaves, indeed there is a touch-based interaction on the CCID mode that gets entirely disabled once enabling either of U2F or OTP, but the two can’t share.

What is not obvious from the website is that to enable U2F (or CCID) modes, you need to use yubikey-neo-manager, an open-source app that can reconfigure the basics of the Yubico device. So I had to package the app for Gentoo of course, together with its dependencies, which turned out to be two libraries (okay actually three, but the third one sys-auth/ykpers was already packaged in Gentoo — and actually originally committed by me with Brant proxy-maintaining it, the world is small, sometimes). It was not too bad but there were a few things that might be worth noting down.

First of all, I had to deal with dev-libs/hidapi that allows programmatic access to raw HID USB devices: the ebuild failed for me, both because it was not depending on udev, and because it was unable to find the libusb headers — turned out to be caused by bashisms in the configure.ac file, which became obvious as I moved to dash. I have now fixed the ebuild and sent a pull request upstream.

This was the only real hard part at first, since the rest of the ebuilds, for app-crypt/libykneomgr and app-crypt/yubikey-neo-manager were mostly straightforward — only I had to figure out how to install a Python package as I never did so before. It’s actually fun how distutils will error out with a violation of install paths if easy_install tries to bring in a non-installed package such as nose, way before the Portage sandbox triggers.

The problems started when trying to use the programs, doubly so because I don’t keep a copy of the Gentoo tree on the laptop, so I wrote the ebuilds on the headless server and then tried to run them on the actual hardware. First of all, you need to have access to the devices to be able to set them up; the libu2f-host package will install udev rules to allow the plugdev group access to the hidraw devices — but it also needed a pull request to fix them. I also added an alternative version of the rules for systemd users that does not rely on the group but rather uses the ACL support (I was surprised, I essentially suggested the same approach to replace pam_console years ago!)

Unfortunately that only works once the device is already set in U2F mode, which does not work when you’re setting up the NEO for the first time, so I originally set it up using kdesu. I have since decided that the better way is to use the udev rules I posted in my howto post.

After this, I switched off OTP, and enabled U2F and CCID interfaces on the device — and I couldn’t make it stick, the manager would keep telling me that the CCID interface was disabled, even though the USB descriptor properly called it “Yubikey NEO U2F+CCID”. It took me a while to figure out that the problem was in the app-crypt/ccid driver, and indeed the change log for the latest version points out support for specifically the U2F+CCID device.

I have updated the ebuilds afterwards, not only to depend on the right version of the CCID driver – the README for libykneomgr does tell you to install pcsc-lite but not about the CCID driver you need – but also to check for the HIDRAW kernel driver, as otherwise you won’t be able to either configure or use the U2F device for non-Google domains.

Now there is one more part of the story that needs to be told, but in a different post: getting GnuPG to work with the OpenPGP applet on the NEO-n. It was not as straightforward as it could have been and it did lead to disappointment. I’ll be a good post for next week.

Setting up Yubikey NEO and U2F on Gentoo (and Linux in general)

When the Google Online Security blog announced earlier this week the general availability of Security Key, everybody at the office was thrilled, as we’ve been waiting for the day for a while. I’ve been using this for a while already, and my hope is for it to be easy enough for my mother and my sister, as well as my friends, to start using it.

While the promise is for a hassle-free second factor authenticator, it turns out it might not be as simple as originally intended, at least on Linux, at least right now.

Let’s start with the hardware, as there are four different options of hardware that you can choose from:

  • Yubico FIDO U2F which is a simple option only supporting the U2F protocol, no configuration needed;
  • Plug-up FIDO U2F which is a cheaper alternative for the same features — I have not witnessed whether it is as sturdy as the Yubico one, so I can’t vouch for it;
  • Yubikey NEO which provides multiple interface, including OTP (not usable together with U2F), OpenPGP and NFC;
  • Yubikey NEO-n the same as above, without NFC, and in a very tiny form factor designed to be left semi-permanently in a computer or laptop.

I got the NEO, but mostly to be used with LastPass ­– the NFC support allows you to have 2FA on the phone without having to type it back from a computer – and a NEO-n to leave installed on one of my computers. I already had a NEO from work to use as well. The NEO requires configuration, so I’ll get back at it in a moment.

The U2F devices are accessible via hidraw, a driverless access protocol for USB devices, originally intended for devices such as keyboards and mice but also leveraged by UPSes. What happen though is that you need access to the device, that the Linux kernel will make by default accessible only by root, for good reasons.

To make the device accessible to you, the user actually at the keyboard of the computer, you have to use udev rules, and those are, as always, not straightforward. My personal hacky choice is to make all the Yubico devices accessible — the main reason being that I don’t know all of the compatible USB Product IDs, as some of them are not really available to buy but come from instance from developer mode devices that I may or may not end up using.

If you’re using systemd with device ACLs (in Gentoo, that would be sys-apps/systemd with acl USE flag enabled), you can do it with a file as follows:

# /etc/udev/rules.d/90-u2f-securitykey.rules
ATTRS{idVendor}=="1050", TAG+="uaccess"
ATTRS{idVendor}=="2581", ATTRS{idProduct}=="f1d0", TAG+="uaccess"

If you’re not using systemd or ACLs, you can use the plugdev group and instead do it this way:

# /etc/udev/rules.d/90-u2f-securitykey.rules
ATTRS{idVendor}=="1050", GROUP="plugdev", MODE="0660"
ATTRS{idVendor}=="2581", ATTRS{idProduct}=="f1d0", GROUP="plugdev", MODE="0660"

-These rules do not include support for the Plug-up because I have no idea what their VID/PID pairs are, I asked Janne who got one so I can amend this later.- Edit: added the rules for the Plug-up device. Cute their use of f1d0 as device id.

Also note that there are properly less hacky solutions to get the ownership of the devices right, but I’ll leave it to the systemd devs to figure out how to include in the default ruleset.

These rules will not only allow your user to access /dev/hidraw0 but also to the /dev/bus/usb/* devices. This is intentional: Chrome (and Chromium, the open-source version works as well) use the U2F devices in two different modes: one is through a built-in extension that works with Google assets, and it accesses the low-level device as /dev/bus/usb/*, the other is through a Chrome extension which uses /dev/hidraw* and is meant to be used by all websites. The latter is the actually standardized specification and how you’re supposed to use it right now. I don’t know if the former workflow is going to be deprecated at some point, but I wouldn’t be surprised.

For those like me who bought the NEO devices, you’ll have to enable the U2F mode — while Yubico provides the linked step-by-step guide, it was not really completely correct for me on Gentoo, but it should be less complicated now: I packaged the app-crypt/yubikey-neo-manager app, which already brings in all the necessary software, including the latest version of app-crypt/ccid required to use the CCID interface on U2F-enabled NEOs. And if you already created the udev rules file as I noted above, it’ll work without you using root privileges. Just remember that if you are interested in the OpenPGP support you’ll need the pcscd service (it should auto-start with both OpenRC and systemd anyway).

I’ll recount separately the issues with packaging the software. In the mean time make sure you keep your accounts safe, and let’s all hope that more sites will start protecting your accounts with U2F — I’ll also write a separate opinion piece on why U2F is important and why it is better than OTP, this is just meant as documentation, howto set up the U2F devices on your Linux systems.

What does #shellshock mean for Gentoo?

Gentoo Penguins with chicks at Jougla Point, Antarctica
Photo credit: Liam Quinn

This is going to be interesting as Planet Gentoo is currently unavailable as I write this. I’ll try to send this out further so that people know about it.

By now we have all been doing our best to update our laptops and servers to the new bash version so that we are safe from the big scare of the quarter, shellshock. I say laptop because the way the vulnerability can be exploited limits the impact considerably if you have a desktop or otherwise connect only to trusted networks.

What remains to be done is to figure out how to avoid this repeats. And that’s a difficult topic, because a 25 years old bug is not easy to avoid, especially because there are probably plenty of siblings of it around, that we have not found yet, just like this last week. But there are things that we can do as a whole environment to reduce the chances of problems like this to either happen or at least avoid that they escalate so quickly.

In this post I want to look into some things that Gentoo and its developers can do to make things better.

The first obvious thing is to figure out why /bin/sh for Gentoo is not dash or any other very limited shell such as BusyBox. The main answer lies in the init scripts that still use bashisms; this is not news, as I’ve pushed for that four years ago, while Roy insisted on it even before that. Interestingly enough, though, this excuse is getting less and less relevant thanks to systemd. It is indeed, among all the reasons, one I find very much good in Lennart’s design: we want declarative init systems, not imperative ones. Unfortunately, even systemd is not as declarative as it was originally supposed to be, so the init script problem is half unsolved — on the other hand, it does make things much easier, as you have to start afresh anyway.

If either all your init scripts are non-bash-requiring or you’re using systemd (like me on the laptops), then it’s mostly safe to switch to use dash as the provider for /bin/sh:

# emerge eselect-sh
# eselect sh set dash

That will change your /bin/sh and make it much less likely that you’d be vulnerable to this particular problem. Unfortunately as I said it’s mostly safe. I even found that some of the init scripts I wrote, that I checked with checkbashisms did not work as intended with dash, fixes are on their way. I also found that the lsb_release command, while not requiring bash itself, uses non-POSIX features, resulting in garbage on the output — this breaks facter-2 but not facter-1, I found out when it broke my Puppet setup.

Interestingly it would be simpler for me to use zsh, as then both the init script and lsb_release would have worked. Unfortunately when I tried doing that, Emacs tramp-mode froze when trying to open files, both with sshx and sudo modes. The same was true for using BusyBox, so I decided to just install dash everywhere and use that.

Unfortunately it does not mean you’ll be perfectly safe or that you can remove bash from your system. Especially in Gentoo, we have too many dependencies on it, the first being Portage of course, but eselect also qualifies. Of the two I’m actually more concerned about eselect: I have been saying this from the start, but designing such a major piece of software – that does not change that often – in bash sounds like insanity. I still think that is the case.

I think this is the main problem: in Gentoo especially, bash has always been considered a programming language. That’s bad. Not only because it only has one reference implementation, but it also seem to convince other people, new to coding, that it’s a good engineering practice. It is not. If you need to build something like eselect, you do it in Python, or Perl, or C, but not bash!

Gentoo is currently stagnating, and that’s hard to deny. I’ve stopped being active since I finally accepted stable employment – I’m almost thirty, it was time to stop playing around, I needed to make a living, even if I don’t really make a life – and QA has obviously taken a step back (I still have a non-working dev-python/imaging on my laptop). So trying to push for getting rid of bash in Gentoo altogether is not a good deal. On the other hand, even though it’s going to be probably too late to be relevant, I’ll push for having a Summer of Code next year to convert eselect to Python or something along those lines.

Myself, I decided that the current bashisms in the init scripts I rely upon on my servers are simple enough that dash will work, so I pushed that through puppet to all my servers. It should be enough, for the moment. I expect more scrutiny to be spent on dash, zsh, ksh and the other shells in the next few months as people migrate around, or decide that a 25 years old bug is enough to think twice about all of them, o I’ll keep my options open.

This is actually why I like software biodiversity: it allows to have options to select different options when one components fail, and that is what worries me the most with systemd right now. I also hope that showing how bad bash has been all this time with its closed development will make it possible to have a better syntax-compatible shell with a proper parser, even better with a proper librarised implementation. But that’s probably hoping too much.

LibreSSL: drop-in and ABI leakage

There has been some confusion on my previous post with Bob Beck of LibreSSL on whether I would advocate for using a LibreSSL shared object as a drop-in replacement for an OpenSSL shared object. Let me state this here, boldly: you should never, ever, for no reason, use shared objects from different major/minor OpenSSL versions or implementations (such as LibreSSL) as a drop-in replacement for one another.

The reason is, obviously, that the ABI of these libraries differs, sometimes subtly enought that they may actually load and run, but then perform abysmally insecure operations, as its data structures will have changed, and now instead of reading your random-generated key, you may be reading the master private key. nd in general, for other libraries you may even be calling the wrong set of functions, especially for those written in C++, where the vtable content may be rearranged across versions.

What I was discussing in the previous post was the fact that lots of proprietary software packages, by bundling a version of Curl that depends on the RAND_egd() function, will require either unbundling it, or keeping along a copy of OpenSSL to use for runtime linking. And I think that is a problem that people need to consider now rather than later for a very simple reason.

Even if LibreSSL (or any other reimplementation, for what matters) takes foot as the default implementation for all Linux (and not-Linux) distributions, you’ll never be able to fully forget of OpenSSL: not only if you have proprietary software that you maintain, but also because a huge amount of software (and especially hardware) out there will not be able to update easily. And the fact that LibreSSL is throwing away so much of the OpenSSL clutter also means that it’ll be more difficult to backport fixes — while at the same time I think that a good chunk of the black hattery will focus on OpenSSL, especially if it feels “abandoned”, while most of the users will still be using it somehow.

But putting aside the problem of the direct drop-in incompatibilities, there is one more problem that people need to understand, especially Gentoo users, and most other systems that do not completely rebuild their package set when replacing a library like this. The problem is what I would call “ABI leakage”.

Let’s say you have a general libfoo that uses libssl; it uses a subset of the API that works with both OpenSSL. Now you have a bar program that uses libfoo. If the library is written properly, then it’ll treat all the data structures coming from libssl as opaque, providing no way for bar to call into libssl without depending on the SSL API du jour (and thus putting a direct dependency on libssl for the executable). But it’s very well possible that libfoo is not well-written and actually treats the libssl API as transparent. For instance a common mistake is to use one of the SSL data structures inline (rather than as a pointer) in one of its own public structures.

This situation would be barely fine, as long as the data types for libfoo are also completely opaque, as then it’s only the code for libfoo that relies on the structures, and since you’re rebuilding it anyway (as libssl is not ABI-compatible), you solve your problem. But if we keep assuming a worst-case scenario, then you have bar actually dealing with the data structures, for instance by allocating a sized buffer itself, rather than calling into a proper allocation function from libfoo. And there you have a problem.

Because now the ABI of libfoo is not directly defined by its own code, but also by whichever ABI libssl has! It’s a similar problem as the symbol table used as an ABI proxy: while your software will load and run (for a while), you’re really using a different ABI, as libfoo almost certainly does not change its soname when it’s rebuilt against a newer version of libssl. And that can easily cause crashes and worse (see the note above about dropping in LibreSSL as a replacement for OpenSSL).

Now honestly none of this is specific to LibreSSL. The same is true if you were to try using OpenSSL 1.0 shared objects for software built against OpenSSL 0.9 — which is why I cringed any time I heard people suggesting to use symlink at the time, and it seems like people are giving the same suicidal suggestion now with OpenSSL, according to Bob.

So once again, don’t expect binary-compatibility across different versions of OpenSSL, LibreSSL, or any other implementation of the same API, unless they explicitly aim for that (and LibreSSL definitely doesn’t!)

LibreSSL and the bundled libs hurdle

It was over five years ago that I ranted about the bundling of libraries and what that means for vulnerabilities found in those libraries. The world has, since, not really listened. RubyGems still keep insisting that “vendoring” gems is good, Go explicitly didn’t implement a concept of shared libraries, and let’s not even talk about Docker or OSv and their absolutism in static linking and bundling of the whole operating system, essentially.

It should have been obvious how this can be a problem when Heartbleed came out, bundled copies of OpenSSL would have needed separate updates from the system libraries. I guess lots of enterprise users of such software were saved only by the fact that most of the bundlers ended up using older versions of OpenSSL where heartbeat was not implemented at all.

Now that we’re talking about replacing the OpenSSL libraries with those coming from a different project, we’re going to be hit by both edges of the proprietary software sword: bundling and ABI compatibility, which will make things really interesting for everybody.

If you’ve seen my (short, incomplete) list of RAND_egd() users which I posted yesterday. While the tinderbox from which I took this is out of date and needs cleaning, it is a good starting point to figure out the trends, and as somebody already picked up, the bundling is actually strong.

Software that bundled Curl, or even Python, but then relied on the system copy of OpenSSL, will now be looking for RAND_egd() and thus fail. You could be unbundling these libraries, and then use a proper, patched copy of Curl from the system, where the usage of RAND_egd() has been removed, but then again, this is what I’ve been advocating forever or so. With caveats, in the case of Curl.

But now if the use of RAND_egd() is actually coming from the proprietary bits themselves, you’re stuck and you can’t use the new library: you either need to keep around an old copy of OpenSSL (which may be buggy and expose even more vulnerability) or you need a shim library that only provides ABI compatibility against the new LibreSSL-provided library — I’m still not sure why this particular trick is not employed more often, when the changes to a library are only at the interface level but still implements the same functionality.

Now the good news is that from the list that I produced, at least the egd functions never seemed to be popular among proprietary developers. This is expected as egd was vastly a way to implement the /dev/random semantics for non-Linux systems, while the proprietary software that we deal with, at least in the Linux world, can just accept the existence of the devices themselves. So the only problems have to do with unbundling (or replacing) Curl and possibly the Python SSL module. Doing so is not obvious though, as I see from the list that there are at least two copies of libcurl.so.3 which is the older ABI for Curl — although admittedly one is from the scratchbox SDKs which could just as easily be replaced with something less hacky.

Anyway, my current task is to clean up the tinderbox so that it’s in a working state, after which I plan to do a full build of all the reverse dependencies on OpenSSL, it’s very possible that there are more entries that should be in the list, since it was built with USE=gnutls globally to test for GnuTLS 3.0 when it came out.

LibreSSL is taking a beating, and that’s good

When I read about LibreSSL coming from the OpenBSD developers, my first impression was that it was a stunt. I did not change my impression of it drastically still. While I know at least one quite good OpenBSD developer, my impression of the whole set is still the same: we have different concepts of security, and their idea of “cruft” is completely out there for me. But this is a topic for some other time.

So seeing the amount of scrutiny from other who are, like me, skeptical of the OpenBSD people left on their own is a good news. It keeps them honest, as they say. But it also means that things that wouldn’t be otherwise understood by people not used to Linux don’t get shoved under the rug.

This is not idle musings: I still remember (but can’t find now) an article in which Theo boasted not ever having used Linux. And yet kept insisting that his operating system was clearly superior. I was honestly afraid that the way the fork-not-a-fork project was going to be handled was the same, I’m positively happy to be proven wrong up to now.

I actually have been thrilled to see that finally there is movement to replace the straight access to /dev/random and /dev/urandom: Ted’s patch to implement a getrandom() system call that can be made compatible with OpenBSD’s own getentropy() in user space. And even more I’m happy to see that at least one of the OpenBSD/LibreSSL developers pitching in to help shape the interface.

Dropping out the egd support made me puzzled for a moment, but then I realized that there is no point in using egd to feed the randomness to the process, you just need to feed entropy to the kernel, and let the process get it normally. I have had, unfortunately, quite a bit of experience with entropy-generating daemons, and I wonder if this might be the right time to suggest getting a new multi-source daemon out.

So a I going to just blindly trust the OpenBSD people because “they have a good track record”? No. And to anybody that suggest that you can take over lines and lines of code from someone else’s crypto-related project, remove a bunch of code that you think is useless, and have an immediate result, my request is to please stop working with software altogether.

Security Holes
Copyright © Randall Munroe.

I’m not saying that they would do it on purpose, or that they wouldn’t be trying to do the darndest to make LibreSSL a good replacement for OpenSSL. What I’m saying is that I don’t like the way, and the motives, the project was started from. And I think that a reality check, like the one they already got, was due and a good news.

On my side, once the library gets a bit more mileage I’ll be happy to run the tinderbox against it. For now, I’m re-gaining access to Excelsior after a bad kernel update, and I’ll just go and search with elfgrep for which binaries do use the egd functionalities and need to be patched, I’ll post it on Twitter/G+ once I have it. I know it’s not much, but this is what I can do.

A new XBMC box

A couple of months ago I was at LinuxTag in Berlin with the friends from VIdeoLAN and we shared a booth with the XBMC project. It was interesting to see the newest version of XBMC running, and I decided that it was time for me to get a new XBMC box — last time I used XBMC was on my AppleTV and while it was not strictly disappointing it was not terrific either after a while.

At any rate, we spoke about what options are available nowadays to make a good XBMC set up, and while the RaspberryPi is all the rage nowadays, my previous experience with the platform made it a no-go. It also requires you to find a place where to store your data (the USB support on the Pi is not good for many things) and you most likely will have to re-encode animes to the Right Format™ so that the RPi VideoCore can properly decode them: anything that can’t be hardware-accelerated will not play on such a limited hardware.

The alternative has been the Intel NUC (Next Unit of Computing), which Intel sells in pre-configured “barebone” kits, some of which include wifi antennas, 2.5” disk bays, and a CIR (Consumer Infrared Receiver) that allows you to use a remote such as the one for the XBox 360 to control the unit. I decided to look into the options and I settled on the D54250WYKH which has a Core i5 CPU, space for both a wireless card (I got the Intel 7260 802.11ac which is dual-radio and supports the new 11ac protocol, even though my router is not 11ac yet), and a mSATA SSD (I got a Transcend 128GB one), as well the 2.5” bay that allows me to use a good old spinning-rust harddrive to store the bulk of the data.

Be careful and don’t repeat my mistake! I originally ordered a very cool Western Digital Caviar Green 2TB HDD but while it is a 2.5” HDD, it does not fit properly in the provided cradle; the same problem used to happen with the first series of 1TB HDDs on PlayStation 3s. I decided to keep the HDD and bring it with me to Ireland, as I don’t otherwise have a 2TB HDD, instead I opted for a HGST 1.5TB HDD (no link for this one as I bought it at Fry’s the same day I picked up the rest, if nothing else because I had no will to wait, and also because I forgot I needed a keyboard).

While I could have just put OpenELEC on the device, I decided instead to install my trusted Gentoo — a Core i5 with 16GB of RAM and a good SSD is well in its ability to run it. And since I was finally setting something up that needs (for myself) to turn on very quickly, I decided to give systemd a go (especially as Robbins is now considered a co-maintainer for OpenRC which drains all my will to keep using it). The effect has been stunning, but there are a few issues that needs to be ironed out; for instance, as far as I can tell, there is no unit for rngd which means that both my laptop (now converted to systemd) and the device have no entropy, even though they both have the rdrand instruction; I’ll try to fix this lack myself.

Another huge problem for me has been getting the audio to work; while I’ve been told by the XBMC people that the NUC are perfectly well supported, I couldn’t for the sake of me get the audio to work for days. At the end it was Alexander Patrakov who pointed out to intel_iommu=on,igfx_off as a kernel option to get it to work (kernel bug #67321 still unfixed). So if you have no HDMI output on your NUC, that’s what you have to do!

Speaking about XBMC and Gentoo, the latest version as of last week (which was not the latest upstream version, as a new one got released exactly while I was installing the box), seem to force you to install FFmpeg over libav – I honestly felt a bit sorry for the developers of XBMC at LinuxTag while they were trying to tell me how the multi-threaded h264 decoder from FFmpeg is great… Anton, who wrote it, is a libav developer! – but even after you do that, it seems like it does not link it in, preferring a bundled copy of it instead. Which also doesn’t seem to build support for multithread (uh?). This is something that I’ll have to look into once I’m back in Dublin.

Other than that, there isn’t much to say; the one remaining big issue is to figure out how to properly have XBMC start up at boot without nasty autologin hacks on systemd. And of course finding a better way than using a transmission user to start the Transmission daemon, or at least find a better way to share the downloads with XBMC itself. Probably separating the XBMC and Transmission users is a good idea.

Expect more posts on what’s going on with my XBMC box in the future, and take this one as a reference about the NUC audio issue.