What does #shellshock mean for Gentoo?

Gentoo Penguins with chicks at Jougla Point, Antarctica
Photo credit: Liam Quinn

This is going to be interesting as Planet Gentoo is currently unavailable as I write this. I’ll try to send this out further so that people know about it.

By now we have all been doing our best to update our laptops and servers to the new bash version so that we are safe from the big scare of the quarter, shellshock. I say laptop because the way the vulnerability can be exploited limits the impact considerably if you have a desktop or otherwise connect only to trusted networks.

What remains to be done is to figure out how to avoid this repeats. And that’s a difficult topic, because a 25 years old bug is not easy to avoid, especially because there are probably plenty of siblings of it around, that we have not found yet, just like this last week. But there are things that we can do as a whole environment to reduce the chances of problems like this to either happen or at least avoid that they escalate so quickly.

In this post I want to look into some things that Gentoo and its developers can do to make things better.

The first obvious thing is to figure out why /bin/sh for Gentoo is not dash or any other very limited shell such as BusyBox. The main answer lies in the init scripts that still use bashisms; this is not news, as I’ve pushed for that four years ago, while Roy insisted on it even before that. Interestingly enough, though, this excuse is getting less and less relevant thanks to systemd. It is indeed, among all the reasons, one I find very much good in Lennart’s design: we want declarative init systems, not imperative ones. Unfortunately, even systemd is not as declarative as it was originally supposed to be, so the init script problem is half unsolved — on the other hand, it does make things much easier, as you have to start afresh anyway.

If either all your init scripts are non-bash-requiring or you’re using systemd (like me on the laptops), then it’s mostly safe to switch to use dash as the provider for /bin/sh:

# emerge eselect-sh
# eselect sh set dash

That will change your /bin/sh and make it much less likely that you’d be vulnerable to this particular problem. Unfortunately as I said it’s mostly safe. I even found that some of the init scripts I wrote, that I checked with checkbashisms did not work as intended with dash, fixes are on their way. I also found that the lsb_release command, while not requiring bash itself, uses non-POSIX features, resulting in garbage on the output — this breaks facter-2 but not facter-1, I found out when it broke my Puppet setup.

Interestingly it would be simpler for me to use zsh, as then both the init script and lsb_release would have worked. Unfortunately when I tried doing that, Emacs tramp-mode froze when trying to open files, both with sshx and sudo modes. The same was true for using BusyBox, so I decided to just install dash everywhere and use that.

Unfortunately it does not mean you’ll be perfectly safe or that you can remove bash from your system. Especially in Gentoo, we have too many dependencies on it, the first being Portage of course, but eselect also qualifies. Of the two I’m actually more concerned about eselect: I have been saying this from the start, but designing such a major piece of software – that does not change that often – in bash sounds like insanity. I still think that is the case.

I think this is the main problem: in Gentoo especially, bash has always been considered a programming language. That’s bad. Not only because it only has one reference implementation, but it also seem to convince other people, new to coding, that it’s a good engineering practice. It is not. If you need to build something like eselect, you do it in Python, or Perl, or C, but not bash!

Gentoo is currently stagnating, and that’s hard to deny. I’ve stopped being active since I finally accepted stable employment – I’m almost thirty, it was time to stop playing around, I needed to make a living, even if I don’t really make a life – and QA has obviously taken a step back (I still have a non-working dev-python/imaging on my laptop). So trying to push for getting rid of bash in Gentoo altogether is not a good deal. On the other hand, even though it’s going to be probably too late to be relevant, I’ll push for having a Summer of Code next year to convert eselect to Python or something along those lines.

Myself, I decided that the current bashisms in the init scripts I rely upon on my servers are simple enough that dash will work, so I pushed that through puppet to all my servers. It should be enough, for the moment. I expect more scrutiny to be spent on dash, zsh, ksh and the other shells in the next few months as people migrate around, or decide that a 25 years old bug is enough to think twice about all of them, o I’ll keep my options open.

This is actually why I like software biodiversity: it allows to have options to select different options when one components fail, and that is what worries me the most with systemd right now. I also hope that showing how bad bash has been all this time with its closed development will make it possible to have a better syntax-compatible shell with a proper parser, even better with a proper librarised implementation. But that’s probably hoping too much.

Project health, and why it’s important — part of the #shellshock afterwords

Tech media has been all the rage this year with trying to hype everything out there as the end of the Internet of Things or the nail on the coffin of open source. A bunch of opinion pieces I found also tried to imply that open source software is to blame, forgetting that the only reason why the security issues found had been considered so nasty is because we know they are widely used.

First there was Heartbleed with its discoverers deciding to spend time setting up a cool name and logo and website for, rather than ensuring it would be patched before it became widely known. Months later, LastPass still tells me that some of the websites I have passwords on have not changed their certificate. This spawned some interest around OpenSSL at least, including the OpenBSD fork which I’m still not sure is going to stick around or not.

Just few weeks ago a dump of passwords caused major stir as some online news sources kept insisting that Google had been hacked. Similarly, people have been insisting for the longest time that it was only Apple’s fault if the photos of a bunch of celebrities were stolen and published on a bunch of sites — and will probably never be expunged from the Internet’s collective conscience.

And then there is the whole hysteria about shellshock which I already dug into. What I promised on that post is looking at the problem from the angle of the project health.

With the term project health I’m referring to a whole set of issues around an open source software project. It’s something that becomes second nature for a distribution packager/developer, but is not obvious to many, especially because it is not easy to quantify. It’s not a function of the number of commits or committers, the number of mailing lists or the traffic in them. It’s an aura.

That OpenSSL’s project health was terrible was no mystery to anybody. The code base in particular was terribly complicated and cater for corner cases that stopped being relevant years ago, and the LibreSSL developers have found plenty of reasons to be worried. But the fact that the codebase was in such a state, and that the developers don’t care to follow what the distributors do, or review patches properly, was not a surprise. You just need to be reminded of the Debian SSL debacle which dates back to 2008.

In the case of bash, the situation is a bit more complex. The shell is a base component of all GNU systems, and is FSF’s choice of UNIX shell. The fact that the man page states clearly It’s too big and too slow. should tip people off but it doesn’t. And it’s not just a matter of extending the POSIX shell syntax with enough sugar that people take it for a programming language and start using them — but that’s also a big problem that caused this particular issue.

The health of bash was not considered good by anybody involved with it on a distribution level. It certainly was not considered good for me, as I moved to zsh years and years ago, and I have been working for over five years years on getting rid of bashisms in scripts. Indeed, I have been pushing, with Roy and others, for the init scripts in Gentoo to be made completely POSIX shell compatible so that they can run with dash or with busybox — even before I was paid to do so for one of the devices I worked on.

Nowadays, the point is probably moot for many people. I think this is the most obvious positive PR for systemd I can think of: no thinking of shells any more, for the most part. Of course it’s not strictly true, but it does solve most of the problems with bashisms in init scripts. And it should solve the problem of using bash as a programming language, except it doesn’t always, but that’s a topic for a different post.

But why were distributors, and Gentoo devs, so wary about bash, way before this happened? The answer is complicated. While bash is a GNU project and the GNU project is the poster child for Free Software, its management has always been sketchy. There is a single developer – The Maintainer as the GNU website calls him, Chet Ramey – and the sole point of contact for him are the mailing lists. The code is released in dumps: a release tarball on the minor version, then every time a new micro version is to be released, a new patch is posted and distributed. If you’re a Gentoo user, you can notice this as when emerging bash, you’ll see all the patches being applied one on top of the other.

There is no public SCM — yes there is a GIT “repository”, but it’s essentially just an import of a given release tarball, and then each released patch applied on top of it as a commit. Since these patches represent a whole point release, and they may be fixing different bugs, related or not, it’s definitely not as useful has having a repository with the intent clearly showing, so that you can figure out what is being done. Reviewing a proper commit-per-change repository is orders of magnitude easier than reviewing a diff in code dumps.

This is not completely unknown in the GNU sphere, glibc has had a terrible track record as well, and only recently, thanks to lots of combined efforts sanity is being restored. This also includes fixing a bunch of security vulnerabilities found or driven into the ground by my friend Tavis.

But this behaviour is essentially why people like me and other distribution developers have been unhappy with bash for years and years, not the particular vulnerability but the health of the project itself. I have been using zsh for years, even though I had not installed it on all my servers up to now (it’s done now), and I have been pushing for Gentoo to move to /bin/sh being provided by dash for a while, at the same time Debian did it already, and the result is that the vulnerability for them is way less scary.

So yeah, I don’t think it’s happenstance that these issues are being found in projects that are not healthy. And it’s not because they are open source, but rather because they are “open source” in a way that does not help. Yes, bash is open source, but it’s not developed like many other projects in the open but behind closed doors, with only one single leader.

So remember this: be open in your open source project, it makes for better health. And try to get more people than you involved, and review publicly the patches that you’re sent!

Limiting the #shellshock fear

Today’s news all over the place has to do with the nasty bash vulnerability that has been disclosed and now makes everybody go insane. But around this, there’s more buzz than actual fire. The problem I think is that there are a number of claims around this vulnerability that are true all by themselves, but become hysteria when mashed together. Tip of the hat to SANS that tried to calm down the situation as well.

Yes, the bug is nasty, and yes the bug can lead to remote code execution; but not all the servers in the world are vulnerable. First of all, not all the UNIX systems out there use bash at all: the BSDs don’t have bash installed by default, for instance, and both Debian and Ubuntu have been defaulting to dash for their default shell in years. This is important because the mere presence of bash does not make a system vulnerable. To be exploitable, on the server side, you need at least one of two things: a bash-based CGI script, or /bin/sh being bash. In the former case it becomes obvious: you pass down the CGI variables with the exploit and you have direct remote code execution. In the latter, things are a tad less obvious, and rely on the way system() is implemented in C and other languages: it invokes /bin/sh -c {thestring}.

Using system() is already a red flag for me in lots of server-side software: input sanitization is essential in that situation, as otherwise passing user-provided strings in a system() call makes it trivial to implement remote code execution, think of a software using system("convert %s %s-thumb.png") with an user provided string, and let the user provide ; rm -rf / ; as their input… can you see the problem? But with this particular bash bug, you don’t need user-supplied strings to be passed to system(), the mere call will cause the environment to be copied over and thus the code executed. But this relies on /bin/sh to be bash, which is not the case for BSDs, Debian, Ubuntu and a bunch of other situations. But this also requires for the user to be able to change the environment variable.

This does not mean that there is absolutely no risk for Debian or Ubuntu users (or even FreeBSD, but that’s a different problem): if you control an environment variable, and somehow the web application invokes (even indirectly) a bash script (through system() or otherwise), then you’re also vulnerable. This can be the case if the invoked script has #!/bin/bash explicitly in it. Funnily enough, this is how most clients are vulnerable to this problem: the ISC DHCP client dhclient uses a helper script called dhclient-script to set some special options it receives from the server; at least in the Debian/Ubuntu packages of it, the script uses #!/bin/bash explicitly, making those systems vulnerable even if their default shell is not bash.

But who seriously uses CGI nowadays in production? Turns out that a bunch of people using WordPress do, to run PHP — and I”m sure there are scripts using system(). If this is a nail on the coffin of something, my opinion is that it should be on the coffin of the self-hosting mantra that people still insist on.

On the other hand, the focus of the tech field right now is CGI running in small devices, like routers, TVs, and so on so forth. It is indeed the case that in the majority of those devices implement their web interfaces through CGI, because it’s simple and proven, and does not require complex web servers such as Apache. This is what scared plenty of tech people, but it’s a scare that has not been researched properly either. While it’s true that most of my small devices use CGI, I don’t think any of them uses bash. In the embedded world, the majority of the people out there wouldn’t go near bash with a 10’ pole: it’s slow, it’s big, and it’s clunky. If you’re building an embedded image, you probably have already busybox around, and you may as well use it as your shell. It also allows you to use the in-process version of most commands without requiring a full fork.

It’s easy to see how you go from A to Z here: “bash makes CGI vulnerable”, “nearly all embedded devices use CGI” become “bash makes nearly all embedded devices vulnerable”. But that’s not true, as SANS points out, it’s a minimal part of the devices that is actually vulnerable to this attack. Which does not mean the attack is irrelevant. It’s important, and it should tell us many things.

I’ll be writing again regarding “project health” and talking a bit more about bash as a project. In the mean time, make sure you update, don’t believe the first news of “all fixed” (as Tavis pointed out that the first fix was not thought-out properly) and make sure you don’t self-host the stuff you want to keep out of the cloud in a server you upgrade once an year.

Polishing init scripts

One of the nicest features of OpenRC/Baselayout 2 (that sooner or later will hit stable I’m sure) is that you can replace bash (which is slow, as its own documentation admits) with a faster, slimmer pure POSIX shell. Okay, so maybe Fedora is now moving away from shells altogether and I guess I can see why from some point of views; but as Nirbheek said it’s unlikely to be suitable for all use cases; in particular, our init system tends to work perfectly fine as is for servers and embedded systems, so … I don’t see the reason we should be switching there. Adding support for dash or any other POSIX sh-compatible fast shell is going to be a win-win situation there — do note that you still need bash to run ebuilds, though!

Now, you can use dash already for the basic init scripts provided by OpenRC and Baselayout, but all the packages need to install proper Posix SH-compatible init scripts if you want to use it with them. Thankfully a number of users seem to care about that, such as Davide last year and Kai right now.

But POSIX compatibility is not the only thing we should actually look out for in our init scripts:

  • Eray pointed out that some scripts don’t re-create their /var/run subdirectories which is indeed a problem that should go fixed at some point; I had similar bad issues with running my own router based on Gentoo;
  • one too often misused parameter of start-stop-daemon is --quiet which can be… way too quiet; if it’s passed, you’re not going to receive any output at all if the daemon you’ve tried to start fails; and that is a problem;
  • there are problems with the way the system-services PAM chain is passed through so that limits are not respected (and if that’s the case, caps wouldn’t be respected coming from PAM either);
  • the way LXC works, init scripts looking just at the process’s name could cause a guest’s daemon to stop if the host’s is closed… this is mostly exercised by killall but also start-stop-daemon when not given a pidfile (and rather just an executable) will have the same problem; and the same goes for pkill, goes without saying.

These are a number of polishing tasks that are by all count minors and not excessively important but… if you’ve free time and care about Gentoo on LXC, embedded or with fast startup, you might want to look into those. Just saying!

More tinderboxing, more analysis, more disk space

Even though I had a cold I’ve kept busy in the past few days, which was especially good because today was most certainly Monday. For the sake of mental sanity, I’ve decided a few months ago that the weekend is off work for me, and Monday is dedicated at summing up what I’m going to do during the rest of the week, sort of a planning day. Which usually turns out to mean a lot of reading and very little action and writing.

Since I cannot sleep right now (I’ll have to write a bit about that too), I decided to start with the writing to make sure the plans I figured out will be enacted this week. Whih is especially considerate to do considering I also had to spend some time labelling, as usual this time of the year. Yes I’m still doing that, at least until I can get a decent stable job. It works and helps paying the bills at least a bit.

So anyway, you might have read Serkan’s post regarding the java-dep-check package and the issus that it found once run on the tinderbox packages. This is probably one of the most interesting uses of the tinderbox: large-scale testing of packages that would otherwise keep such a low profile that they would never come out. To make more of a point, the tinderbox is now running with the JAVA_PKG_STRICT variable set so that the Java packages will have extra checks and would be much more safely tested on the tree.

I also wanted to add further checks for bashisms in the configure script. This sprouted from the fact that, on FreeBSD 7.0, the autoconf-generated configure script does not discard the /bin/sh shell any longer. Previously, the FreeBSD implementation was discarded because of a bug, and thus the script re-executed itself using bash instead. This was bad (because bash, as we should really well know, is slow) but also good (because then all the scripts were executed with the same shell on both Linux and FreeBSD). Since the bug is now fixed, the original shell is used, which is faster (and thus good); the problem is that some projects (unieject included!) use bashisms that will fail. Javier spent some time trying to debug the issue.

To check for bashisms, I’ve used the script that Debian makes available. Unfortunately the script is far from perfect. First of all it does not really have an easy way to just scan a subtree for actual sh scripts (using egrep is not totally fie since autoconf m4 fragments often have the #!/bin/sh string in them). Which forced me to write a stupid, long and quite faulty script to scan the configure files.

But even worse, the script is full of false positives: instead of actually parsing its semantics, it only scans for substrings. For instance it identified the strange help output in gnumeric as a bash-specific brace expansion, when it was in an HEREDOC string. Instead of this method, I’d probably take a special parameter in bash that tells the interpreter to output warnings about bash-specific features, maybe I should write it myself.

But I think that there are some things that should be addressed in a much different way than the tinderbox itself. Like I have written before, there are many tests that should actually be executed on source code, like static analysis of the source code, and analysis of configure scripts so to fix issues like canonical targets when they are not needed, or misaligned ./configure --help output, and os on so forth. This kind of scans should not be applied only to released code, but more importantly on the code still in the repositories, so that the issues can be killed before the released code.

I had this idea when I went to look for different conditions on Lennart’s repositories (which are as usually available on my own repositories with changes and fixes and improvements on the buildsystem – a huge thanks to Lennart for allowing me to be his autotools-meister). By build-checking his repositories before he makes release I can ensure the released code works for Gentoo just fine, instead of having to patch it afterwards and queue the patch for the following release. It’s the step beyond upstreaming the patches.

Unfortunately this kind of work is not only difficult because it’s hard to write static analysis software that gets good results; US DHS-founded Coverity Scan, although lauded by people like Andrew Morton, had tremendously bad results in my opinion with xine-lib analysis: lots of issues were never reported, and the ones reported were often enough either false positives or were inside the FFmpeg code (which xine-lib used to import); and the code was almost never updated. If it didn’t pick up the change to the Mercurial repository, that would have been understandable, I don’t pretend them to follow the repository moves of all the projects they analyse, but the problem was there since way before the move. And it also reported every and each day the same exact problems, repeated over and over; for a while I tried to keep track of them and marked hte ones we already dealt with or which were false positives or were parts of FFmpeg (which may even have been fixed already).

So one thing to address is to have an easy way to keep track of various repositories and their branches, which is not so easy since all SCM programs have different ways to access the data. Ohloh Open Hub has probably lots of experience with that, so I guess that might be a start; it has to be considered, though, that Open Hub only supports the three “major” SCM products, GIT, Subversion and the good old CVS, which means that extending it to any repository at all is going to take a lot more work, and it had quite a bit of problems with accessing Gentoo repositories which means that it’s certainly not fault-proof. And even if I was able to hook up a similar interface on my system. it would probably require much more disk space that I’m able to have right now.

For sure now the first step is to actually write the analysis script that first checks the build logs (since anyway that would already allow to have some results, once hooked up with the tinderbox), and then find a way to identify some of the problems we most care about in Gentoo from static analysis of the source code. Not an easy task or something that can be done in spare time so if you got something to contribute, please do, it would be really nice to get the pieces of the puzzle up.

bash scripting tiny details

Although I’m now an ebuild developer for almost two years, and I contributed for at least another year through bugzilla, I never considered myself a bash expert; the functions I use are mostly the generics, a bit more advanced than a newbie usage, as often needed in ebuilds, so from time to time, when I learn some new trick that others known since ages before, or I discuss about alternatives with other developers, I usually end up posting here trying to share it with others that might find it useful.

As autoepatch is being written entirely in bash, I end up coping with some problems or corner cases that I need to dig around and thus I ended up learning some more tricks, and some things I’m thinking about for the ebuilds themselves.

The first thing, is about sed portability.. we already have made sure that “sed” called in ebuild scope is always GNU sed 4, so that the command lines supported are the same everywhere; a portable alternative, and seems also to be a faster alternative: perl. The command “perl -p -i -e” is a workalike replacement for “sed -i -e”, that as far as I can see is also faster than sed.. I wonder if, considering we already have perl in base system, it would be viable to use it as an alternative to sed throughout the Portage tree.

For find(1) we instead rely on a portable subset of commands, so that we don’t ask Gentoo/*BSD users to install GNU findutils (that also often breaks on BSD too); one of the most used features of find in ebuilds is -print0 to then run through xargs to run some process on a list of files. Timothy (drizzt) already suggested some time ago to use -exec cmd {} + instead, as that merges the xargs behaviour in find itself, avoiding one process spawning and a pipe. Unfortunately, this feature, designed in SUSv3, is present in FreeBSD, DragonFlyBSD and NetBSD, but not on OpenBSD… for autoepatch (where I’m going to use that feature pretty often as it all comes down to find to, well, find the targets) I decided that the find command used has to support that feature, so to run on OpenBSD it will have to depend on GNU findutils (until they implement it). I wonder if this could be told of the whole Portage and then replace the many xargs calls in ebuilds…

I should ask reb about this latter thing but he is, uh, disappeared :/ seems like Gentoo/OpenBSD is one of those projects where people involved end up disappearing or screaming like crazy (kinda remind me of PAM).

Talking about the MacBookPro: today I prepared an ebuild for mactel-sources in my overlay, that I’m going to commit now, that takes gentoo-sources and applies the mactel patches over it, it’s easier to handle for me in the long run; this way, the Synaptics driver for the touchpad actually worked fine. Unfortunately, KSynaptics made the touchpad go crazy, so I suggest everybody NOT to try it, as it is now.

Today’s little tricks

So, although I should be in break, also today I ended up doing some kind of work, a part updating the ALSA guide as I told before today, I’ve also taken a deeper look into my backlogs, and I found that I still had a few packages that needed to be fixed because they were stripping binaries.

As I was doing that, I wanted to provide a few useful pointers that might be not that obvious and I found interesting to know.

The first is a bash trick, or better a particular way to use bash that might come handy. Many people often use something like

for file in $(find . -type f); do stuff; done

I seen that used often also in ebuilds. Although it works, there are often better ways to do this (like using sed -i -e on multiple files instead of using a for to get file-by-file and use sed -e > tmpfile and so on). In those cases when the cycle is actually needed, there’s something that might come handy to know tho.

I found this myself when running the cycle that gets the list of files that are installed with stripped debug information (thing that makes my efforts to get the system fully debuggable useless :P). It was a for that used scanelf’s output, and it stuck waiting for scanelf to complete before starting looking at the files, probably because I was invoking a subshell with $().
If instead of doing that I do this:

scanelf -k '!.symtab' -F '#k%F' -qRB /usr/lib/debug 
   | while read file; do qfile -C ${file}; done

and thus use a pipe, I start getting output even if scanelf didn’t complete.
I know this might be obvious to seasoned bash coders, but it’s probably a neat trick to know for the newbies.

The other is not exactly a trick but an information that I should probably add to the backtraces guide. Some time ago I asked to solar about some files that didn’t get the debug info copied in /usr/lib/debug before strip. The problem was difficult to reproduce and he wasn’t actually sure what the cause was.
Well I can say now that the problem was either in GCC or in Binutils, as some packages like procps that had no debug information are now correctly copied. The same happened with some of the gcc’s libraries (although the c++ compiler still has debug information stripped). So if you ever see an “Invalid Operation” error from objcopy during stripping, you have to know that we probably can’t do much about that.