Usable security: the sudo security model

Edit: here’s the talk I was watching while writing this post, for context.

I’m starting writing this while I’m at Enigma 2016 listening to the usable security track. I think it’s a perfectly good time to start talk publicly about my experience trying to bring security to a Unix-based company I worked for before.

This is not a story of large company, the company I worked for was fairly small, with five people working in it at the time I was there. I’ll use “we” but I will point out that I’m no longer at that company and this is all in the past. I hope and expect the company to have improved their practices. When I joined the company, it was working on a new product, which meant we had a number of test servers running within the office and only one real “production” server for this running in a datacenter. In addition to the new product, a number of servers for a previous product were in production, and a couple of local testing servers for these.

While there was no gaping security hole for the company (otherwise I wouldn’t even be talking about it!) the security hygiene in the company was abysmal. We had an effective sysadmin at the office for the production server, and an external consultant to manage the network, but the root password (yes singular) of all devices was also known to the owner of the company, who also complained when I told them I wouldn’t share my password.

One of the few things that I wanted to set up there was a stronger authentication and stopping people from accessing everything with root privileges. For that stepping stone I ended up using, at least for the test servers (I never managed to put this into proper production), sudo.

We have all laughed at sudo make me a sandwich but the truth is that it’s still a better security mode than running as root, if used correctly. In particular, I did ask the boss what they wanted to run as root, and after getting rid of the need for root for a few actions that could be done unprivileged, I set up a whitelist of commands that their user could run without password. They were mostly happy not to have to login as root, but it was still not enough for me.

My follow-up ties to the start of this article, in particular the fact I started writing this while listening to Jon Oberheide. What I wanted to achieve was having an effective request for privilege escalation to root — that is, if someone were to actually find the post-it with the owner’s password they wouldn’t get access to root on any production system, even though they may be able to execute some (safe) routine tasks. At the time, my plan involved using Duo Security and a customized duo_unix so that a sudo request for any non-whitelisted command (including sudo -i) would require confirmation to the owner’s phone. Unfortunately at the time this hit two obstacles: the pull request with the code to handle PAM authentication for sudo was originally rejected (I’m not sure what the current state of that is, maybe it can be salvaged if it’s still not supported) and the owners didn’t want to pay for the Duo license – even just for the five of us, let alone providing it as a service to customers – even though my demo did have them quite happy about the idea of only ever needing their own password (or ssh key, but let’s not go there for now.)

This is just one of many things that were wrong in that company of course, but I think it shows a little bit that even in the system administration work, sometimes security and usability do go hand in hand, and a usable solution can make even a small company more secure.

And for those wondering, no I’m in no way affiliate with Duo, I just find it a good technology and I’m glad Dug pointed me at it a while back.

The security snakeoil

I said I’m basically doing the job of a network and system administrator here in LA — although you could say I have been doing that for the past four years as well, with the main difference that in the past I would have only a few Unix systems to administer, and mostly I had to deal with way too many Windows boxes.

Anyway the interesting part of doing this job is having to deal with the strange concepts of security that customers (and previous administrators) have of security. I won’t enter into much details, but I’ll talk about a few of the common snakeoil that is being sold for security.

First let’s start with the mighty DROP — it seems to be common choice for most network administrators to set up their systems to DROP all the packets that are not to be delivered to their end destination. Why is this done, is up to debate; some insist that it’s to not let known to a scanner that there even is a host there — which only makes sense if there is no service at all on the system that needs to be accessible, in which case we’re talking about different issues. Another alternative story is that it makes it more difficult to discern between services that are available and those who aren’t — Peter Benie analyzed this better than I can do here.

So the only effective reason to use DROP is to reduce the amount of packets to process and send back during a DDoS — which might be something. On the other hand, if there is even one service that is open, they’ll just exploit that one for the DDoS, not multiple ports. On the other hand, using the DROP rule makes it harder to diagnose network problems for other administrators that are just trying to do their job. What is my solution? Rate-limit: you’ll start dropping packets after the host starts actually trying to flood you, this way the usual diagnostics just work correctly.

Then there is the idea that you can solve everything by putting it behind a firewall and a VPN. Somehow there is this silly idea floating around that the moment when you add a firewall in front of your cabinet, and then use a VPN to tunnel from your office, everything is, in one move, capable of being either trusted or not trusted. Bonus points when the firewall’s only task is to avoid “netscans” and otherwise just do port whitelisting for a bunch of Linux servers.

The problem starts appearing the moment you’re expecting to be able over 100Mbps … and you end up instead being bottle-necked by a 40Mbps firewall which is not even using packet inspection! And the snakeoil here is expecting that there is a huge difference between a server with only one whitelisted port on the firewall and the same server with the same port whitelisted by iptables — no there isn’t a policy on the outgoing connections, so don’t ask. Of course anything behind the VPN is then handled like a completely trusted network, with shallow password, if any at all, and no second-factor authentication.

Finally there is what I all hyper-security — the same kind of thing that enthuses OpenBSD users: lock everything down as tight as you might. “Hey it’s not difficult, you just spend time once!” I heard it so many times by now… Nobody is saying that it’s a good idea to ignore security issues and known vulnerabilities, but most of the time you have to come up with a good mediation between security and usability. If your security is so strong that you can’t do anything useful, then it’s bad security.

It’s not just about what you can do with the system — especially for servers it’s pretty trivial to lock it down to exactly just what you need and nothing more, so I can understand one willing to do so. The problem is that unless you can afford to re-validate all the checks every time you have to add something new, the chances are that with maintenance, you can easily slip up and forget about something, either breaking the services you run, or breaching security altogether.

Here’s the main catch: if it’s a system that only has one user, and will die with you, tightening it down is easy and can work, if you’re not an idiot — but if you feel smug because others accept compromises for the sake of not being a human single point of failure, stop it because you’re just the kind of scum that causes so many people to say “pfft” to security. You can actually have two very easy examples for it in Windows Vista and Fedora.

The former started asking you to confirm every single operation, to the point that users ended up click “Ok” without reading what it asked

  • Do you want to install the drivers for the printer you just bought? Ok.
  • Do you want to install the operating system updates? Ok.
  • Do you want to install Firefox? Ok.
  • Do you want to upgrade Firefox? Ok.
  • Do you want to install Microsoft Office? Ok.
  • Do you want to update Microsoft Office? Ok.
  • Do you want to install Adobe Reader? Ok.
  • Do you want to install this little game you just downloaded? Ok.
  • Do you want to install this driver that sniffs your network traffic? Ok.

I’ve seen malware actually requiring a confirmation, easy to actually notice if you look at the window, but usually just discarded by clicking “Ok” without looking.

For what concerns Fedora instead — you probably are familiar with the decision on RedHat’s part of enabling SELinux by default on their installs, since a very long time ago. And I tend to agree that it’s a good thing; the problem is that most people don’t understand how to work with SELinux, and lacking simple ways to do what they need to do in simple ways, they just decide to turn SELinux off altogether — this is the same problem with Nagios plugins and sudo where the documentation just tells you to give them sudo access to everything.

I could probably write a lot more about this kind of situations but I don’t think it’s worth my time. I just think that people insisting on hyper-security are detrimental to the overall security of the population, as they scare people away.

Nagios, Icinga, and a security nightmare

If you’ve followed my blog in the past few weeks, I’ve been doing quite some work between Munin and the Nagios packaging (I leave Icinga to prometheanfire!), as well as working closely with Munin upstream by feeding them patches — yesterday I actually got access to the Munin contrib repository so now I can help make sure that the plugins reach a state where they can actually be redistributed and packaged.

I also spent some time clearing up what was once called nagios-nsca and nagios-nrpe (which are now just nsca and nrpe since they work just fine with Icinga as well, and the nagios- part was never part of the upstream names anyway; kudos to Markos for noticing I didn’t check correctly for revdeps, by the way) — now you got minimal USE flags that you can turn on to avoid building their respective daemon, although you have to remember that you have to enable minimal for nsca on the nodes, and for nrpe on the master. They also both come with new init scripts that are simplified and leverage the new functionalities in OpenRC.

There is though something that is not making me sleep well — and that’s beside the usual problems I have with sleeping. Let me try to explain.

Some of the Nagios/Icinga tests can’t be executed remotely as they are, obviously — things like S.M.A.R.T. monitoring need to be executed on the box they have to monitor obviously. So how do you fix this? You use nrpe — the Nagios Remote Plugin Executor. This is basically a daemon that is used to execute commands on the node (more or less the way Munin’s node works). Unfortunately, unlike Munin, both Icinga proper and NRPE don’t allow you to choose on a per-plugin basis which user to use (to do so, Munin has its node running as root).

Instead, everything is executed by the nagios user, and if you need to access something that the user can’t access, you can work it around by using a setuid-root plugin (these are tied to the suid USE flag for nagios-plugins in Gentoo). But this, of course, only works for binaries, not scripts. And here’s the first problem: to check the S.M.A.R.T. status of an IDE drive, you can use the check_ide_smart tool that reimplements the whole protocol… to check the status of a SATA drive you should use check_smart.pl that uses SmartMonTools to take care of it.

But how can the script access the disk? Well, it does it in the simplest way: it uses sudo. Of course this means that the nagios user has to have sudo access… afraid that this would get people to give unconditional sudo access to the nagios user, I decided to work it around by installing my own configuration file for sudo in the ebuild, making use of the new /etc/sudoers.d folder, which means that on a default install, just the commands that are expected will be allowed for the nagios user. And this is good.

But sometimes the plugins themselves don’t seem to care about using sudo directly; instead they rely on being executed with an user that has enough privileges; for this reason, the nrpe configuration allows you to prefix all commands with any command of your choice, with the default being… sudo! And their documentation suggest to make sure that the user running nrpe does not have write access to the directory to avoid security issues… you can understand that it’s not the only bad idea you could have, there.

Sigh, this stuff is a complete security nightmare, truly.

There’s a new sudo in town

It was last month that I noticed the release of the new series of Sudo (1.8), which brought a number of changes in the whole architecture of the package, starting from a plugin-based architecture for the authentication systems. Unfortunately when I went to bump it, I decided to simply leave users updating to 1.7.5 and keep 1.8.0 in package.mask simply because I didn’t have time to solve the (many) quality issues the new version reported, starting from LDFLAGS handling.

This morning, during my usual work tour-de-force, I noticed the first release candidate of 1.8.1 series that is due to come out at the end of this week, so I decidd to finally get to work on the ebuild for 1.8 series so that it would work correctly. I didn’t expect that it would take me till over 6 in the morning to get it to work to an extent (and even then it wasn’t completely right!).

While looking into 1.8 series I also ended up getting the 1.7.6 ebuild updated to a more… acceptable state:

  • the infamous S/Key patch that I’ve been carrying along since I took over the package from Tavis didn’t apply on 1.8, mostly just for sources reorganisation; I ended up converting it in a proper autoconf-based check, so that it could be submitted upstream;
  • the recursive make process wasn’t right in the 1.8 releases, so even if the package failed to build, there was no indication of it to the caller, and thus the ebuild didn’t fail – this is probably something I could write about in the future, as I’ve seen it happening before as well – I was able to fix that and send it upstream, but it has also shown me that I was ignoring one of the regression tests failing altogether, which is worrisome to say the least;
  • since I had to re-run autotools, I ended up hitting a nasty bug with packages using autoconf but not using automake: we didn’t inject the search path of extra m4 files, even though the eclass was designed with AT_M4DIR in mind for that; it is now fixed, and the various autoconf/autoheader steps provide the proper path conditioning, so that even sudo can have its autotools build system updated.
  • while looking at the ebuilds, I decided to change both of them from EAPI=0 to EAPI=4; it didn’t make much of a difference by itself, but I was able to use REQUIRED_USE to express the requisite of not having both PAM and S/Key enabled at the same time — even though I didn’t get this right the first time around; thanks Zeev!
  • another piece of almost “cargo code” was left in the ebuild since I took it over, and it was in charge of adding a few variables to the list of variables to blacklist; since this was done through the use of sed, and just with the 1.8 series it stopped working, I never noticed that it was nowadays meaningless: all those variables were already in the sudo upstream sources; I was simply able to drop the whole logic — not to self: make sure never to rely so much on silently-ignored sed statements!
  • today while at it (I added rc1 this morning; moved to rc2 today after Todd pointed to me that the LDFLAGS issue was fixed there), I reviewed the secure path calculation, which now treats the gnat compiler paths just like those for gcc itself;
  • the sudo plugins are no longer installed in /usr/libexec but rather in the so-called pkglibdir (/usr/lib/sudo); this also required patching the build system.

I hope to be able to unmask sudo 1.8.1 to ~arch when it comes out. With a bit of luck, by that time, there won’t be any patch applied at all, since I sent all of them to Todd. And that would be the first version of sudo in a very long time built from the vanilla, unpatched, unmodified sources. And if you know me, I like situations like those!

Unfortunately, I have never used nor tested the LDAP support for sudo, so I’m pretty sure it doesn’t work with sudo 1.8 series in Gentoo. Pretty please if somebody has used that, let me know what needs to be fixed! Thanks.

Explaining the sudo crapfest

Some of you might have noticed that I’ve been attacked in the past few days over a feature change bug (now unrestricted so you all can judge it for what it was) with sudo.

Let’s try to explain what the problem seems to have been with that bug: the sudo ebuild, since a long time ago, when it was maintained by Tavis Ormandy (taviso, of, among other things, ocert fame), changed the default editor used to nano. The reasoning for this is that you have to have a valid default editor in sudo, and that, after all, is what we ship in the stage3 as default editor, and what OpenRC sets as default editor if the user sets nothing else. I kept the same setting because, well, I could see the point in it; and I’m not a nano user, I don’t have a single nano executable on my main running systems: I use GNU Emacs for all my “heavy” editing, and zile (of which I’m a minuscule contributor from time to time) for the smaller stuff.

This is just the default editor used by sudo, but by no definition the only one. Indeed it’s mostly used in two places: visudo and sudoedit. The former, respects the EDITOR variable, with the usual rules, so if your root user has the EDITOR variable set to anything it’ll be used (this is also not a default, we pass explicitly the --with-env-editor option at ./configure); the only place where it’s not used, by default that is, is if you invoke sudo visudo, but not because we set the default to nano, but rather because sudo by default resets the environment. For what concerns sudoedit, it honours the EDITOR variable by default without us doing anything about it.

Why did we ask sudo to explicitly honour the EDITOR variable? Isn’t that unsafe? Well, not really. First of all, sudoedit is a special command that takes care of allowing an user to edit a file without running anything as root user, so the EDITOR will always be fired up with the same permission as the user running it (pretty cool stuff); and for what concerns visudo… if you allow an user to run visudo, they can easily set up their user to do anything at all, so the permission under which EDITOR runs is pretty much moot.

So why did I refuse to use the EDITOR variable in the ebuild? Well the reason is that I hate that particular dirty hack (because a dirty hack it is). Other ebuilds, like fcron, have been doing it, and I don’t like its use there either. The reason is that it makes package building relying on a variable that is not by default saved anywhere, and that might not really be the same between the build and host machines (just as a further example, I have different EDITOR settings between root and my user… if I were to build sudo right now, with solar’s changes, with my usual method – sudo emerge with env_keep – I would get a build error: EDITOR="emacsclient -c"). If this is really a big concern, the proper solution is to have a proper virtual implementation of the editor: a /usr/libexec/gentoo-editor script that accepts just one parameter (a filename) and relies the call to the currently-selected editor via the EDITOR environment variable. Something like this:

#!/bin/sh

if [ $# != 1 ]; then
    echo "$0 accepts exactly one parameter." >/dev/stderr
    return 1
fi

if [ "x$EDITOR" = "x" ]; then
    echo "The EDITOR variable must be set." >/dev/stderr
    return 1
fi

exec "$EDITOR" "$1"

Note: I wrote this snippet right at my blog, I haven’t tested, checked, re-read it at all.

This would allow ebuilds to properly set a default (or even a forced default!) in their configuration stages, without getting in the way of users’ choices.

Now, this is going to require a bit of work; the reason for that is that you have to make sure that it works standalone, that it works with the packages that would like to use it, and most likely you want to make available a “safe-editor” script as well, to enable eventual safe modes where shell-escape (as well as opening random files) is disabled, for those applications that do open the editor as a different user. Since I have never seen much point in pushing this further (I have a long TODO list of stuff much more important for QA, Gentoo and users as a whole), I haven’t done anything about it before, and I didn’t want to start that. But hey if somebody wanted to contribute it, I would have been the first one making use of it.

Now, how much in the way of users I have been by passing --with-editor=/bin/nano in the ebuild? I would sincerely say very little. Do you want to change the default at build time? Take your “How Gentoo Works” handbook out and look up in the index the topic EXTRA_ECONF. It’s a variable. It allows you to pass further options to econf. Options to ./configure (which is what econf wraps around) are positional: the last one passed is the one that “wins”.

Too much work to use EXTRA_ECONF each time you merge sudo?

# mkdir -p /etc/portage/env/app-admin
# echo 'EXTRA_ECONF='$EXTRA_ECONF --with-editor="${EDITOR}"'' >> /etc/portage/env/app-admin/sudo

It really isn’t that dificult, is it?

And what really ticked me off is users, which, as precious as they are, are still just users (in this case not even a well-known power user) telling me what the “Gentoo way” is… I think I know pretty well the Gentoo way, given I’ve been a major developer for the most part of the past five years, I’m one of those trying his best to keep everything up to date and maintained, I’m the one who’s running his own workstation to make sure that Gentoo packages do build out of the box.

Some more Gentoo activity

Even though I’m spending most of my time working on paid jobs, I’ve returned active in Gentoo, although I’m mostly doing testbuilding for --as-needed lately. Yamato, with its 8-core horsepower, is still building tree, I left it before going to my bedroom to relax a bit that it was finishing the game-arcade catgory. I’ve been committing the most trivial stuff (missing deps, broken patches, and stuff like that), and opening bugs for the rest. The result is that the “My Bugs” search on Gentoo’s bugzilla reports over one thousand bugs.

Also, tonight I spent the night fixing the patches currently in tree so that absolute paths are replaced by relative ones, since epatch now fails if you try to apply a patch that has absolute paths (because when they don’t add up you’d be introducing subtle bugs that might apply to users but not to you). The result has been an almost tree-wide commit spree that sanitised the patches so that they won’t fail to apply for users. It was a long boring and manual job but it was completed, and of that I’m happy.

But it’s not all well. as Mike (vapier) pointed out, trying to just report all failures related to -Wl,--no-undefined is going to produce a huge amount of bugspam for false positives. In cases like zsh plugins, you really can’t do much more than passing -Wl,--undefined to disable the previous option. Which makes -Wl,--no-undefined too much of an hassle to be usable in the tree. On the other hand it’s stil an useful flag to ask upstream to adopt for their ow builds, so that there are no undefined symbols. I think I’ll doublecheck all the software I contribute to to add this flag (as a special exception, this flag needs to be used only on Linux, since on other systems it might well be a problem, for instance on BeOS dynamic patching is more than likely to cause problems, and on FreeBSD the pthread functions are not usually linked in libraries).

This, and the largefile problem I wrote about brings me to wonder what we can do to improve even further the symbiosis between Gentoo and the various upstream. I’m sure there are tons of patches in the tree that hasn’t been sent, and I’m afraid that --as-needed patching will cause even more to be introduced. I wonder if there could be volunteers that spend time checking package per package the patches so that they are sent upstream, checking the Unmaintained Free Software wiki so that if a package is not maintained by upstream anymore there are references to our patches if somebody wants to pick it up.

I could be doing this myself, but it takes time, and lately I haven’t had much; I could try to push myself further but I currently don’t see much of the point since I sincerely have had very little feedback from users lately, beside the small stable group of users who I esteem very much and who’s always around when I need help. Even just a kudo on ohloh would be nice to know my work is appreciated, you know. Anyway if you’re interested in helping with submitting stuff upstream, please try to contact me, so I can see to write down upstream references in the patches that we have in tree.

Also, since I started working “full time” on --as-needed issues, I had to leave behind some things like closing some sudo bugs and some PAM issues, like the one with OpenSSH and double lastlogin handling. I hope to resume those as soon as I free some of my time from paid jobs (hopefully having some money spare to pay for Yamato, which still ain’t completely paid for, and which by this pace is going to need new disks soon enough, considering the amount of space that the distfiles archive, as well as the built chroot, take. I actually hope that Iomega will send the replacement for my UltraMax soon since I wanted to move music and video on that external drive to free up a few hundreds gigabytes from the internal drives.

Once the build is completed, I’ll also have to spend time to optimise my ruby-elf tools to identify symbol collisions. I haven’t ran the tools in almost an year, and I’m sure that with all the stuff I mergd in that chroot, I’m going to have more interesting results than the one I had before. I already started looking for internal libraries, although just on account of exported symbols, which is, as you probably know, a very bland way to identify them. A much more useful way to identify those is by looking at the DWARF debug data in the files, with utilities like pfunct from the dwarves package. I haven’t built the chroot with debug information though, since otherwise it would have required much more on-disk space, and the system is already a few tens gigabytes, without counting portage, distfiles, packages, logs or build directories.

In the mean time even with the very bland search, I found already a few packags that do bundle zlib, one of which bundles an old version of zlib (1.2.1) which as far as I remember is vulnerable to some security issues. Once again having followed policy would have avoided the problem altogether, just like SDL bundling its own versions of Xorg libraries which can now make xorg-server crash when an SDL application is executed. Isn’t it pure fun?

At any rate, for tonight I’m off, I did quite a lot already, it’s 4.30 am and I’m still not sleepy, something is not feeling right.

Back on Enterprise

So I’m finally back on Enterprise, the new office is almost done, I just have to finish cleaning my stuff to bring it back into the office.

Since I’m back on Enterprise, I started Specto and looked at which pages were updated, and I got quite a few new releases for the packages I maintain:

  • Linux-PAM released the first of the 1.0 series, 1.0.0, which I’ve already added to the tree;
  • libarchive and sudo updated both their stable and beta branches;
  • nxhtml got a new release.

the release of Linux-PAM 1.0.0 makes me wonder if I should be trying to complete the move to sys-auth that started in fall 2005. I left PAM out before hoping that epkgmove could improve, but I don’t see that happening anytime soon, so I’m pondering about doing the move manually myself. Linux-PAM should really become sys-auth/linux-pam rather than sys-apps/pam it is now.

The biggest problem I can see are overlays that refer to sys-libs/pam that would be broken just about immediately by the move, and overlays that for some (often stupid) reason have a sys-libs/pam package needing to move that around.

Oh well, just another entry in my TODO list I suppose.

Checking for page changes

Dear lazyweb, I’m asking for help. I maintain the ebuilds for a series of software that seems not to use common release notices like Freshmeat. This is the case, for instance, of libarchive, Nimbus theme and sudo, etc.

I usually check the pages every other day, but it starts to get boring, and something I’d rather automate. Does anybody know of a nice software that checks if pages have changed and send me an email, or a log, or an rss, or anything actually, so that I can just tell that to check some given pages and leave it alone?

Optimally, it should just check the header for the Last-Modified date, or the ETag to be different, without doing a full GET of the page. I could probably write something up in bash to do it, it’s just a matter of netcat and grep, or maybe curl, but I’d actually avoid having to write something up myself if it exists already.

The reason why I would like it to use HEAD rather than GET is because there is no point for a script to request the same page every day to check if it is changed, it’s not like a browser requesting it. This way it would save both my and the site’s bandwidth, which if done properly by all services would be quite a save. Even better, if it could use If-Modified-Since, so that after the first request, every other subsequent request would just get a 304 response and no extra data like content type and length, which requires a stat on the server.

If anybody has a suggestion, it’s very welcome. Even if it’s not in portage, I can create ebuilds and maintain them afterward, I just need something to make my job easier! Of course it’s obvious that it has to run on Linux (and possibly FreeBSD ;) ).