GnuPG Agent Forwarding with OpenPGP cards

Finally, after many months (a year?) absence, I’m officially back as a Gentoo Linux developer with proper tree access. I have not used my powers much yet, but I wanted to at least point out why it took me so long to make it possible for me to come back.

There are two main obstacles that I was facing, the first was that the manifest signing key needed to be replaced for a number of reasons, and I had no easy access to the smartcard with my main key which I’ve been using since 2010. Instead I set myself up with a separate key on a “token”: a SIM-sized OpenPGP card installed into a Gemalto fixed-card reader (IDBridge K30.) Unfortunately this key was not cross-signed (and still isn’t, but we’re fixing that.)

The other problem is that for many (yet not all) packages I worked on, I would work on a remote system, one of the containers in my “testing server”, which also host(ed) the tinderbox. This means that the signing needs to happen on the remote host, although the key cannot leave the smartcard on the local laptop. GPG forwarding is not very simple but it has sort-of-recently become possible without too much intrusion.

The first thing to know is that you really want GnuPG 2.1; this is because it makes your life significantly easier as the key management is handed over to the Agent in all cases, which means there is no need for the “stubs” of the private key to be generated in the remote home. The other improvement in GnuPG 2.1 is that there is better sockets’ handling: on systemd it uses the /run/user path, and in general it uses a standard named socket with no way to opt-out. It also allows you to define an extra socket that is allowed to issue signature requests, but not modify the card or secret keys, which is part of the defence in depth when allowing remote access to the key.

There are instructions which should make it easier to set up, but they don’t quite work the way I read them, in particular because they require a separate wrapper to set up the connection. Instead, together with Robin we managed to figure out how to make this work correctly with GnuPG 2.0. Of course, since that Sunday, GnuPG 2.1 was made stable, and so it stopped working, too.

So, without further ado, let’s see what is needed to get this to work correctly. In the following example we assume we have two hosts, “local” and “remote”; we’ll have to change ~/.gnupg/gpg-agent.conf and ~/.ssh/config on “local”, and /etc/ssh/sshd_config on “remote”.

The first step is to ask GPG Agent to listen to an “extra socket”, which is the restricted socket that we want to forward. We also want for it to keep the display information in memory, I’ll get to explain that towards the end.

# local:~/.gnupg/gpg-agent.conf

keep-display
extra-socket ~/.gnupg/S.gpg-agent.remote

This is particularly important for systemd users because the normal sockets would be in /run and so it’s a bit more complicated to forward them correctly.

Secondly, we need to ask OpenSSH to forward this Unix socket to the remote host; for this to work you need at least OpenSSH 6.7, but since that’s now quite old, we can be mostly safe to assume you are using that. Unlike GnuPG, SSH does not correctly expand tilde for home, so you’ll have to know the actual paths we want to write the unix at the right path.

# local:~/.ssh/config

Host remote
RemoteForward /home/remote-user/.gnupg/S.gpg-agent /home/local-user/.gnupg/S.gpg-agent.remote
ExitOnForwardFailure yes

Note that the paths need to be fully qualified and are in the order remote, local. The ExitOnForwardFailure option ensures that you don’t get a silent failure to listen to the socket and fight for an hour trying to figure out what’s going on. Yes, I had that problem. By the way, you can combine this just fine with the now not so unknown SSH tricks I spoke about nearly six years ago.

Now is the slightly trickier part. Unlike the original gpg-agent, OpenSSH will not clean up the socket when it’s closed, which means you need to make sure it gets overwritten. This is indeed the main logic behind the remote-gpg script that I linked earlier, and the reason for that is that the StreamLocalBindUnlink option, which seems like the most obvious parameter to set, does not behave like most people would expect it to.

The explanation for that is actually simple: as the name of the option says, this only works for local sockets. So if you’re using the LocalForward it works exactly as intended, but if you’re using RemoteForward (as we need in this case), the one on the client side is just going to be thoroughly ignored. Which means you need to do this instead:

# remote:/etc/sshd/config

StreamLocalBindUnlink yes

Note that this applies to all the requests. You could reduce the possibility of bugs by using the Match directive to reduce them to the single user you care about, but that’s left up to you as an exercise.

At this point, things should just work: GnuPG 2.1 will notice there is a socket already so it will not start up a new gpg-agent process, and it will still start up every other project that is needed. And since as I said the stubs are not needed, there is no need to use --card-edit or --card-status (which, by the way, would not be working anyway as they are forbidden by the extra socket.)

However, if you try at this point to sign anything, it’ll just fail because it does not know anything about the key; so before you use it, you need to fetch a copy of the public key for the key id you want to use:

gpg --recv-key ${yourkeyid}
gpg -u ${yourkeyid} --clearsign --stdin

(It will also work without -u if that’s the only key it knows about.)

So what is about keep-display in local:~/.gnupg/gpg-agent.conf? One of the issues I faced with Robin was gpg failing saying something about “file not found”, though obviously the file I was using was found. A bit of fiddling later found these problems:

  • before GnuPG 2.1 I would start up gpg-agent with the wrapper script I wrote, and so it would usually be started by one of my Konsole session;
  • most of the time the Konsole session with the agent would be dead by the time I went to SSH;
  • the PIN for the card has to be typed on the local machine, not remote, so the pinentry binary should always be started locally; but it would get (some of) the environment variables from the session in which gpg is running, which means the shell on “remote”;
  • using DISPLAY=:0 gpg would make it work fine as pinentry would be told to open the local display.

A bit of sniffing around the source code brought up that keep-display option, which essentially tells pinentry to ignore the session where gpg is running and only consider the DISPLAY variable when gpg-agent is started. This works for me, but it has a few drawbacks. It would not work correctly if you tried to use GnuPG out of the X11 session, and it would not work correctly if you have multiple X11 sessions (say through X11 forwarding.) I think this is fine.

There is another general drawback on this solution: if two clients connect to the same SSH server with the same user, the last one connecting is the one that actually gets to provide its gpg-agent. The other one will be silently overruled. I”m afraid there is no obvious way to fix this. The way OpenSSH itself handles this for the SSH Agent forwarding is to provide a randomly-named socket in /tmp, and set the environment variable to point at it. This would not work for GnuPG anymore because it now standardised the socket name, and removed support for passing it in environment variables.

Asynchronous Munin

If you’re a Munin user in Gentoo and you look at ChangeLogs you probably noticed that yesterday I did commit quite a few changes to the latest ~arch ebuild of it. The main topic for these changes was async support, which unfortunately I think is still not ready yet, but let’s take a step back. Munin 2.0 brought one feature that was clamored for, and one that was simply extremely interesting: the former is the native SSH transport, the others is what is called “Asynchronous Nodes”.

On a classic node whenever you’re running the update, you actually have to connect to each monitored node (real or virtual), get the list of plugins, get the config of each plugin (which is not cached by the node), and then get the data for said plugin. For things that are easy to get because they only require you to get data out of a file, this is okay, but when you have to actually contact services that take time to respond, it’s a huge pain in the neck. This gets even worse when SNMP is involved, because then you have to actually make multiple requests (for multiple values) both to get the configuration, and to get the values.

To the mix you have to add that the default timeout on the node, for various reason, is 10 seconds which, as I wrote before makes it impossible to use the original IPMI plugin for most of the servers available out there (my plugin instead seem to work just fine, thanks to FreeIPMI). You can increase the timeout, even though this is not really documented to begin with (unfortunately like most of the things about Munin) but that does not help in many cases.

So here’s how the Asynchronous node should solve this issue: on a standard node, the requests to the single node are serialized so you’re actually waiting for each to complete before the next one is fetched, as I said, and since this can make the connection to the node take, all in all, a few minutes, and if the connection is severed then, you lose your data. The Asynchronous node, instead, has a different service polling the actual node on the same host, and saves the data in its spool file. The master in this case connects via SSH (it could theoretically work using xinetd but neither me nor Steve care about that), launches the asynchronous client, and then requests all the data that was fetched since the last request.

This has two side-effects: the first is that your foreign network connection is much faster (there is no waiting for the plugins to config and fetch the data), which in turn means that the overall munin-update transaction is faster, but also, if for whatever reason the connection fails at one point (a VPN connection crashes, a network cable is unplugged, …), the spooled data will cover the time that the network was unreachable as well, removing the “holes” in the monitoring that I’ve been seeing way too often lately. The second side effect is that you can actually spool data every five minutes, but only request it every, let’s say, 15, for hosts which does not require constant monitoring, even though you want to keep granularity.

Unfortunately, the async support is not as tested as it should be and there are quite a few things that are not ironed out yet, which is why the support for it in the ebuild has been this much in flux up to this point. Some things have been changed upstream as well: before, you had only one user, and that was used for both the SSH connections and for the plugins to fetch data — unfortunately one of the side effect of this is that you might have given your munin user more access (usually read-only, but often times there’s no way to ensure that’s the case!) to devices, configurations or things like that… and you definitely don’t want to allow direct access to said user. Now we have two users, munin and munin-async, and the latter needs to have an actual shell.

I tried toying with the idea of using the munin-async client as a shell, but the problem is that there are no ways to pass options to it that way so you can’t use --spoolfetch which makes it vastly useless. On the other hand, I was able to get the SSH support a bit more reliable without having to handle configuration files on the Gentoo side (so that it works for other distributions as well, I need that because I have a few CentOS servers at this point), including the ability to use this without requiring netcat on the other side of the SSH connection (using one old trick with OpenSSH). But this is not yet ready, it’ll have to wait for a little longer.

Anyway as usual you can expect updates to the Munin page on the Gentoo Wiki when the new code is fully deployed. The big problem I’m having right now is making sure I don’t screw up with the work’s monitors while I’m playing with improving and fixing Munin itself.

Mostly unknown OpenSSH tricks

I’m not really keen on writing “tricks” kind of posts for software, especially widely known as OpenSSH, but this stuff tends to be quite intriguing now – to me at least – and I think most people wouldn’t know about that.

If you’ve used cvs, svn or other SSH-connected services in the recent past you probably know that one of the most time-wasting tasks for these systems is to connect the ssh session itself. Thankfully, quite a bit of time ago, OpenSSH introduced the “master” connection feature: the first ssh connection you open to a host (with a given username) creates a Unix socket descriptor, which is used by the following sessions. This was really a godsend for when committing a huge number of packages more or less at the same time, when I was still in the KDE team.

Unfortunately, this still required the first connection to be a persistent one; to solve that, I used to start a screen session that connected a few ssh connections, after asking me for the key passphrase. This didn’t make it any nice on the system or on the security, among other things. But fear not, OpenSSH 5.6 brought magic into the mix!

The new ControlPersist option allows for the master connection to be created in a detached SSH process when first connecting to a box, so that there is no need to preventively prepare for processes to be kept around. Basically all you have to do to make use of master connections now is something like this in your ~/.ssh/config:

Host *.gentoo.org
ControlMaster auto
ControlPersist yes

and you’re set: ssh takes care of creating the first master connection if not present, and to delete and recreate it if it’s dropped for whatever reason; you can otherwise force the connection to be closed by using ssh $host -O exit. Lovely!

There is one more trick that I wish to share though, although this time around I’m not sure which OpenSSH version introduced it. You can have from time to time the need to connect to a box that is behind a not-too-hostile firewall, which you have also access to. This is the case at a customer’s of mine where only the main router has a (dynamic) IPv6 address and I have to go through that to connect to the other boxes. The usual trick to follow in such a situation is to use the ProxyCommand option, using ssh and nc (or in my case nc6) to get a raw connection to the other side. I was always a bit bothered by having to do it this way to be honest. Once again, recent versions of OpenSSH solve the problem with the -W option:

Host router.mycustomer.example.com
ProxyCommand none

Host *.mycustomer.example.com
ProxyCommand ssh router.mycustomer.example.com -W %h:%p

With this method, it will be the ssh on my customer’s router to take care of connecting to the box further down the net and redirect that to the input/output streams, without the need for a nc process to be spawned. While this doesn’t look like a biggie, it’s still one less package I have to keep installed on the system, among other things.

What’s the plan with PAM?

Okay after quite a few days working on pambase, and a few discussions, I think I have a decent plan going forward to implement the new pambase. You can find the current status of the code in the repository but of course it is not really enough to get to test it; I’m still undecided whether I should add an overlay to the same repository, or get a PAM-only overlay where I can dump all the ebuilds for review.

So here is the rundown. First of all, as I said many times by now, I’ve replaced the C preprocessor with the M4 tool instead; this provides me a much more complete macro language, which means I can easily add further support, and mess around to disable/enable options as needed. Thanks to this, we now have a number of other authentication schemes, such as LDAP, PKCS#11 (via pam_pkcs11, I’m also going to add pam_p11 which is definitely simpler to set up even though it has less features).

But the main difference is in the services that it provides out of the box; there are now a number of new service files that will be used directly or indirectly by all the packages using PAM in the tree; while I didn’t like increasing the number of service files, there are slight differences in the behaviour of each that makes it necessary to split them around. For instance a remote login by default will not force root to login on a secure tty (although you might want to do that!), and as I said before you cannot easily use the same auth chain for both a login and a password-changer.

Another issue that I’m having trouble wrapping my head around is that you really cannot use the same authentication schemes for both interactive and automatic services; so for instance if you’re logging into a remote mail server you cannot provide an access token to do the login (since it requires it to be connected locally on the server). From one side, the same goes for ssh actually… I guess the only reason why I don’t feel as compelled to tackle it there is that I don’t use PAM for authenticating on SSH and neither should you in most cases.

What is now available in the repository is mostly working fine; although it remains a problem of interfacing it properly: I’m still unsure of the way the various modules stack up. For what it’s worth, one of the most obnoxious things I can think of is properly supporting pam_loginuid: it should be used by services both interactive and non-interactive to properly set auditing to log the user who’s acting (so even if there is privilege escalation you can track down who exploited it), but it should thus not be used by either sudo nor su, nor by things like PostgreSQL or Drizzle.

Right now what we’re going to have for certain are these services:

  • system-local-login (name kept for compatibility) will be used basically only by login — I’m actually tempted to provide /etc/pam.d/login directly as part of pambase, given that it is, yes, provided by shadow right now, but it’s also used by Busybox;
  • system-graphical-login will be used by all the graphical login managers: GDM, XDM, KDM, Slim, Qingy… the idea behind this one is that it is, by default, quiet; unlike what I originally planned, and what GDM now does, this one will not avoid running pam_mail, pam_lastlog and so on so forth; they will be, instead, put into their silent modes: the data will be updated, the variables will be set, but no output will come from them; unfortunately things like Qingy will cause double-setting of the same variables; that’s a problem for another time;
  • system-remote-login is the one service used by sshd (and theoretically by rsh/@rlogin@ if they made sense to be used);
  • system-services is the one used by the various cron daemons, atd, start-stop-daemon and so on: they have no authentication done, but they do have account validation and, most importantly, session support; as I said above, these should set the login uid to ensure that the audit framework knows who to blame for problems in here;
  • system-password is the service used by passwd and the other password changing tools; it only deals with the Unix stack and with (eventually) Gnome-Keyring;
  • system-auth-backends and system-login-backends are the two that give me more trouble to deal with; they have the actual calls to the backends used for authenticating; they are separate so that we can actually have them set up for optionality, so that only one is needed to succeed to allow the user to authenticate to the system, by using the substack option on the previous service; beside substack, the only other solution would have been to use the skip method I’ve been using up to now, and that I haven’t entirely stopped considering to be honest.

Also, among other things, I’ve removed the nullok option from the authentication and session support for pam_unix; this basically means that you can no longer have valid accounts without a password set; the idea is that this will not cause trouble with autologin features, but I’ll cross that bridge in due time. For now it should be enough to have an idea that the code is out there and how should be more or less be used.

Comments are, as usual, quite welcome.

Hardware signatures

If you read Planet Debian as well as this blog, you probably have noticed the number of Debian developers that changed their keys recently, after the shadows cast over the SHA-1 hash algorithm. It is debatable on whether this is an issue now or not, but that’s not what I want to discuss.

There are quite a few reasons why Debian developers are more interested in this than Gentoo developers; while we also sign manifests, there are quite a few things that don’t work that well in our security infrastructure, which we should probably pay more attention to (but I don’t want to digress now), so I don’t blame their consideration of tighter security.

I’m also considering the switch; while I have my key for quite a while, there are a few issues with it: it’s not signed by any Gentoo developer (I actually don’t think I have met anybody in person to be able to exchange documents and stuff), the Manifest signing key is not a subkey of my actual primary key (which by the way contains lots of data of my previous “personas” that don’t matter any longer), and so on so forth. Revoking this all and starting anew might be a good solution.

But, before proceeding, I want finally go get over with the thing and move to hardware cryptography if possible; I already expressed the interest before, but I never dug enough to find the important information, now I’m looking for that kind of information. And I want a solution that works in the broadest extension of cases:

  • I want it to work without SHA-1; I guess this starts already to be difficult; while it’s not clear whether SHA-1 is weak enough to be a vulnerability or not, being able to ignore the issue by using a different algorithm is certainly a desirable feature;
  • I want it to work with GnuPG and OpenSSH at least; if there is a way to get it to work with S/MIME it might also be a good idea;
  • I want it to work on both Linux and Mac OS X: I have two computers in my office: Yamato running Gentoo and Merrimac running OSX; I have to use both, and can’t do without either; I don’t care if I don’t have GnuPG working on OSX, I still need it to work with OpenSSH, since I would like to use it for remote access to my boxes;
  • as an extension to the previous point, I guess it has to be USB; not only I can switch it between the two systems (hopefully!), I’m also going to get a USB switch to use a single console between the two;

I guess the obvious solution would be a tabletop smartcard reader with one or more cards (and I could get my ID card to be a smartcard), but there is one extra point: one day I’m going to have a laptop again, what then? I was thinking about all-in-one tokens, but I have even less knowledge about those than I have about smartcards.

Can anybody suggest me a solution? I understand that the FSFE card only supports 1024 bit for the keys, which seems to be tied to weakness lately, no idea how much of that is true though, to be honest.

So, suggestions, very welcome!

The mess I found with PAM

So I started working, as I anticipated before, toward centralising a bit of PAM configuration for Gentoo in a pambase package. It’s still in flux a bit, but the base should be here already.

The 20080219.1 ebuild now install system-local-login and system-remote-login. ConsoleKit and similar should go to system-local-login, while system-remote-login is reserved to ssh and similar software.

I opened a tracker bug for the new pambase system, I’ve started the cleanup process for pam_passwdqc (which will be available as alternative to cracklib in the pambase ebuild soon, so that it also allows a FreeBSD-like default – FreeBSD uses pam_passwdqc by default), and I started converting ebuilds.

The new setup is quite flexible, I think, but it’s not really complete yet. Unfortunately I’m dedicating just part of my daily time on this, as I’m also working on C# (and swearing against Microsoft for their broken documentation about clipboard usage!). I’m still uncertain on a couple of points, but it should come out nicely in the end.

I’m also wondering if I should move ftpd support from ftpbase into pambase, it makes sense to me as that way it is handled properly by all current and future PAM implementations supported by Gentoo. ftpbase is currently maintainer-needed, so it makes even more sense to bring at least part of it under a team, even if understaffed as PAM team is.

I’ll also have to talk with the net-mail team about mailbase for similar reasons.

Unfortunately, doing this is also showing off big problems on how PAM was handled up to now. There are packages with a pam USE flag, but not installing any pam.d configuration file, there are packages that install upstream-provided files that are not valid in Gentoo, there are packages still providing configuration files referring to pam_stack (in comments). I’m starting to wonder if I should ask the Portage (and QA) team to add an extra QA check for PAM configuration files after build, so that I can find more easily packages breaking the rules.

I admit it’s not easy on the maintainers’ part as I haven’t finished writing the PAM documentation. I hope to be free from work in the next weeks, but I start to wonder if I’ll ever be. And I also hate working on PAM as always, but somebody has to do that, no?

Anyway, two packages are plainly broken and not looking up to be fixed: net-misc/ssh (which should be punted from the tree ASAP, thanks Gustavo!) and net-misc/lsh (another SSH implementation, it has completely broken PAM – it checks against other, which is set to deny any user – and no maintainer; if you want to save it, step up, and I might actually fix the PAM part).