Signatures, security, attack vectors

In the past weeks we have assisted to lots of concern in the free software world about the problems tied with the simplification of attacks on the SHA-1 algorithm. Unfortunately this type of news, while pretty important, is only meaningful for the people who actually are expert in cryptography, and can confuse the hell of people, such as me, who don’t know that much about the topic.

So while there are posts about the practical attacks to git with SHA-1 weakness which may seem far fetched for some users, I’d like to try understanding, and making understood what the real world implications are of using weak hash algorithms in many situation.

The first problem that comes to my mind is very likely social: we call them “unique identifiers” but there is nothing in the math, as far as I can see, that do make them unique, and “one-way hash”, while obviously you can revert that with a properly sized table. What good hashes are designed for is making sure that the chances of a collision are low enough that it’s infeasible to hit them, and for the tables needed for reversing the hash to be huge enough that normal computers can’t handle it.

Of course, the concepts of “infeasible” and “huge” in computers are quite vague: while something may very well be infeasible for my cellphone, it might not be for the iMac, or it might be huge for Merrimac but not for Yamato. What was absolutely impossible for personal computers five years ago might very well be a cakewalk for a PlayStation 3 (think about the December 2008 SSL Certificate attack ). And this is without considering clusters and supercomputers; which means that we have to take a lateral approach to all this rather than just following the God Hash Function.

Hash functions are not just used for cryptographic work of course; sometimes it’s just a redundancy check to ensure that the data arrived properly, for instance the TCP protocol still supports checksumming the packets with CRC, even though the collision-free space for CRCs is quite smaller than MD5, which itself we know is no longer a valid security solution. There are some cases where just being able to tell if some data arrived or was decoded properly, that using cryptographic hashes is not very important at all, where speed is more of a concern. In those cases, CRC32 still performs pretty neatly.

On a similar note I still can’t understand why FLAC need to store the MD5 hash of the uncompressed wave data; sure it’s very desirable to have a checksum to tell if the decode was done properly, but I’d expect a CRC32 checksum to be quite enough for that, I don’t see why going with the big guns…

Anyway, this moves on to the next point; having a checksum, a hash, a digest for a file that has to be downloaded is useful to know whether the download completed successfully or if there was problem during the transmission; is MD5 enough there? Well it really depends, if it’s just to make sure that some data that is not tremendously important, because it’s not going to execute code on the machine, like a photo, or a video, then it might well be enough; sometimes CRC32 is also quite enough (for instance if you’re an anime fan you probably have noticed quite a few times the episodes having a CRC32 checksum in the downloaded file name – of course downloading pirated anime is illegal, just remember that next time you do it…).

But is it the same thing for source code? Why doesn’t Gentoo use MD5 any longer? Why are we using both SHA-256 and RMD-160? Obviously it’s not the same for source code, and while using more resilient hash algorithms (I was going to say “stronger” but that’s not the point of course) is necessary, is by far not sufficient. With source and executable code, we don’t only want to ensure that the data was received correctly, but also that the data is what we want it to be. This means that we need to certify that the downloaded data correspond to what was tested and found safe.

For this step we have to introduce a different concept from the mere hash: the signature; we need to sign the data to make sure that it’s not changed, and that if it’s tampered with, we want to make sure that the signature doesn’t match. GnuPG signatures are meant just to do that, but they also rely on a hash algorithm, that nowadays tend to be SHA-1, unless, like the Debian developers, you start to change it to SHA-256 or whatever else. Does it make such a difference? It depends on what you use the key for, one should say.

There are two main critiques against the use of different hashing algorithms for GnuPG key generation by default: the GnuPG maintainer said that the (economical) resources needed to counterfeit a signature nowadays are high enough that would still allow somebody to just pay a very bad guy to arrive at you. With a gun. Or worse. The second is that to perform a fake signature on an email message, you’re going to need to add lots and lots of spurious data, which will be quite a sell off of the way the message was produced.

Of course these points are both true; but there are one catch for each: the former is true for now but is not going to remain true forever, not only there can be more weaknesses on the algorithm to be found, but the average computing power of a single individual is still increasing year after year; while 200 PS3 systems don’t come cheap nowadays they certainly are more feasible, and less risky, to procure than a serial killer. And they are much lower profile.

The latter point is more interesting, because it shows some limits to the ability of forging a duplicate key or counterfeiting a signed message. Indeed, whatever the algorithm used, a simple signed text message, once counterfeited, is going to be easily spoofed by the presence of data that is bogus or not relevant to the message. While the technical chance that a way is found to make a counterfeited message that only contains words in the correct language, and that is thus easy to blend with the rest of the message, is not null, it’s also quite far fetched nowadays even for CRC I’d say. That should be enough for email messages.

But is it for every application of GnuPG keys? I don’t think so; as you might have read in the post I linked early in this entry about the chances of using the SHA-1 attacks to fool the GIT content tracker, it is possible to replace source code even when entering bogus data, because almost nobody will be going through all the source files to see if there is something strange in them. Similarly, spoofing signatures for binary files is not as hard to achieve as spoofing signatures for email messages. Even more so when you count that bzip2, gzip, and lzma all ignore trailing unknown data in their archives (which is a feature used even by Gentoo for the binary packages Portage builds). Which means that keys used for signing source and binary packages, like in the cases of Debian, and Gentoo, are more at risk for the SHA-1 attack than keys used just to sign email messages.

There are more things about this, but since I’m no expert I don’t want to go longer ways on this. There is much more to be said about the panacea of signatures, because as it appeared in my previous post about github there are quite a few users that are confused by what git tag signatures should mean to Gentoo developers and users. But this is the kind of stuff I always wanted to write about and almost never had time, I guess I’ll try my best to find time for it.

Hardware signatures

If you read Planet Debian as well as this blog, you probably have noticed the number of Debian developers that changed their keys recently, after the shadows cast over the SHA-1 hash algorithm. It is debatable on whether this is an issue now or not, but that’s not what I want to discuss.

There are quite a few reasons why Debian developers are more interested in this than Gentoo developers; while we also sign manifests, there are quite a few things that don’t work that well in our security infrastructure, which we should probably pay more attention to (but I don’t want to digress now), so I don’t blame their consideration of tighter security.

I’m also considering the switch; while I have my key for quite a while, there are a few issues with it: it’s not signed by any Gentoo developer (I actually don’t think I have met anybody in person to be able to exchange documents and stuff), the Manifest signing key is not a subkey of my actual primary key (which by the way contains lots of data of my previous “personas” that don’t matter any longer), and so on so forth. Revoking this all and starting anew might be a good solution.

But, before proceeding, I want finally go get over with the thing and move to hardware cryptography if possible; I already expressed the interest before, but I never dug enough to find the important information, now I’m looking for that kind of information. And I want a solution that works in the broadest extension of cases:

  • I want it to work without SHA-1; I guess this starts already to be difficult; while it’s not clear whether SHA-1 is weak enough to be a vulnerability or not, being able to ignore the issue by using a different algorithm is certainly a desirable feature;
  • I want it to work with GnuPG and OpenSSH at least; if there is a way to get it to work with S/MIME it might also be a good idea;
  • I want it to work on both Linux and Mac OS X: I have two computers in my office: Yamato running Gentoo and Merrimac running OSX; I have to use both, and can’t do without either; I don’t care if I don’t have GnuPG working on OSX, I still need it to work with OpenSSH, since I would like to use it for remote access to my boxes;
  • as an extension to the previous point, I guess it has to be USB; not only I can switch it between the two systems (hopefully!), I’m also going to get a USB switch to use a single console between the two;

I guess the obvious solution would be a tabletop smartcard reader with one or more cards (and I could get my ID card to be a smartcard), but there is one extra point: one day I’m going to have a laptop again, what then? I was thinking about all-in-one tokens, but I have even less knowledge about those than I have about smartcards.

Can anybody suggest me a solution? I understand that the FSFE card only supports 1024 bit for the keys, which seems to be tied to weakness lately, no idea how much of that is true though, to be honest.

So, suggestions, very welcome!

Logins and GnuPG

This post will not touch Gentoo at all in its topic, but these considerations do stem out of a Gentoo-related problem. The problem relates on our infrastructure system, not like there are problems with Infra, rather a problem of us developers, but let me explain.

We usually login to the various infrastructure boxes like dev.gentoo.org (webspace and other), the CVS and the GIT servers through the use of SSH keypairs, without a password. This is basically a generic method to keep boxes secure still allowing external access to them. But we also have a password set in LDAP that is used for the mail and to set LDAP data. One of the things I find most useful as a non-recruiter developer is being able to look up the IM addresses of the devs who made them available to other devs. I use Jabber a lot, especially since it allows me to avoid IRC.

As we rarely use that password, you can easily expect that a good deal of us forget that password quite easily. I asked already twice in three years for that to be reset (to my defense, it wasn’t even set the first time). Now to get it reset we have to ask someone, like Robin, who has to do the stuff by hand. I wondered how it can be safely automated. We have SSH and PGP keypairs, they could just as well work.

This in turn made me wonder how much safe are some services’ logins. I often forget some passwords, so it happens that I ask for a replacement, it arrives to my mail, and then I trash the mail for safety. But what if the mail was encoded with GnuPG? Then I’d need my keypair to decode it, and I can trust to leave it on the server. You could also use it to avoid phishing: make the outgoing service mail to be GPG-signed.

I tried something like that before in PHP, but it wasn’t really simple because you either had to leave the secret key without a passphrase, or you had to hardcode the passphrase inside the source (or configuration) files, which is not a good idea.

Sincerely I wonder if there is any software out there that does use GnuPG in a non-interactive way, beside simple scripts. Of the latter I have an example handy. The whole database of this blog as well as of xine Bugzilla is dumped every night, then compressed and encrypted with my GPG public key, the result is then sent directly to my email address, where I store them (I actually have to write another script to fetch one backup every week and write it off on a CF card, using tar directly on the device, without any filesystem, it should limit the deletion, after all it was designed for magnetic tapes, and the limitations are almost the same between the two).

It would be quite nice if w could easily let all the sensitive information encrypted on the mail server. Unfortunately using GMail through WebMail ruins the whole idea. Luckily, they do offer IMAP and POP3 which make using GnuPG quite friendlier.