One of the thing I was the most interested to hear about, at Enigma 2016, was news about EFF’s Panopticlick. For context, here is the talk from Bill Budington:
I wrote before about the tool, but they have recently reworked and rebranded it to use it as a platform for promoting their Privacy Badger, which I don’t particularly care for. For my intents, they luckily still provide the detailed information, and this time around they make it more prominent that they rely on the fingerprintjs2 library for this information. Which means I could actually try and extend it.
I tried to bring up one of my concerns at the post-talk Q&A at the conference (the Q&A were not recorded), so I thought it wold be nice to publish my few comments about the tool as it is right now.
The first comment is this: both Panopticlick and Privacy Badger do not consider the idea of server-side tracking. I have said that before, and I will repeat it now: there are plenty of ways to identify a particular user, even across sites, just by tracking behaviour that are seen passively on the server side. Bill Budington’s answer to this at the conference was that Privacy Badger’s answer is allowing cookies only if if there is a policy in place from the site, and count on this policy being binding for the site.
But this does not mean much — Privacy Badger may stop the server from setting a cookie, but there are plenty of behaviours that can be observed without the help of the browser, or even more interestingly, with the help of Privacy Badger, uBlock, and similar other “privacy conscious” extensions.
Indeed, not allowing cookies is, already, a piece of trackable information. And that’s where the problem with self-selection, which I already hinted at before, comes to: when I ran Panopticlick on my laptop earlier it told me that one out of 1.42 browsers have cookies enabled. While I don’t have any access to facts and statistics about that, I do not think it’s a realistic number to say that about 30% of browsers have cookies disabled.
If you connect this to the commentaries on NSA’s Rob Joyce said at the closing talk, which unfortunately I was not present for, you could say that the fact that Privacy Badger is installed, and fetches a given path from a server trying to set a cookie, is a good way to figure out information on a person, too.
The other problem is more interesting. In the talk, Budington introduces briefly the concept of Shannon Entropy, although not by that name, and gives an example on different amount of entropy provided by knowing someone’s zodiac sign versus knowing their birthday. He also points out that these two information are not independent so you cannot sum their entropy together, which is indeed correct. But there are two problems with that.
The first, is that the Panopticlick interface does seem to think that all the information it gathers is at least partially independent and indeed shows a number of entropy bits higher than the single highest entry they have. But it is definitely not the case that all entries are independent. Even leaving aside browser specific things such as the type of images requested and so on, for many languages (though not English) there is a timezone correlation: the vast majority of Italian users would be reporting the same timezone, either +1 or +2 depending on the time of the year; sure there are expats and geeks, but they are definitely not as common.
The second problem is that there is a more interesting approach to take, when you are submitted key/value pair of information that should not be independent, in independent ways. Going back to the example of date of birth and zodiac sign, the calculation of entropy in this example is done starting from facts, particularly those in which people cannot lie — I’m sure that for any one database of registered users, January 1st is skewed as having many more than than 1/365th of the users.
But what happens if the information is gathered separately? If you ask an user both their zodiac sign and their date of birth separately, they may lie. And when (not if) they do, you may have a more interesting piece of information. Because if you have a network of separate social sites/databases, in which only one user ever selects being born on February 18th but being a Scorpio, you have a very strong signal that it might be the same user across them.
This is the same situation I described some time ago of people changing their
User-Agent string to try to hide, but then creating unique (or nearly unique) signatures of their passage.
For a more proactive approach to improve users’ privacy, we should ask for more browser vendors to do what Mozilla did six years ago and sanitize what their User-Agent content should be. Currently, Android mobile browsers would report both the device type and build number, which makes them much easier to track, even though the suggestion has been, up to now, to use mobile browsers because they look more like each other.
And we should start wondering how much a given browser extension adds or subtract from the uniqueness of a session. Because I think most of them are currently adding to the entropy, even those that are designed to “improve privacy.”