EFF’s Panopticlick at Enigma 2016

One of the thing I was the most interested to hear about, at Enigma 2016, was news about EFF’s Panopticlick. For context, here is the talk from Bill Burlington:

I wrote before about the tool, but they have recently reworked and rebranded it to use it as a platform for promoting their Privacy Badger, which I don’t particularly care for. For my intents, they luckily still provide the detailed information, and this time around they make it more prominent that they rely on the fingerprintjs2 library for this information. Which means I could actually try and extend it.

I tried to bring up one of my concerns at the post-talk Q&A at the conference (the Q&A were not recorded), so I thought it wold be nice to publish my few comments about the tool as it is right now.

The first comment is this: both Panopticlick and Privacy Badger do not consider the idea of server-side tracking. I have said that before, and I will repeat it now: there are plenty of ways to identify a particular user, even across sites, just by tracking behaviour that are seen passively on the server side. Bill Budington’s answer to this at the conference was that Privacy Badger’s answer is allowing cookies only if if there is a policy in place from the site, and count on this policy being binding for the site.

But this does not mean much — Privacy Badger may stop the server from setting a cookie, but there are plenty of behaviours that can be observed without the help of the browser, or even more interestingly, with the help of Privacy Badger, uBlock, and similar other “privacy conscious” extensions.

Indeed, not allowing cookies is, already, a piece of trackable information. And that’s where the problem with self-selection, which I already hinted at before, comes to: when I ran Panopticlick on my laptop earlier it told me that one out of 1.42 browsers have cookies enabled. While I don’t have any access to facts and statistics about that, I do not think it’s a realistic number to say that about 30% of browsers have cookies disabled.

If you connect this to the commentaries on NSA’s Rob Joyce said at the closing talk, which unfortunately I was not present for, you could say that the fact that Privacy Badger is installed, and fetches a given path from a server trying to set a cookie, is a good way to figure out information on a person, too.

The other problem is more interesting. In the talk, Budington introduces briefly the concept of Shannon Entropy, although not by that name, and gives an example on different amount of entropy provided by knowing someone’s zodiac sign versus knowing their birthday. He also points out that these two information are not independent so you cannot sum their entropy together, which is indeed correct. But there are two problems with that.

The first, is that the Panopticlick interface does seem to think that all the information it gathers is at least partially independent and indeed shows a number of entropy bits higher than the single highest entry they have. But it is definitely not the case that all entries are independent. Even leaving aside browser specific things such as the type of images requested and so on, for many languages (though not English) there is a timezone correlation: the vast majority of Italian users would be reporting the same timezone, either +1 or +2 depending on the time of the year; sure there are expats and geeks, but they are definitely not as common.

The second problem is that there is a more interesting approach to take, when you are submitted key/value pair of information that should not be independent, in independent ways. Going back to the example of date of birth and zodiac sign, the calculation of entropy in this example is done starting from facts, particularly those in which people cannot lie — I’m sure that for any one database of registered users, January 1st is skewed as having many more than than 1/365th of the users.

But what happens if the information is gathered separately? If you ask an user both their zodiac sign and their date of birth separately, they may lie. And when (not if) they do, you may have a more interesting piece of information. Because if you have a network of separate social sites/databases, in which only one user ever selects being born on February 18th but being a Scorpio, you have a very strong signal that it might be the same user across them.

This is the same situation I described some time ago of people changing their User-Agent string to try to hide, but then creating unique (or nearly unique) signatures of their passage.

Also, while Panopticlick will tell you if the browser is doing anything to avoid fingerprinting (how?) it still does not seem to tell you if any of your extensions are making you more unique. And since it’s hard to tell whether some JavaScript bit is trying to load a higher-definition picture, or hide pieces of the UI for your small screen, versus telling the server about your browser setup, it is not like they care if you disabled your cookies…

For a more proactive approach to improve users’ privacy, we should ask for more browser vendors to do what Mozilla did six years ago and sanitize what their User-Agent content should be. Currently, Android mobile browsers would report both the device type and build number, which makes them much easier to track, even though the suggestion has been, up to now, to use mobile browsers because they look more like each other.

And we should start wondering how much a given browser extension adds or subtract from the uniqueness of a session. Because I think most of them are currently adding to the entropy, even those that are designed to “improve privacy.”

Siphoning data on public and private WiFi

So you may remember I have been reviewing some cyber-thrillers in the past, and some of them have been pretty bad. After that I actually thought I could write one myself; after all, it couldn’t be as bad as Counting from Zero. Unfortunately the harsh reality is that I don’t know enough diverse people out there to build up new, interesting but most importantly realistic characters. So I shelved the project completely.

But at the same time, I spent a lot of time thinking of interesting things that may happen in a cyber-thriller that fit more into my world view — while Doctorow will take on surveillance, and Russinovich battles terrorists armed with Windows viruses, I would have put my characters in to deal with the more mundane variety of cyber criminals.

One of the things that I thought about is a variant on an old technique, called Wardriving. While this is not a new technique, I think there are a few interesting twists and it would be a little too interesting tool for low-lifers with a little (not a lot) of computer knowledge.

First of all, when wardriving started as what became a fad, the wireless networks out there were vastly unencryped and for the most part underutilized. Things changed, now thanks to WPA a simple pass-by scan of a network does not give you as much data, and changes in the way wireless protocols are implemented have, for a while, made the efforts hard enough.

But things changed over time, so what is the current situation? I have been thinking of how many things you could do with a persistent wardriving, but it wasn’t until I got bored out of my mind on a lounge at an airport that I was able to prove my point. On my own laptop, in a totally passive mode, invisible to any client on the network, a simple tcpdump or Wireshark dump would show a good chunk of information.

For the most part not something that would be highly confidential — namely I was not able to see anything being sent by the other clients of the network, but I was able to see most of the replies coming from the servers; just monitor DNS and clear-text HTTP and you can find a lot of information about who’s around you.

For instance I could tell that there was another person in the lounge waiting for the same flight as me — as they were checking the RTE website, and I doubt any person not Irish or not connected with Ireland would spend time there. Oh and the guy sitting in front of me was definitely Japanese, because once he sat down I could see the replies back from yahoo.co.jp and a few more websites based in Japan.

Let me be clear, I was not doing that with the intention of doxxing somebody. I originally started tcpdump because one of my own servers was refusing me access — the lounge IP range is in multiple DNSBL, I was expecting the traffic on the network to be mostly viruses trying to replicate. What I found instead was that the access point is broadcasting to all connected clients the replies coming in for anyone else. This is not entirely common: usually you need to set your wireless card in promiscuous mode, and many cards nowadays don’t even let you do that.

But if this is the small fries of information I can figure out by looking at a tcpdump trace in a few minutes, you can imagine what you can find if you can sniff a network for a few hours. But spending a few hours tracing a network in the coffee shop at the corner could be suspicious. How can you make it less obvious? Well, here’s an interesting game, although I have not played it if not in my own stories’ drafts.

There are plenty of mobile WiFi devices out there — they take a SIM card and then project a WiFi signal for you to connect your devices to. I have one by Vodafone (although I use it with a bunch of different operators depending on where I’m traveling), and it is very handy, but while it runs Linux I did not even look for the option of rooting it. These are pretty common to find on eBay, second hand, because sometimes they essentially come free with the contract, and people update them fairly often as new features come up. Quite a few can run OpenWRT.

These devices come with a decent battery (mine lasts easily a whole day of use), and if you buy them second hand they are fairly untraceable (does anybody ever record the IMEI/serial number of the devices they sell?), and are ready to connect to mobile networks (although that’s trickier, the SIM is easier to trace.) Mine actually comes with a microSDHC slot, which means you can easily fit a very expensive 128GB microSD card if you want.

Of course it relies a lot on luck and the kind of very broad fishing net that makes it unfeasible for your average asshole to use, but there isn’t much needed — just a single service that shows you your plaintext password on a website, to match to an username, as most people will not use different passwords across services, with very few exceptions.

But let’s make it creepier – yes I’ll insist on making my posts about what I perceive to be a more important threat model than the NSA – instead of playing this on a random coffee shop at the corner, you are looking into a specific someone’s private life, and you’re close enough that you know or can guess their WiFi access point name and password, dropping one of these devices within the WiFi reach is not difficult at all.

The obvious question becomes what can you find with such a trace. Well, in no particular order you can tell the routine of a person quite easily by figuring out which time of the day they are at home (my devices don’t talk to each other that much when I’m not at home), what time they get up for work, and what time they are out of the door. You can tell how often they do their finances (I don’t go to my bank’s site every day, much less often the revenue’s). For some of the people out there you can tell when they have a private moment and what their interests are (yes I admit I went and checked, assuming you can only see the server response, you can still tell the title of the content that is being streamed/downloaded.) You can tell if they are planning a vacation, and in many cases where. You can tell if they are going to see a movie soon.

Creepy enough? Do I need to paint you a picture of that creepy acquaintance that you called in last week to help you set up your home theater, and to which you gave the WiFi password so he could Google up your provider’s setup guide?

How do you defend from this? Well, funnily enough a lot of the things people have been talking before the “Snowden Revelations” help a lo with this: HTTPS Everywhere and even Tor helps with this. While the latter gives you a different set of problems (it may be untraceable but it does not mean it’s secure!), it does obfuscate the data flow out of your network. It does not hide the traffic patterns (so you can still tell when people are in or not, when they wake up, and so on) but it does hide where you’re going, so that your private moments stay private. Unfortunately it is out of the reach of most people.

HTTPS is a compromise: you can’t tell exactly what’s going on, but if your target is going to YouPorn, you can still tell by the DNS reply. It does reduce the surface of attack considerably, though, and does not require that much technical knowledge on the client side. It’s for reasons like this that service providers should use HTTPS — it does not matter if the NSA can break the encryption, your creepy guy is not the NSA, but small parts of the creepy guy’s plan are thwarted by it: the logs can show the target visited the website of a movie theatre chain, but can’t show the replies from the server with the name of the branch or the movie that the target was interested in.

What is not helping us here, right now, with the creepy guys that are so easy to come by, is the absolute paranoia of the security and cryptography community right now. Dark email? Secure text messaging? They are definitely technologies that need to be explored and developed, but they should not be the focus of the threat model for the public. In this, I’m totally agreeing with Mickens.

I was (and a bit am) scared about writing about this, it makes me feel creepy. It gives a very good impression of how easy it is to abuse a bit of technical knowledge to become a horrible person. And with the track record of the technical circle in the past few years, it does scare the hell out of me, pardon the language.

While the rest of the security and technical community keep focusing on the ghost of the NSA, my fears are in the ease of everyday scams and information leaks. I was not surprised of what the various secret agencies out there wanted to do, after all we’ve seen the movies and the TV series. I was surprised of a few of the tools and reaches, but not the intentions. But the abuse power? There’s just as much of it outside of the surveillance community, it’s just that the people who know don’t care – they focus on theoretical problems, on the Chief World Systems, because that’s where the fun and satisfaction is – and the people who are at risk either believe everything is alright, or everything is not alright; they listen to what the media has to say, and the media never paints useful pictures.

Again on threat models

I’ve read many people over the past few months referencing James Mickens’s article on threat models. Given I wrote last year about a similar thing in regard to privacy policies, one would expect me to fall in line with said article fully. They would be disappointed.

While I agree with the general gist of the article, I think it gets a little too simplistic. In particular it downplays a lot the importance to protect yourself against two separate class of attackers: people close to you and people who may be targeting you even if you don’t know them. These do seem at first sight to fit in with Mickens’s categories, but they go a little further than he’s describing. And by painting the categories as “funny” as he did I think he’s undermining the importance of security.

Let’s start with the first threat model that the article points out to in the “tl;dr” table;

Ex-girlfriend/boyfriend breaking into your email account and publicly releasing your correspondence with the My Little Pony fan club

Is this a credible threat? Not really, but if you think about it a little more you can easily see how this can morph into disgruntled ex breaking into your computer/email/cloud account and publicly releasing nude selfies as revenge porn. Now it sounds a little more ominous than being outed out as a fan of My Little Pony, doesn’t it? And maybe you’ll call me sexist to point this out, but I think it would be hypocrite not to point out that this particular problem sees women as much more vulnerable to this particular problem.

But it does not have to strictly be an ex; it may be any creepy guy (or gal, if you really want to go there) who somehow gets to access your computer or to guess your “strong” password. It’s easy to blame the victim in these situations but that’s not the point; there are plenty of people ready to betray the trust of their acquaintances out there — and believe me, people trust other people way too easily, especially when they are looking for a tech-savvy friend-of-a-friend to help them fix their computer, I’ve been said tech-savvy friend-of-a-friend, and it didn’t take many times doing the kind of usual recovery to realize how important that trust is.

The second “threat model”, that is easily discounted, is described as

Organized criminals breaking into your email account and sending spam using your identity

The problem with a similar description of the threat is that it’s too easy for people to discard it with “so what?” People receive spam all the time, why would it matter whose identity it’s sent as? Once again, there are multiple ways to rephrase this to make it more ominous.

A very simple option is to focus on the monetary problem: organized criminals breaking into your email account looking for your credit card details. There are still plenty of services that will request your credit card numbers by email, and even my credit card company sends me the full 16-digits number of my card on the statements. When you point out to people that the criminals are not just going to bother a random stranger, but actually are going after their money, they may care a significant bit more.

Again this is not all there is, though. For a security or privacy specialist to ignore the issues of targeted attacks such as doxxing, coming up with the harassment campaigns that are all the rage to date is at the very least irresponsible. And that does not involve only the direct targets of harassment: the protection of even the most careful person is always weak to the people they have around, because we trust them, with information, or access, and so on.

Take for instance Facebook’s “living will” for users — if one wanted to harass some person, but their security was too strong, they could go after their immediate family, hoping that one of the would have the right access to close the account down. Luckily, I think Facebook is smarter than this, and so it should not be that straightforward, but many people also use member of the family’s addresses as recovery addresses if they were to lose access to their own account.

So with all this in mind, I would like to point out that at the same time I agree and disagree with Mickens’s article. There are way too many cryptographers out there that look into improbable threat models, but at the same time there are privacy experts that ignore what the actual threats are for many more users.

This is why I don’t buy into the cult of personalities of Assange, Snowden or Appelbaum. I’m not going to argue that surveillance is a good thing, nor I’m going to argue that there are no abuses ever – I’m sure there are – but the focus over the past two years have been so much more on state actions that malicious actors like those I described earlier.

I already pointed out how privacy advocates are in love with Tor and they ignore the bad behaviours it enables, and I once again I do wonder why they are more concerned about the possibility of obscure political abuses of power, rather than the real and daily abuse of people, most likely a majority of which women.

Anyway, I’m not a thought leader, and my opinions are strictly personal — but I do think that the current focus on protecting the public from possibly systemic abuse from impersonal organisations such as the NSA is overshadowing the importance of protecting people from those they are most vulnerable from: the people around them.

And let’s be clear: there are plenty of things that the crypto community can and should do to protect people in these situations: HTTPS is for instance extremely important, as it does not take a huge effort for a disgruntled ex to figure out how to snoop cleartext traffic to find the odd password or information that could lead to a break.

Just think twice, next time you decide to rally people up against a generic surveillance society phantom, or even to support EFF — I used to, I don’t currently and while I agree they have done good things for people, I do find they are focusing on the wrong threats.

My Personal Privacy Policy

Be warned, this post might as well offend you — it’s actually the same topic, and mostly the same post, as I was trying to write months ago and the last of a series of drafts that Typo made me lose and for which I was actually quite pissed off at it.

A premise, considering my current employer, you could expect that I’m biased. People who have known me for a while should know that this has always been my point of view and a payslip is not enough for buying my ideals. A second premise is that what I’m writing here is my personal opinion and has nothing to do with my employer.

Before getting into the details of my personal view on privacy, I’ll have to at least categorize who I am. I’m most definitely not a public figure, but I’m also not a complete nobody. I’m not sure if I’m notable, I’m not an activist as Jürgen is, but with being a Gentoo developer, I end up in a more visible spot than your average person. Even so, I’m not an A-list or even a B-list blogger.. maybe a D-list, for Diego, would be okay. It is obvious too when you consider that my blog has unmoderated, unlimited, non-captcha comments and yet I receive only a handful of them per post.

It is not something I care to think about too much, but I have noticed when I started working here in Dublin, that there were people that already knew me, even when I did not know them before, if not by a name passing on my blog’s comments. It does not mean much, of course, as my contribution to the world is still negligible. But it does mean that what I write on my blog, on my (public) Twitter, Facebook, Google+ profiles, is seriously public. My blog, my mailing list posts, even my IRC history is something that not only employers can look into, but also something that an enemy, if there are still some out there that didn’t grow bored of making my life miserable, would be able to leverage.

So with this premise, what is my idea of privacy? Well, as you probably remember, I have no problem with relatively-big corporation knowing what I buy and given how I use both FourSquare and Ingress, I have no problem with them knowing where I am in most cases. I also have no problem with most of my friends to know where I am, sure, it takes away from me the option of lying to people if I don’t want to go out with them — I count that as a positive note though, as my friends can count on the fact that I’m not doing that. Myself, if I was to do that, I would probably just not count them as friends, and thus would not have a problem with telling them that I don’t want to see them.

Is there anything I don’t want to broadcast? Sure, plenty. And I don’t do that by default. My opinion of people, for instance, is not something I tend to talk about, well, depends on the people of course. And there are habits of my own that I’d rather not talk about. And embarrassing personal problems too, but these do not include, for instance, my diabetes or my pancreatic problems, even though, as medical records, they are among the most protected data about me that is to be found out there.

Let me try to make a practical example of what my concerns of privacy actually are. It’s not a mystery that I’m no good with relationships – surprise, surprise, for a geek – and I’m pretty sure I admitted before to being a virgin as of 28 years of age (and counting). If I was to meet a gal with whom there could be a reciprocal attraction (unlikelier by the day), that would be one thing that I wouldn’t want to be known right away by everyone on earth. If nothing else, because I would probably not believe in the situation myself.

But more importantly, both details and general gists would have different circles of people who would get to know them at different times. My mother would, most definitely, be the last one to know — I originally wrote “my family” (which is basically me, my mother and my sister and her husband), then I realized that something that I similarly wanted to keep from them happened recently, when I got almost mugged. My sister got to know about that episode the week after it happened, when I had to go to the dentist and get the tooth extracted — the punch caused me an abscess that was quite painful and dangerous. I was broadcasting the event to the public and keeping it from my family because I did not want to worry them until the whole thing was completed. My mother still does not know that happened. Helps that neither speak or read English.

So going back to the example above, it’s a certainty that my colleagues would probably find out almost first, as I’m a person of routine and anything that breaks said routine is going to be pretty visible. I could make an excuse, but why? So it’s just going to be noticed. But unless I broadcast it, my sister and mother will not get to know it until I tell them. Sure, FourSquare could possibly deduce a change in behaviour, or notice that I’m checking in with a different set of friends; a government agency tracking my phone and hers could possibly find that I’m taking long walks with a new person (and that could be easily mixed in with my phone often taking long walks with other people as I play Ingress), but what would they care about it? It’s not illegal here.

And here’s the first tenet of my personal privacy policy: the fact that I can afford not to hide from governments is a privilege, and so is my ability to broadcast my position and my habits. I live and lived in countries that are relatively civil, I’m not, say, a gay person in Russia, and, sorry to say this so bluntly, I’m not female, which makes showing people that I’m somewhere alone not that much of a concern. This is the same concept of threat model that applies to computer security and other security areas; in my threat model, what I’m concerned about are not state actors or corporations, but rather criminals and personal enemies.

Back again at the example, if actually going out with somebody would break my routine enough to be noticeable, becoming sexually active I’d expect not to – just a guess, given that I’m not able to tell at this point – and that does change a few more things. Given it would be something private between me and this hypothetical significant other, I wouldn’t be talking about it in the open, which means even my colleagues would not know about it. Somebody would probably know that basically right away: my doctor for sure, and possibly my pharmacist (yes, I do have a local pharmacy, the one where I go buy my insulin and the other prescription drugs I have to take). The former would know when I ask him a new set of blood tests to be safe, the latter would know when I’d be asking for condoms for the first time. Alternatively, Tesco would know when I’d order them from the website, and the delivery guy would know as well, when he comes delivering. I’m pretty sure between the two options I’d go with the pharmacy, as I’ve already given up with being embarrassed when talking with them.

To close this, I would like to note that even though I live in what is mostly a glass house, I don’t expect everybody else to do so too. I’m just writing this to signify that I don’t think that there are many threat models that apply to me, for which I would start wearing a tinfoil hat in light of the “NSA revelations” that last year brought us. Maybe for some of you there are, but I doubt that all the people that have been fretting about tor attacks and the like have good reason to do so.

I’m sure that there are people out there that, under oppressive governments, that entrust their life to Tor and similar tools, so identifying and resolving its vulnerabilities is something that I can’t disagree with. On the other hand, as I said before most of the self-defined privacy advocates out there tend to not consider that this also helps also people like the SilkRoad users. While I’m definitely okay with legalization of marijuana, I’m of that opinion because it would avoid the existence of things like SilkRoad.

On the other hand, the NSA revelations do concern me, not because I’m scared of the NSA, but because if they can do it now, others will be able to do so in the future, and if those others are criminals, then I’d be scared of them. So please let’s all try to make things better, encrypt everything, research and find way around browser fingerprinting and help the EFF (I’m a donor too). Just keep in mind what your threat models are, rather than just blindly follow the blogosphere’s hysteria.

Browser fingerprinting

I’ve posted some notes about browser fingerprinting back in March, and noted how easy it is to identify a given user across requests just by the few passive scans that are possible without even having to have Flash enabled. Indeed, EFF’s Panopticlick considers my browser unique even with Flash disabled.

But even if Panopticlick is only counting it among the people who actually ran it, which means it’s just a percentage of all the possible users out there, it is also not exercising the full force of fingerprinting. In particular it does not try to detect the installed Chrome extensions, which is actually trivial to do in JavaScript for some of these extensions. In particular in my case I can easily identify the presence of the Readabily extension because it injects an “indicator” as an iframe with a fixed ID. Similarly it’s relatively easy to identify adblock users, as you probably have noticed in a bunch of different sites already that beg you to disable the adblocker so that they can make some money with the ads.

Given how paranoid some of my readers are, I’m looking forward for somebody to add Chrome and Firefox extensions identification to Panopticlick, it’ll be definitely interesting going forward.

User-Agent strings and entropy

It was 2008 when I first got the idea to filter User-Agents as an antispam measure. It worked for quite a while on its own, but recently my ruleset had to come up with more sophisticated fingerprinting to discover spammers. It still works better than a captcha, but it did worsen a bit.

One of the reasons why the User-Agent itself is not enough anymore is that my filtering has been hindered by a more important project. EFF’s Panopticlick has shown that the uniqueness of the strings used in User-Agent is actually an easy way to track a specific user across requests. This got so important, that Mozilla standardized their User-Agents starting with Firefox 4, to reduce their size and thus their entropy. Among other things, the “trail” component has been fixed on the desktop to 20100101 and to the same version as Firefox for the mobile version.

_Unfortunately, Mozilla lies on that page. Not only the trail is not fixed for Firefox Aurora (i.e. the alpha version), which means that my first set of rules was refusing access to all the users of that version, but also their own Lightning extension for SeaMonkey appends to the User-Agent, when they said that it wasn’t supported anymore._

A number of spambots seem to get this wrong, by the way. My guess is that they have some code that generates the User-Agent by adding a bunch of fragments, and make it randomize it, so you can’t just kick a particular agent. Damn smart if you ask me, unfortunately, as ModSecurity hashes the IP collection by remote address and user-agent, so if they cycle different user agents, it’s harder for ModSecurity to understand that it’s actually the same IP address.

I do have some reserves on Mozilla’s handling of identification of extensions. First they say that extensions and plugins should not edit the agent string anymore – but Lightning does! – then they suggest that instead they can send an extra header to identify themselves. But that just means that fingerprinting systems only need to start counting those headers as well as the generic ones that Panopticlick already considers.

On the other hand, other browsers don’t seem to have gotten the memo yet — indeed, both Safari’s and Chrome’s strings are long and include a bunch of almost-independent version numbers (AppleWebKit, Chrome, Safari — and Mobile on the iOS versions). It gets worse on Android, as both the standard browser and Chrome provide a full build identifier, which is not only different from one device to the next, but also from one firmware to the next. Given that each mobile provider has its own builds, I would be very surprised if among my friends I was able to find two with the same identifier in their browsers. Firefox is a bit better on that but it sucks in other ways so I’m not using it as my main browser anymore there.

Why I check your user agents

I’m one of the few Free Software activists that actually endorses the use of User-agent header, I’m afraid. The reason for that is that, while in general that header is used to implement various types of policies, it is often used as part of lock-in schemes (sometimes paper-thin lock-ins by the way), and we all agree that lock-ins are never nice. It is a different discussion on whether those lock-ins are something to simply attack, or something to comprehend and accept — I sincerely think that Apple has all the rights to limit the access to their trailers to QuickTime, or at least try to, as they are providing the service, and it’s for them a platform to show their software; on the other hand, BBC and RAI using it to lock-in their public service TV is something nasty!

So basically we have two reasons to use User-agent: policies and statistics. In the former category I also count in the implementation of workarounds of various species. Statistics, are mostly useful to decide on what to focus, policies, can be used for good or evil; lock-ins are generally evil, but you can use policies to improve the quality of the service for users.

One of the most commonly used workarounds applied by using the user agent declarations are related to MSIE missing features; for instance, there is one to handle serving properly the XHTML files through the application/xhtml+xml mime type, which it doesn’t support:

RewriteCond %{REQUEST_URI} ^/[a-z_/]*$
RewriteCond %{HTTP_USER_AGENT} MSIE [OR]
RewriteCond %{HTTP_USER_AGENT} facebookexternalhit [OR]
RewriteCond %{HTTP_ACCEPT} application/xhtml+xmls*;s*q=0.?0*(s|,|$)
RewriteRule ^/[a-z_/]*$ - [T=text/html]

Yes this has one further check that most of the copies of the same check have on Internet; the reason is that I have experimentally noticed that Facebook does not handle XHTML properly; indeed if you attach a link to a webpage that has images, and is served as XHTML, it won’t get you the title nor allow you to choose an image to use for the link. This was true at least up to last December, and I assume the same is true now, and thus why I have that extra line.

In a different situation, feng uses the User-agent field to identify bugged software and implement specific workarounds (such as ignoring the RTSP/1.0 standard, and seek on subsequent PLAY requests without PAUSE).

Stepping away from workarounds, policies that can implemented this way include warning about insecure, unsupported browsers, trojan-infected systems, and provide them with an informational message telling the user what to do to get something better/cleaner (I do that for a few websites to tell the users that they are running something very broken — such as Internet Explorer 6). This is policy, it’s generally a good policy in my opinion. *On a different note, if somebody can suggest a way to use cookies to add a static way to bypass the check, I’d be happy.*

There are many more things you can do with agent-specific policies, including providing lower-quality images for smartphones, without implementing mobile-specific website vhosts, but I won’t go into deeper details right now.

For what concerns statistics, they usually provide a way for developers and designer to focus on what’s really being used by the targets of their software. Again, some activists dislike this because it shows that it’s not worth considering non-Firefox, non-IE browsers for most websites, and sometimes not even Firefox, but avoiding these extreme cases, statistics are, in the real working world, very important.

Some people feel like being smarter than the average programmer, and want to throw out of place the statistics by saying that they are using “Commodore 64” or “MS-DOS” as operating system. They pretend to defend their privacy, to camouflage among the bad bad Internet. What they are doing, is actually trying to hide on a plane by wearing a balaclava which you might guess is pretty peculiar. In fact, if you try EFF’s Panopticlick you can see that an unique, “novelty” User-agent is actually making you spark among the Internet users. Which means that if you’re trying to hide through a crowd with the balaclava you’re not smarter than anybody, you’re actually stupider than the average.

Oh and by the way, there is no way your faking being Googlebot will work out good for you; on my webserver for instance, you’ll get 403 responses for all your requests… unless your reverse resolution properly forward-confirms to be coming from the googlebot server farm…