More on browser fingerprinting, and Analytics

A few days ago I pointed out how it’s possible to use some of the Chrome extensions (and likely just as many of the Firefox ones) to gather extra entropy in addition to the one that Panopticlick already knows about. But this is not the only source of identification that Panopticlick is not considering, and that can be used to track users.

I originally intended to write a full proof of concept for it, but since I’m currently in Mountain View, my time is pretty limited, so I’ll limit myself to a description of it. Panopticlick factors in the Accept header for the page that the browser sends with the page’s request, but there is one thing that it does not check for, as it’s a bit more complex to do: the Accept header for images. Indeed, different browsers support different image formats, as I’ve found before and even browsers that support, for instance, WebP such as Opera and Chrome will have widely different Accept headers.

What does it mean? Well, if you were trying to replace, let’s say, your Chrome user agent with a Firefox one, you’d now have a very unique combination of a Firefox user agent accepting WebP images. Your hope of hiding by muddling the waters just made you stand up much more easily. The same goes if you were trying to disable WebP requests to make your images’ Accept more alike Firefox’s: now you’ll have a given version of Chrome that does not support WebP — the likeliness of being unique is even bigger.

So why am I talking this much about browser fingerprinting later? Well, you may or may not have noticed but both my blog and Autotools Mythbuster are now using Google Analytics. The reason for that is that, after my doubts on whether to keep running the blog or not, I want to know exactly how useful my blog is to people, and how many people end up reading it at given time. I was originally a bit unsure on whether this was going to be a problem for my readers, but seeing how easily it is to track people stealthily, tracking people explicitly shouldn’t be considered a problem — thus why I’m going to laugh at your expense if you’ll start complaining about this being a “web bug”.

Serving WebP images

A few months ago I tried experimenting with WebP so to reduce the traffic on my blog without losing quality. At the end the results were negative and I decided instead to drop the backgrounds on my blog and replace them with some CSS to provide gradients, which was not a lossless change, but was definitely easier to load.

After the VideoLan Dev Days 2013 (of which I have to write some report soon, I went to speak with Pascal again, and he was telling me that the new version of Chrome finally fixed the HTTP accept header, so that finally it will prefer WebP to other image formats if present. I confirmed this, as Chrome 30 reports Accept: image/webp,*/*;q=0.8. The q= parameter is not needed for Apache actually, but it’s a good idea to have it there anyway.

Thanks to this change, and mod_negotiation’s MultiViews it’s possible to auto-select the format, between JPEG, PNG and WebP, for Chrome users. Indeed if you’re visiting my blog with Chrome 30 (not sure about 29), you’re going to be served mostly WebP images (the CC license logos are still provided as PNG because the lossless compression was worse, and the lossy one was not saving enough bytes to be worth it).

I started working on enabling this while waiting at Terminal 1 at CDG airport (this is the last time I’m flying Aer Lingus to Paris), and I was able to finish the thing before my flight boarded. What I did realize just before that, though, is that Apache would still prefer serving WebP, I’d venture a guess that it’s because it’s smaller in size. This is okay for Opera, Firefox and (obviously) Chrome, but not for Safari or IE.

Of course if the other browsers were to actually report the formats they supported it would be all fine, but that’s not the case. In particular, Firefox actually prefers image/png to anything else (Apache will add a low q= value to any glob request, just to be on the safe side, which is why I said earlier that q= is not needed for it), so that even if I don’t make any more changes, Firefox will still prefer PNG to WebP (but won’t do anything for JPEG, so if the web server prefers WebP to JPEG, it’s going to be fine).

So how to provide WebP without breaking other browsers? One solution would be to use PageSpeed to compress the images on the fly to WebP when requested by a compatible browser, but that is a bit of overkill, and is hard to package right, and, most importantly, requires browser detection logic on the server, which is not very safe.

At the end I decided to go with a safer option: provide WebP only to Chrome users and not to users of other browsers, at least until they decide to fix their Accept headers. But how to do that? Well, I needed to check Apache’s source code, because documentation does not seem to explain that clearly and explicitly, but to decide which format to serve, Apache will multiply the q= parameter coming from the browser, or its implicit values (that make image/* and */* have a default value of less than 0.1) with the qs= parameter passed when declaring the type:

AddType image/jpeg .jpeg .jpg .jpe
AddType image/png .png
AddType image/webp;qs=0.9 .webp

By using the value 0.9 to webp, and leaving the default 1 to the other formats, I’m basically telling Apache that, all things equals (like if the browser is sending Accept: */*, Internet Explorer style), I prefer to provide PNG or JPEG to the users, rather than WebP. It will also prefer to serve JPEG to Firefox (which uses image/*). Chrome 30, on the other hand, explicitly prefers WebP over any other image format, and so Apache will calculate the preference as 1.0*0.9 for WebP and 0.8*1.0 for PNG and JPEG. I have not checked what Opera does, but it looks like all the browsers on my cellphone would support WebP but they don’t prefer it, so they won’t be served it either.

So right now WebP images for my blog are an exclusive of Chrome users; the win is relatively good, by halving the size of the Autotools Mythbuster cover on the right, and shaving off a few bytes from the top image for the links. There are definitely more interesting way to save bandwidth by re-compressing the images that I used around the blog (many of which, re-compressed, end up taking half the space), but that will have to wait for me to fix this bloody Typo as right now editing posts is failing.

Another thing that I will have to work on is a tool to handle the re-compression. Right now I’m doing so by hand, and it’s both a waste of time, and prone to errors. I’ll have to come up with a good way to quickly describe images so that a tool can re-compress them and evaluate whether to keep them in WebP or not, and at the same time I need to find a way to store the originals at the highest quality. But that’s a topic for a different time.

The WebP experiment

You might have noticed over the last few days that my blog underwent some surgery, and in particular that some even now, on some browsers, the home page does not really look all that well. In particular, I’ve removed all but one of the background images and replaced them with CSS3 linear gradients. Users browsing the site with the latest version of Chrome, or with Firefox, will have no problem and will see a “shinier” and faster website, others will see something “flatter”, I’m debating whether I want to provide them with a better-looking fallback or not; for now, not.

But this was also a plan B — the original plan I had in mind was to leverage HTTP content negotiation to provide WebP variants of the images of the website. This was a win-win situation because, ludicrous as it was when WebP was announced, it turns out that with its dual-mode, lossy and lossless, it can in one case or the other outperform both PNG and JPEG without a substantial loss of quality. In particular, lossless behaves like a charm with “art” images, such as the CC logos, or my diagrams, while lossy works great for logos, like the Autotools Mythbuster one you see on the sidebar, or the (previous) gradient images you’d see on backgrounds.

So my obvious instinct was to set up content negotiation — I’ve used it before for multiple-language websites, I expected it to work for multiple times as well, as it’s designed to… but after setting all up, it turns out that most modern web browsers still do not support WebP *at all*… and they don’t handle content negotiation as intended. For this to work we need either of two options.

The first, best option, would be for browsers only Accept the image formats they support, or at least prefer them — this is what Opera for Android does: Accept: text/html, application/xml;q=0.9, application/xhtml+xml, multipart/mixed, image/png, image/webp, image/jpeg, image/gif, image/x-xbitmap, */*;q=0.1 but that seems to be the only browser doing it properly. In particular, in this listing you’ll see that it supports PNG, WebP, JPEG, GIF and bimap — and then it accepts whatever else with a lower reference. If WebP was not in the list, even if it had an higher preference for the server, it would not be sent to the client. Unfortunately, this is not going to work, as most browsers send Accept: */* without explicitly providing the list of supported image formats. This includes Safari, Chrome, and MSIE.

Point of interest: Firefox does explicit one image format before others: PNG.

The other alternative is for the server to default to the “classic” image formats (PNG, JPEG, GIF) and then expect the browsers supporting WebP prioritizing it over the other image formats. Again this is not the case; as shown above, Opera lists it but does not prioritize, and again, Firefox prioritizes PNG over anything else, and makes no special exception for WebP.

Issues are open at Chrome and Mozilla to improve the support but they haven’t reached mainstream yet. Google’s own suggested solution is to use mod_pagespeed instead — but this module – which I already named in passing in my post about unfriendly projects – is doing something else. It’s on-the-fly changing the content that is provided, based on the reported User-Agent.

Given that I’ve spent some time on user agents, I would say I have the experiences to say that this is a huge pandora’s vase. If I have trouble with some low-development browsers reporting themselves as Chrome to fake their way in with sites that check the user agent field in JavaScript, you can guess how many of those are going to actually support the features that PageSpeed thinks they support.

I’m going to go back to PageSpeed in another post, for now I’ll stop to say that WebP has the numbers to become the next generation format out there, but unless browser developers, as well as web app developers start to get their act straight, we’re going to have hacks over hacks over hacks for the years to come… Currently, my blog is using a CSS3 feature with the standardized syntax — not all browsers understand it, and they’ll see a flat website without gradients; I don’t care and I won’t start adding workarounds for that just because (although I might use SCSS which will fix it for Safari)… new browsers will fix the problem, so just upgrade, or use a sane browser.