Serving WebP images

A few months ago I tried experimenting with WebP so to reduce the traffic on my blog without losing quality. At the end the results were negative and I decided instead to drop the backgrounds on my blog and replace them with some CSS to provide gradients, which was not a lossless change, but was definitely easier to load.

After the VideoLan Dev Days 2013 (of which I have to write some report soon, I went to speak with Pascal again, and he was telling me that the new version of Chrome finally fixed the HTTP accept header, so that finally it will prefer WebP to other image formats if present. I confirmed this, as Chrome 30 reports Accept: image/webp,*/*;q=0.8. The q= parameter is not needed for Apache actually, but it’s a good idea to have it there anyway.

Thanks to this change, and mod_negotiation’s MultiViews it’s possible to auto-select the format, between JPEG, PNG and WebP, for Chrome users. Indeed if you’re visiting my blog with Chrome 30 (not sure about 29), you’re going to be served mostly WebP images (the CC license logos are still provided as PNG because the lossless compression was worse, and the lossy one was not saving enough bytes to be worth it).

I started working on enabling this while waiting at Terminal 1 at CDG airport (this is the last time I’m flying Aer Lingus to Paris), and I was able to finish the thing before my flight boarded. What I did realize just before that, though, is that Apache would still prefer serving WebP, I’d venture a guess that it’s because it’s smaller in size. This is okay for Opera, Firefox and (obviously) Chrome, but not for Safari or IE.

Of course if the other browsers were to actually report the formats they supported it would be all fine, but that’s not the case. In particular, Firefox actually prefers image/png to anything else (Apache will add a low q= value to any glob request, just to be on the safe side, which is why I said earlier that q= is not needed for it), so that even if I don’t make any more changes, Firefox will still prefer PNG to WebP (but won’t do anything for JPEG, so if the web server prefers WebP to JPEG, it’s going to be fine).

So how to provide WebP without breaking other browsers? One solution would be to use PageSpeed to compress the images on the fly to WebP when requested by a compatible browser, but that is a bit of overkill, and is hard to package right, and, most importantly, requires browser detection logic on the server, which is not very safe.

At the end I decided to go with a safer option: provide WebP only to Chrome users and not to users of other browsers, at least until they decide to fix their Accept headers. But how to do that? Well, I needed to check Apache’s source code, because documentation does not seem to explain that clearly and explicitly, but to decide which format to serve, Apache will multiply the q= parameter coming from the browser, or its implicit values (that make image/* and */* have a default value of less than 0.1) with the qs= parameter passed when declaring the type:

AddType image/jpeg .jpeg .jpg .jpe
AddType image/png .png
AddType image/webp;qs=0.9 .webp

By using the value 0.9 to webp, and leaving the default 1 to the other formats, I’m basically telling Apache that, all things equals (like if the browser is sending Accept: */*, Internet Explorer style), I prefer to provide PNG or JPEG to the users, rather than WebP. It will also prefer to serve JPEG to Firefox (which uses image/*). Chrome 30, on the other hand, explicitly prefers WebP over any other image format, and so Apache will calculate the preference as 1.0*0.9 for WebP and 0.8*1.0 for PNG and JPEG. I have not checked what Opera does, but it looks like all the browsers on my cellphone would support WebP but they don’t prefer it, so they won’t be served it either.

So right now WebP images for my blog are an exclusive of Chrome users; the win is relatively good, by halving the size of the Autotools Mythbuster cover on the right, and shaving off a few bytes from the top image for the links. There are definitely more interesting way to save bandwidth by re-compressing the images that I used around the blog (many of which, re-compressed, end up taking half the space), but that will have to wait for me to fix this bloody Typo as right now editing posts is failing.

Another thing that I will have to work on is a tool to handle the re-compression. Right now I’m doing so by hand, and it’s both a waste of time, and prone to errors. I’ll have to come up with a good way to quickly describe images so that a tool can re-compress them and evaluate whether to keep them in WebP or not, and at the same time I need to find a way to store the originals at the highest quality. But that’s a topic for a different time.

Should website do public service?

Today I finally put online my new website based on the fsws framework. While still not ready for release, right now it can generate in a single call (but with dual pass!) the whole site, the page sitemap (compliant with the specification) and even the robots.txt file (my reason to generate it with the rest of the site is that it keeps a pointer to the sitemap, and at the same time, you can ignore a whole subtree much more easily, by just setting parameters on the various pages).

The nice thing about fsws is in its very lightweight output: the whole site I wrote for my friend is less than 300k, and requires almost no server-side handling at all. The only thing that I’m forced to do is some playing with Apache’s mod_rewrite to change the content type of the pages, because Internet Explorer (who else?) fails to handle properly-served XHTML content (and asks to save the pages instead of opening them).

But together with this particular quirk, I also keep another piece of code, that works quite alike a web application, even if it’s self-contained inside the webserver configuration: a sanity check for the browser, based on the user agent, just like my antispam filter in this blog. It checks for both older browser versions and particular user agent signatures that indicate the presence of adware, spyware and viruses on the requesting user’s system.

When these signatures are identified, all the requests for actual pages are redirected to an error-like page that warns the user about the problem and ask him (or her) to update or change browser, or to install and use an antivirus. Now, since the site is entirely static and there is no user interaction with the server-side components beside the HTTP server itself, there is no real need for me to discard requests coming from unsafe clients, so my only reason to actually implement this type of code is public service.

I haven’t implemented the same trick on my website, yet. I’m still a bit conflicted about its usage. From one side, applying it means that part of the internet users will be unable to even view my site, which being even my professional site, might be a not so sound business move; from the other point of view, if most of the sites out there (with the obvious exclusion of those providing tools like browsers and antivirus) were to refuse requests from IE6 and other old browsers, maybe their widespread user would be put to a stop.

And to which extent should I (we) be refusing requests? Having a minimum base version for any browser is a good start, but there is more to that. As I noticed, there are quite a few Windows spyware, adware, and trojans (especially dialups) that register themselves as part of the Internet Explorer user agent. I have no idea why they do that, maybe it’s to pay some kind of provision to the trojan’s authors, but we could be using this kind of information to notify the users about malware presence on their systems.

Unfortunately, there doesn’t seem to be a comprehensive database of user agent identifiers, although with a bit of search over a sample you can easily find a lot of useful data; and also, since the whole check right now is handled through a simple redirection, I have no way to provide the user with any kind of feedback about what kind of malware is in their system. I guess that using some quick javascript inside the error page itself would be able to solve this.