Why I check your user agents

I’m one of the few Free Software activists that actually endorses the use of User-agent header, I’m afraid. The reason for that is that, while in general that header is used to implement various types of policies, it is often used as part of lock-in schemes (sometimes paper-thin lock-ins by the way), and we all agree that lock-ins are never nice. It is a different discussion on whether those lock-ins are something to simply attack, or something to comprehend and accept — I sincerely think that Apple has all the rights to limit the access to their trailers to QuickTime, or at least try to, as they are providing the service, and it’s for them a platform to show their software; on the other hand, BBC and RAI using it to lock-in their public service TV is something nasty!

So basically we have two reasons to use User-agent: policies and statistics. In the former category I also count in the implementation of workarounds of various species. Statistics, are mostly useful to decide on what to focus, policies, can be used for good or evil; lock-ins are generally evil, but you can use policies to improve the quality of the service for users.

One of the most commonly used workarounds applied by using the user agent declarations are related to MSIE missing features; for instance, there is one to handle serving properly the XHTML files through the application/xhtml+xml mime type, which it doesn’t support:

RewriteCond %{REQUEST_URI} ^/[a-z_/]*$
RewriteCond %{HTTP_USER_AGENT} MSIE [OR]
RewriteCond %{HTTP_USER_AGENT} facebookexternalhit [OR]
RewriteCond %{HTTP_ACCEPT} application/xhtml+xmls*;s*q=0.?0*(s|,|$)
RewriteRule ^/[a-z_/]*$ - [T=text/html]

Yes this has one further check that most of the copies of the same check have on Internet; the reason is that I have experimentally noticed that Facebook does not handle XHTML properly; indeed if you attach a link to a webpage that has images, and is served as XHTML, it won’t get you the title nor allow you to choose an image to use for the link. This was true at least up to last December, and I assume the same is true now, and thus why I have that extra line.

In a different situation, feng uses the User-agent field to identify bugged software and implement specific workarounds (such as ignoring the RTSP/1.0 standard, and seek on subsequent PLAY requests without PAUSE).

Stepping away from workarounds, policies that can implemented this way include warning about insecure, unsupported browsers, trojan-infected systems, and provide them with an informational message telling the user what to do to get something better/cleaner (I do that for a few websites to tell the users that they are running something very broken — such as Internet Explorer 6). This is policy, it’s generally a good policy in my opinion. *On a different note, if somebody can suggest a way to use cookies to add a static way to bypass the check, I’d be happy.*

There are many more things you can do with agent-specific policies, including providing lower-quality images for smartphones, without implementing mobile-specific website vhosts, but I won’t go into deeper details right now.

For what concerns statistics, they usually provide a way for developers and designer to focus on what’s really being used by the targets of their software. Again, some activists dislike this because it shows that it’s not worth considering non-Firefox, non-IE browsers for most websites, and sometimes not even Firefox, but avoiding these extreme cases, statistics are, in the real working world, very important.

Some people feel like being smarter than the average programmer, and want to throw out of place the statistics by saying that they are using “Commodore 64” or “MS-DOS” as operating system. They pretend to defend their privacy, to camouflage among the bad bad Internet. What they are doing, is actually trying to hide on a plane by wearing a balaclava which you might guess is pretty peculiar. In fact, if you try EFF’s Panopticlick you can see that an unique, “novelty” User-agent is actually making you spark among the Internet users. Which means that if you’re trying to hide through a crowd with the balaclava you’re not smarter than anybody, you’re actually stupider than the average.

Oh and by the way, there is no way your faking being Googlebot will work out good for you; on my webserver for instance, you’ll get 403 responses for all your requests… unless your reverse resolution properly forward-confirms to be coming from the googlebot server farm…

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s