Web pages and DNS requests

Since Google DNS was announced, I noted that people repeated the Google message:

Complex pages often require multiple DNS lookups before they start loading, so your computer may be performing hundreds of lookups a day.

Is this really true? Well let’s try to break this up in more verifiable keyphrases, starting from the end of the message.

“Your computer may be performing hundreds of lookups a day” is definitely true in general. On the other hand I’d like to note a few things here. Most decent operating systems, and that includes Linux, FreeBSD, OSX and I think Windows as well, tend to cache the DNS results. In Linux and FreeBSD the piece of software that does that is called nscd (Name Service Cache Daemon). If it’s not running, then multiple requests do tend to be quite taxing (I found that out by myself when I forgot to start it in the tinderbox after moving to containers), so set it to auto-start if you didn’t already.

Now it is true that the type of caching that nscd do is quite limited, so it’s definitely not perfect but there is probably another problem there, and that’s well summarised by Paul Vixie’s rant about DNS abuse where he explain how DNS have been abused, among other things, to provide policy results rather than absolute truths. To express policies, it also means that the response cannot really be cached for long periods of time, and that’s definitely a problem for all systems caching, as having a low time-to-live (TTL) means that you have to make at least one full resolution any given time.

And this is probably the main reason why DNS caches, whoever provides them, tend to get more interesting: while they have to make a full recursive resolution every time the TTL expires, it’s most likely that somebody else have requested it already so the common entries (like Google’s hosts) are likely to always be fresh. It still doesn’t solve one problem though: you need multi-level caches, just like the CPU has, since making a (cached) request to an external server is always going to take more time than getting it out of a local cache. Myself to solve the problem I usually have nscd running as well as having pdnsd running on my router to cache the requests (yes that’s why most home routers don’t assign your ISP’s DNS to the single boxes but try to be the main DNS, sometimes failing badly because of bad caches).

But what I really have a problem with is the other half of the message: “Complex pages often require multiple DNS lookups before they start loading”. This is also true of course, but has anybody thought about why is that? Well I sincerely think this is also Google’s own fault!

Indeed even my own blog pages have resources coming from a number of different hosts: Google (for the custom search), PayPal (for the donation button which I restored a couple of weeks ago), Amazon (for the rare associate links) and Creative Commons for the logo at the end of the page; single post pages also have Gravatar images. While it’s probably impossible to just avoid using external resources in a web page, it doesn’t help that a lot of services, including Amazon’s associates website, Google’s Analytics and AdSense have external hostnames: a web page with Google custom search, AdSense and Analytics will have to make three resolutions, rather than one.

And this made me think about some self-cleansing as well. My personal reason to have my website and my blog as separate hostnames is mostly that at the time I first set it up, Typo was hard to properly set on a sub-directory. I wonder if I should now reduce the two of them at just using the flameeyes.eu domain and have a single /blog/ subdirectory for Typo.

But I also noticed other people creating different subdomains for “static” content, which end up requiring further resolutions, for no good reason at all. I think that most people don’t realise that at all (I also haven’t thought much about that before), and I wonder if Google shouldn’t rather start trying to teach people the right way to do things rather than coming up with some strangely boring, badly hyped, privacy-concerning project like the DNS.