Links

pmuellr is Patrick Mueller

other pmuellr thangs: home page, twitter, flickr, github

Monday, May 21, 2007

ETags are not a panacea

panacea: A remedy for all diseases, evils, or difficulties; a cure-all.
From Answers.com

In my post "That Darned Cat! - 1", I complained about Twitter performance, and peeked at some of their HTTP headers noticing they didn't seem to respect ETags or Last-Modified header cache validator tests.

Since posting, Twitter performance is back on track. I haven't checked, but I'm guessing they didn't add ETag support. :-)

A number of people seemed to read into my post that ETags are a cause of Twitter's performance problems. I'd be the first to admit that such a proposition is a bit of a stretch. ETags are no panacea, and in fact you'll obviously have to write more code to handle them correctly. Harder even, if you're using some kind of high level framework for your app. This isn't easy stuff.

And in general, my 20+ years of programming have taught me that your first guess at where your performance problems in your code are, is dead wrong. You really need to break out some diagnostic tools, or write some, to figure out where your problems are. Since I don't have the Twitter code, I'm of course at a complete loss to guess where their problems are, when they have them.

ETags and Last-Modified processing is something you ought to do, if you can afford it, because it does allow for some optimization in your client / server transactions. To be clear, the optimization is that the server doesn't have to send the content it would have sent to the client, as the client has indicated it already has that 'version' of it cached. There is still a round-trip to the server involved. If you're looking for an absolute killer optimization though, you should be looking at Expires and Cache-Control headers. See Mark Nottingham's recent post "Expires vs. max-age" for some additional information, along with the link to his caching tutorial.

Expires and friends are killer, because they allow the ultimate in client / server transaction optimization; the transaction is optimized away completely. The client can check the expiration data, and determine that the data they have cached has not 'expired', and thus they don't need to ask the server for it at all. Unfortunately, many applications won't be able to use these headers, if their data is designed to change rapidly; eg, Twitter.

Sam Ruby also blogged about another great example of Expires headers. How often does the Google maps image data really change?

Here's another great example, applicable to our new web 2.0-ey future. Yahoo! is hosting their YUI library for any application to use directly, without copying the toolkit to their own web site. Let's peek at the headers from one of their files:

$ curl -I http://yui.yahooapis.com/2.2.2/build/reset/reset-min.css
HTTP/1.1 200 OK
Last-Modified: Wed, 18 Apr 2007 17:35:45 GMT
X-Yahoo-Compressed: true
Content-Type: text/css
Cache-Control: max-age=313977290
Expires: Tue, 02 May 2017 04:08:44 GMT
Date: Mon, 21 May 2007 04:13:54 GMT
Connection: keep-alive

Good stuff! The Expires and Cache-Control headers render this file pretty much immutable, as it should be. When Yahoo! releases the next version of the toolkit, it'll be hosted at a different url base, and so will be unaffected by the headers of this particular file; they will be different urls. This sort of behaviour is highly optimal for web 2.0-ey apps, which are wont to download a lot of static html, css and javascript files, which, for some particular version of the app, will never change. And thus, by having the files cached on the client in such a way that it never asks the server for them again, the app will come up all the quicker.

Good stuff to know, and take advantage of if you can.

For more specific information about our essential protocol, HTTP 1.1, see RFC 2616. It's available in multiple formats, here: http://www.faqs.org/rfcs/rfc2616.html.

No comments: