Links

pmuellr is Patrick Mueller

other pmuellr thangs: home page, twitter, flickr, github

Thursday, May 17, 2007

That Darned Cat! - 1

The performance of Twitter as of late has been abysmal. I'm getting tired of seeing tweets like "Wondering what happened to my last 5 tweets" and "2/3 of the updates from Twitterrific never post for me. Is this normal?". I'm especially tired of seeing that darned cat!

Pssst! I don't think the cat is actually helping! Maybe you should get him away from your servers.

Here's a fun question to ask: do you support ETags?

In order to test whether Twitter is doing any of the typical sorts of caching that it could, via ETag or Last-Modified processing, I wrote a small program to issue HTTP requests with the relevant headers, which will indicate whether the server is taking advantage of this information. The program, http-validator-test.py, is below.

First, here are the results of targetting http://python.org/ :

$ http-validator-test.py http://python.org/
Passing no extra headers
200 OK; Content-Length: 15175; Last-Modified: Fri, 18 May 2007 01:41:57 GMT; ETag: "60193-3b47-b04e2340"

Passing header: If-None-Match: "60193-3b47-b04e2340"
304 Not Modified; Content-Length: None; Last-Modified: None; ETag: "60193-3b47-b04e2340"

Passing header: If-Modified-Since: Fri, 18 May 2007 01:41:57 GMT
304 Not Modified; Content-Length: None; Last-Modified: None; ETag: "60193-3b47-b04e2340"

The first two lines indicate no special headers were passed in the response, and that a 200 OK response was returned with the specified Last-Modified and ETag headers.

The next two lines show an If-None-Match header was sent with the request,indicating to only send the content if it's ETag doesn't match the value passed. It does match, so a 304 Not Modified is returned instead, indicating no content will be sent down (it hasn't changed since you last asked for it).

The last two lines show an If-Modified-Since header was sent with the request,indicating to only send the content if it's last modified date is later than the value specified. It's not later, so a 304 Not Modified is returned instead, indicating no content will be sent down (it hasn't changed since you last asked for it).

For content that doesn't change between requests, this is exactly the sort of behaviour you want to see from the server.

Now, let's look at the results we get back from going to my Twitter page at http://twitter.com/pmuellr :

$ http-validator-test.py http://twitter.com/pmuellr
Passing no extra headers
200 OK; Content-Length: 26491; Last-Modified: None; ETag: "a246e2e41e13726b7b8f911995841181"

Passing header: If-None-Match: "a246e2e41e13726b7b8f911995841181"
200 OK; Content-Length: 26504; Last-Modified: None; ETag: "1ef9e784fa85059db37831c505baea87"

Passing header: If-Modified-Since: None
200 OK; Content-Length: 26503; Last-Modified: None; ETag: "2ba91b02f418ed74e316c94c438e3788"

Rut-roh. Full content sent down with every request. Probably worse, generated with every request. In Ruby. Also note that no Last-Modified header is returned at all, and different ETag headers were returned for each request.

So there's some low fruit to be picked, perhaps. Semantically, the data shown on the page did not change between the three calls, so really, the ETag header should not have changed, just as it didn't change in the test of the python site above. Did anything really change on the page? Let's take a look. Browse to my Twitter page, http://twitter.com/pmuellr, and View Source. The only thing that really looks mutable on this page, given no new tweets have arrived, is the 'time since this tweet arrived' listed for every tweet. That's icky.

But poke around some more, peruse the gorgeous markup. Make sure you scroll right, to take in some of the long, duplicated, inline scripts. Breathtaking!

There's a lot of cleanup that could happen here. But let me get right to the point. There's absolutely no reason that Twitter shouldn't be using their own API in an AJAXy style application. Eating their own dog food. As the default. Make the old 1990's era, web 1.0 page available for those people who turn JavaScript off in their browser. Oh yeah, a quick test of the APIs via curl indicates HTTP requests for API calls do respect If-None-Match processing for the ETag.

The page could go from the gobs of duplicated, mostly static html, to just some code to render the data, obtained via an XHR request to their very own APIs, into the page. As always, less is more.

We did a little chatting on this stuff this afternoon; I have more thoughts on how Twitter should fix itself. To be posted later. If you want part of the surprise ruined, Josh twittered after reading my mind.

Here's the program I used to test the HTTP cache validator headers: http-validator-test.py

	#!/usr/bin/env python

	#--------------------------------------------------------------------
	# do some ETag and Last-Modified tests on a url
	#--------------------------------------------------------------------

	import sys
	import httplib
	import urlparse

	#--------------------------------------------------------------------
	def sendRequest(host, path, header=None, value=None):
	    headers = {}

	    if (header):
	        print "Passing header: %s: %s" % (header, value)
	        headers[header] = value
	    else:
	        print "Passing no extra headers"

	    conn = httplib.HTTPConnection(host)
	    conn.request("GET", path, None, headers)
	    resp = conn.getresponse()

	    stat = resp.status
	    etag = resp.getheader("ETag")
	    lmod = resp.getheader("Last-Modified")
	    clen = resp.getheader("Content-Length")

	    print "%s %s; Content-Length: %s; Last-Modified: %s; ETag: %s" % (
	        resp.status, resp.reason, clen, lmod, etag
	        )
	    print

	    return resp

	#--------------------------------------------------------------------
	if (len(sys.argv) <= 1):
	    print "url expected as parameter"
	    sys.exit()

	x, host, path, x, x, x = urlparse.urlparse(sys.argv[1], "http")

	resp = sendRequest(host, path)
	etag = resp.getheader("ETag")
	date = resp.getheader("Last-Modified")

	resp = sendRequest(host, path, "If-None-Match", etag)
	resp = sendRequest(host, path, "If-Modified-Since", date)

Update - 2007/05/17

Duncan Cragg pointed out that I had been testing the Date header, instead of the Last-Modified header. Whoops, that was dumb. Thanks Duncan. Luckily, it didn't change the results of the tests (the status codes anyway). The program above, and the output of the program have been updated.

Duncan, btw, has a great series of articles on REST on his blog, titled "The REST Dialog".

In addition, I didn't reference the HTTP 1.1 spec, RFC 2616, for folks wanting to learn more about the mysteries of our essential protocol. It's available in multiple formats, here: http://www.faqs.org/rfcs/rfc2616.html.

No comments: