The performance of
Twitter
as of late has been abysmal. I'm getting tired of seeing tweets like
"Wondering what happened to my last 5 tweets"
and
"2/3 of the updates from Twitterrific never post for me. Is this normal?".
I'm especially tired of seeing that darned cat!
Pssst! I don't think the cat is actually helping! Maybe you should get him
away from your servers.
Here's a fun question to ask:
do you support ETags?
In order to test whether Twitter is doing any of the
typical sorts of caching that it could,
via ETag or Last-Modified processing,
I wrote a small program to issue HTTP requests with the relevant headers, which will indicate
whether the server is taking advantage of this information.
The program, http-validator-test.py, is below.
First, here are the results of targetting http://python.org/ :
$ http-validator-test.py http://python.org/
Passing no extra headers
200 OK; Content-Length: 15175; Last-Modified: Fri, 18 May 2007 01:41:57 GMT; ETag: "60193-3b47-b04e2340"
Passing header: If-None-Match: "60193-3b47-b04e2340"
304 Not Modified; Content-Length: None; Last-Modified: None; ETag: "60193-3b47-b04e2340"
Passing header: If-Modified-Since: Fri, 18 May 2007 01:41:57 GMT
304 Not Modified; Content-Length: None; Last-Modified: None; ETag: "60193-3b47-b04e2340"
The first two lines indicate no special headers were passed in the response, and that a 200 OK response was
returned with the specified Last-Modified and ETag headers.
The next two lines show an If-None-Match
header was sent with the request,indicating to only send the content if
it's ETag doesn't match the value passed. It does match,
so a 304 Not Modified
is returned instead, indicating no content will be sent down (it hasn't changed since you last asked for it).
The last two lines show an If-Modified-Since
header was sent with the request,indicating to only send the content if
it's last modified date is later than the value specified. It's not later,
so a 304 Not Modified
is returned instead, indicating no content will be sent down (it hasn't changed since you last asked for it).
For content that doesn't change between requests, this is exactly the sort of behaviour
you want to see from the server.
Now, let's look at the results we get back from going to my Twitter page at
http://twitter.com/pmuellr :
$ http-validator-test.py http://twitter.com/pmuellr
Passing no extra headers
200 OK; Content-Length: 26491; Last-Modified: None; ETag: "a246e2e41e13726b7b8f911995841181"
Passing header: If-None-Match: "a246e2e41e13726b7b8f911995841181"
200 OK; Content-Length: 26504; Last-Modified: None; ETag: "1ef9e784fa85059db37831c505baea87"
Passing header: If-Modified-Since: None
200 OK; Content-Length: 26503; Last-Modified: None; ETag: "2ba91b02f418ed74e316c94c438e3788"
Rut-roh. Full content sent down with every request. Probably worse,
generated with every request. In Ruby. Also note that no
Last-Modified header is returned at all, and different ETag
headers were returned for each request.
So there's some low fruit to be picked, perhaps. Semantically, the data shown on the page did not
change between the three calls, so really, the ETag header should not
have changed, just as it didn't change in the test of the python site above. Did anything really
change on the page? Let's take a look. Browse to my Twitter page,
http://twitter.com/pmuellr, and View Source. The only thing
that really looks mutable on this page, given no new tweets have arrived, is the 'time since
this tweet arrived' listed for every tweet. That's icky.
But poke around some more, peruse the gorgeous markup. Make sure you scroll right, to
take in some of the long, duplicated, inline scripts. Breathtaking!
There's a lot of cleanup that could happen here. But let me get right to the point. There's
absolutely no reason that Twitter shouldn't be using
their own API
in an AJAXy style application.
Eating their own dog food.
As the default. Make the old 1990's era, web 1.0 page available for
those people who turn JavaScript off in their browser. Oh yeah, a quick test of the APIs via curl indicates
HTTP requests for API calls do respect If-None-Match processing for the ETag.
The page could go from the gobs of duplicated, mostly static html, to just some code to render
the data, obtained via an XHR request to their very own APIs, into the page. As always, less is more.
We did a little chatting on this stuff this afternoon; I have more thoughts on how Twitter
should fix itself. To be posted later. If you want part of the surprise ruined,
Josh twittered after reading my mind.
Here's the program I used to test the HTTP cache validator headers: http-validator-test.py
#!/usr/bin/env python
#--------------------------------------------------------------------
# do some ETag and Last-Modified tests on a url
#--------------------------------------------------------------------
import sys
import httplib
import urlparse
#--------------------------------------------------------------------
def sendRequest(host, path, header=None, value=None):
headers = {}
if (header):
print "Passing header: %s: %s" % (header, value)
headers[header] = value
else:
print "Passing no extra headers"
conn = httplib.HTTPConnection(host)
conn.request("GET", path, None, headers)
resp = conn.getresponse()
stat = resp.status
etag = resp.getheader("ETag")
lmod = resp.getheader("Last-Modified")
clen = resp.getheader("Content-Length")
print "%s %s; Content-Length: %s; Last-Modified: %s; ETag: %s" % (
resp.status, resp.reason, clen, lmod, etag
)
print
return resp
#--------------------------------------------------------------------
if (len(sys.argv) <= 1):
print "url expected as parameter"
sys.exit()
x, host, path, x, x, x = urlparse.urlparse(sys.argv[1], "http")
resp = sendRequest(host, path)
etag = resp.getheader("ETag")
date = resp.getheader("Last-Modified")
resp = sendRequest(host, path, "If-None-Match", etag)
resp = sendRequest(host, path, "If-Modified-Since", date)
Update - 2007/05/17
Duncan Cragg pointed out that I had been
testing the Date header, instead of the Last-Modified header. Whoops, that was dumb.
Thanks Duncan. Luckily, it didn't change the results of the tests (the status codes
anyway). The program above, and the output of the program have been updated.
Duncan, btw, has a great series of articles on REST on his blog, titled
"The REST Dialog".
In addition, I didn't reference the HTTP 1.1 spec, RFC 2616, for folks wanting to learn
more about the mysteries of our essential protocol. It's available in multiple formats,
here:
http://www.faqs.org/rfcs/rfc2616.html.