Links

pmuellr is Patrick Mueller

other pmuellr thangs: home page, twitter, flickr, github

Friday, June 01, 2007

browser scripting

Finally we have a programmable persistence engine for our browsers. Thank you, Googleplex.

Here's what I want next: more scripting languages. Obvious choices being python and ruby. Whatever happened to this? Slingshot would still have a slight advantage over such a contraption, as Slingshot has some additional desktop integration features that browsers don't currently have. It also has the advantage that it's not running in an application shell designed for browsing the entire web; there's no time machine (back button).

There's also Adobe Flex/Apollo to consider, since they will also have an embedded database available. On the language front with Flex, Adobe recently made an ActionScript Virtual Machine 2 (AVM2) Overview available. How long before someone ports some languages to that VM? Especially since dynamic languages like python and ruby are a fairly natural fit to the AVM2 engine (compared to the JVM anyway), and the AVM2 engine will likely be the most widely deployed VM in the near future (it's included in Flash 9).

The one thing I've been most excited about given the rash of new client products available, is that we've finally got a new "browser war" on our hands. Competition is fantastic; it's going to be a wild next couple of years.

Wednesday, May 30, 2007

not doing REST

From Mark Baker:

So if you're writing (or generating) contract/interface-level code which can't late-bind to all resources, everywhere, you're not doing REST ...

Is this "We don't need no stinkin contracts!" meme a reaction to the non-web-friendly WS-* world, what with it's overly complex and verbose schemas? Because I think there's plenty of room for some people to apply contracts to parts of the web. I certainly don't believe the entire web can be fully described using some all-encompassing schema language; but small pieces? Sure.

I guess what I don't understand, is how you are supposed to describe your services to someone without some kind of meta-data describing the service. Every 'web api' I've ever seen has human readable text describing the URIs, the data expected as input, and the data expected as output. (Admittedly, most of these 'web api's violate best practice HTTP principles somehow, but I think that's not an issue here; they could all be refactored to be be better HTTP citizens.) That human readable text is a contract; an interface. In English. Which is terrible. I'd rather have a machine readble version of that contract, so I can generate something human readable from it. And perhaps a validator. And some client stubs. Maybe some some test cases. Diagnostic tools. Etc.

What is the the alternative to describing your services? How is anyone going to write code to use these services, if they don't know where to send requests, what verbs to use, what data to send, and what kind of data to expect? Instead of Flickr producing a description of their web services like this, they're simply supposed to say "Flickr is now fully REST-enabled. Start here, and have fun!" ??

As with data modelling, I don't feel like there is a single answer to what schema or contract language be used. I'm not initially sold on WADL (seems too verbose), and certainly wouldn't use it if there was something else, better, for whatever project I was working on. The shape of the schema language isn't important, as long as it works for you.

So I guess contract-driven HTTP interfaces aren't REST. But this is an area I'm interested in; what name should I use, so I can avoid be labelled as "not doing REST" while I'm optimizing my use of the web by being a good HTTP citizen?

Tuesday, May 29, 2007

typing rest

Count me as someone who wants some typing in the REST world, based on the arguments made in the post by Aristotle Pagaltzis last week.

We're talking about contracts here. Contracts need to be formalized, somehow. English is not the best language to use, especially since we have much more precise languages available to us.

My thoughts here are really just an extension to my thoughts on data serialization. Services are just the next level of thing that need to be meta-described.

Several folks have pointed out WADL (Web Application Description Language) as a potential answer, but it has at least one hole: it doesn't have a way of describing non-XML data used as input or output. For example, JSON. It certainly is simpler and more direct than WSDL, so it does have that going for it.

All in all, good thoughts all around, but we have more work to do, more proof to provide. And by more work, I don't mean getting a handful of experts in a smoky back room mandating what the formats are going to be. In fact, I'm not so sure we need a single 'format'. If you've creating some kind of machine-readable schema to describe your data and your services, you're way ahead of the game.

In any case, don't wait for WADL to be finshed before starting to build out schema for your services. Use WADL if you can, use something else (hopefully simpler) if it's more appropriate for you.

Additional thoughts on Aristotle's post from Tim Bray, Stefan Tilkov and Mike Herrick.

Wednesday, May 23, 2007

audio vs video

The Redmonkers have been starting to web publish video interviews along with their usual audio interviews. Coté seems to be doing most (all?) of the work, and you can catch these as he releases them on his blog.

I like to see people experimenting with new technology, and 'upgrading' from audio to video sounds like a fun experiment (pardon the pun). But it doesn't work for me.

My issues:

  • There really isn't that much 'extra' in a video interview, over just the audio. You get to see faces. You get to see some body language. Maybe a picture or two.

  • The idea of watching an interview means I have to have two senses trained on the media: eyes and ears. You're killing my continuous partial attention!

  • I can't listen to it on my video-challenged iPod.

  • The reason I don't have a video-capable iPod is that the situations in which I listen to my iPod don't lend themselves to allowing me to watch something on the device as well: driving, mowing the lawn, washing the dishes, etc.

I fully admit to being an old fuddy-duddy; even Rush Limbaugh does video of his show. Good luck guys, and, if you can, also make the audio portion of your videos available as an MP3. I'm not alone in this wish.

But let me change the direction here. Let's look at an environment that is high on video interaction, and absolutely bereft of audio interaction. Your programming environment. Your IDE, be it Eclipse, NetBeans, IntelliJ, Visual Studio, XCode, Emacs, or a text editor and a command-line. How many of these programs use audio to help you develop code? None. Well, I might be lying, I'm not familiar with all of these environments, but I don't recall any of these environments making use of audio like they do visuals.

When we're programming on all cylinders, we're in 'the zone'. Continuous full attention. Eyes reading and scanning, fingers typing and clicking and moving mice. Where's the audio? It ain't there.

Nathan Harrington has a number of articles up at developerWorks, such as "Monitor your Linux computer with machine-generated music" which discuss ways developers can use audio in their computing environment.

This is good stuff, and we need more of it.

I would be remiss in not pointing out here that audio feedback like this is only useful to those of us lucky enough to have decent audio hardware and software in our heads. Those of us without such luck wouldn't be able to take advantage of audio feedback. On the other hand, folks who lack decent video hardware and software in their heads would most likely appreciate more emphasis on a sense they are more dependant on.

The most obvious use case for audio in a development environment is with debugging. There are a lot of cases while debugging when you just want to know if you hit a certain point in your program. The typical way you'd do this is to set a breakpoint, and when the location is hit, the debugger stops, you see where you are, and hit continue. Breaking your attention and demanding your input. What if, instead, you could set an audio breakpoint, that would play a sound when the location was hit? So your attention wasn't broken. And you didn't have to press the Continue button to proceed.

With regard to audio debugging, I know this has been experimented with many times in the past. I've done it as well, a decade ago, when I was using a programming environment that I was able to easily reprogram: Smalltalk.

But audio usage in development environments is not yet mainstream. There's lots of research to be done here:

  • What are the best sound palettes to use: audio clips, midi tones, short midi sequences, percussion vs tones?

  • How should we take advantage of other audio aspects like volume and pitch and three dimensional positioning, especially regarding continuously changing quantities like memory or i/o usage by an application?

  • How do we deal with 'focus' if I'm also listening to Radio Paradise while I'm working?

  • Does text-to-speech make sense in some cases?

  • How do we arrange multiple audio feedback to be presented to us in a way that's not unpleasing to listen to? Not just a cacophony of random sounds.

  • Beyond debugging, where else can we make use of audio feedback?

  • How might audio be integrated into diagnostic tools like DTrace?

Monday, May 21, 2007

ETags are not a panacea

panacea: A remedy for all diseases, evils, or difficulties; a cure-all.
From Answers.com

In my post "That Darned Cat! - 1", I complained about Twitter performance, and peeked at some of their HTTP headers noticing they didn't seem to respect ETags or Last-Modified header cache validator tests.

Since posting, Twitter performance is back on track. I haven't checked, but I'm guessing they didn't add ETag support. :-)

A number of people seemed to read into my post that ETags are a cause of Twitter's performance problems. I'd be the first to admit that such a proposition is a bit of a stretch. ETags are no panacea, and in fact you'll obviously have to write more code to handle them correctly. Harder even, if you're using some kind of high level framework for your app. This isn't easy stuff.

And in general, my 20+ years of programming have taught me that your first guess at where your performance problems in your code are, is dead wrong. You really need to break out some diagnostic tools, or write some, to figure out where your problems are. Since I don't have the Twitter code, I'm of course at a complete loss to guess where their problems are, when they have them.

ETags and Last-Modified processing is something you ought to do, if you can afford it, because it does allow for some optimization in your client / server transactions. To be clear, the optimization is that the server doesn't have to send the content it would have sent to the client, as the client has indicated it already has that 'version' of it cached. There is still a round-trip to the server involved. If you're looking for an absolute killer optimization though, you should be looking at Expires and Cache-Control headers. See Mark Nottingham's recent post "Expires vs. max-age" for some additional information, along with the link to his caching tutorial.

Expires and friends are killer, because they allow the ultimate in client / server transaction optimization; the transaction is optimized away completely. The client can check the expiration data, and determine that the data they have cached has not 'expired', and thus they don't need to ask the server for it at all. Unfortunately, many applications won't be able to use these headers, if their data is designed to change rapidly; eg, Twitter.

Sam Ruby also blogged about another great example of Expires headers. How often does the Google maps image data really change?

Here's another great example, applicable to our new web 2.0-ey future. Yahoo! is hosting their YUI library for any application to use directly, without copying the toolkit to their own web site. Let's peek at the headers from one of their files:

$ curl -I http://yui.yahooapis.com/2.2.2/build/reset/reset-min.css
HTTP/1.1 200 OK
Last-Modified: Wed, 18 Apr 2007 17:35:45 GMT
X-Yahoo-Compressed: true
Content-Type: text/css
Cache-Control: max-age=313977290
Expires: Tue, 02 May 2017 04:08:44 GMT
Date: Mon, 21 May 2007 04:13:54 GMT
Connection: keep-alive

Good stuff! The Expires and Cache-Control headers render this file pretty much immutable, as it should be. When Yahoo! releases the next version of the toolkit, it'll be hosted at a different url base, and so will be unaffected by the headers of this particular file; they will be different urls. This sort of behaviour is highly optimal for web 2.0-ey apps, which are wont to download a lot of static html, css and javascript files, which, for some particular version of the app, will never change. And thus, by having the files cached on the client in such a way that it never asks the server for them again, the app will come up all the quicker.

Good stuff to know, and take advantage of if you can.

For more specific information about our essential protocol, HTTP 1.1, see RFC 2616. It's available in multiple formats, here: http://www.faqs.org/rfcs/rfc2616.html.

Sunday, May 20, 2007

blogging process

In "Lesson learned", my colleague Robert Berry recounts 'losing' a blog post he was editing. Not the first time I've heard this recently. I thought I'd document my process of creating blog posts, in case it's of any use to anyone. Because I don't lose blog posts.

My secret: I use files.

Although many blogging systems let you edit your blog posts 'online', and even let you save them as drafts, I don't actually go into my blogging system to enter a blog post, until it's complete. The process is:

  • Create a new blog entry by going into the Documents/blog-posts folder in my home directory of my primary computer, and creating a new file, the name of which will be the title of the blog post. The 'extension' of the file is .html.

  • Edit the blog post in html, in a plain old text editor.

  • While editing, at some point, double click on the file in my file system browser (Explorer, PathFinder, Nautilus, etc) to preview it in a web browser.

  • Churn on the edit / proof-read-in-a-web-browser cycle, for hours or days.

  • Ready to post? First, check all links.

  • Surf over to blog editing site, enter the body of the post into the text editor via the clipboard, set the title, categories / links, etc.

  • Preview the post on the blog editing site. Press the "Publish" button.

  • Move the file with the blog post from Documents/blog-posts to Documents/blog-posts/posted .

HTML TextAreas are an extremely poor replacement for a decent text editor. Using HTML is handy, since some (most?) blogging systems will accept it as input, and you can preview it yourself with your favorite web browser. Saving the files, even after finished posting, is a convenient backup mechanism, should you ever lose your entire blog.

Besides these obvious advantages, I noticed some behaviours of other blogging systems that I really didn't like, when saving drafts of posts 'online':

  • On one system I used, the title saved with the first draft was used as the slug of the blog URL. Even if I later changed the title, the slug remained some abbreviated version of the first saved title. Ick.

  • On one system I used, tags I saved with a post ended up showing up in the global list of tags on the blog. Even if there weren't any published posts that had used that tag. Ick.

I should note that I also have a directory Documents/blog-posts/unused for posts which I've started, and decided not to post. The "island of misfit blog posts", as it were, but "unused" was shorter.

There you have it! Since you religiously backup files on your primary computer you'll have no concern about ever losing a blog post again!

Friday, May 18, 2007

That Darned Cat! - 2

Some more thoughts on Twitter performance, as a followup to "That Darned Cat! - 1".

Twitter supports three different communication mediums:

  • SMS text messaging
  • A handful of IM (Instant Messaging) services
  • HTTP - which can be further subdivided into the web page access, the Twitter API, and RSS and Atom feeds

I'm not going to talk about the first two, since I'm not familiar with the technical details of how they work. Other than to notice that I don't see how Twitter can be generating any direct revenue off of HTTP (no ads on the web pages, even), whereas they could certainly be generating revenue of off the SMS traffic they drive to whoever hosts their SMS service. IM? Dunno.

It would appear, or at least I guess, that most of the folks I follow on Twitter are using HTTP, rather than the other communication mediums. Maybe I'm in a microcosm here, but I'm guessing there are a lot of people who only use the HTTP medium. And there's no money to be made there.

So, we got a web site getting absolutely pounded, that's generating no direct revenue for the traffic it's handling. And it's become a bottleneck. What might we do?

Distribute the load.

Here's a thought on how this might work. Instead of people posting messages to Twitter, have them post to their own site, just like a blog. HTTP-based Twitter clients could then feed off of the personal sites, instead of going through the Twitter.com bottleneck.

This sounds suspiciously like blogging, no? Well, it is a lot like blogging. Twitter itself is a lot like blogging to begin with. Only the posts have to be at most 140 bytes. So let's start thinking about it in that light, and see what tools and techniques we can bring from that world.

For instance, my Twitter 'friends' are nothing but a feed aggregator like Planet Planet or Venus. Only the software to do this would be a lot easier.

Josh notes: Hmm, but doesn't RSS/Atom already give us everything we need for a twitter protocol minus the SMS (and text limit)? Indeed. Twitter already does support both RSS and Atom (I didn't see explicit Atom links, but find an RSS link @ Twitter, and replace the .rss URL suffix with .atom). They aren't things of beauty, but it something to start with. While you can already using blogging tools to follow Twitter, I'm not sure that makes sense for most people. However, reusing the data formats probably makes a lot of sense.

So, why would Twitter ever want to do something like this? I already mentioned they don't seem to be making any direct revenue off the HTTP traffic, so off-loading some of that is simply going to lower their network bill. They could concentrate instead, in providing some kind of value, such as contact management and discovery. Index the TwitterSphere, instead of owning and bottlenecking it. And of course continue to handle SMS and IM traffic, if that happens to bring in some cash.

In the end, I'm not sure any one company can completely 'own' a protocol like this forever. Either they simply won't be able to afford to (expense at running it, combined with a lack of revenue), or something better will come along to replace it.

If you love something, set it free.

There are other ideas. In "Twitter Premium?", Dave Winer suggests building Twitter "peers". This sounds like distributing Twitter from one central site, to a small number of sites. I don't think that's good enough. Things will scale better with millions of sites.

Thursday, May 17, 2007

That Darned Cat! - 1

The performance of Twitter as of late has been abysmal. I'm getting tired of seeing tweets like "Wondering what happened to my last 5 tweets" and "2/3 of the updates from Twitterrific never post for me. Is this normal?". I'm especially tired of seeing that darned cat!

Pssst! I don't think the cat is actually helping! Maybe you should get him away from your servers.

Here's a fun question to ask: do you support ETags?

In order to test whether Twitter is doing any of the typical sorts of caching that it could, via ETag or Last-Modified processing, I wrote a small program to issue HTTP requests with the relevant headers, which will indicate whether the server is taking advantage of this information. The program, http-validator-test.py, is below.

First, here are the results of targetting http://python.org/ :

$ http-validator-test.py http://python.org/
Passing no extra headers
200 OK; Content-Length: 15175; Last-Modified: Fri, 18 May 2007 01:41:57 GMT; ETag: "60193-3b47-b04e2340"

Passing header: If-None-Match: "60193-3b47-b04e2340"
304 Not Modified; Content-Length: None; Last-Modified: None; ETag: "60193-3b47-b04e2340"

Passing header: If-Modified-Since: Fri, 18 May 2007 01:41:57 GMT
304 Not Modified; Content-Length: None; Last-Modified: None; ETag: "60193-3b47-b04e2340"

The first two lines indicate no special headers were passed in the response, and that a 200 OK response was returned with the specified Last-Modified and ETag headers.

The next two lines show an If-None-Match header was sent with the request,indicating to only send the content if it's ETag doesn't match the value passed. It does match, so a 304 Not Modified is returned instead, indicating no content will be sent down (it hasn't changed since you last asked for it).

The last two lines show an If-Modified-Since header was sent with the request,indicating to only send the content if it's last modified date is later than the value specified. It's not later, so a 304 Not Modified is returned instead, indicating no content will be sent down (it hasn't changed since you last asked for it).

For content that doesn't change between requests, this is exactly the sort of behaviour you want to see from the server.

Now, let's look at the results we get back from going to my Twitter page at http://twitter.com/pmuellr :

$ http-validator-test.py http://twitter.com/pmuellr
Passing no extra headers
200 OK; Content-Length: 26491; Last-Modified: None; ETag: "a246e2e41e13726b7b8f911995841181"

Passing header: If-None-Match: "a246e2e41e13726b7b8f911995841181"
200 OK; Content-Length: 26504; Last-Modified: None; ETag: "1ef9e784fa85059db37831c505baea87"

Passing header: If-Modified-Since: None
200 OK; Content-Length: 26503; Last-Modified: None; ETag: "2ba91b02f418ed74e316c94c438e3788"

Rut-roh. Full content sent down with every request. Probably worse, generated with every request. In Ruby. Also note that no Last-Modified header is returned at all, and different ETag headers were returned for each request.

So there's some low fruit to be picked, perhaps. Semantically, the data shown on the page did not change between the three calls, so really, the ETag header should not have changed, just as it didn't change in the test of the python site above. Did anything really change on the page? Let's take a look. Browse to my Twitter page, http://twitter.com/pmuellr, and View Source. The only thing that really looks mutable on this page, given no new tweets have arrived, is the 'time since this tweet arrived' listed for every tweet. That's icky.

But poke around some more, peruse the gorgeous markup. Make sure you scroll right, to take in some of the long, duplicated, inline scripts. Breathtaking!

There's a lot of cleanup that could happen here. But let me get right to the point. There's absolutely no reason that Twitter shouldn't be using their own API in an AJAXy style application. Eating their own dog food. As the default. Make the old 1990's era, web 1.0 page available for those people who turn JavaScript off in their browser. Oh yeah, a quick test of the APIs via curl indicates HTTP requests for API calls do respect If-None-Match processing for the ETag.

The page could go from the gobs of duplicated, mostly static html, to just some code to render the data, obtained via an XHR request to their very own APIs, into the page. As always, less is more.

We did a little chatting on this stuff this afternoon; I have more thoughts on how Twitter should fix itself. To be posted later. If you want part of the surprise ruined, Josh twittered after reading my mind.

Here's the program I used to test the HTTP cache validator headers: http-validator-test.py

	#!/usr/bin/env python

	#--------------------------------------------------------------------
	# do some ETag and Last-Modified tests on a url
	#--------------------------------------------------------------------

	import sys
	import httplib
	import urlparse

	#--------------------------------------------------------------------
	def sendRequest(host, path, header=None, value=None):
	    headers = {}

	    if (header):
	        print "Passing header: %s: %s" % (header, value)
	        headers[header] = value
	    else:
	        print "Passing no extra headers"

	    conn = httplib.HTTPConnection(host)
	    conn.request("GET", path, None, headers)
	    resp = conn.getresponse()

	    stat = resp.status
	    etag = resp.getheader("ETag")
	    lmod = resp.getheader("Last-Modified")
	    clen = resp.getheader("Content-Length")

	    print "%s %s; Content-Length: %s; Last-Modified: %s; ETag: %s" % (
	        resp.status, resp.reason, clen, lmod, etag
	        )
	    print

	    return resp

	#--------------------------------------------------------------------
	if (len(sys.argv) <= 1):
	    print "url expected as parameter"
	    sys.exit()

	x, host, path, x, x, x = urlparse.urlparse(sys.argv[1], "http")

	resp = sendRequest(host, path)
	etag = resp.getheader("ETag")
	date = resp.getheader("Last-Modified")

	resp = sendRequest(host, path, "If-None-Match", etag)
	resp = sendRequest(host, path, "If-Modified-Since", date)

Update - 2007/05/17

Duncan Cragg pointed out that I had been testing the Date header, instead of the Last-Modified header. Whoops, that was dumb. Thanks Duncan. Luckily, it didn't change the results of the tests (the status codes anyway). The program above, and the output of the program have been updated.

Duncan, btw, has a great series of articles on REST on his blog, titled "The REST Dialog".

In addition, I didn't reference the HTTP 1.1 spec, RFC 2616, for folks wanting to learn more about the mysteries of our essential protocol. It's available in multiple formats, here: http://www.faqs.org/rfcs/rfc2616.html.

Tuesday, May 15, 2007

modelled serialization

Too many times I've seen programmers writing their web services, where they are generating the web service output 'by hand'. Worse, incoming structured input to the services (XML or JavaScript), parsed by hand into objects. Maybe not parsed, but DOMs and JSON structures walked. Manually. Egads! Folks, we're using computers! Let the computer do some work fer ya!

In my previous project, we used Eclipse's EMF to model the data we were sending and receiving via RESTy, POXy web services. For an example of what I'm referring to as 'modelling', see this EMF overview and scroll down to "Annotated Java". For us, to model our data, meant adding EMF comments to our code. And then running some code to generate the EMF goop. What they goop ended up giving you was a runtime version of this model you could introspect on. Basically just like Java introspection and reflection calls, to examine the shape of classes, and the state of objects, dynamically. Only with richer semantics. And frankly, just easier, if I remember correctly.

Anyhoo, for the web services were we writing, we constrained the data being passed over the wire to being modelled classes. Want to send some data from the server to the client? Describe it with a modelled class. Because the structure of these modelled classes was available at runtime, it was (relatively) easy to write code that could walk over the classes and generate a serialized version of the object (XML or JSON). Likewise, we could take an XML or JSON stream and turn it into an instance of a modelled class fairly easily. Generically. For all our modelled classes. With one piece of code.

Automagic serialization.

One simplification that helped was that we greatly constrained the types of 'features' (what EMF non-users would call attributes or properties) of a class; it turned out to be basically what's allowed in JSON objects: strings, numbers, booleans, arrays, and (recursively) objects. We had a few other primitive types, like dates and uuid, but amazingly, were we able to build a large complex system from a pretty barebones set of primitives and composites. Less is more.

For folks familiar with WS-*, none of this should come as a huge suprise. There are basically two approaches to defining your web service data: define it in XML schema, and have tooling generate code for you. Or define it in code, and have tooling generate schema for you. In both cases, serialization code will be generated for you. Neither of these resulted in a pleasing story to me. Defining documents in schema is not simple, certainly harder than defining Java classes. And the code generated from tooling to handle schema tends to be ... not pretty. On the other hand, when starting with code, your documents will be ugly - some folks don't care about that, but I do. The document is your contract. Why do you want your contract to be ugly?

Model driven serialization can be a nice alternative to these two approaches, assuming you're talking about building RESTy or POXy web services. Because it's relatively simple to create a serializer that feels right for you. And you know your data better than anyone; make your data models as simple or complex as you actually need. If you're using Java, and have complex needs, consider EMF, because it can probably do what you need, or at least provide a solid base for what you need.

Besides serialization, data modelling has other uses:

  • Generating human-readable documentation of your web service data. You were planning on documenting it, right? And what, you were going to do it by hand?

  • Generating machine-readable documentation of your web service data; ie, XML schema. I know you weren't going to write that by hand. Tell me you weren't going to write that by hand. Actually, admit it, you probably weren't going to generate XML schema at all.

  • Generating editors, like EMF does. Only these editors would be generated in that 'junky' HTML/CSS/JavaScript trinity, for your web browser client needs. Who wants to write that goop?

  • Writing wrappers for your web services for other languages. At least this helps with the data marshalling. Again, JavaScript is an obvious target language here.

  • Generating database schema and access code, if you want to go hog wild.

 

If it's not obvious, I'm sold on this modelling stuff. At least lightweight versions thereof.

So I happened to be having a discussion with a colleague the other day about using software modelling to make life easier for people who need to serialize objects over the web. I don't think I was able to get my message across as well as I wanted, and we didn't have much time to chat anyway, so I thought I'd whip up a little sample. Code talks.

This sample is called ToyModSer - Toy Modelled Serializer. It's a toy because it only does a small amount of what you'd really want it to be able to do; but there's enough there to be able to see the value in the concept, and how light-weight you can go. I hope.

Monday, May 07, 2007

structured javascript

I just listened to the Jon Udell podcast interview with John Lam, which was quite interesting. Highly recommended. I do, of course, have a bone to pick.

At 9:52, the conversation turns to talk about JavaScript. John Lam says JavaScript is "a very difficult language for programming in the medium or the large and by medium and large I'm going to say applications which exceed 5,000 to 10,000 lines of code. Once my JavaScript gets up to that, I have a really hard time maintaining that stuff because modularity is definitely one of the things that isn't really all that well thought out in the JavaScript language. Which is much better in languages like Ruby and Python."

Now, John is certainly correct that JavaScript lacks language level modularity features like namespaces, packages, and class definitions. Typically, a 'class' is defined by creating a constructor function and adding methods to it by adding functions to the constructor's prototype. Packages and namespaces are defined as a top-level objects with the top-level package segment, with a field defined for the next package segment, and then recursing through the remainder of the package segments.

Class definition by running plain old code.

Some JavaScript libraries like YUI and dojo actually do have some code and conventions around defining such things.

So, note that. No language level modularity features like a 'package' and 'class' keyword. It's all dynamic. And perhaps some conventions provided by libraries you happen to be using.

Which is quite similiar to Smalltalk. There are no 'language level' features for defining classes. There is no 'class' keyword. Instead, to define a new class, I'd go into a class browser, and fill in a template like:

    Number subclass: #Fraction
        instanceVariableNames: 'numerator denominator'
        classVariableNames: ''
        poolDictionaries: ''	

This is a class definition. However, literally, it's a message send. A message sent to a class (Number) to create a subclass (Fraction) with two instance variables (numerator and denominator).

I don't recall anyone who ever bothered to learn Smalltalk having made claims that it wasn't modular. So I don't think having language level modularity features is a neccessity for making the language usage modular.

My reference to Smalltalk isn't entirely spurious given the recent news of Dan Ingall's Project Flair. As Tom Waits would 'sing' ... "What's he building in there?".

This leads me to a number of questions:

  • Could we build a set of conventions around package/namespace/class/method definition that would be usable in a number of different contexts? IDEs, live class browsers, etc.

  • What would it mean to build 'images' of code and data in JavaScript? By image I don't mean a .gif file, I mean a collected set of code and data serialized into one or more files. An 'executable unit' for a language interpreter.

  • Could we get these things to work with the rather crude code injection apparatus we currently have for web applications (<script src=...>)?

  • In the unspecified future, we'll have declarative language features in JavaScript. If you'd like to get a taste for what this might look like, right now, look no further than ActionScript 3 from Adobe. Do we want this? Do we need this?

  • WWSD - What Would Self Do? Self is probably the best known prototype-based language. It would be interesting to go back and look at some of the Self stuff; it's been years for me.