pmuellr is Patrick Mueller, Senior Node Engineer at NodeSource.

other pmuellr thangs: home page, twitter, flickr, github

Friday, October 12, 2007

on links

When people think about links, with regard to tying together information on the web, the usual thoughts are of URLs. Either absolute URLs, or a URL relative to some base (either implicitly the URL of the resource that contains the link, or explicitly via some kind of xml:base-like annotation).

But I wrestle with this.

Here's one issue. Let's say I have multiple representations of my resources available; today you see this typically as services exposing data as either JSON or XML. If that representation includes a link to other data that can be exposed as either JSON or XML, do you express that link as some kind of "platonic URL"? Or if you are doing content-negotiation via 'file name extension' sort of processing, does your JSON link point to a .json URL, but your XML link point to a .xml URL?

See a discussion in the JSR 311 mailing list for some thoughts on this; I stole the term "platonic URL" from this discussion.

The godfather had something interesting to say in a recent presentation. In "The Rest of REST", on slide 22, Roy Fielding writes:

Hypertext does not need to be HTML on a browser
- machines can follow links when they understand the data format and relationship types

Where's the URL? Perhaps tying links to URLs is a constraint we can relax. Consider, as a complementary alternative, that just a piece of data could be considered a link.

Here's an example: let's say I have a banking system with a resource representing a person, that has a set of accounts associated with it. I might typically represent the location of the account resources as a URL. But if I happen to know, a priori, the layout of the URLs, I could just provide an account number (assuming that's the key). With the account number, and knowledge of how to construct a URL to an account given that information (and perhaps MORE information), the URL to the account can easily be constructed.

The up-side is that the server doesn't have to calculate the URL, if all they have is the account number. They just provide the account number. The notion of content-type-specific URLs goes away; there is only the account number. The resources on the server can be a bit more independent of themselves; they don't have to know where the resource actually resides, just to generate the URL.

Code-wise, on the server, this is nice. There's always some kind of translation step on the server that's pulling your URLs apart, figuring out what kind of resource you're going after, and then invoking some code to process the request. "Routing". For that code to also know how to generate URLs going back to other resources, means the code needs the reverse information.

The down-side, of course, is that you can't use a dumb client anymore; your client now needs to know things like how to get to an account given just an account number.

And just generally, why put more work on the client, when you can do it on the server? Well, server performance is something we're always trying to optimize - why NOT foist the work back to the client?

But let's also keep in mind that the Web 2.0 apps we know and love today aren't dumb clients. There's user-land code running there in your browser. Typically provided by the same server that's providing your resources in the first place. ie, JavaScript.

I realize that's a bad example for me to use; me being the guy who thinks browsers are a relatively terrible application client, but what the heck; that's the way things are today.

For folks who just want the data, and not the client code, because they have their own client code, well, they'll need some kind of description of how everything's laid out; the data within a resource representation, and the locations of the resources themselves. But the server already knows all that information, and could easily provide it in one of several formats (human- and machine-readable).

As an proof point of all of this, consider Google Maps. Think about how the map tiles are being downloaded, and how they might be being referenced as "links". Do you think that when Google Maps first displays a page, all the map tiles for that first map view are sent down as URLs? Think about what happens when you scroll the map area, and new tiles need to be downloaded. Does the client send a request to the server asking for the URLs for the new tiles? Or maybe those URLs were even sent down as part of the original request.

All rhetorical questions, for me anyway. I took a quick look at the JavaScript for Google Maps in FireBug, and realized I've already debugged enough obfuscated code for a few lifetimes. Probably a TOS violation to do that anyway. Sigh. I'll leave that exercise to younger pups. But ... what would you do?

For Google maps, it's easy to imagine programmatically generating the list of tiles based on location, map zoom level, and map window size. Assuming the tiles are all accessible via URLs that include location and zoom level somewhere in the URL. In that case, the client code for calculating the URLS of the tiles needed is just a math problem. Why make it more complex than that?

I think there are problem domains where dealing with 'links' as just data, instead of explicit URLs make sense, as outlined with Google Maps. Remember what Roy wrote in his presentation: "machines can follow links when they understand the data format and relationship types". Of course, there's plenty of good reason to use continue to use URLs for links as well, especially with dumb-ish clients.

No comments: