Links

Patrick Mueller elsewhere: muellerware.org, twitter.com/pmuellr

Friday, September 19, 2008

fun with WorkerPools

Arrr

I suspect I''e looked at Google Gears a half dozen or so times since it was originally released. Always looked kinda intarstin', but hadn't mightily any use for it. I took another look when Chrome was announced, since, o' all the stuff in Chrome, the fact that Gears was baked in was the most intarstin' bit t' me.

(Bein' a bit curmugdeonly har, as in, the flashy chrome bits don't excite me. "When I was a kid" my web browser (OS/2's WebExplorer) could only remember 10 bookmarks. The Nintendo DS Browser works for me, in a pinch. Now GET OFF MY YARD!)

Aye

Pirate-speak translation above provided by the Pirate Speak Translator.

Gears is the most interesting bit in Chrome because it's the programmable bit.

What I decided to look through the other day was the WorkerPool bits, since I hadn't really looked into them much before. What I realized pretty quickly is that they should have actually called these ActorPools. You may be familiar with the Actor paradigm via the recent Erlang hotness.

In a nutshell, the WorkerPool facility provides the following capabilities:

  • the ability to run a hunk of JavaScript code in a new context
  • that context does not share code or state with any other JavaScript context, including the context which 'launched' the code
  • the only way to communicate with the code is via asynchronous one-way message sends of arbitrary JavaScript objects (basically, the same sorts of objects describable in JSON; no functions or non-trivial objects allowed, for instance)
  • the ability to run that code independently of other contexts. Think threads, though that's just an implementation detail.

If that's all it was, wouldn't be terribly interesting. You can get aspects of this type of stuff with traditional JavaScript code, though it's often messy. Gears provides, at least, a fairly clean way of providing these capabilities.

But here's where it gets interesting. That hunk of code that you've created a worker for can be loaded from an arbitrary server, referenced via a URL. And then that code follows the "same origin policy" rules for other Gears APIs available to workers, for instance, the Database APIs and HttpRequest APIs. In other words, your worker code can access HTTP-resources on the server it came from, and have a 'protected' Database that it manages that is only visible to workers that also were loaded from that server.

Very cool, because this means that you can build workers that act as self-contained service modules to allow access to HTTP resources for other applications to use. None of the usual cross-site chicanery we've had to deal with. In addition, these modules can manage their own protected cache of data. Also, such modules can be reused across multiple web applications, with each one reusing the same code and database store.

This seems like powerful mojo.

Building an RPC mechanism on top of the message send APIs

One of the downers, for most people, with the current WorkerPool APIs, is going to be the message sending paradigm. It's pretty low-level and raw. The great thing is that asynchronous message sends are a type of atomic building block upon which other forms of IPC can easily be built. The QNX operating system is famously built up on this core concept, slightly expanded.

I've taken a run at building a simple RPC mechanism built on top of the message send API. The proof of concept is available here: http://muellerware.org/ggw-services-poc/. Here's how it works:

To build your RPC-styled worker, create a JavaScript file that includes the services you want to expose, implemented as plain old functions, along with a list of the functions you want to expose. Here's an example of some math services:

    //---------------------------------------------------------
    // service function to add a list of numbers
    //---------------------------------------------------------
    function add() {
        var result = 0
        for (var i=0; i<arguments.length; i++) {
            result += arguments[i]
        }
        return result
    }

    ...

    //---------------------------------------------------------
    // list of exported services
    //---------------------------------------------------------
    services = [
        add,
        ...
    ]
    
    //---------------------------------------------------------
    // boiler-plate from here to end
    //---------------------------------------------------------
    
    ...

At the bottom of this file is some boiler-plate code that deals with the messaging interface. Basically, messages are received that are serialized versions of the function invocation: function name, arguments, and an identifier to indicate which invocation this was (needed to match up return values later). The boiler-plate code cracks open the message, reflectively calls the function, then sends back the function result as a message to the original message sender.

On the client end, in your main HTML / JavaScript code, you'll be using the service like this:

   math_service = new ggw_services_Service("math_services.js")

   ...

   // callback function to display the result of our service call
   function print_sum(sum) {
      ...
   }

   // handler that invokes our service call
   function do_sum() {
      math_service.services.add(print_sum, 1,2,3,4,5,6,7,8,9,10)
   }

   ...

In this code, we instantiate the services with the ggw_services_Service constructor, passing it the URL of the JavaScript code we want to run as a worker - this would be the service implementation file described above. The resulting object will then expose proxies for the exposed functions in the service in the services field of the object. You call these just like normal functions. One trick: because the message sends are one-way, and you'll probably want to get a result back, the first parameter can be a call-back function which is invoked when the service method returns it's value. In this case, that would be the print_sum function.

Neat. But crude. To do this right, would require a bit more infrastructure, as well as making sure you can catch all the sorts of error conditions that can happen. The end result won't be (shouldn't be) as transparent as the example above, but you can probably get pretty close.

Notes

  • Because there is no sharing of code or data in a worker, and anything else, things like debugging get hard, because you can't access the DOM, you can't access the document, you can't access alert(). In FireBug, you can see the message text from Errors, which is useful, especially when you throw them yourself. But the code source isn't identified, just that the error occurred on "worker 0" or the like.

  • Speaking of FireBug, I was able to pretty consistently lock up FireFox while debugging my example. Due to my browser coding naiveté, don't know if this was me, FireBug, Gears, or some combination of them. But it got old fast.

  • It's not clear what the best way is to handle security credentials for HTTP requests from within workers. My gut tells me that some Gears APIs to manage sensitive data like passwords and keys would be useful. Storing credentials in a server-specific database doesn't sound great, but doesn't sound terrible either. But it's not even clear how a worker would go about prompting a user for credentials, and do it safely.

  • The current story for worker code is that all the worker code has to be in a single file. Not great. It would be nice to have an API like loadScript() or some such that would allow you to add additional JavaScript code to your context. In lieu of that, you can always XHR GET your additional code, and eval() it into the context. Icky, but should work.

  • One of the nice things about the stark bleakness of the context in which the workers run is that it makes these workers applicable to other environments. For instance, it's not a huge stretch to consider porting the Gears APIs to Rhino, allowing reuse of the workers from within Java. Very cool for Java, where loading "live code" is typically not something that is done, for various reasons, but this makes it relatively straight-forward.

  • In terms of higher-level frameworks for this stuff, I think I'd start looking at OSGi and it's Bundle and Service concepts. I suspect there's a pretty good fit there. Combined with the previous thought of reusing workers in Java itself, why not design it around OSGi, so that I could actually design a worker such that it could easily be spun into OSGi bundle itself, and so directly consumable by OSGi-friendly code without having to have them deal with "yucky" JavaScript. Having access to JavaScript in Java isn't always considered a "plus" by Java programmers, but they can be easily fooled.