Links

pmuellr is Patrick Mueller

other pmuellr thangs: home page, twitter, flickr, github

Friday, January 09, 2009

JavaScript modules

When I see code like this, an example I pulled from YUI, I simply want to cry. Things need not be so ugly. I don't mean to callout Yahoo here, it's just an example I found. Most of the non-trivial 'package-ized' JavaScript I see is like this.

Issues:

  • Everyone of these files is a single anonymous top-level function that is invoked to execute it at the bottom of the file. Icky. Why do we need this?

    This is done because the method of compositing function into your application is done by including the source of that function into the big, single namespace known as your JavaScript environment. To keep from having source you are compositing into your app not "infect" it with additional global variables, you use the trick of putting all your code in a function body and executing it. The function can be (and should be) anonymous. No infection, or controlled infection, as long as you use var on all your variables (argh), as the variables at the top level of the function are local to the function, and not the "environment" (basically, globals).

  • We have functions defined in functions defined in functions here, two levels of which are asynchronous callbacks.

    I don't have a big beef with nested functions, except when it gets silly. Like, in this case. One of the big offenders is definition of the loader function, whose purpose is to load the code, pre-reqs, etc, defined as callbacks presumably because the loading of such files isn't necessarily provided as a synchronous capability from the browser.

  • I bet the folks that wrote this had editor macros set for the text YAHOO.example.app

    Frankly, there's no defense for this; the code should probably be using "shortcut" variables for the "package names", and even just some of the functions, like YAHOO.log.

    I assume there is some kind of taboo over using shortcut variables here; or are people depending on fully qualified function names for code completion or other source analysis tools? Yikes.

How can we make this nicer looking?

I think we need a better way of packaging up the pieces of our composite applications.

Packages. Modules. Bundles. Whatever.

Google Gears WorkerPools, again

I've previously blogged about using Google Gears Worker Pools to build service frameworks. For certain types of functionality, this makes a lot of sense. It certainly has the characteristic of compositing function into your application in a clean, infection-free manner. But it also has the following characteristics:

  • All message sense are asynchronous. While this sort of programming style might be comfortable to some people, and in fact might be the best way to program in the end, it's not terribly friendly for most programmers who have been using mostly/only synchronous function calls their entire programming life.

  • Out of the box, you're always going to be 'sending messages' instead of 'calling functions'.There's technically not much of a distinction between sending a message and a function invocation, you might say, besides the invocation style of the two. But again, for most programmers, function invocation is the norm. And probably requires less syntax per invocation. Shorter programs == good.

So while I don't have any problem with WorkerPools per se, and in fact, I think they are a great pattern for handling asynchronous, parallel work, they also aren't really going to provide the best pattern for modularity.

But I really love the cleanliness aspect.

So here is what I want

Python modules.

And here's the thing. I think we can add support for this fairly unobtrusively.

The basic idea is to define a new function, say loadModule() which is used to 'reference' another module by passing it's name as a parameter (URI to the name, prolly). A module is just a JavaScript file. Only instead of working the way <script src=""> does, it actually defines a new separate, empty namespace and loads the JavaScript into that namespace (just like the way Google Gears WorkPools does). The process of running loadModule() on a module the first time is that the JavaScript source is executed. The object returned is a 'module' object, whose propertes include all the global variables in the module's private namespace. For loadModule() calls with the same module beyond the first, the code is not executed again, but the same 'module' object is returned.

Or other varieties thereof. It's fairly simple to play with this kind of stuff from with Rhino, though your brain will be hurting after getting all the prototype and context linkages set up right. I assume you can do this sort of multi-environment stuff in other JavaScript implementations.

I want it in the browser.

This isn't the kind of code you can write in userland JavaScript, because JavaScript doesn't give you low-level access to it's innards. Needs to be yet another function the browser injects into the JavaScript environment, probably not even implemented in JavaScript, but in C, C++, Java, etc.

What changes

This makes individual JavaScript files a little cleaner by:

  • Not requiring the anonymous function wrapper.
  • Letting you get away with making your namespace a mess without worrying about infecting someone else's namespace.
  • Letting you use shorter names, because imports beyond the first are crazy cheap, so every module would probably just import everything they needed as one-level modules.

Sure would be nice if we could make that loadModule() function synchronous, otherwise loadModule() would really have to be a function which took the URI to the module and a callback, and invoked the callback after the module load. Back into some ickys-ville. Is <script src=""> synchronous?

It's not a lot. But it's a start.

Additional advantages

  • It's easy to imagine that the process of reloading a module which has changed (you just edited it while you were debugging) would be a little more straight-forward; largely only the module itself is affected, though presumably there are some imported object references that would also need to be fixed up (using short-cut variables causes issues here - is that one of the reason the Yahoo example used fully-qualified names?). Some lower-level VM help could get even those references fixed up, I'm thinking.

  • Better than eval(). Yeah, you could code something up to do this using eval(), I suppose. Or get close. The problem with eval() is the code becomes disassociated from it's source location. This makes it difficult/impossible to debug. Or save, after I make my changes in the debugger (some day). With an import story, the original location of the source can be associated with the code, just like all the files <script src="">'d into your page get associated with their source location.

  • You could imagine the keeping byte- or machine-code versions of those modules, in their pre-executed state, cached in memory for future interpreter invocations that imported the module. And cached on disk.

  • As a simple function, you can imagine have embellished versions that handled things like version numbering, pre-reqs, etc.

Example

I coded up an implementation of loadModule() for Rhino tonight, along with a simple example that uses four modules:

main.js:


print("loading module: " + __FILE__)

abc = loadModule("abc.js")
def = loadModule("def.js")

abc.sayHello()
def.sayHello()

abc.js:


print("loading module: " + __FILE__)

sayer = loadModule("sayer.js")

function sayHello() {
    sayer.say("hello")
}

def.js:


print("loading module: " + __FILE__)

sayer = loadModule("sayer.js")

function sayHello() {
    sayer.say("world")
}

sayer.js:


print("loading module: " + __FILE__)

function say(message) {
    print(message)
}

Each module prints a line indicating it's being loaded; the environment I set up defines the __FILE__ variable containing the module source file name (my C roots are showing), and a print() function which prints a line to stdout.

main.js loads two modules, abc.js and def.js. It then calls the sayHello() function in the abc module, followed by the sayHello() function in the def module.

abc.js and def.js are identical, except for the message printed from the sayHello() function at the bottom of the file. Both modules load the sayer.js module. They also both define a function with the same name - sayHello() - but that's ok because they live in separate namespaces and can be accessed separately by code that imports them, like main.js does above.

sayer.js defines a say() function prints a string (my REXX roots are showing).

Here's the output of running the main.js module:

loading module: main.js
loading module: abc.js
loading module: sayer.js
loading module: def.js
hello
world

The output shows that the code in sayer.js is only executed once, like with Python modules; subsequent imports just return the module reference which was built during the first import.

The source for the Java code to run this is available here: http://muellerware.org/hg/org.muellerware.jsml/; it's an Eclipse project stored in an hg repository.

There are really only two Java files used for this, if you just want to peek at the code: Main.java and ModuleLoader.java.

Why don't we have something like this in the browser?

Frankly, after spending a very small amount of time implementing the basic functionality, I have to wonder why we don't have something like this in the web browsers today? We have XMLHttpRequest to programmatically fetch data, why don't we have a way of programmatically fetching and executing code? <script src=""> is a sorry excuse of a version of this. Let me code it, dammit!

6 comments:

s|mono said...

helma is working on doing this module-thing cleanly.

require, import, include: http://github.com/hns/helma-ng/tree/master/modules/global.js

Biao said...

I believe the reason for making each JavaScript file an anonymous function is so that they can be packaged into one large file to be fetched once when the user requests it. By making an AJAX call for every module you load, you'll significantly increase the page's load time with the increased overhead in fetching multiple files as opposed to just one.

Chris Aniszczyk (zx) said...

I can feel your pain everytime you blog about JS and modularity. The next step for this would be to include versioning information.

Keep up the good fight ;)

Earth To Grandma said...

You should take a look at Dojo. It uses modules and a loader. It's very clean.
http://api.dojotoolkit.org/jsdoc/dojo/1.2/dojo.require
The async issue that Biao mentioned is handled by allowing you to create a "build" that contains all the modules you want to use in a single file.

Kris Kowal said...

We should work together to get implementations of the secure module proposal working in both Rhino and browsers. https://wiki.mozilla.org/index.php?title=ServerJS/Modules/SecurableModules

I noticed your implementation on github. I've got one in googlecode. We should share some tests and get both to support them, then coordinate the Rhino implementation.

Dewa said...

hi nice article, and i interest with phyton module...