pmuellr is Patrick Mueller, Senior Node Engineer at NodeSource.

other pmuellr thangs: home page, twitter, flickr, github

Friday, April 20, 2007

turtles all the way down

There's been a bit of controversy over Mike Pence's article "Heresy and turtles (all the way down) with Avi Bryant"; it's an interesting read, if obviously, and seemingly intentionally, controversial.

There is one bit that particular cries out to me, and that's the notion of Smalltalk being implemented as "turtles-all-the-way-down". Meaning, Smalltalk 'libraries', like collections, http client access, numbers (heh) are implemented in ... Smalltalk. With a small core of 'natives' that do the lowest-level stuff like opening files and writing to sockets. Alternatively, if you look at a language like PHP, you realize that none of the 'base' library is implemented in PHP, it's implemented in C. Notice that these two languages are at opposite ends of the spectrum here, compared to other languages like Ruby and Python, where there's more C code than in Smalltalk, but less than in PHP (relatively, compared to libraries implemented in the target language). Meaning, a lot of the languages we use, have a lot of their base library code implemented in C. And not just any old C, but C code wrapped in layers of language- and interpreter-specific wrappers. Meaning, I can't share that code between languages. Ruby can't use 'extension libraries' built for Python (and vice versa). It gets worse. JRuby can't use 'extension libraries' built for Ruby (and vice versa).

Something's clearly a bit out of whack here. It shouldn't be this hard. There's so much other work involved in getting new languages, or new implementations of existing languages up and going, not to mention making sure you have good editors, debuggers, etc. This is clearly one area that we can make easier for language implementors, isn't it?

In VisualAge Smalltalk, we had both a 'native' interface, with icky wrappers you had to use around your C code, but we also had a great little framework called PlatformFunction. A way to call out directly to C code. We also had a framework to allow you to deal with native pointers called OSObject. With these facilities, in the 5 or so years I was shoulder deep in Smalltalk, I think I only ever wrote one real 'native'; I was able to use call outs to C code for everything else I needed. And I used this a lot. We also had a way to generate a pointer to a C function that hooked into an arbitrary method, so you could call in to Smalltalk from C, for callback purposes.

I've seen other libraries in other languages do this as well; usually referred to as ffi - foreign function interface. The CLR provides for this as well, via P/Invoke. It's a nice capability that every language environment should provide.

The downside of this is that you probably can't do anything with 'objects' using these interfaces. You're dealing with low-level C goop. Sometimes you really want to be dealing with objects.

So here's an interesting thought exercise. Would it be possible to create a portable 'native' interface, like Java's JNI, which can deal with 'objects', that could be used by multiple languages? So that I could compile an extension library as a binary, native shared library that multiple languages could make use of. I call out JNI specifically both because I'm intimately familiar with it, and it's actually a pretty nice API. For a number of reasons:

  • JNI defines all the 'functions' used to talk to the runtime that you can use from your natives, as function pointers in a struct that gets passed to your 'native' functions. Since they're function pointers, you don't need to link against a library; the functions are resolved at runtime, not link time. This means I can compile a Java native library once, and use it on any VM that supports the same level of JNI.

  • JNI provides pretty nice encapsulation from the VM implementation. Instead of getting pointers to internal VM structures and mucking with stuff, you have function calls to do your work. Instead of dealing with garbage collection bits and pieces (like reference counts), you use APIs to create/destroy references that the underlying VM will use to deal with the garbage collection bits and pieces.

  • JNI has been protected very well against version to version changes, so that you have a very good expectation that a native library you compiled for version 1.x of Java will work with version > 1.x of Java.

What would it mean to do something like this to support multiple languages? There are obviously too many semantic differences between all the different languages we use to hope to be able to have a complete, universal 'extension' interface, but there's also a lot of commonality with these languages. It's not a simple problem to solve. But it's time to start thinking about it. In fact, it seems a bit embarrassing that we've not yet started thinking about it.

No comments: