pmuellr is Patrick Mueller

other pmuellr thangs: home page, twitter, flickr, github

Wednesday, April 25, 2007

Java, please evolve

As the Java language continues to add dysfunctional function to it's libraries, and as it doesn't add stuff we need to it's libraries, and we bicker about licensing details of test cases, Microsoft is doing something interesting in the CLR that Java should have done years ago.


We need to seriously stir some stuff up in the Java space. A lot.

I see signs of hope. Today in the #jruby irc channel on, there was some heretical chatter about using grizzly to tie directly into JRuby on Rails, avoiding Servlet (~shiver~) altogether.

This is excellent thinking. We need more heresy like this.

BTW, Charles Nutter (of JRuby fame) is running an informal "dynamic languages on the JVM" session at JavaOne. Contact him for more details.

Tuesday, April 24, 2007

Java process management arghhs

Pretend like you'd like to write a Java program that wants to launch and lightly manage some operating system processes. Know how to do this? You'll want to use one of the flavors of the Runtime exec() methods. If you happen to be running on a J2SE 1.5 or greater JVM, you can use the new ProcessBuilder class, but there's seems to be not much benefit to doing so, and of course, your code won't run on a J2SE 1.4 or earlier JVM if you do. Both Runtime.exec() and ProcessBuilder end up returning an instance of the Process class, which models the launched operating processes (the child).

One of the things I learned the hard way with the Process class, many years ago, was that you really need to process the stdout and stderr output streams of the child process, because if you don't, and your child process writes more than a certain amount (operating system dependent) of data on those streams, it'll block on the write, and thus 'freeze'. So, you need to get the stdout and stderr streams via the (confusingly named) getInputStream() and (logically named) getErrorStream() methods of the Process class.

The simplest thing to do to handle this is to launch two threads, each reading from these input streams, to keep the pipes from getting clogged. Do whatever you need to do with the data being output by your child process; I'm sure you want to do something interesting with it.

I've not really had a chance to work with the java.nio package before, so I happened to think that this would be a good chance to play; let's see if we can get from two threads handling the child process's output, down to one, by using the class Selector, which would seem to be the moral equivalent of the *nix function select(), which can be used to determine the readiness of i/o operations of multiple handles at the same time.

Looking at Selector, you can tell right from the top of the doc, that this class deals with SelectableChannel objects. Now, you need to figure out how to get from an InputStream (returned by Process.get[Input|Error]Stream()) to a SelectableChannel. Except, you can't. From any InputStream, you can call Channels.newChannel() to get a ReadableByteChannel, but a ReadableByteChannel is not a SelectableChannel. (ReadableByteChannel is an interface, and SelectableChannel is a class.) This is not the end of the world; it may just be that the object returned by Channels.newChannel() is actually an instance of SelectableChannel (or a subclass thereof). Never know. So, here's a little experiment for an Eclipse scrapbook page:

    Process process = Runtime.getRuntime().exec(new String[] {"sleep", "10"}); iStream = process.getInputStream();
    java.nio.channels.ReadableByteChannel channel = java.nio.channels.Channels.newChannel(iStream);
    System.out.println(channel instanceof java.nio.channels.SelectableChannel);

and the result is ...


Bad news; it's not a SelectableChannel, and would appear to be an instance of an inner class of the Channels class itself. Poking into inner class, that class is a subclass of java.nio.channels.spi.AbstractInteruptibleChannel. Not selectable at all. All this inner class business is for the default 1.5 JRE I use on my mac. Other implementations might well be different (and better), but that doesn't help me on the mac.

So, unfortunately, you can't reduce my two threads to process a child process's stdout and stderr down to one, using this select technique. This might not sound so bad, but what if you wanted to be able to handle a lot of processes? To handle N simultaneous processes, you'll need N * 2 threads to process all the stdout and stderr streams. If you could have used the select logic, you'd only need 1 thread.

Note that you could also try polling these streams, by calling available() on them, to determine if they have anything to read. Polling isn't very elegant, and is obviously going to be somewhat cpu intensive. But for me, I got burned on available() a long time ago, don't trust it, and never use it.


But wait, it gets better!

Another thing you're going to want to do with these child processes is to determine when they're done. There's two ways of doing this. You can call Process.exitValue() which returns the exit value of the process. Unless the process hasn't actually exited, in which case it throws an IllegalThreadStateException. The other way to determine when the process is done is to call Process.waitFor(); this method will block until the process has exited.

Neither of these is very nice. If you use waitFor(), you'll burn a thread while it blocks (now up to N * 3 threads per process!). If you use exitValue(), you can poll, but every check of the process, while it's not complete, is guaranteed to throw an exception, which is going to burn even more cpu.

Double bummer.