Links

pmuellr is Patrick Mueller, Senior Node Engineer at NodeSource.

other pmuellr thangs: home page, twitter, flickr, github

Tuesday, April 24, 2007

Java process management arghhs

Pretend like you'd like to write a Java program that wants to launch and lightly manage some operating system processes. Know how to do this? You'll want to use one of the flavors of the Runtime exec() methods. If you happen to be running on a J2SE 1.5 or greater JVM, you can use the new ProcessBuilder class, but there's seems to be not much benefit to doing so, and of course, your code won't run on a J2SE 1.4 or earlier JVM if you do. Both Runtime.exec() and ProcessBuilder end up returning an instance of the Process class, which models the launched operating processes (the child).

One of the things I learned the hard way with the Process class, many years ago, was that you really need to process the stdout and stderr output streams of the child process, because if you don't, and your child process writes more than a certain amount (operating system dependent) of data on those streams, it'll block on the write, and thus 'freeze'. So, you need to get the stdout and stderr streams via the (confusingly named) getInputStream() and (logically named) getErrorStream() methods of the Process class.

The simplest thing to do to handle this is to launch two threads, each reading from these input streams, to keep the pipes from getting clogged. Do whatever you need to do with the data being output by your child process; I'm sure you want to do something interesting with it.

I've not really had a chance to work with the java.nio package before, so I happened to think that this would be a good chance to play; let's see if we can get from two threads handling the child process's output, down to one, by using the class Selector, which would seem to be the moral equivalent of the *nix function select(), which can be used to determine the readiness of i/o operations of multiple handles at the same time.

Looking at Selector, you can tell right from the top of the doc, that this class deals with SelectableChannel objects. Now, you need to figure out how to get from an InputStream (returned by Process.get[Input|Error]Stream()) to a SelectableChannel. Except, you can't. From any InputStream, you can call Channels.newChannel() to get a ReadableByteChannel, but a ReadableByteChannel is not a SelectableChannel. (ReadableByteChannel is an interface, and SelectableChannel is a class.) This is not the end of the world; it may just be that the object returned by Channels.newChannel() is actually an instance of SelectableChannel (or a subclass thereof). Never know. So, here's a little experiment for an Eclipse scrapbook page:

    Process process = Runtime.getRuntime().exec(new String[] {"sleep", "10"});
    java.io.InputStream iStream = process.getInputStream();
    java.nio.channels.ReadableByteChannel channel = java.nio.channels.Channels.newChannel(iStream);
    System.out.println(channel.getClass().getName());
    System.out.println(channel instanceof java.nio.channels.SelectableChannel);

and the result is ...

    java.nio.channels.Channels$ReadableByteChannelImpl
    false	

Bad news; it's not a SelectableChannel, and would appear to be an instance of an inner class of the Channels class itself. Poking into inner class, that class is a subclass of java.nio.channels.spi.AbstractInteruptibleChannel. Not selectable at all. All this inner class business is for the default 1.5 JRE I use on my mac. Other implementations might well be different (and better), but that doesn't help me on the mac.

So, unfortunately, you can't reduce my two threads to process a child process's stdout and stderr down to one, using this select technique. This might not sound so bad, but what if you wanted to be able to handle a lot of processes? To handle N simultaneous processes, you'll need N * 2 threads to process all the stdout and stderr streams. If you could have used the select logic, you'd only need 1 thread.

Note that you could also try polling these streams, by calling available() on them, to determine if they have anything to read. Polling isn't very elegant, and is obviously going to be somewhat cpu intensive. But for me, I got burned on available() a long time ago, don't trust it, and never use it.

Bummer.

But wait, it gets better!

Another thing you're going to want to do with these child processes is to determine when they're done. There's two ways of doing this. You can call Process.exitValue() which returns the exit value of the process. Unless the process hasn't actually exited, in which case it throws an IllegalThreadStateException. The other way to determine when the process is done is to call Process.waitFor(); this method will block until the process has exited.

Neither of these is very nice. If you use waitFor(), you'll burn a thread while it blocks (now up to N * 3 threads per process!). If you use exitValue(), you can poll, but every check of the process, while it's not complete, is guaranteed to throw an exception, which is going to burn even more cpu.

Double bummer.

8 comments:

Balint said...
This comment has been removed by the author.
Anonymous said...

Found your post via google after running into the ReadableByteChannel/SelectableChannel dilemma. I can't believe that the java.nio implementation in JDK1.5 doesn't handle the stdout/stderr incantation that would surely be its most common use case, possibly after the ServerSocket multiplexing that is the only other example one ever sees. I would love to know a solution if there was one, or a rationale for why there isn't one otherwise.

John said...

You can reduce this down to one thread - you call ProcessBuilder.redirectErrorStream(true) which merges stdout and stderr into one InputStream. After that, it's just a matter of reading the InputStream until the stream closes, i.e. when the process has terminated.

Balint said...

John, that's true, that can be done, but most of the time I actually want them separate - external processes I'm dealing with don't guarantee that their output to stdout and stderr are distinguishable. If I mix them up then I can't always tell if there was any error logged for example.

Jacques said...

A real pity, the N*3 constraint (3 threads needed per monitored process) , I just tried to "update" my ProcessShell class by using a Selector and ended on this page....
Is there something new (and better) regarding this subject in Java 1.6 ?

Anonymous said...

Found your page by googling for a solution to the same problem you outline. I'm still finding it hard to believe there's no way to multiplex the output of multiple sub-processes (yes, Process.getInputStream is confusingly named) using the Selector framework -- seems like an obvious use of Selector. It does seem like you can get away with N*2 threads instead of N*3 by having your 2 threads reading stdout and stderr of the sub-process watch for read() returning -1 indicating end of stream -- at that point I would guess it's safe to assume the process has exited. Just a guess. Thanks for the original post.

Roland said...

You can reduce it from 3*N to 2*N threads by calling Process.waitFor on one of the two threads after the end of the stream is reached. The process can't have finished earlier.

Patrick Mueller said...

Roland, true, but it also seems like this will complicate the logic a bit. Hard to say without really trying it. And if you really need 1*N threads back, then perhaps the complexity will be worth it.