pmuellr is Patrick Mueller

other pmuellr thangs: home page, twitter, flickr, github

Tuesday, May 15, 2007

modelled serialization

Too many times I've seen programmers writing their web services, where they are generating the web service output 'by hand'. Worse, incoming structured input to the services (XML or JavaScript), parsed by hand into objects. Maybe not parsed, but DOMs and JSON structures walked. Manually. Egads! Folks, we're using computers! Let the computer do some work fer ya!

In my previous project, we used Eclipse's EMF to model the data we were sending and receiving via RESTy, POXy web services. For an example of what I'm referring to as 'modelling', see this EMF overview and scroll down to "Annotated Java". For us, to model our data, meant adding EMF comments to our code. And then running some code to generate the EMF goop. What they goop ended up giving you was a runtime version of this model you could introspect on. Basically just like Java introspection and reflection calls, to examine the shape of classes, and the state of objects, dynamically. Only with richer semantics. And frankly, just easier, if I remember correctly.

Anyhoo, for the web services were we writing, we constrained the data being passed over the wire to being modelled classes. Want to send some data from the server to the client? Describe it with a modelled class. Because the structure of these modelled classes was available at runtime, it was (relatively) easy to write code that could walk over the classes and generate a serialized version of the object (XML or JSON). Likewise, we could take an XML or JSON stream and turn it into an instance of a modelled class fairly easily. Generically. For all our modelled classes. With one piece of code.

Automagic serialization.

One simplification that helped was that we greatly constrained the types of 'features' (what EMF non-users would call attributes or properties) of a class; it turned out to be basically what's allowed in JSON objects: strings, numbers, booleans, arrays, and (recursively) objects. We had a few other primitive types, like dates and uuid, but amazingly, were we able to build a large complex system from a pretty barebones set of primitives and composites. Less is more.

For folks familiar with WS-*, none of this should come as a huge suprise. There are basically two approaches to defining your web service data: define it in XML schema, and have tooling generate code for you. Or define it in code, and have tooling generate schema for you. In both cases, serialization code will be generated for you. Neither of these resulted in a pleasing story to me. Defining documents in schema is not simple, certainly harder than defining Java classes. And the code generated from tooling to handle schema tends to be ... not pretty. On the other hand, when starting with code, your documents will be ugly - some folks don't care about that, but I do. The document is your contract. Why do you want your contract to be ugly?

Model driven serialization can be a nice alternative to these two approaches, assuming you're talking about building RESTy or POXy web services. Because it's relatively simple to create a serializer that feels right for you. And you know your data better than anyone; make your data models as simple or complex as you actually need. If you're using Java, and have complex needs, consider EMF, because it can probably do what you need, or at least provide a solid base for what you need.

Besides serialization, data modelling has other uses:

  • Generating human-readable documentation of your web service data. You were planning on documenting it, right? And what, you were going to do it by hand?

  • Generating machine-readable documentation of your web service data; ie, XML schema. I know you weren't going to write that by hand. Tell me you weren't going to write that by hand. Actually, admit it, you probably weren't going to generate XML schema at all.

  • Generating editors, like EMF does. Only these editors would be generated in that 'junky' HTML/CSS/JavaScript trinity, for your web browser client needs. Who wants to write that goop?

  • Writing wrappers for your web services for other languages. At least this helps with the data marshalling. Again, JavaScript is an obvious target language here.

  • Generating database schema and access code, if you want to go hog wild.


If it's not obvious, I'm sold on this modelling stuff. At least lightweight versions thereof.

So I happened to be having a discussion with a colleague the other day about using software modelling to make life easier for people who need to serialize objects over the web. I don't think I was able to get my message across as well as I wanted, and we didn't have much time to chat anyway, so I thought I'd whip up a little sample. Code talks.

This sample is called ToyModSer - Toy Modelled Serializer. It's a toy because it only does a small amount of what you'd really want it to be able to do; but there's enough there to be able to see the value in the concept, and how light-weight you can go. I hope.

No comments: