Links

pmuellr is Patrick Mueller

other pmuellr thangs: home page, twitter, flickr, github

Sunday, April 02, 2006

Attributes vs. Elements

This is a response to Bill Higgins' blog post titled Simplicity for humans, simplicity for programs.

IIRC, when Balaji was designing the new marshaller, I specifically told him that I wanted everything marshalled as subelements and not attributes. Why would I do this? It had nothing to do with readability; I requested this because the marshalling scheme we were replacing was agressively using attributes instead of subelements. Including doing things like marshalling a list of things as a comma-separated string used as an attribute value. That was way wrong. Since it forced folks to have to do further parsing even after parsing up the XML.

The decision on whether to marshall something as an attribute or subelement is usually not too difficult. If the value can be represented as a short-ish string, an attribute works out well. If it's a string and long-ish, using a subelement might be better (and allow you to handle in-line markup on that, or microformats, or ...). If it's data with it's own structure, you'll want to use subelements.

If you are representing a list of things, you really want subelements, although then you have more decisions to make. Should you add a special containing element, like a <ul> element, and render the list items as subelements under that? Or should you not have a containing element, and store each item as a peer of other elements in the same structure. Maybe you want to use a containing element, but that element name is the 'property' name, and the list elements are stored as <li> elements.

In the end, you end up with this weird tension of wanting to use attributes as much as possible, since they are easier for programs to get to, they take up less text, etc. But then you end up having the remainder of your things that you have to render as subelements, for whatever reason. The decision ends up being made strictly on whether XML can actually handle it as an attribute, and that seems wrong.

Thus, I decided that rather than make somewhat arbitrary decisions about whether to use attributes or subelements, and then make people guess at which we'd use when they saw the programs that ate this data, to go with the lowest common denominator; nothing but subelements. Of course, even that's not quite right; we actually do have a small fixed set of things we render as attributes. Very small.

The fact that you have to make decisions like this, even about whether to render something as an element or attribute, speaks to the problems here. XML is great for documents, but what you are really talking about is data. Do we need to describe all our data as documents? If noone is ever going to need it as a document, why go to that trouble? Is XML a solution in search of a problem in this case?

If you don't have a real requirement to provide a document for your data, other formats like JSON might be more appropriate.

No comments: