Tuesday, May 22, 2007

XML, the right format for object graph serialization

I was reading this blog entry, and my annoyance meter was piqued:
One of the advantages of having a canonical representation of an object tree, is that it serves as a great interchange enabler. That canonical representation doesn't need to be Xml, but it seems like a good choice.

Sigh. The assumption of an "object tree" - object graphs usually aren't trees, they typically have lots of cross-references. Objects in memory form edge-labeled graphs, while XML is a node-labeled tree. Converting objects to XML requires some naming scheme for those objects which have relationships more complex than parent-child.

If you design your objects along functional lines and make them immutable after the constructor has finished executing, this scheme can work, but I don't believe there are many useful structures with no cross-references whatsoever. It works best if you have a rooted namespace context like TComponent and its Owner, or something similar where objects can be identified by a URL - ideally one and only one URL.

I believe that in some decades from now, we'll look back at this objectification of the world, this current habit of seeing the power of objects in e.g. GUI libraries and then applying the same hammer to all manner of delicate problems, as one of the bigger mistakes in the history of programming. Basically, both hierarchical data (e.g. XML) and relational data (collections of fact tuples) are bad fits for object orientation, and in the code side of the house (as opposed to data) lightweight processes (think Erlang) haven't had their due. And popular programming languages are starved of higher-order abstraction facilities.

I wonder how this is going to change. C# is gaining its extensions (and cruft - having both lambdas and anonymous delegates in the language is a bit of an eyesore), but it's still pretty poor in many other ways. It's very deeply wedded to imperative, linear thinking, with very few transformations before execution. Static typing is fine, but metaprogramming and dynamic programming's rise show why it is also constraining. Statically typed languages that lack a higher order, a way of programming over the type domain as well as the value domain, end up falling back to reflection and other weakly typed loopholes to overcome traditional (C, Pascal, Java) static typing's inflexibility.

1 comment:

F Quednau said...

So far I have used XAML for representing the objects of some UI, which is a control hierarchy. So far, so good. XAML's databinding features, though, show that one must leave XML to express more complex relationships between objects.
As to your last passage on C#...it would be nice what you mean by programming over the type domain. Is the concept of generics not already a pretty good addition to working with more flexible type domains or am I on the wrong track?