"I have a mind like a steel... uh... thingy." Patrick Logan's weblog.

Search This Blog

Friday, October 03, 2003

What *is* XML anyway?

My somewhat random thoughts, reading through "The Impedance Imperative Tuples + Objects + Infosets = Too Much Stuff!" from Dave Thomas (the ex-OTI/IBM Smalltalk guy, not the Pragmatic Programmer guy) being discussed on Lambda the Ultimate...

"SQL is quite good for simple CRUD applications on normalized tables."

This seems to speak to OLTP. For OLAP, denormalized tables (3NF fact tables and 2NF dimension tables in a star schema) would be preferred. Still standard SQL does not support all the expressions you'd like in OLAP such as time series expressions.

For OLTP I am not convinced you want SQL at all. Something like Prevayler might be preferred. When we get large, battery-backed RAMs in a few years, we won't even care about writing transactions to disk.

"SQL programming often requires an alternative interface using cursors"

This is becoming somewhat less necessary in situations where set-based expressions are the ideal. Some databases like Teradata and Sybase IQ support set-based expressions efficiently. Even SQL Server is better at this than in previous versions.

"after many years of engineering, the relational databases can finally claim the performance and flexibility of keyed files...; network databases..."

Henry Baker has some great thoughts about this. I am kind of in the middle. One thing seems to be true, that funding for any kind of database other than relational is almost nothing. Object databases have had commercial funding, but they've been miniscule compared to the commercial relational database R&D.

What, for example, could have been done at Gemstone where indexing, query, and reporting for its OODB had well under one person year R&D during it's 20 years of development?

This has some applicability to XML too. Is XML a "random access database"? Or a "serialization" (with "includes"? with "pointers")?

"Third Generation Database Manifesto... objects... were syntactic extensions on Blobs"

Another approach in PostgreSQL and other DBs is to make tables like a "class" (whatever that is!) and one class/table can inherit from another. This is actually fairly useful for O/R mapping.

"Object databases, it was claimed, solved the impedence mismatch..."

Another note on star schemas, they simplify data models relative to 3NF models, and they partition data into dimensions, facts, and many-many relationships. Dimensions map fairly well into objects, facts map into observations or measurements among networks of objects. If you design your objects and your data with this in mind, the O/R mapping problem can be reduced for many common business (and other) scenarios.

"while there are some solutions (AS/400 and Gemstone persistent stores) that have been very successful..."

Dave gave a keynote at a Gemstone company retreat. He tried to marry Gemstone with AS/400, suggesting we could ignore the Java industry and make more money. I tend to believe him since AS/400 was already a successful niche with persistent data as a feature, and Dave was at the time with IBM (via his OTI subsidiary) and so had to have had some inside understanding of the economics.

This was the point where Gemstone in "the hopes of becoming the next Oracle" all but abandoned Smalltalk for Java/J2EE. For the next several years the Smalltalk market funded the Java development with about 3x developers for Java than ever worked on Smalltalk. I doubt the Java investment ever broke even, while Smalltalk continued to bring in revenue (at least as of a year or so ago).

As mentioned above, Gemstone hardly invested anything in query, indexing, and reporting for either Smalltalk or Java OODBs. Had the numbers assigned to Java been put into this, and perhaps the AS/400 port, not to mention the replication mechanism and servlet-like multiplexor which had just been developed on a shoestring, what could have been the result?

What if these had been developed and Gemstone purchased by IBM, which had been discussed many times even on Gerstner's floor in IBM?

"the brave new world of XML schemas and Infosets"

We'll see. Not too many business systems have been built on these yet. As mentioned above, it is not clear that XML is a random access database or a serialization or something else altogether. Nor is it clear where "includes" and "pointers" fit in. And what is a "relationship" in XML as in the relational database sense? Not entirely clear.

"It can be argued that given the ability to directly query both relational and XML data one can handle lots of problems without needing objects."

Objects are for abstractions. So are functions. So the comprehensiveness of the above statement depends on what "query" means and it depends on the query language.

"the lack of explicit XML values..."

This gets back to what is XML vs. some use of XML. Should there be one "data model" for XML? I doubt it.

"The impedence of incompatible type systems imposes..."

Everything is incompatible (e.g. "computation" and "data model" as well as "type"). An approach to some of the concerns in this paper may be better off *ignoring* XML(!), and going more into left field for potential solutions. Then those solutions may be able to be mapped back into XML for some purposes.

What *is* XML anyway? We have some relatively primitive yet widespread tools "for XML". But should this suggest our future data model, search, and computation problems are best solved "using XML", whatever myriad of mechanisms that means?

Wednesday, October 01, 2003

The Principle of Stability vs. the Principle of Release Early and Often

Over at Hamish's MishMash...

Patrick raises the flag for Smalltalk, and notest that it took ten years to get to the current version, Smalltalk-80. Which is now over 20 years old an substantially unchanged. There's an interesting question in there about how much of that stability is from it being "just right", and how much from the fact that once it's out there, it's harder to change. The balance is well over in favour of the former in Smalltalk's case. So "release early, release often" isn't necessarily the right way to go with language development?

Robert Martin developed the idea of "stability" in OO designs many years ago. The subsequent years brought battles in comp.object around whether this idea of "stability" is good or bad. In fact it is neither, just an observation that if many things depend on X then X is unlikely to change in ways that affects it's dependents. X may be "good" or "bad" by some other measures.

The same ideas can be applied to other kinds of design, e.g. language design as well as entire frameworks, such as the topic of the original messages, dotnet f/w. In my argument I am just assuming the Smalltalk system is "good" then I am attempting to explain how low stability for 10 years allowed it to become "good" before it became "stable". This is rare.

The original Smalltalk team did release early and often. But they were in the advantageous position to radically alter their design between revisions.

A commercial product like Hamish says, once it is out there is hard to change. So it depends on what "release early" means.

One product released to a small number of customers can change more easily than a suite of products (like the dotnet f/w) released to the entire world. If you are designing the ultimate in reusable platforms, this is a concern.


I just started chewing on Jeremy Allaire's item on RSS-Data. This is from a very constructive comments thread, followed by some very preliminary thoughts...

I'd love to see an example showing how RSS-Data is a Good Thing compared to a similar RSS 2.0 w/namespace example. It just seems like we're losing some precious semantic information when we drop down to datatypes in the document.

  • I like the idea, because I like XML-RPC's data definitions, more or less, especially how uncomplicated they are for programmers.
  • I don't like the idea for the same reason, it does not result in domain-specific XML tags and document definitions.
  • The difference between these two points is that in XML-RPC this "tagged data" document is represented as a struct

As per Greg and Eric... whether you use this approach or an XML namespace approach, you still have the same need for an out-of-band agreement. In either case you will have nested values and name/value pairs that only "mean" something to the people who write the code that makes it useful.

In short, I could get code working with either approach (and will probably have to). There will be thrash, but congratulations for getting a very important ball rolling.

Blog Archive

About Me

Portland, Oregon, United States
I'm usually writing from my favorite location on the planet, the pacific northwest of the u.s. I write for myself only and unless otherwise specified my posts here should not be taken as representing an official position of my employer. Contact me at my gee mail account, username patrickdlogan.