"I have a mind like a steel... uh... thingy." Patrick Logan's weblog.

Search This Blog

Friday, October 03, 2003

What *is* XML anyway?

My somewhat random thoughts, reading through "The Impedance Imperative Tuples + Objects + Infosets = Too Much Stuff!" from Dave Thomas (the ex-OTI/IBM Smalltalk guy, not the Pragmatic Programmer guy) being discussed on Lambda the Ultimate...

"SQL is quite good for simple CRUD applications on normalized tables."

This seems to speak to OLTP. For OLAP, denormalized tables (3NF fact tables and 2NF dimension tables in a star schema) would be preferred. Still standard SQL does not support all the expressions you'd like in OLAP such as time series expressions.

For OLTP I am not convinced you want SQL at all. Something like Prevayler might be preferred. When we get large, battery-backed RAMs in a few years, we won't even care about writing transactions to disk.

"SQL programming often requires an alternative interface using cursors"

This is becoming somewhat less necessary in situations where set-based expressions are the ideal. Some databases like Teradata and Sybase IQ support set-based expressions efficiently. Even SQL Server is better at this than in previous versions.

"after many years of engineering, the relational databases can finally claim the performance and flexibility of keyed files...; network databases..."

Henry Baker has some great thoughts about this. I am kind of in the middle. One thing seems to be true, that funding for any kind of database other than relational is almost nothing. Object databases have had commercial funding, but they've been miniscule compared to the commercial relational database R&D.

What, for example, could have been done at Gemstone where indexing, query, and reporting for its OODB had well under one person year R&D during it's 20 years of development?

This has some applicability to XML too. Is XML a "random access database"? Or a "serialization" (with "includes"? with "pointers")?

"Third Generation Database Manifesto... objects... were syntactic extensions on Blobs"

Another approach in PostgreSQL and other DBs is to make tables like a "class" (whatever that is!) and one class/table can inherit from another. This is actually fairly useful for O/R mapping.

"Object databases, it was claimed, solved the impedence mismatch..."

Another note on star schemas, they simplify data models relative to 3NF models, and they partition data into dimensions, facts, and many-many relationships. Dimensions map fairly well into objects, facts map into observations or measurements among networks of objects. If you design your objects and your data with this in mind, the O/R mapping problem can be reduced for many common business (and other) scenarios.

"while there are some solutions (AS/400 and Gemstone persistent stores) that have been very successful..."

Dave gave a keynote at a Gemstone company retreat. He tried to marry Gemstone with AS/400, suggesting we could ignore the Java industry and make more money. I tend to believe him since AS/400 was already a successful niche with persistent data as a feature, and Dave was at the time with IBM (via his OTI subsidiary) and so had to have had some inside understanding of the economics.

This was the point where Gemstone in "the hopes of becoming the next Oracle" all but abandoned Smalltalk for Java/J2EE. For the next several years the Smalltalk market funded the Java development with about 3x developers for Java than ever worked on Smalltalk. I doubt the Java investment ever broke even, while Smalltalk continued to bring in revenue (at least as of a year or so ago).

As mentioned above, Gemstone hardly invested anything in query, indexing, and reporting for either Smalltalk or Java OODBs. Had the numbers assigned to Java been put into this, and perhaps the AS/400 port, not to mention the replication mechanism and servlet-like multiplexor which had just been developed on a shoestring, what could have been the result?

What if these had been developed and Gemstone purchased by IBM, which had been discussed many times even on Gerstner's floor in IBM?

"the brave new world of XML schemas and Infosets"

We'll see. Not too many business systems have been built on these yet. As mentioned above, it is not clear that XML is a random access database or a serialization or something else altogether. Nor is it clear where "includes" and "pointers" fit in. And what is a "relationship" in XML as in the relational database sense? Not entirely clear.

"It can be argued that given the ability to directly query both relational and XML data one can handle lots of problems without needing objects."

Objects are for abstractions. So are functions. So the comprehensiveness of the above statement depends on what "query" means and it depends on the query language.

"the lack of explicit XML values..."

This gets back to what is XML vs. some use of XML. Should there be one "data model" for XML? I doubt it.

"The impedence of incompatible type systems imposes..."

Everything is incompatible (e.g. "computation" and "data model" as well as "type"). An approach to some of the concerns in this paper may be better off *ignoring* XML(!), and going more into left field for potential solutions. Then those solutions may be able to be mapped back into XML for some purposes.

What *is* XML anyway? We have some relatively primitive yet widespread tools "for XML". But should this suggest our future data model, search, and computation problems are best solved "using XML", whatever myriad of mechanisms that means?

No comments:

Blog Archive

About Me

Portland, Oregon, United States
I'm usually writing from my favorite location on the planet, the pacific northwest of the u.s. I write for myself only and unless otherwise specified my posts here should not be taken as representing an official position of my employer. Contact me at my gee mail account, username patrickdlogan.