"I have a mind like a steel... uh... thingy." Patrick Logan's weblog.

Search This Blog

Friday, December 14, 2007

Column-Oriented RDF Storage

A couple years ago it occured to me that a database with column-oriented storage, such as Sybase IQ, might make a reasonable database for storing RDF data, where a star schema can be seen as a way to represent related tuples together.

Now it turns out some folks have been working on such a thing... C-Store is an open source, column-oriented databse. The paper "Scalable Semantic Web Data Management Using Vertical Partitioning" (pdf) discusses using C-Store for RDF...

Efficient management of RDF data is an important factor in realizing the Semantic Web vision. Performance and scalability issues are becoming increasingly pressing as Semantic Web technology is applied to real-world applications. In this paper, we examine the reasons why current data management solutions for RDF data scale poorly, and explore the fundamental scalability limitations of these approaches. We review the state of the art for improving performance for RDF databases and consider a recent suggestion, “property tables.” We then discuss practically and empirically why this solution has undesirable features. As an improvement, we propose an alternative solution: vertically partitioning the RDF data. We compare the performance of vertical partitioning with prior art on queries generated by a Web-based RDF browser over a large-scale (more than 50 million triples) catalog of library data. Our results show that a vertical partitioned schema achieves similar performance to the property table technique while being much simpler to design. Further, if a column-oriented DBMS (a database architected specially for the vertically partitioned case) is used instead of a row-oriented DBMS, another order of magnitude performance improvement is observed, with query times dropping from minutes to several seconds.

Thursday, December 13, 2007

Release It Again

Pete Lacey -- what he said, about Release It and Michael Nygard's blog.

If you find yourself in a panic over some centralized resource, wonder if the full costs are accounted for. What alternatives might exist for decentralizing, and how do the cost/benfits really add up over time?

The cost of operations is dropping. The cost of change is still too high for many to take advantage of that though. By the time we can get our systems onto more budget-friendly architectures... well, I guess heading in that direction puts you on the path toward even better things.

Depending on the business, if one extrapolates out from one's current position, through the point where more open/available/scalable systems are in use... well, then is this evidence that for most of us, our ultimate position is out in "software as a service/utility" land? Exactly who should be in the data center business five to ten years from now?

Another data point shows up. (Oh, and speaking of cost of change, it's in Erlang and it provides a REST api available from any language.)

From the SimpleDB FAQ...

Q: Where is my data stored?

Amazon SimpleDB stores your data in our multiple data centers in the United States. We anticipate adding other geographies over time.

Q: Does Amazon store its own data in Amazon SimpleDB?

Yes. Developers within Amazon use Amazon SimpleDB for a wide variety of projects. Many of these projects use Amazon SimpleDB as their authoritative data and query store and rely on it for business-critical operations.

So, "business-critical" seems kind of reliable. Not just scaled out databases, but scaled out data centers. Those are even more expensive to operate on your own.

Yahoo Flex Skin, Other Flex News

Among other recent Flash/Flex/Air/Adobe news, Yahoo released an open source skin for Flex.

Elsewhere new versions of Flash/Flex/Air from Adobe have been released and in some cases newly opened up.

I don't feel compelled to use the Flex Data Services, but for those who do, or when I do, today's news should be encouraging. You can use their open source implementation, or use their open specification for an alternative, e.g. in some other non-JVM language.

Blog Archive

About Me

Portland, Oregon, United States
I'm usually writing from my favorite location on the planet, the pacific northwest of the u.s. I write for myself only and unless otherwise specified my posts here should not be taken as representing an official position of my employer. Contact me at my gee mail account, username patrickdlogan.