Tuesday, June 30, 2009

Yet more bad news for RDBMS enthusiasts

The temperature's rising, and it's surely only a matter of time before normalisation is wiped from the developer's best-practice lexicon. Another well written article killing the myth here - http://www.roadtofailure.com/2009/06/19/social-media-kills-the-rdbms/

As previously stated here - the sheer scale of internet applications has exposed the short-comings of traditional databases in all but the most severe environments (banks?)

My favourite section of the article is the list of things that the author will not be missing with his new solution:

[Quote (c) Bradford Stephens, Road To Failure blog]
WHAT WE’RE SCRAPPING:

* Transactions. Our data is written in from a Hadoop cluster in large batches. If something fails, we’ll just grab the HDFS block and try again.
* Joins. Nothing is more evil than normalization when you need to shard data across multiple servers. If we need to search on 15 primary fields, we’re fine with copying our data set 15 times, with each field a primary key for its table.
* Backup and Complex Replication. All of our data is imported from HDFS. If high-availability is a must, we can simply use Zookeeper to keep track of what nodes die, and then bring up a new one and feed it the data needed in ~ 60 seconds. With scales of hundreds of millions of documents, no one will miss a few hundred thousand for that brief period of time.
* Consistency. If our users are analyzing millions of documents, they’re not going to care if there’s 15,000 unique Authors, or 15,001.

Agreed - if you're a financial institution, the difference between 15,000,000,000 and 15,000,000,001 is important, but for the rest of us, it just isn't.

1 comment:

Bradford said...

Thanks a lot for the link to my article -- I definitely enjoyed your summary and thoughts. Stay tuned for another...