Tuesday, June 30, 2009
As previously stated here - the sheer scale of internet applications has exposed the short-comings of traditional databases in all but the most severe environments (banks?)
My favourite section of the article is the list of things that the author will not be missing with his new solution:
[Quote (c) Bradford Stephens, Road To Failure blog]
WHAT WE’RE SCRAPPING:
* Transactions. Our data is written in from a Hadoop cluster in large batches. If something fails, we’ll just grab the HDFS block and try again.
* Joins. Nothing is more evil than normalization when you need to shard data across multiple servers. If we need to search on 15 primary fields, we’re fine with copying our data set 15 times, with each field a primary key for its table.
* Backup and Complex Replication. All of our data is imported from HDFS. If high-availability is a must, we can simply use Zookeeper to keep track of what nodes die, and then bring up a new one and feed it the data needed in ~ 60 seconds. With scales of hundreds of millions of documents, no one will miss a few hundred thousand for that brief period of time.
* Consistency. If our users are analyzing millions of documents, they’re not going to care if there’s 15,000 unique Authors, or 15,001.
Agreed - if you're a financial institution, the difference between 15,000,000,000 and 15,000,000,001 is important, but for the rest of us, it just isn't.
Tuesday, June 23, 2009
Something that has really resonated with me over recent weeks is the concept of the graph database. I’ve spent most of my professional career railing against RDBMS software and the frustration of database cost/scale/performance, and although graph databases (or key-value databases) won’t solve the database dilemma, it’s very encouraging to find such a vibrant community of experts trying to tackle these issues.
Here is a great presentation which introduces the concepts - http://markorodriguez.com/Lectures_files/risk-symposium2009.pdf
Monday, June 08, 2009
I know none of these is exactly new, but I think the ideas behind them are being shared within the broader community these days, and that can only be a good thing.
Tuesday, June 02, 2009
We’ve been using Basecamp to collaborate on our team for the past year, and one of the things that it highlights is how incredibly poor an experience email provides. A typical email thread (say 10 replies) can encompass a number of people who are cc’d in (or dropped) from any single message, making retrospective auditing of a decision very, very complex. (In fact it’s impossible if you weren’t on the critical email in the chain.)
Having got used to Basecamp, we now use its Messages function in preference to email precisely because it provides a single conversational thread where anyone can see the decisions being made in chronological order, irrespective of when they joined.
One of the features of Basecamp we haven’t really got comfortable with is the Chat feature, which is functionally equivalent to the Messages, but in ‘real’ time – i.e. it’s a better IM, where Messages are a better email.
Google Wave seems like a better Chat and a better Messages function, combined in one. I have no idea if it’s a ‘killer app’, and some of the initial press has suggested it’s just too ambitious, too complicated for non-technical users, but I for one applaud Google’s ambition in at least addressing the problem. Email is well past it’s sell-by-date, I think that for tech-savvy power users Wave (or an equivalent) could become a de facto communication medium.