Thursday, October 29, 2009

Is Google Evil?

Hot on the heels of one giant crushing the ambitions of smaller companies (Amazon’s MySQL solution) comes another – Google’s destruction of the sat-nav market with their latest announcement (“Free Google sat-nav shakes market “).

Enthusiastic free-marketeers would probably say that the addition of Google into the market will spur Garmin and TomTom to greater innovation and ultimately a better deal for the consumer, but it can’t be pleasant having the rug pulled from under you like that.

At least in the good ol’ days they had the decency to buy your company first and then destroy integrate it.

Tuesday, October 27, 2009

Amazon marches on

Why would anyone host or manage their own infrastructure these days - http://aws.amazon.com/about-aws/whats-new/2009/10/27/introducing-amazon-relational-database-service/

So now Amazon offer storage (S3), compute power (EC2), relational databases (RDS), non-relational databases (SimpleDB), queueing (SQS), Hadoop (Elastic MapReduce), people (Mechanical Turk - for when computers just don’t cut it), connectivity into your own infrastructure via VPN – really, if I owned a hosting company I’d be worried, and if I ran a start-up I’d look no further.

Apparently they sell a bunch of stuff as well - http://phx.corporate-ir.net/phoenix.zhtml?c=176060&p=irol-newsArticle&ID=1345413&highlight=

Thursday, October 22, 2009

Raindrop

Mozilla’s labs group have released some information on a messaging solution codenamed Raindrop. It runs on CouchDb, and seems vaguely related to the whole APML (attention profiling markup language) movement – allowing users to sift through the daily dump of information by applying qualitative filters to info. (e.g. emails from my family are more important to me than Tweets from some celebrity I happen to follow – that sort of thing).

I don’t really have an opinion on it as yet, but given how much I dislike email, and how disappointed I am with Wave, it’s good to see life in this area.

Monday, October 19, 2009

“Polyglot Persistence”

As I cycled home this evening I was thinking to myself that perhaps a mixed-database environment might be the best approach. I was trying to work out how I would re-engineer our platform given the opportunity, and it’s clear that whilst some areas are ripe for the NoSQL upgrade, others, notably transaction tables, audit logs etc., are much better suited to strongly-typed relational databases.

As with everything I seem to do at the moment, I’m definitely a follower, as no sooner had I fired up my browser than I came across the following: http://johnpwood.net/2009/09/29/using-multiple-database-models-in-a-single-application/

It’s a great article, articulate and intelligent, so go read it…

Thursday, October 15, 2009

Map/Reduce & the Mechanical Turk

So, I have a project that I have wanted to get off the ground for a long, long time, which involves people solving a problem that computers seems incapable of – making sense of the entertainment industry’s metadata mess. It’s a disaster, no one can match anything to anything with any degree of confidence, and even companies whose raison d'ĂȘtre is value-added metadata don’t seem capable of getting it right. I don’t entirely blame them, as having worked at the sharp end for a number of years I know how difficult it is.

Except that it isn’t really – at least not for humans. Computers can’t do it because IDs don’t match and there’s very little fixed structure. It’s a schema-less nightmare. The only significant effort I’ve seen at creating a universal schema was hopeless. (Unfortunately I was supposed to be managing it at the time!) And yet it’s quite easy to match assets to metadata across formats (digital, physical etc.) as a human. We can match images and sounds, we can do loose / fuzzy text matches, and above all we have common sense.

The problem for us people is the scale of the problem – tens of millions of assets need matching – which superficially appears best-suited for tackling programmatically. So how can we reconcile the requirement for human intervention with a problem of vast scale?

This is a map/reduce problem at its heart – we need to spread the work across as many people as we can, and then aggregate the results. Is the Amazon Mechanical Turk the solution?

Adding in the spice of having no fixed schema (what happens to your precious database when the music industry decide to create a new product type that looks a bit like an album, but different) and it’s a problem for the NoSQL generation.

So here’s my solution – stick all the available data into a non-relational document store, index with a search engine, and then present a simple user interface to allow people to validate the metadata and to perform the all-important matching process. Finally, motivate people to do the work by paying them, and use the Mechanical Turk to manage the human map/reduce function.

Some kind of validation is required to maintain the data quality (only accept matches provided by multiple people?) – who knows, perhaps if enough people join the labels / studios themselves might get involved to officially endorse the work (think Twitter verified accounts.)

All I need now is someone pay to have it done…

[UPDATE]

I’ve just done my first couple of HITs (Human Intelligence Task) – looking up iTunes AudioBook prices for someone – hopefully I now have $0.04 winging it’s across to me. Here’s a screencast of me in action! http://screencast.com/t/GazhIehoEW

Saturday, October 10, 2009

Riak – another No-SQL option

Just reading about Riak – another non-RDBMS solution, which pushes all of the right buttons:

  • Key-value store
  • Document-based
  • Extensible (in runtime) schema
  • Flexible inter-object links (i.e. relationships)
  • Includes Map/Reduce functions for data queries
  • Natively accessible over HTTP
  • Syntactically Get-Put-Delete, not CRUD
  • Deterministic & repeatable ID generation
  • Shares some concepts with Amazon’s Dynamo

slide02

All-in-all it seems on the face of it to be a data persistence solution built for the internet. Details can be found here - http://riak.basho.com/nyc-nosql/ – though be warned, this presentation includes lots of diagrams like this:

slide08

Wednesday, October 07, 2009

Measurable quality

Whilst boring a (City-based) friend with my thoughts he pointed out that in the City the highest paid people are often not the bosses, but the star traders. So now we have another models to investigate – the bonus culture, where workers are openly rewarded according their contribution. Either way, it’s possible to earn a (very) decent living by continuing to practice the very thing that made you successful in the first place, with the business “management” being taken care of by people trained in their own way to do just that. (And having no greater status than those doing the work.)

The most successful lawyers continue to practice the law, and the most successful traders continue to trade. One could argue that this is because what they do generates enough money to make this an attractive proposition, whereas software development does not. And yet… software is surely the biggest growth industry in the last 25 years – it has literally appeared out of thin air, and yet it’s created some of the largest personal fortunes every seen. An article I read a couple of years ago (wish I could find it) had some statistics about billionaires that included a summary of those who could “program a computer”; let’s just say that there were more computer programmers in the list than lawyers or accountants. So what gives?

Back in the real world, most software programmers are neither billionaires, nor working for billionaires, but the question of how to make a respectable living still exists. Of course, the great advantage that City traders have is that their contribution is measured in numbers* – which can be ranked and rated. Which begs the question, how do you evaluate a developer’s contribution to a company’s success?

If you started a company that became SAP (as did one of the programmers in the list), and therefore had both money to spend, and the ability to recognise technical talent, how would you measure it? Do you even bother – is one person’s line of code the same as another’s?

Everyone is not equal, we all know that, but how do you prove it; if it’s not possible to prove it objectively, does that make software development art and not science?

* They also have the huge advantage of having “make more money” as the sole focus of their job.