Wednesday, December 08, 2010

Shoreditch – partying like it’s 1999

I’m a bit late on this one, but when I read about this on holiday - – I thought I was dreaming. Silicon roundabout has apparently become mainstream, but this is very, very old news (even in real, i.e. not-internet, time).

I went to my first Old Street rooftop digital / web collective launch party in 1999 – Hoxton Square was the place during the original dotcom bubble. was headquartered down the road in Farringdon, I went to an interview at Gorrilla Park Ventures in a converted warehouse near Spitalfields – the idea that the tech community has suddenly converged on the strip running from Farringdon through Old Street to Shoreditch is pure marketing fluff.

Perhaps we’re just waking up after the 10-year hangover.

Friday, November 05, 2010

Web Forms (1): Validation

This post is more of a note-to-self than anything. Part one of a two-parter, concentrating on web form processing – first from a server-side point of view, and then later, from a user-centric client-side point of view. The aim is to document a pattern for forms processing that is consistent and repeatable for any website. Think PRG (Post-Redirect-Get), with some server-side detail added in for good measure. PRG+.

In this post, I just want to focus on server-side data validation, and how to deal with the different types of invalid data entry. The first point to note is that you should never make any assumption about the origin of the data. You must ignore any notion of client-side validation – the data that your controller receives as the request could have come from anywhere, and may contain anything. Data is sent over the wire as text key-value pairs, and should therefore go through a number of validation steps:

  • Checking the data type – everything arrives as a string over the wire, so needs checking
  • Simple data type logic – is an email in a valid format, is a date of birth set in the past
  • Domain logic – is an email / username a duplicate, does a product exist, is the price ‘right’?

Only at this point can you attempt to process the form with any confidence.

When an rule fails, there are a couple of different ways in which the user can be notified:

  • Return to the form, and allow them to amend the invalid data
  • Proceed to a new page, with appropriate messaging (e.g. PRG)

I will go into more detail around this side of the process in the next post (when to redirect, how to notify users etc.), however one thing I would highlight here is the difference between recoverable and unrecoverable errors: if the data fails validation before any changes are made server-side, then I believe the user should be alerted via a warning message and not an error message. Errors should indicate that something went wrong, and if a form fails validation then the user should always have the option to amend their data and resubmit it. In project management terminology, warnings indicate a risk (something that could go wrong if mitigating action is not taken), whilst errors indicate an issue – something that has gone wrong already.

I also believe that there is a difference between the simple data validation and the contextual business domain validation, and that only the first type should ever be replicated client-side (ignoring AJAX for now – more of that next time). In an MVC world, I think the first type can be done by the controller, and that this validation should match client-side validation, whilst the second type should be done deeper into the model and / or application domain. (Assuming that the controller is just that – a controller – and that it delegates the ‘doing’ to other components.)

Below is some pseudo-code demonstrating what I believe** to be the ideal processing and validation for a sample form request handler.


    /* first do some 'dumb' data type validation of input values – remember
    that all request values are passed over the wire as text, so they need to
    be validated according to the basic destination data type. At this point
    client-side validation should be ignored – we don’t know that the information
    was submitted using the form – we simply know which controller was called. */
    if (! isEmail(
        view-data.errors.add(new InvalidPropertyException("email"))
    if (! isZipcode(
        view-data.errors.add(new InvalidPropertyException("zipcode"))
if (! isDate(request.dob))
        view-data.errors.add(new InvalidPropertyException("dob"))
    /* if we have any errors so far then don't bother continuing, return to the
    form and prompt the user for new values. This is WARNING, and not an ERROR,
    as nothing has been changed server-side, and the user can always amend the
    values and resubmit. This is the server-side equivalent of client-side JS
    validation. */ 
    if (view-data.hasErrors)

    /* if we get here then we know that the values are 'correct' but that doesn't
mean they will work. This next validation step may require more context than
    the simple
validation above – it is not something that can (or should) be 
    replicated in client-side JS. e.g. is the price in the acceptable range, is the
    username available. */
        model = new model(,, request.dob, request.price)
    /* exception thrown by any property setter that doesn't like the value it's
    given; this is functionally equivalent to the case above - nothing has really
    happened so log this as a WARNING, and not an ERROR */

    catch (InvalidPropertyException)
    /* exception thrown by the doSomethingWithModel method that occurs BEFORE
    anything has been committed, and whilst there is the opportunity to resubmit
    the data. An example of this might be an attempt to register a duplicate
    username; this can be marked as a WARNING or an ERROR, depending on context.*/
    catch (RecoverableException)
    /* exception thrown by the doSomethingWithModel method after data has been
    irreversibly committed. In this case resubmitting the data is not desirable,
    and the user should be alerted! An example of this might be a database
    exception after the record has been partially committed. */
    catch (UnrecoverableException, AnyOtherUnexpectedException)

    // phew - if we got here it all went well, so we render the anticipated page

** It would be fair to say that my views on this aren’t universally accepted – comments welcome.

Wednesday, October 20, 2010

Guardian Open Platform – IT, or Innovation?

Earlier this week I attended the Enterprise Search Meetup that was held at the Guardian’s office, and entitled “Search at the Guardian”.

It was a great event, lots of interesting details and a good insight into the Guardian, who, through their Open Platform initiative, seem to running away from the rest of Fleet Street in terms of innovation and open data. Who knows, perhaps they even “get” online?

Plenty of others have written reviews of the event, and most of the speakers themselves blog regularly about the Open Platform, so I won’t go into that here – suffice to say that, in the words of Martin Belam, “Open Data is basically one big advanced search query”.

What I found most interesting about the evening was the philosophy / approach behind the initiative, something that I believe is neatly illustrated by their choice of search software.

The Guardian held a search “bake-off” when they had to decide whether to upgrade their Endeca implementation to a new version, and they invited Endeca, FAST and Apache Solr. The eventual winner, and apparently this was a decision that ignored commercial factors, was the ‘free’ one – Solr. And why – well, because when it came down to it, Solr was more developer-friendly, and they knew that when they started to push the boundaries of the product, they could either reach out to the community, or, heaven-forbid, do it themselves.

This is not a rant about Open Source Software, however, as I’m not sure that’s the point. I think the Guardian is probably spending more on people tweaking their free software by hand than they would have spent on the licence fee for the expensive commercial software equivalents.

To me, the point is this: on the plus side, the commercial product comes with a supportable SLA, a sense of security around the product vendor, and a contract, which allows the user to hold the vendor to account in the event of failure. It also comes with a set of documented tools and user manuals – a product is designed to slot into an operational model.

On the downside, it is very hard to innovate around a commercial product – if it doesn’t do what you want it to, you simply have to sit and wait for the next release.

To an IT-style "CIO”, this is perfect – a fixed contract allows them to manage costs and budget for the year ahead – as well as safe-guarding their position (it would be hard to fire someone for choosing to buying Endeca over Solr in a large organisation.) It reinforces the idea of technology as an operational cost, to be managed down. This is is the essence of “IT”.

The alternative approach is to invest that same money in people – clever people, empowered to innovate. For the same amount of money they are now treating technology as a competitive advantage, something that can be used to put clear water between themselves and their peers. It’s a huge gamble, and the person responsible almost certainly will lose their job if it doesn’t pay off.

In answer to my question on the night - “What is the commercial model, and why would anyone pay to support this initiative”, it was clear that whilst there is no direct commercial model as such (it’s free, after all) they were already seeing benefits. As Stephen Dunn (whose decision it may have been) pointed out, by opening up their platform they are capable of supporting far more partners than they would have been by trying to integrate them into their core systems. (I’m not sure this was the original design goal, but I may be wrong.)

A brave initiative from the Guardian, and one worth supporting.

Tuesday, October 19, 2010

DNS Propagation

We all tell clients that DNS changes take 24-48 hours to propagate, but we very rarely get to see any evidence of this. Well, now I have a nice chart showing just this.

A website that I am, ahem, involved with, went offline for a period of time earlier this week on account of a ‘miscommunication’ with the company hosting the DNS name servers. As a result, they removed the DNS record from their servers, and the domain simply disappeared from the internet. The solution was pretty simple – update the name servers to the current hosting provider (the ever reliable Rackspace, who I have no hesitation in plugging – their service desk is fantastic). Then all I had to do was sit back and wait.

Fortunately, at the same time the site was being monitored by the equally reliable Pingdom, who monitor the site from 25 locations in 10 countries across Europe and North America (not representative of the world, I know, but enough to make this interesting).

This meant that I could watch the propagation of the DNS change as it was picked up by their servers. Again, this is not a scientific survey, but interesting nonetheless.


Thursday, September 30, 2010

User-centred design

In a follow-up to my previous post on admin interfaces, I thought I’d post on something that happened the other day on the current project.

As part of the ecommerce site we’re building we have a dedicated service for managing email notifications (registration, order confirmation, lost password etc. – about 50 in total), and during development we’ve been using Google’s SMTP service to do the sending itself. A couple of days ago I wanted to find out whether an email had been sent (as it hadn’t been received), and the easiest way to do that was to log in to the associated Gmail account and look into the Sent Items folder.

Turns out, the best logging / admin tool for an email service is, well, an email service. You can search through the emails, filter them, track them, configure various options and more. It just works.

At the same time, whilst thinking through the process for website administrators to take down certain products, the solution is not to give them access to a special website from where they can use the the Take-Down function (for which they will need the appropriate training and documentation), it’s to put a Take-Down button on the website product page that is only visible to people who are administrators. No “process re-engineering” required.

This user-centred design concept may yet catch on…

(Of course, it can get a bit out of hand – qv the “customer journey re-engineering manager” comment in this article - – rather embarrassingly I think I know the people involved!)

Sunday, September 26, 2010

HTTP self-service

HTTP is all around us – it has become the de facto transport protocol of choice; web servers may be the primary source, but routers / access points have used it for a  long time, databases have begun to use it (e.g. CouchDB, RavenDB), and now even console apps have taken to its user-friendly interface. We’ve been using Mercurial on our latest project, which is a fantastic application in its own right, and despite being essentially a console application, it too contains its own HTTP server. Type in “hg serve” at the command line, and it will start up a single-use dedicated HTTP server, which provides a great online user interface.

I won’t go into the details (you can try it yourself), but it did set me thinking. Running background applications on a server have always suffered from the lack of user interface, but embedding an HTTP server in the application would allow you  to offer this**. Moreover, within a locked-down production environment you could provide network access to an application using a protocol that sysadmins understand.

Digging a little deeper, I came across Kayak – a lightweight .NET HTTP server, which would allow just this. It should be possible to use this (or an equivalent)  to bundle an HTTP server in with your services to serve up things like: logs, service status (e.g. databases up, network locations running) & configuration information. This would also allow you to manage / update configuration settings in runtime, and to combine multiple services into a simple management console.

Ultimately, it would be great to provide a mechanism for uploading software updates – in the same way that you update router firmware – navigate to a URL, upload an update package, and then restart the service. Add in scripting support, and it would make management of a multi-server environment much, much simpler, without the need for an expensive System Centre type application.

I’m so convinced that this is a good idea that I’m going to set up a project to demonstrate its application, so watch this space.

** NB – when I say “user” interface, I am obviously talking about the tech-team, not end-users. I’m not advocating opening this kind of interface to the internet – this is internal only!

Monday, September 13, 2010

Whatever happened to XSLT?

Many, many years ago I built a website that used XHTML/XSLT to render all of the pages (this was back in the old-school ASP days). It worked a treat, and I’ve had a certain regard for XSLT ever since.

Which is why I was curious when some colleagues were tasked with building an email template system for an ecommerce system (large ecommerce systems can have dozens of email templates covering things like order confirmation, registration, forgotten password etc.) and then decided to use their own custom templating language.

Surely this is what XSLT was invented for – taking one block of XML (e.g. order details) and transforming it into another (an XHTML representation of the order)? “Too hard” came the reply; “it’s another language to learn” was another.

I wondered if this was a wide-spread complaint, so did a quick Google, and it appears as if XSLT has dropped into a black hole. It’s either so ubiquitous that no one talks about it anymore (it just works), or people are genuinely turned off by it.

It’s a shame, because with all of the fuss about DSLs these days, XSLT deserves an honourable mention.

Sunday, August 22, 2010

Google’s “High Performance Image Serving”

[Update] I stand corrected - I do now have billing enabled on my Apps account, and can confirm that images are served with all the correct response headers set. The URLs do indeed support 304 conditional GETs on the production infrastructure, which makes this a very attractive image hosting solution. Well done Google, apologies for the mis-representation.

(NB – this article is based on the development SDK (1.3.6) – I haven’t been able to test on the production infrastructure as yet as for some reason Google won’t authorize billing on my AppEngine account, without which the Blobstore is unavailable!)

The concept of a hosted service that manages image serving / caching, resizing & cropping is something that has cropped up in projects I’ve been involved with for the past 5 years (and the rest).

I had a hand in a startup about 6 years ago now that processed MMS (picture messages) that people sent in. At its simplest this involved storing people’s uploaded images, and then cropping / rotating and resizing the images to fit a certain profile (we were printing them out, and needed to convert landscape to portrait, and crop to a fixed aspect ratio.)

Subsequent to that experience I worked for many years in online digital entertainment, developing a platform for the processing of music / video assets, and one of the elements we also needed was the ability to take in hi-res images, shrink to an acceptable size / format (e.g. not TIFF), and then host them for delivery across a CDN.

The project I’m currently working on has a combination of the two – processing user-generated content for serving back over the web. What we need is an image processing service, backed with a large data store, and a high-performance cache.

So, it was with great excitement that I noticed that the Google App Engine SDK comes with an in-built library to do exactly this. It’s built on top of the Picasa library (it even includes the “I feel lucky” transformation), and enables cropping, resizing, rotation etc. The App Engine platform has no file-backed storage, but the datastore does include the BlobProperty type, which can be used to store binary data (such as images). A simple image processor using this took about fifteen minutes to set up (which was mainly cut-and-paste from their sample app here).

(Some people may by now be thinking “Tinysrc” – the online service that resizes images for mobile screens – well, no surprises, tinysrc runs on AppEngine – this is precisely how they do it, except that they pull the images from a remote server – they are not stored.)

The downside of this approach is the the datastore has a 1MB limit per entity, which makes it borderline useful if you’re dealing with UGC (web-optimised images should never be 1MB, but the image someone just uploaded from their new digital camera could easily be.)

Fortunately, Google provides a secondary datastore specifically for large binary objects, called the Blobstore. It’s well documented (here), so I won’t repeat that, but what I can say is that it integrates directly with the Image api (see here). There are some complex limits about the amount of data you can process, so read the article carefully, but suffice to say it can be done. (See here for a nice example of the blobstore / image api interaction.)

A killer function, which has been publicised this last week (as “High Performance Image Serving”), is the “get_serving_url” function in the Images api – which takes in a Blobstore object key, and returns a fixed URL that can be used as the static image URL.This looks almost like a Google CDN – the ability to serve images as static content, with the ability to crop / resize on the fly (albeit using fixed sizes) thrown in for free.

[Updated – see intro] And yet… if you set up an image service using these amazing (and practically free) resources, you’ll find a fairly large hole in the implementation. It’s our old friend HTTP status codes. The fixed URL exposed by the images service does not support a 304 (content unmodified) status – meaning that every time you call for it, you get the whole thing, increasing server bandwidth and client download times. (See introductory note – this may just be a development server issue – TBC.)

I can only assume that this is deliberate – as Google gets its money from the bandwidth charge. It is however extremely annoying.


Saturday, August 21, 2010

Cloud computing – where’s the PHP platform?

As the fog begins to lift from the world of cloud computing the classification of cloud services is becoming clearer. Whilst the likes Salesforce and Google Docs set the running in the Application space (Sofware-as-a-Service), and Amazon was the clear leader in cloud infrastructure (IaaS), the most interesting area (at least for me) is the Platform-as-a-Service (PaaS).

PaaS has been around for a while, and offers some compelling advantages over IaaS for the software development community. Virtualisation (the basis for IaaS) might be cost effective, but from the point of view of the software developer it’s still an O/S. PaaS removes much of the boilerplate code that developers have to write, and can make architectural decisions around scaling and performance redundant (the platform providers do the hard work, you just have to follow the rules.)

The two leading offerings at the moment are Microsoft’s Azure (.NET based) and Google’s AppEngine (Python or Java). There are a couple of Ruby offerings (Heroku and Engine Yard), and a new arrival with (Python / Django). The obvious missing link is a PHP-based PaaS offer – the LAMP community is now behind the curve where it once led (most early EC2 adopters were touting their LAMP credentials).

Now, where could a PHP service come from? What we need is a company that has experience running a global infrastructure, supporting well-understand PHP web-frameworks (documented and preferably OSS) and looking to encourage the army of keen LAMP developers out there to stick with the stack, and not migrate to Python / Ruby (or heaven forbid .NET!)

Facebook have just posted an update on their developer support blog (here), and PHP isn’t in it – it’s all about Facebook apps and integration. It would be nice to see them being a little more ambitious – they are the obvious choice, and integrating things like HipHop, Hive, Scribe, Tornado, Memcache, and of course Cassandra, they would have an incredibly compelling service on offer.


Saturday, August 07, 2010

Schema-less data and strongly-typed objects

There has been some talk on the NoSQL grapevine recently about schema-mapping and the associated issues when dealing with schema-less data stores (specifically document-databases).

To my mind the core problem is not that it’s difficult to control the schema – that, after all, is the point, but instead by the use of statically-typed languages to access the contents. Deserializing documents back into strongly-typed binary objects will cause a problem if the structure of the documents changes. Deserializing JSON back into JavaScript objects doesn’t suffer from this problem.

It then becomes a client-side issue of understanding what to do with an object when it doesn’t have the property you’re looking for. This is a great example of why you should consider the use of NoSQL data stores carefully – caveat emptor is the guiding principal.

(Here’s an example of the sort of thing going on to mitigate such problems - – and yes, I do know that MongoDB uses BSON, not JSON – I’m just illustrating the point.)

Thursday, August 05, 2010

Times Online pricing

I received another email from the Times today extolling the virtues of its new site, and suggesting I take out a subscription. The thing is, I don’t want to read the digital equivalent of the back-breaking forest-destroying Sunday Times – however I might want to read a few choice articles or sections / categories. Which is surely the major advantage of digital over physical. I can pick the articles I want to read, and ignore everything else. Except that the Time has applied the same pricing policy to their online version. Aaargh. Idiots.

Tuesday, August 03, 2010

IT, innovation and the internet

After seeing this post by Martin Fowler, I thought I should revisit / update the article I posted here, given that the IT / innovation divide seems to be gaining traction.

My original post was a bit OTT, and primarily a reaction to the conference I’d attended the previous day. I’ve had plenty of time to think through the issues since then, and as a result I now believe in the divide more than ever.

There is an excellent article here that describes some of the issues, but I still think there is a further distinction between companies who make their technical innovation part of their corporate DNA, and those who don’t – those who pursue strategies of operational efficiency and economies of scale.

It seems to me that this distinction is clearest in the case of pure-play internet businesses. The internet is an entire ecosystem within which innovation is key. People talk about “internet time” precisely because the rate of change is so great. And in this environment, I think that the responsibility for innovation (whether that be application development or infrastructure-led) should not sit with the IT department, however clear the distinction. IT should report to the operational director (or COO), “innovation” should report to whoever is responsible for strategy and growth.

I’d almost go so far as to suggest that the Waterfall / Agile schism is a reflection of this change. Perhaps all projects that can / should be scoped in full in advance of implementation should fall under IT, and those with a more ‘uncertain’ outcome should fall under the auspices of the new department, whatever it’s called.

What is clear (to me) is that the fact that a project / program / initiative involves either a computer, or software, does NOT automatically mean that it should come under the banner of IT. When the web was starting, the IT department appropriated web development simply because they knew one end of a computer from the other.

As Ross Pettit points out “IT has no definition on its own … it only has definition as part of a business”.

Saturday, June 26, 2010

Redirect status codes (again)

My pedantry over over the 303 redirect having been pointed out by my colleagues, I figure ‘in for a penny, in for a pound’. Or 301 to be precise.

The use of a canonical URL for SEO purposes is well known, as search engines are notoriously precise, and will store reputation against the exact form of a URL, including trailing slashes and case sensitivities.

The recommended best practice is to use a redirect to consolidate reputation against the canonical form. 

The important point about the redirect is that you should use a 301 status code to indicate that the redirect is permanent. This is used by the search engine to combine reputation. If you use a 302 status code the user will be redirected, which is good, but the search engine will interpret this as a temporary redirect and will keep the incorrect URL in its index as a valid content URL.

This illustration from the Google SEO Report highlights the problem:

Illustration showing SEO dilution

As ever, the best reference for more information is Google, and I would recommend everyone involved in SEO read Google’s own SEO Report Card – it’s an easy read, and well worth the effort.

Wednesday, June 23, 2010

Exercising freedom of speech, or just squatting?

I’m settling comfortably into middle age so it’s about time for a Colonel Blimp moment. I have the great good fortune to have a bicycle / moped ride to work each day that takes in the Houses of Parliament and a circuit round Parliament Square. Once upon a time (and for quite a long time) there was a chap who camped out there called Brian Haw, and he was a man of principal and much respected.

In recent months however, Parliament Square has begun to look like an urban outpost of Glastonbury, and it’s completely out-of-hand and I’m outraged (albeit not from Tonbridge Wells).

Frankly, I think they’re taking the p*ss, and if I had my way they’d all be moved on each night. I’d also point out that there’s no one in Parliament at night, so they’re not demonstrating to anyone or making any point. They’re just squatting on one of most expensive pieces of real estate in the world.

I’m all for the expression of free speech, and am more than happy for people to march up and down during the day, but I think that they should be cleared out each night – if they feel so strongly about whatever they are demonstrating against, they should be happy to come back each day. If the price of democracy is a bus ticket in each morning, so be it.

My letter to the Telegraph is in the post.

Update (06/07/2010): it has a name, it’s called Democracy Village – website here - Turns out it’s quite a big deal - Boris is attempting to get rid of them, but they have had a stay of execution for some reason. Also worth noting that Brian Haw is not affiliated to them in any way – as stated on his website -

Facebook and the Open Graph Protocol (OGP)

A couple of days ago someone asked me whether the old “avoid iframes” mantra was still relevant, and if so why. The specifics of my response may make it into another post, but in summary I suggested that if both sides (host page and iframe target) trust each other, and are working towards a common aim – i.e. they both want the solution to work - then iframes work just fine. I gave the Facebook Social Plugins as my reference for this – the Like button can be implemented as either XFBML or an iframe, and works just fine as either**.

His response (and he may have misquoted his source) was to the effect that he’d been told that that was not the case, and that the Open Graph Protocol didn’t use iframes. I could have let it lie at that, but I trust him, so am happy to lay the blame at the door of his source, in which case some serious explaining is in order.

The Open Graph Protocol is precisely that – a protocol. It is a recommended set of HTML <meta> tag attributes that can be used to give semantic meaning to a web page. There is a healthy OGP group running here - – where members are discussing possible extensions to the current protocol, but in essence it’s a bunch of machine-readable information that can be used to categorise the contents of a web page. These tags can be used by anyone, either to ‘decorate’ their own website, or to give meaning web pages that include the tags (if you run a service that involves managing URLs – e.g. link shortening, search engines etc.) Facebook’s own use of the OGP tags is pretty basic at the moment – if you “Like” a page, Facebook does two things:

  • It registers the fact that you Like it, and posts to your stream
  • It retrieves the page itself, and extracts information that is relevant in order to make the Like information more structured.

In a standard web page all Facebook can do is extract the few standard HTML details (page URL, title tag etc.) By encouraging web developers to include the OGP tags they can provide a much richer experience on the Facebook site – they can “know” that when you Like a movie on IMDB that you are talking about a movie, and not some abstract page on the internet. This is the essence of the Semantic Web.

So, to say that the OGP doesn’t support iframes is a bit like saying a boiled egg can’t drive a car. It’s meaningless.

** It turns out that XFBML uses iframes anyway, so you can’t get away from them. The fb:like element is converted into an iframe by the FB JavaScript. Easiest way to view this is to use Firebug, which will show you the actual DOM post-manipulation.

Thursday, June 10, 2010

HTTP status codes and the PRG pattern

The Post-Redirect-Get is a popular pattern in websites seeking to prevent users from reposting data by accident (e.g. refreshing the checkout page and then being charged again.)

Just to refresh, the process is as follows:

  1. User submits data which is POSTed to the server
  2. Server processes form data, and issues a REDIRECT
  3. Browser receives the REDIRECT and the GETs the new page

Because the final step is a GET (and not a POST), refreshing the page has no effect (assuming you’re not posting data via a get, in which case you should have your internet connection revoked).

What you may not know is the there is a standard HTTP response code for this operation – and it’s not 302, it’s 303. It’s a tiny detail, but an important one. If you’re using PRG as a pattern, make sure that whatever server tech you’re using issues the correct status code – it’s the little things that make all the difference. Just ask Apple.

(The ASP.NET MVC Controller.Redirect method uses 302, natch.)

Wednesday, June 02, 2010

South London Geek Night – 14th July 2010

A bit of self-publicity – I’m speaking at the second “South London Geek Night” – no, I’m not telling my family – on 14th July, at the Bedford pub, in Balham.

Topic is NoSQL – it’s only 15 minutes, and the audience is mixed (the lawyer from my previous project spoke at the last one on IPR), so it won’t be overly technical – more of an overview.

If anyone has anything they particularly want to hear about, just drop me a line (or comment here).

Details here –

BBC homepage issues – is CouchDB the culprit?

Some of you may have noticed that the BBC Homepage now has a small notice in it:

BBC homepage screenshot I know that the BBC homepage is one of the key case studies in production use of CouchDB, and the message here (in the comments), suggests that load is the problem:

As I described in an earlier blog post, the new BBC homepage has been built on a whole new technical architecture. Since launching we’ve found an issue with the service we use to save users’ customisation settings. Although we ran a public beta for more than 2 months, this problem only became apparent when we moved the whole audience across to the new site, increasing the load on the platform 20 times. Despite thorough load testing before launch we were unable to accurately predict the type and combination of customisations that users would perform, and as a result we now need to re-architect the way we save your homepage customisation settings in a more efficient way.”

Let’s hope it’s solved, and soon.

NoSQL database as a mock data provider

We’ve been talking through some of the possible areas in which a document database (MongoDb, CouchDb, RavenDb) might be of use to us, and a new one came up today. The dev team were discussing database schema issues, and it seemed to me that we could bypass the entire conversation by using a schema-less database, not for production, but for testing / development.

Whilst the guys spend time refining the domain model, adding / removing attributes, changing relationships etc., the easiest thing to do data-wise is simply ignore it. Use a document database as a very simple half-way house between static mock data providers and a final-cut RDBMS (if that’s the solution). It’s also a great way to get the ops team (LOL) comfortable with using a database technology with which they are unfamiliar. No more excuses.

It’s also a neat way to remove developers from the database creation / optimisation process, should you so wish. They can ignore schema issues, and hand the problem to a database specialist. Or am I dreaming?

In addition, the ease of use means that a single dev server could be used to support multiple developers and test scenarios. Each developer could have their own database for their own abuse, and they could at the same time plug into a single, shared database for more strictly controlled test data. It would also allow testers to prime test databases with data using simpler tools than raw SQL. Everyone is a winner.

I think this probably counts as an update to this post from 2005 - , when I’d just got the hang of the provider model.

Sunday, May 23, 2010

The AQA Language Debate – UK v Vietnam

This article (“Exam board deletes C and PHP from CompSci A-levels”) has received a bunch of Slashdot comment, and I wouldn’t have entered the fray but for my recent trip to Vietnam.

The point of my trip was to see an offshore development team, and very good they were too. One of the things that they drew particular attention to was the fact that at university in Vietnam, they don’t study “things like Fortran”, but “commercial skills” like C++, .NET, Java, and that all students spend a year in industry as part of their course. The point being that all graduates are good to go from the day they leave – they emerge from University ready to contribute.

Contrast this with the perverse nature of the UK education system – no one can persuade me that learning VB6 is more progressive than PHP or C  - it’s absolutely ridiculous, and as for the BCS approving of the decision – well that just confirms everything I’ve always thought about them (beards & sandals).

It also seems completely counter-intuitive from the point of view of the students themselves, especially at A-level. Teach them PHP and they’ve got enough to go and build a website. Teach them VB6 or Pascal and they’ve got enough to what – write an Excel macro? Which do you think they’d rather do?

Just for the record, these would be my language choices:

  • The basics:
    • C, C++
  • Commercial frameworks:
    • .NET (C#) or Java
  • The web:
    • Python, PHP or Ruby
    • Javascript
    • HTML, CSS & XML
  • SQL - can’t really get away without it
  • Just for the fun of it:
    • Erlang, Lisp, Smalltalk, F# ?

Thursday, May 20, 2010

Vietnam thoughts

I’ve been in Ho Chi Minh City (HCMC) in Vietnam for the past few days – Saigon as was – so here are a few thoughts:

  • People are very friendly – and it doesn’t appear to be put on
  • They love mopeds – a lot
  • They have no obvious indigenous culture

I know the last point is a bit controversial, but seriously, on a tour of the city, the points of interest included:

  • Communist party office block, turned museum
  • European Catholic cathedral
  • French Post Office
  • Chinese pagoda
  • Nothing Vietnamese.

Doesn’t stop it being a nice place, however, and the people really are friendly. Only thing to know beforehand is how to cross the road, which does seem to be a genuinely Vietnamese experience. The key is to ignore the traffic, and whatever you do, don’t stop. Just walk slowly through the traffic, and it will all just run around you, like water. It’s a rather bizarre experience.

Streets of Saigon - mopeds

Streets of Saigon - Communist Party banners

Saigon City Palace

Wednesday, May 05, 2010

.NET as a “web development” language.

This is the list of current Open Graph Protocol helper libraries that have appeared in the wild since the protocol was announced at f8. Spot the .NET library. Hang on, there isn’t one.

Open Graph Protocol implementations

That would be because people who really understand websites do not use .NET. And by understand I mean people who get that HTML is a skill, and not what the junior guy in the corner does.

So why is the .NET community so backward in coming forward, again? I blame Sue – I mean Microsoft, with their stupid bloody tools (see previous post.)

(Taken from the Open Graph Protocol documentation -

Sunday, May 02, 2010

What’s in a date?

Can you recall the date on which any online business launched? Furthermore, is there any significant  web-related date that anyone can remember? When did Google launch, or Yahoo!, or Facebook? When did Amazon sell its first book?

I can’t remember the date on which any project I’ve ever been involved with went live. Clearly a project needs a set scope / duration, if only to contain the project scope & manage costs, but with the magic of hindsight, the actual calendar date is irrelevant. What counts is the quality of what is delivered. Google is Google because it’s an amazing product, not because it launched on  … well, you get my point.

(Interestingly, the Google website doesn’t list any significant dates at all – just the year-by-year calendar of events. Wikipedia does list dates – but only for things like the date the domain was registered, the date the company was incorporated. There is in fact, no known date on which the search engine itself “launched”. Perhaps it was always there, and Brin/Page simply discovered it, like gravity, or natural selection?)

The reason dates have such prominence is marketing. Dates are critical in marketing – where things like promotional campaigns and launch events centre around a date. So the problem seems to be the relationship between marketing and production. It’s the age-old issue (/moan) – you should market something that has been built, and not build towards a marketing plan.

I’ve decided to write about this (again!) because of a comment made on a conference call earlier this week. That the launch of a product couldn’t be delayed on account of the parent company’s marketing budget dates – in order make use of the remaining quarterly budget we had to hit a launch date of the 12th May, irrespective of the quality of the product. The account manager was genuinely suggesting that we remove the testing period and launch direct to the public because they had booked a big promo for a given date.

I’m not saying that marketing is a worthless industry – it isn’t – it can be amazingly effective. However, what will ensure the long-term future of the product is the quality of the product, and not the campaign announcing its arrival.

(I think you can expect me to return to this theme fairly regularly, so if you work in marketing and think I’m wrong, please look away now. Or comment.)

Wednesday, April 28, 2010

Everyone celebrate, it’s Notepad day.

I would like to declare today (or possibly tomorrow, seeing as today has gone already, or even the whole week) “notepad day” as a direct result of the frustration that a developer “friendly” tool has created for a site I’m trying to get live.

It’s a pretty simple scenario – a form that needs to be filled in, which the designers have mocked up with nice text box hints, and validation behaviours. All very simple in HTML/CSS/JS terms. The problem arises because MSFT, in their infinite wisdom, decided to make it really “easy” for .NET developers to auto-magically add the correct validation hooks by adding code attributes inside the model class definitions (it’s an MVC app) – i.e. it auto-generates the HTML, a set of dynamic CSS attributes to decorate the output and some JS to manage the behaviour.

So, this model annotation:

Class annotations

Combined with this view:

View definition

Outputs this HTML and associated JS:

HTML output JS outputAs a result, when there is a problem with the UI code and you turn to your HTML developer he then tells you he has no idea where the HTML comes from – it’s a .NET thing. So you ask the .NET developer, who knows where it comes from, but does not know either how to affect the output, or what the output should ideally look like.

There is absolutely no reason for the the Html.EditorFor helper – not only does it not save time, it pulls the .NET developer across the line and into the HTML. (It also means recompiling the code to change the validation which is insane.)

This is not the first time they’ve done this – they have quite a heritage in the trying-to-be-simple-but-totally-missing-the-point stakes:

1. In VB6 there was some crazy model that involved compiled VB6 apps emitting HTML. Never went anywhere. (I can’t even find out what it was called. Although I did come across this – which has to win a prize for most optimistic course organiser – VB6 for Beginners, starting April 2010…)

2. ASP.NET WebForms – instead of respecting HTTP/HTML, they designed a model that involved replicating the eventing model of a local app by manipulating something called the ViewState and which resulted in developers spending most of their time asking themselves in what order the chain of 874 events fired, and where in the stack of 400 child controls the problem lay.

3. MVC – having finally got the message, and adopted the clean lines of the rest-of-the-world’s favoured model, they decide that implementing MVC wasn’t enough. What developers really want/need is a way of ignoring the problem that HTML is harder than it looks, and different to C#. MSFT – please STOP.

A website is more complex than a server app if only because it involves a skills laminate – designers, HTML and server-side all come together around the same output, and in order to allow each of these to do their jobs effectively they need to have clearly-defined roles. HTML developers are responsible for what is displayed in the browser, and they can’t take on this responsibility if HTML is being cooked up behind the scenes by some opaque process.

I presume this happens because MSFT are keen to look after the smaller development shops where they don’t have the luxury of split roles, and one developer is doing everything, but it’s a royal PITA for teams who want to do the job properly, and only serves to undermine the impression that non .NET designers / HTML developers / project managers / testers etc. have of the platform. Which is unfair, as it is actually rather good.

Tuesday, April 27, 2010

Did ancient Egyptians Tweet?

One of the enduring mysteries of human civilisation is the question of how the world “forgot” about hieroglyphics. Despite the fact that they were used by the most advanced civilisation of its time, for hundreds of years, people seem to have moved on and simply forgotten to take a translation with them. Without the discovery of the Rosetta Stone (which you can read about on Wikipedia if you don’t already know about it) we’d still be in the dark.*

That could never happen now could it? Well what happens if people stop reading / writing books, and start communicating only through Facebook updates and 140 character tweets. Evolution will see us to a point where an ever decreasing number of people actually produce any thoughts / words of their own, and everyone else simply retweets, or worse still, just clicks on the Like button.

And what happens then when for some reason (EMP, Solar flare, who knows) all that data is lost?

Will archaeologists look back on the early internet years as a new Dark Ages – where, despite an explosion in the volume of communications, very little of value survives?

Just a random thought that came up in office conversation today*, but it could happen… couldn’t it?

* Our thinking in the office was that papyrus had a similar effect on the Egyptians - they all upgraded from sandstone tablets to the fancy new "paper", only for someone to accidentally burn down the entire national archives.

Friday, April 23, 2010

Will Google ever ‘Like’ Facebook?

So the new web is semantic & social and search is important. Facebook has made its play for the first two, and the assumption is that it's going to make a play for the third (search) as part of its new web land grab (AKA Kill Google).

But even if Facebook is serving up billions of Like buttons, whilst that means it can theoretically catalogue the web and apply personalised recommendations, it still has to do the engineering. And whilst Facebook isn't short of clever developers (quite the opposite), its 300 engineers have nothing like the firepower required to beat Google at this (I know this is hardware not software, but as an indication of the size of Google it’s pretty impressive -

With valuations where they are there's nothing to prevent Facebook from acquiring the rocket scientists it requires - but why bother? 99.9% of the web is still unstructured, and making sense of that is Google's home ground (great Wired article on search query semantics here -

What is really required (by us, the users) is a combination of the two – Google’s algorithmic scientists and search experts working with Facebook's recommendations and social graph. This would give us what we want – a searchable, personal, semantic web. And this begins with Google adding the Like button to its search results. (And ends, slightly unrealistically, with Facebook allowing Google to index its data).

Of course Google doesn't own part of Facebook, Microsoft does. So the best we can hope for in the future is something like this, but using Bing - FaceBing, or perhaps BingBook?

In the end this is all about ad revenue, and no one seems inclined to share that right now, so although it makes for an interesting user experience / engineering challenge, it's not going to happen.


[Update: Not sure when this appeared, but the Facebook developers’ site search results page now sports a natty blue Bing logo. Has Microsoft out-flanked Google with its purchase?]

Thursday, April 22, 2010

What f8 means for you, me, Google & the web.

Having had a night’s sleep since my slightly hysterical twitterings from the f8 conference keynote (below), here is a slightly more considered post.

The combination of the new Facebook Like button and their Open Graph protocol could kick-start the semantic web (finally). Is this a problem for Google, or an opportunity?

(Just to recap – the Open Graph Protocol (OGP) is a set of HTML meta tags that allow developers to apply semantic markup to a web page - Facebook may have just killed off the microformats community in the process of defining their own protocol. The Like button is a way for users to interact with a site – they may have killed off Delicious and Digg with this one.)

Applying OGP tags to the HTML of a web page applies the necessary metadata to allow semantic interpretation of the page (whether it represents a “movie” or a “book” for instance). That said, there is no barrier to Google using this data to improve its indexing of the web. Just as Facebook can now see that an IMDB page represents a “movie”, so can Google. The semantic web is now a reality.

The real dilemma is this: Google’s secret sauce, and the thing that gave it its mojo in the early days, was its understanding of the “popularity” of any given web page, which it measured (and probably still does) by applying complex algorithms to the value of links in from other web pages. It’s a bit like the ATP Tennis rankings – you gain more points from winning matches, but the number of points you gain depends on who you’re playing, and at what stage of the tournament. A link to your site from the BBC website is worth more that a link from this blog. So far, so normal.

Facebook is of such a scale these days that it is now in a position to apply its own re-interpretation of the web – at least any page that any of its 400 million members have come across (which you have to assume is most of it). Facebook’s Like button will give it access to explicit recommendations of pages across the web. The Like button alone allows Facebook to compile an index of the web that connects you to any page via a sequence of recommendations, each of which has a relevance ranked not only by the “influence” of the recommender, but by their relationship to you, as a person. At the same time, each page that implements the OGP tags can be indexed semantically and placed in the correct context. (And just as Google promoted the use of correct HTML through its SEO recommendations, Facebook will encourage the use of OGP through its involvement. I would strongly recommend any web developer to implement OGP tags in their website, starting today.)

The net result is a search engine that is not only based on the recommendations of people I know, but that actually understands the context and content of the pages it is indexing. This could be the holy grail of search.

This is the beginning of the next phase of the Web (3.0?) – post-Google, semantic and personal. You have been warned.


Guilty pleasures

One of the pleasures of working in a creative office as a techie is in seeing the creative process in full flow. I can’t give any specifics, but at the moment one of my favourite websites is being redesigned by the person sitting next to me.

How many people get the chance to say to the designers of a successful site - “please don’t change X” and actually have that taken into account?

It feels a bit like stepping in wet cement – you know you shouldn’t do it, but the opportunity to have your footprint captured forever (or until the next redesign) is too tempting to ignore.

So, apologies in advance about the colour scheme on www.{…}.com – I just like it that way.

Tuesday, April 20, 2010

SEO 101 – More notes from Google

Why listen to what I have to say, when Google says it themselves -

It’s a great starter document for anyone interested in what it takes to get up and running with SEO, and includes the following basics:

  • URL structure
  • Site structure / sitemaps
  • HTML best practices
  • Managing site crawlers (use of robots.txt)
  • Site submission / webmaster tools
  • Useful further reading

Less edifying is the article on hiring an SEO consultant, which includes this gem:

While SEOs can provide clients with valuable services, some unethical SEOs have given the industry a black eye through their overly aggressive marketing efforts and their attempts to manipulate search engine results in unfair ways. Practices that violate our guidelines may result in a negative adjustment of your site's presence in Google, or even the removal of your site from our index.

You have been warned (article here -

Monday, April 19, 2010

Doing Things Right (#DTR)

Today I’m starting a movement called “Doing Things Right” in response to the success of the Getting Things Done (GTD) movement.

Doing Things Right means exactly what it says. The low-hanging fruit that I’m using to kick off the movement is HTML development. As I’ve mentioned here before, everyone involved in web development should understand two things before anything else – the basics of the HTTP protocol, and how to format good HTML. 

Any web development team should have someone who is held responsible for ensuring that the HTML emitted by any dynamic server-side code (.NET, Java, Python, PHP, RoR, doesn’t matter what technology is used) is not only well-formed, but semantically meaningful. (And at the same time, let’s all hope that Webforms get caught in the cross-fire.)

I’ll keep a list here of references to good practice, and will mark any Twitter posts that are relevant with #DTR.

First off is this link to the POSH (Plain Old Semantic HTML) wiki -, which is a great place to start. I would also call out this article (referenced from the POSH article) - – which is a great walkthrough of the steps taken to migrate from lazy HTML to good HTML.

Wednesday, April 14, 2010

iPad apps that impress

I’m not a fan, but I have now seen some iPad apps that impress. If you do own one, and want to show it off, try one of these:

Gap1969 – this is a really good ecommerce example, with great no-boundary scrolling – the entire shop is on one screen, and you just zoom around in all directions. Not sure how usable it is long-term, but it’s a great start.

Alice – this has already received a lot of online press – and deservedly so. You might not want to sit down for two hours to read a book on the iPad, but this example of an animated book/app is outstanding. If I had kids, this alone would make me buy one.

Toy Story – I guess it helps if you own Pixar, but again this is outstanding. It looks like interactive kids books are the run away success so far (comics look great on the iPad as well).

Dr. Seuss can’t be far behind.

(I’ll keep the list updated as I come across more).

FreeAgent + GetSatisfaction = Happy Customer

I’ve started using FreeAgent to manage consultancy projects (thanks Glyn), and it’s a fantastic product – it does exactly what it needs to do, it’s incredibly flexible – you can use it to manage cashflow, timesheets, invoicing, tax/IR reminders etc.

When backfilling some detail this morning I realised that I had no billing rate set up against a project, and no obvious way of adding it in. (It was only for completeness – as FreeAgent allows you to edit the invoice in situ it doesn’t matter if fields aren’t set up correctly to begin with – which is how it should be – use the data input to guide the user, without boxing them in.)

FreeAgent uses GetSatisfaction for their feedback, so I figured instead of trying to work out how to set the billing rate, I’d just ask. Lo and behold, I received an answer within five minutes of posting the question – GetSatisfaction is as good at managing feedback as FreeAgent is at managing small business finance – together they make for a very happy customer. No hunting through the FAQs, no “” black hole – just a great customer experience, from start to end.

Thank you both, and if you’re a contractor / small business struggling with finance, use FreeAgent, and if you’re looking for a feedback management service (used to be called CRM) take a look at GetSatisfaction.

Tuesday, April 13, 2010

Is a front-end developer really a ‘developer’?

Now that HTML has risen back to the surface as a (the) first-class citizen of the web, along with it's siblings, CSS and JS, it’s time to revisit its place in the developer landscape.

Several recent projects have suffered from a lack of attention to the subtleties of the front-end code, but the ubiquitous use of AJAX, the emergence of HTML5, and the enforcement of good HTML practices by Google’s indexing process (see previous posts) mean that we can no longer ignore the value of good front-end developers, and that we should appreciate that a “good” front-end developer is worth considerably more than a “cheap” front-end developer. It’s no longer a commodity, and those at the top of their game should expect to earn as much as their back-end platform peers.

All of which leads to the question of the day – is it easier to teach an HTML/CSS/JS developer some basic PHP/C# etc., or to teach a C#/Java developer HTML/CSS/JS?

I think the answer today is the former – I would expect anyone claiming to be a front-end web developer these days to have a deep understanding of languages (not just how to use JS, but a real understanding). Hard-core platform developers have to stop looking at the front-end team as being involved with the fluffy stuff.

[Best analogy I can think of is Rugby union – to quote Peter Fitzsimmons (NZ), back in the 1970/80s:

Forwards are the gnarled and scarred creatures who have a propensity for running into and bleeding all over each other.”,


backs can be identified because they generally have clean jerseys and identifiable partings in their hair"”.

And then along came Jonah Lomu, and now you can’t tell the difference – the backs are 6’7” tall and the front row endorse ‘male grooming’ products.]

Wednesday, April 07, 2010

SEO handbook – Notable app SEO report

Looking around for something to allow for online collaboration for designers (Basecamp+, allowing annotation of designs), I came across Notable (, which looks just the job. Not only can you upload designs, you can import them direct from a URL, or “clip” them using the Firefox browser plug-in.

Better still, if you do upload the HTML source, it not only gives you a screenshot of the page design to annotate, but it provides a very neat SEO report – see below for its review of the Wired HTC HD2 review:

notable_seo_report_wired This is a great one page report, and well worth signing up to Notable for.

User interface design – ultimate AJAX interaction

Whilst I’m on the subject I may as well share my favourite experience on the web – and the moment AJAX started to work for me.

I can’t honestly remember when Google rolled out the “star”, it may well have been there from the start – but it was certainly when I first started to see the web as an O/S.

Screenshot of Gmail "star" interaction

For those of you who don’t know how this works (it’s popped all over the place since) - it works simply by clicking on the star – at which point, and with no screen refresh, the message is starred, forever.

Not out of the ordinary today, but pretty radical back in the day, and I’m still surprised by how few sites have emulated this (Twitter being the notable exception).

User interface design – rollover options

Just a very quick post about user interface design. I love Twitter from a UX point of view – particularly the “More” concept (no more paging), but I wanted to pick out something else that it seems to have pioneered – the use of hidden controls.

The Twitter interface is kept very clean by having no interaction controls (retweet, reply, etc.) visible until you mouse over a particular tweet. It’s a very simple, very neat solution to a perennial problem, and YouTube seem to have taken their lead (see screenshots below). Facebook has eschewed this approach, and has links for “Comment, Like, Add as Friend” on each post.

Twitter stream (showing rollover options):Screenshot of Twitter stream showing rollover optionsYouTube comments (showing rollover options)Screenshot of YouTube comments showing rollover optionsFacebook stream options:Screenshot of Facebook stream showing rollover options

Saturday, March 27, 2010

SEO 101 (pt 3) – the search results page

  • Click here for part 1 – how search works
  • Click here for part 2 – anatomy of an HTML page

OK – so you’ve managed your SEO brilliantly, and your website appears on the first page of all the major search engines (that would be Google) for all of the keywords you’re monitoring (you are monitoring keywords, aren’t you?)

Unfortunately the job isn’t yet complete – good SEO may get you onto the results page, but the final decision isn’t up to Google, it’s up to the user, who has to decide which of the results most closely matches their query. Fortunately, there are several ways in which you can manage your appearance in the list, and it’s really not that hard.

Page title – it’s the first thing people see in the list of results – make sure it includes your name, and something about the site. Google only shows the first 60 characters or so, so make it pithy.

Description – it may not be used in the ranking, but it will appear on the search results page, so again, make it count, and try and imagine how it will read to someone who doesn’t know about you already – an overly-clever marketing strap-line may look good on the homepage, but may be misunderstood when taken out of context.

Site links – you can’t control site links, they are auto-generated, but you can remove specific links if you wish, using Google’s Webmaster tools. If you see something you don’t like, get rid of it.

URLs – search engines are very specific with regard to URLs. At a technical level, developers and network experts often do clever things to make sure that people are always directed to the correct page, but this may actually harm your ranking, as the search engine may split your “ranking” across the URLs (e.g. and may resolve to the same logical page, but to a search engine they are different pages). In terms of SEO, the recommendation is to consolidate URLs using the standard HTTP response code 301 to redirect all traffic to a single URL. (As a side note on this one, you should make sure you are using the analytics to understand where people are coming from if you are getting a lot of traffic on an unwanted URL. Affiliate sites are notorious for this.)

The best way to achieve all of this is to start at the end and work backwards. What do you want your site to look like when it appears in search results list? Look at your competitors and see what they do – use a bit of cut-and-paste magic to fake a screenshot that has all of your competitors on the same page, and then print it out, pin it up and make up your own mind – would you choose your own site?

SEO 101 (pt 2) – anatomy of an HTML page

  • Click here for part 1 – how search works

SEO isn’t only about the structure of your HTML pages – site structure, URL composition, HTTP response codes and PageRank** all count too – but the heart of any search engine is the indexing of HTML content.

It’s all too easy these days to auto-generate a boilerplate web page and then extend it  with content, but ignoring the structure will come back to bite you, if SEO is important to your business.

Below is a (very) simple guide to the basics – and if you need to go deeper than this, check out Google’s own documentation.

 Sample HTML page, illustrating SEO elements In summary: every part of the page is important, and even if we don’t know exactly how Google treats the contents, we do know that they reward those who pay attention to detail.

  • Click here for part 3 – the search results page

** There is some debate on the web as to how significant this is these days, something which Google themselves seem to endorse -

SEO 101 (pt 1) – how search works

(NB: If you want to know about SEO in more detail, go and visit Glyn’s blog.)

So how does a search engine work? It’s very complicated in reality, hence why Google only employs such clever people, but the principals are pretty straightforward.

Crawling – the first thing a search engine needs to do is know about all the pages it needs to search through. This involves lots (and lots, and lots) of small programs (“spiders”) scraping their way through the entire content of the internet – they follow every link, and scuttle back to base with the contents (HTML) of every page. When a new page is found, links within that page are added to the backlog of pages to crawl, and the spiders just keep on doing their thing until the job is done. Which is never. Think you’ve got it bad at work?

Indexing – once a page has been harvested by the spiders, the content within the page is indexed. This is part one of the secret sauce – using upwards of 200 distinct attributes of a page, Google will pull it all apart and strip out what it thinks it all means. The index is what is used to match your query to the library of web pages that Google knows about.

Query semantics – if I type in “bread”, am I looking to buy some online, make my own, or watch old episodes of the 1980’s sitcom of the same name? Who knows, but this really is rocket science. Spooky stuff, but it includes things like common phrases, popular abbreviations, semantic deconstruction of sentences, plus knowledge about you, your country etc. A PhD in Philology probably helps with this bit.

Ranking – given the size of the internet, you can type in pretty much anything and get a zillion matches between your query and Google’s index, so the next step is putting them in some kind of order. No one really knows how this works – Google used to use something they called PageRank, which was the original secret sauce, but apparently even that is less important than it used to be (see here). Whatever it is, this is bit that’s hard to predict, so your best bet is not to bother – neither you, nor anyone selling their services to you, can game Google (more than once!). Just stick to the basics, and make your website as simple to index as possible. (That’s not strictly true – it’s not totally opaque. Being really popular does help your ranking, hence the proliferation of “link sites” which superhighway robbers use to try and force up PageRank for a site. They don’t work, and may in fact get you removed from Google’s index altogether – avoid like the plague.)

If you’re interested in finding out more about Google, the best place to start is Google itself – they even have some instructional videos - In fact, I should have just posted this link to begin with.

  • Click here for part 2 – anatomy of an HTML page
  • Click here for part 3 – the search results page

Eric Schmidt deserves a mention

I don’t know how this slipped through the net, but had always assumed that when Eric Schmidt took over the reigns from Sergey Brin and Larry Page that Google had lost some of its engineering roots and that the money men had taken over.

Which made it all the more impressive that it somehow managed to maintain its bewildering rate of technical progress (and prowess). I have a pet theory that tech companies run by techies do better than those run by accountants (although you do need an accountant – I’m not suggesting you don’t), which Schmidt seemed to contravene.

Turns out he’s head of the nerd-herd – co-author of a popular UNIX program called lex, and ex-CTO of Sun, so he’s the real deal. He’s also a pioneer of the 70:20:10 time management model, hence the 10% time.

Eric Schmidt, I did you a disservice, and I apologise.

(Oh, and if you run a company that makes / sells software, and you can’t read code, go buy a book. At least look like you’re making the effort.)

Friday, March 26, 2010

iPad – what’s it for again?

Sitting here in an airport (with free wi-fi – are you listening BAA?) about to embark on six hours of travelling, involving two flights and a bunch of hanging around, I feel like I should be in an iPad advert. Surely this is what the iPad was invented for?

And yet, I’m struggling to think of what I would do with it that doesn’t require both an internet connection and a physical keyboard.

A touch-only wi-fi slate really wouldn’t help me right now. I love the idea of having all my daily news appear Minority Report style, and being able to swipe my way through colourful online magazines, but at the end of the day I also need a keyboard, and the ability to work offline.

You don’t need a big screen for music, an iPod will do, so the only real value is in watching movies, which I can do already on my laptop. My laptop also has the advantage of a hinge, so whilst its base is sitting flat, the screen is at a convenient angle.

There is a solution to this, apparently. As the BBC Click presenter so gushingly put it – it’s not until you see it propped up on its stand and with the physical keyboard attached that you really get it. No, I thought – you really don’t get it – it’s now like a laptop, only worse, as all the bits that were so carefully engineered to fit together have been taken apart and spread out all over the table. As a laptop, it’s rubbish.

[Disclaimer: All of the above ignores the fact that I still want one, and it will undoubtedly be a great success. I  just won’t be getting rid of my laptop any time soon.]

Thursday, March 25, 2010

Social media tracking products

[Update: Katie, from Radian6, responded to this (see comments below) in record time, so they are the winner. Being serious for a minute, it was very impressive. I’m still sceptical and think that this has more to do with Katie herself, and the fact that she’s good at her job, than the merits of any specific product, but then what do I know? Not as much as she does, that’s for sure, so why not talk to her instead - @misskatiemo on Twitter. Oh, and buy her product – it’s brilliant, as has just been demonstrated.]

So, now that I’ve become obsessed with tracking myself online (see previous post), I’ve uncovered a very healthy sub-culture in social media tracking. There are lots of products one can use to track Twitter, Facebook, Digg, etc. with a bewildering array of different feature sets and target markets. It’s clearly a nascent industry (no fixed sales pitch). Let’s see if we can help it along.

Social media tracking seems to boil down to scanning various social networks for mentions of specific keywords, retweets etc. The aim is to track everywhere your name / product is mentioned, and if you’re really on the ball you can use the tools to “respond effectively” to the general chatter. Advanced features include things like “sentiment analysis” to help you understand whether people are saying nice things or not. You could just read them of course – you can get through a hell of a lot of tweets in a short period of time if you’re really trying.

Apparently Eurostar is the case study in getting this wrong – when their trains got stuck in the winter they were very slow to respond to a very active, and understandably upset, community of marooned passengers. If' only they’d bought a copy of Radian6, they’d have been fine.

If anyone reading this decides to tweet about it we may able to drive the social media tracking industry into a recursive search about itself, which can only help to drive up their collective profile.

  • Flowdock – it’s Finnish, it’s RoR, it probably does stuff you don’t understand. Be warned. In public Beta, so it’ free – get it now.
  • Raven – it’s not written by Ayende, but don’t hold that against it. Looks good, $79pcm, free 30 day trial.
  • Sysomos – they’re “redefining social media analytics” apparently, which generally means they aren’t. Includes “Automated Sentiment” tracking. Think they pinched that from Scrumbot? No free trial (why do people do that – they’ve just lost me already – that’s 0% conversion rate from my visit – put that in your sentiment-meter Sysomos.)
  • Scoutlabs – quite pricey, at $199pcm, but with a 14-free trial. Coloured chart, graphs, all that stuff. Looks like an attempt to make social media tracking look like watching the stock market – i.e. grown-up, and neat.
  • Radian6 – the uber-dashboard – with an annoyingly sincere video to accompany their product launch, which goes on about the “game-changing” nature of their product. I think that means it’ll be really, really expensive (pricing is still TBA). They’ve also made up their own catchy phrase for all of the noise on the internet – the Social Phone. I presume they mean a phone at the bottom of a handbag, in a noisy bar, that auto-dialled your number at 3am whilst you shout loudly down the other end trying to get someone to listen to you?
    [Update: I still think the video is over-kill, and I don’t like the Social Phone, but the product does indeed seem to work.]
  • Unilyzer – as previously posted, this one is all about the stats – though you may need a PhD to decipher them.

Of course, given the nature of products I am assuming that someone from all of the above companies will see and respond to this post – given that that’s the point of them?

An honourable mention (and retraction of any unfair criticism) to the first person to do so.

I, Internet

In a matrix-style revolution I have apparently merged myself into the very fabric of the internet. Or so it would appear from my personal dashboard from Unilyzer. I think it’s really for companies who want to monitor their online presence and engage with the inter-youth everywhere and anywhere, but it’s very good for a spot of personal navel-gazing.

I haven’t the foggiest what it’s telling me, and I’m not that happy about the number of zeroes in it; the fact that my name is the top item in the tag cloud also suggests that I talk about myself a lot. Although very rarely in third person – Hugo Rodger-Brown doesn’t do that.

Go get yourself one – it’s free for a single account.


Wednesday, March 24, 2010

Ecommerce Stakeholders – Mindmap available

I’ve published another Mindmeister mindmap, this time on the subject of “Ecommerce Stakeholders” -

When working on a large ecommerce deployment, it’s all too easy to concentrate on the problem(s) right in front of you – i.e. how to hit the deadline – at the expense of the bigger picture. Ecommerce websites often exist within a complex corporate structure that includes legal, financial and marketing functions amongst others, and not engaging with these groups at the earliest opportunity is a very easy shortcut to take. If you do take this shortcut, be prepared to repent at your leisure, as the finance department will gladly can your launch rather than letting it go ahead without sufficient testing.

All of the stakeholders should be included in the entire lifecycle, from requirements gathering to final pre-launch testing and sign-off (sign-off being critical in terms of stakeholder management – people need to understand what they’re getting, and how to determine when it’s ready to go).

Of course this works both ways – it would be nice sometimes if the marketing department could wait until the site is built before launching their $$$ advertising campaign, but such is life.

Anyway – it’s public, so please use and update as you wish.

Ecommerce stakeholders mindmap

Tuesday, March 23, 2010

Site design is not just Photoshop

In order to try and explain to a design team that I’m working with that the site-design-by-photoshop approach is not delivering, I have created the mindmap below on Mindmeister. It’s supposed to illustrate all the things you should think about when designing a web UI, from the look-and-feel, through to how the page structure can affect analytics, SEO etc. I’ve made the map public, as I think it could become a useful tool for people when trying to explain to extended team members why, for instance, putting the entire site on a single page with lots of AJAX isn’t a good idea, however nice it looks.

URL is - – and it’s editable by anyone.

Effective user experience design mindmap