Saturday, March 27, 2010

SEO 101 (pt 1) – how search works

(NB: If you want to know about SEO in more detail, go and visit Glyn’s blog.)

So how does a search engine work? It’s very complicated in reality, hence why Google only employs such clever people, but the principals are pretty straightforward.

Crawling – the first thing a search engine needs to do is know about all the pages it needs to search through. This involves lots (and lots, and lots) of small programs (“spiders”) scraping their way through the entire content of the internet – they follow every link, and scuttle back to base with the contents (HTML) of every page. When a new page is found, links within that page are added to the backlog of pages to crawl, and the spiders just keep on doing their thing until the job is done. Which is never. Think you’ve got it bad at work?

Indexing – once a page has been harvested by the spiders, the content within the page is indexed. This is part one of the secret sauce – using upwards of 200 distinct attributes of a page, Google will pull it all apart and strip out what it thinks it all means. The index is what is used to match your query to the library of web pages that Google knows about.

Query semantics – if I type in “bread”, am I looking to buy some online, make my own, or watch old episodes of the 1980’s sitcom of the same name? Who knows, but this really is rocket science. Spooky stuff, but it includes things like common phrases, popular abbreviations, semantic deconstruction of sentences, plus knowledge about you, your country etc. A PhD in Philology probably helps with this bit.

Ranking – given the size of the internet, you can type in pretty much anything and get a zillion matches between your query and Google’s index, so the next step is putting them in some kind of order. No one really knows how this works – Google used to use something they called PageRank, which was the original secret sauce, but apparently even that is less important than it used to be (see here). Whatever it is, this is bit that’s hard to predict, so your best bet is not to bother – neither you, nor anyone selling their services to you, can game Google (more than once!). Just stick to the basics, and make your website as simple to index as possible. (That’s not strictly true – it’s not totally opaque. Being really popular does help your ranking, hence the proliferation of “link sites” which superhighway robbers use to try and force up PageRank for a site. They don’t work, and may in fact get you removed from Google’s index altogether – avoid like the plague.)

If you’re interested in finding out more about Google, the best place to start is Google itself – they even have some instructional videos - In fact, I should have just posted this link to begin with.

  • Click here for part 2 – anatomy of an HTML page
  • Click here for part 3 – the search results page

No comments: