Sunday, August 22, 2010

Google’s “High Performance Image Serving”

[Update] I stand corrected - I do now have billing enabled on my Apps account, and can confirm that images are served with all the correct response headers set. The URLs do indeed support 304 conditional GETs on the production infrastructure, which makes this a very attractive image hosting solution. Well done Google, apologies for the mis-representation.

(NB – this article is based on the development SDK (1.3.6) – I haven’t been able to test on the production infrastructure as yet as for some reason Google won’t authorize billing on my AppEngine account, without which the Blobstore is unavailable!)

The concept of a hosted service that manages image serving / caching, resizing & cropping is something that has cropped up in projects I’ve been involved with for the past 5 years (and the rest).

I had a hand in a startup about 6 years ago now that processed MMS (picture messages) that people sent in. At its simplest this involved storing people’s uploaded images, and then cropping / rotating and resizing the images to fit a certain profile (we were printing them out, and needed to convert landscape to portrait, and crop to a fixed aspect ratio.)

Subsequent to that experience I worked for many years in online digital entertainment, developing a platform for the processing of music / video assets, and one of the elements we also needed was the ability to take in hi-res images, shrink to an acceptable size / format (e.g. not TIFF), and then host them for delivery across a CDN.

The project I’m currently working on has a combination of the two – processing user-generated content for serving back over the web. What we need is an image processing service, backed with a large data store, and a high-performance cache.

So, it was with great excitement that I noticed that the Google App Engine SDK comes with an in-built library to do exactly this. It’s built on top of the Picasa library (it even includes the “I feel lucky” transformation), and enables cropping, resizing, rotation etc. The App Engine platform has no file-backed storage, but the datastore does include the BlobProperty type, which can be used to store binary data (such as images). A simple image processor using this took about fifteen minutes to set up (which was mainly cut-and-paste from their sample app here).

(Some people may by now be thinking “Tinysrc” – the online service that resizes images for mobile screens – well, no surprises, tinysrc runs on AppEngine – this is precisely how they do it, except that they pull the images from a remote server – they are not stored.)

The downside of this approach is the the datastore has a 1MB limit per entity, which makes it borderline useful if you’re dealing with UGC (web-optimised images should never be 1MB, but the image someone just uploaded from their new digital camera could easily be.)

Fortunately, Google provides a secondary datastore specifically for large binary objects, called the Blobstore. It’s well documented (here), so I won’t repeat that, but what I can say is that it integrates directly with the Image api (see here). There are some complex limits about the amount of data you can process, so read the article carefully, but suffice to say it can be done. (See here for a nice example of the blobstore / image api interaction.)

A killer function, which has been publicised this last week (as “High Performance Image Serving”), is the “get_serving_url” function in the Images api – which takes in a Blobstore object key, and returns a fixed URL that can be used as the static image URL.This looks almost like a Google CDN – the ability to serve images as static content, with the ability to crop / resize on the fly (albeit using fixed sizes) thrown in for free.

[Updated – see intro] And yet… if you set up an image service using these amazing (and practically free) resources, you’ll find a fairly large hole in the implementation. It’s our old friend HTTP status codes. The fixed URL exposed by the images service does not support a 304 (content unmodified) status – meaning that every time you call for it, you get the whole thing, increasing server bandwidth and client download times. (See introductory note – this may just be a development server issue – TBC.)

I can only assume that this is deliberate – as Google gets its money from the bandwidth charge. It is however extremely annoying.

Links:

2 comments:

Russ said...

Nice feature of 'High Performance Image serving' - geo distribution. Playing with this feature I got an image hosted on 'lh5.ggpht.com'. If you traceroute this domain from two different locations you'll get served up an IP address 'closest' to you.

Hugo Rodger-Brown said...

Russ, that's great, but if it returns the image each time you request it using a 200 HTTP status code, rather than respecting a conditional GET and returning a 304 this is of limited use as a CDN, as minimising bandwidth is a key CDN requirement. Furthermore, blobs stored in the Blobstore are immutable, so they are implicitly cacheable since the content at the end of the static URL can never change; which makes the lack of 304 support even more baffling.

It's hard to understand why they have done this; the argument (made by me!) that this is a commercial decision so that Google can charge for the service via the bandwidth allocation is a hack - Google could easily have a separate billing quota for the fixed URLs.