Wednesday, April 03, 2013

Status codes are not for humans.

Last week I finally managed to provoke our tech lead into using the 'F-word' in conversation. The trigger for this outburst was none other than HTTP status codes, and my desire to invent a new one. This was clearly beyond the pale.

I have since retreated from my original stance (a custom status code), but have committed a change to our dev branch to use an existing, but rarely seen, status code - 422 "Unprocessable Entity" (more on the situation in which I return this later).

This code is an extension to the base set of codes, proposed as part of the WebDAV extensions to HTTP 1.1 in June 2007 (citation):
The 422 (Unprocessable Entity) status code means the server understands the content type of the request entity (hence a 415(Unsupported Media Type) status code is inappropriate), and the syntax of the request entity is correct (thus a 400 (Bad Request) status code is inappropriate) but was unable to process the contained instructions. For example, this error condition may occur if an XML request body contains well-formed (i.e., syntactically correct), but semantically erroneous, XML instructions.
This status code has seen increasing adoption within the HATEOS / REST community, however it's not without its detractors - as the comment thread with this post illustrates.

My particular crime is not just to use this code, but to use it in a classic vanilla HTML scenario - in the event of form POST validation errors.

It has always seemed to me a peculiar anomaly that whilst the PRG pattern and the use of 3xx status codes (albeit typically the 'wrong' code, 302, instead of the code specifically designed for this situation - 303) has become ubiquitous, it is perfectly acceptable to return a 200 OK status when the form POST is rejected (because some validation has failed).

The common argument against this is that form validation is an application specific issue, and therefore not the responsibility of the protocol used to the transmit the data, but I think that's a thin argument at best. Additional, accepted, status codes exist for "Payment Required (402)", and "Forbidden (403)", both of which relate to the underlying application, and not HTTP per se.

The 2xx code block specifically states:
This class of status code indicates that the client's request was successfully received, understood, and accepted.
But what if the request was successfully received, understood, and not accepted. Surely this is a valid use case - in fact I would suggest that it's one of the canonical HTML/HTTP use cases. And whilst the specific rules around acceptance are application specific, the generic concept of 'Request Declined' is ubiquitous - in 15 years of web development I've never seen an application that does not support this scenario. And the rules around this are no more application-specific than the 'Payment Required' example.

It is (IMO) unfortunate that people who are looking to support this as a valid 4xx exception are having to shoe-horn this into the 422 WebDAV extension, but my suggestion for a new 421 code (421 on the basis that it's between 422, which is the closest formal code, and 420, which Twitter appropriated and therefore also made-up) seems to have really upset people, so I'm happy to stand down on that. (And in fact, re-reading 422 now, it is basically what I'm looking for - just would rather it wasn't within the context of a WebDAV submission. My ideal outcome would be to rewrite the description of 422 in a more generic sense, and have it adopted as the de facto response in the event of a request validation error.)

All of which brings me to the real purpose of this post. In researching 422, I have come across endless discussion around the rights and wrongs of HTTP code assignments, but not one post or comment on what, to me, is the main reason for adopting status codes at all (beyond the functioning of the web, of course) - which is to facilitate testing and analytics, i.e. making it easier for computers to understand what is going on without having to read HTML.

Status codes are transparent and invisible to end users, but stick out like a sore thumb in logs, which makes them invaluable for analysis. A log file that includes nothing but 200s or the occasional 404 provides very little insight in to what is really going on.

Similarly, having to interrogate the HTML body of a response to understand whether the HTTP request was "received, understood and accepted" or not is painful at best, and often misleading.
HTTP status codes aren't there for humans, they are there for non-humans, whether that be the routing infrastructure (understanding what to cache), or HTTP clients (understanding what to do next).

And remember, browsers are not the only clients.

(PS - to everyone who disagrees with me, bear in mind that I do understand your objections, I just don't agree with them. I'm trying to make the web more useful, not studying for my Masters.)

[UPDATE]

I found this in the original HTTP 1.1 specification:

HTTP status codes are extensible. HTTP applications are not required to understand the meaning of all registered status codes, though such understanding is obviously desirable. However, applications MUST understand the class of any status code, as indicated by the first digit, and treat any unrecognized response as being equivalent to the x00 status code of that class, with the exception that an unrecognized response MUST NOT be cached. For example, if an unrecognized status code of 431 is received by the client, it can safely assume that there was something wrong with its request and treat the response as if it had received a 400 status code. In such cases, user agents SHOULD present to the user the entity returned with the response, since that entity is likely to include human- readable information which will explain the unusual status.
You can make up your own mind from this (I have).

No comments: