Last week I finally managed to provoke our tech lead into using the 'F-word'
in conversation. The trigger for this outburst was none other than HTTP status
codes, and my desire to invent a new one. This was clearly beyond the pale.
I have since retreated from my original stance (a custom status code), but
have committed a change to our dev branch to use an existing, but rarely seen,
status code - 422 "Unprocessable Entity" (more on the situation in which I
return this later).
This code is an extension to the base set of
codes,
proposed as part of the WebDAV extensions to HTTP 1.1 in June 2007
(
citation):
The 422 (Unprocessable Entity) status code means the server understands
the content type of the request entity (hence a 415(Unsupported Media Type)
status code is inappropriate), and the syntax of the request entity is
correct (thus a 400 (Bad Request) status code is inappropriate) but was unable
to process the contained instructions. For example, this error condition may
occur if an XML request body contains well-formed (i.e., syntactically correct),
but semantically erroneous, XML instructions.
This status code has seen increasing adoption within the HATEOS / REST
community, however it's not without its detractors - as the comment thread
with
this post
illustrates.
My particular crime is not just to use this code, but
to use it in a classic
vanilla HTML scenario - in the event of form POST validation errors.
It has always seemed to me a peculiar anomaly that whilst the PRG pattern and
the use of 3xx status codes (albeit typically the 'wrong' code,
302, instead of the code
specifically designed for this situation -
303) has become
ubiquitous, it is perfectly acceptable to return a 200 OK status when the
form POST is rejected (because some validation has failed).
The common argument against this is that form validation is an application
specific issue, and therefore not the responsibility of the protocol used
to the transmit the data, but I think that's a thin argument at best.
Additional, accepted, status codes exist for "Payment Required (402)", and
"Forbidden (403)", both of which relate to the underlying application, and
not HTTP per se.
The 2xx code block specifically states:
This class of status code indicates that the client's request was
successfully received, understood, and accepted.
But what if the request was successfully received, understood, and
not
accepted. Surely this is a valid use case - in fact I would suggest that it's
one of the canonical HTML/HTTP use cases. And whilst the specific rules around
acceptance are application specific, the generic concept of 'Request Declined'
is ubiquitous - in 15 years of web development I've never seen an application
that does not support this scenario. And the rules around this are no more
application-specific than the 'Payment Required' example.
It is (IMO) unfortunate that people who are looking to support this as a valid
4xx exception are having to shoe-horn this into the 422 WebDAV extension, but
my suggestion for a new 421 code (421 on the basis that it's between 422, which
is the closest formal code, and 420, which Twitter appropriated and therefore
also made-up) seems to have really upset people, so I'm happy to stand down
on that. (
And in fact, re-reading 422 now, it is basically what I'm looking
for - just would rather it wasn't within the context of a WebDAV submission.
My ideal outcome would be to rewrite the description of 422 in a more
generic sense, and have it adopted as the de facto response in the event of
a request validation error.)
All of which brings me to the real purpose of this post. In researching 422,
I have come across endless discussion around the rights and wrongs of HTTP
code assignments, but not one post or comment on what, to me, is the main
reason for adopting status codes at all (beyond the functioning of the web,
of course) - which is to facilitate testing and analytics, i.e. making it
easier for computers to understand what is going on without having to read HTML.
Status codes are transparent and invisible to end users, but stick out like
a sore thumb in logs, which makes them invaluable for analysis. A log file
that includes nothing but 200s or the occasional 404 provides very little
insight in to what is really going on.
Similarly, having to interrogate the HTML body of a response to understand
whether the HTTP request was "received, understood and accepted" or not is
painful at best, and often misleading.
HTTP status codes aren't there for humans, they are there for non-humans,
whether that be the routing infrastructure (understanding what to cache),
or HTTP clients (understanding what to do next).
And remember, browsers are not the only clients.
(PS - to everyone who disagrees with me, bear in mind that I do understand
your objections, I just don't agree with them. I'm trying to make the web
more useful, not studying for my Masters.)
[UPDATE]
I found this in the original HTTP 1.1 specification:
HTTP status codes are extensible. HTTP applications are not required to understand the meaning of all registered status codes, though such understanding is obviously desirable. However, applications MUST understand the class of any status code, as indicated by the first digit, and treat any unrecognized response as being equivalent to the x00 status code of that class, with the exception that an unrecognized response MUST NOT be cached. For example, if an unrecognized status code of 431 is received by the client, it can safely assume that there was something wrong with its request and treat the response as if it had received a 400 status code. In such cases, user agents SHOULD present to the user the entity returned with the response, since that entity is likely to include human- readable information which will explain the unusual status.
You can make up your own mind from this (I have).