It seems like the unstated larger problem here (in the blog article at least, I ...

pquerna · on Oct 22, 2019

Actually, the origin is supposed to send a `Vary` header if it changes behavior based on any header.

So, if a client sends a 20kb `X-Oversized-Header`, when the server responds with a 400 -- it might be conceivable that it should include `Vary: X-Oversized-Header`.

Is that "really" the right fix? Probably not. But the HTTP RFCs provide `Vary` for exact this kind of reason within HTTP caching: the origin is varying its response based on a subset of headers.

mjw1007 · on Oct 22, 2019

In RFC-world (which may not be the same as the real world), there's no need for `Vary` on a 400 response, because a 400 response isn't cacheable (unless it has a cache-control or expires header).

derefr · on Oct 22, 2019

I feel like HTTP response codes are an attempt to squish several layers of Result monads together into one single many-valued one, where each layer of the original nested set of Results has different caching semantics.

Like, in this case, a 400 is really the origin is saying that it's not even sending you a representation of the resource you requested, because you didn't make a request that can be parsed as any particular resource. It's a Left RequestFormatError instead of a Right CachableResourceRepresentation.

And, annoyingly, the codes don't have any line-up with which layer of the result went bad. 4XX is "client error", sure; but 404 isn't really an "error" at all, but an eminently cacheable representation of the representation of the non-existence of a resource.

It'd be neat to see the HTTP codes rearranged into layers by what caching semantics they require of the implementing UA, such that UAs could just attach behavior to status-code ranges. Maybe in HTTP/4?

ktpsns · on Oct 22, 2019

Well, HTTP was probably not designed for doing well with caches. You certainly know that the first digit of a HTTP response code tells about some grouping, where jt is

    4xx (Client Error): The request contains bad syntax or cannot be fulfilled

I would state that, since this is client-specific, all 4xx responses should not be cached at some proxy/CDN, since he is not the client. And even the client should not cache a 404. A ressource could just be created the next moment.

hughw · on Oct 22, 2019

HTTP was very much designed to work well with caches.

brians · on Oct 22, 2019

Could be, but there’s a reason to want cachable, proxyable errors here: they’re often very expensive for the origin.

ktpsns · on Oct 23, 2019

Actualy, handling an error should be cheaper then handling a proper request. Just because an error most likely means an early exit of the handling server -- which means less time-to-answer, i.e. cheaper. (This does not cover any kind of DoS attack, which is always difficult to handle, regardless of an error or non-error answer)

However, effectively we agree with derefr, saying that HTTP status code design did not have this pecularity of cachable vs. non-cachable errors in mind. This is definetly a shortcoming.

gambler · on Oct 22, 2019

The unstated larger problem is that HTTP is a content-delivery protocol that is (ab)used to also serve as an inter-process messaging protocol.

If messaging stuff was factored out or moved to be the primary protocol (with content delivery implemented on top) a lot of the issues we have with security, latency and caching would just disappear. And no, WebSockets (the way they work right now) are not going to solve this. Neither will QUIC aka HTTP3.

bauerd · on Oct 22, 2019

What would be the solution then?

Supermancho · on Oct 23, 2019

Don't rely on HTTP codes for anything other than content-related status. The idea of REST (using HTTP codes and actions), in practice, has almost universally required customization/workarounds. I've never understood the insistence of REST as an API, fullstop. It's just one aspect of an API which you necessarily have to augment to leverage within a more robust protocol.

snthd · on Oct 22, 2019

Maybe the cache's upstream requests should be normalized with the same routine used to normalize the request into a key to lookup in the cache.

Basically the problem stems from the cache implementation not being DRY.