“One of the main reasons for HHO and HMC CPDoS attacks lies in the fact that a vulnerable cache illicitly stores responses containing error codes such as 400 Bad Request by default. This is not allowed according to the HTTP standard.“
It seems that it should be feasible to cache more kinds of errors if the request that populated the cache and the subsequent request are identical. These attacks all rely on that not being the case. However, "identity" is a more slippery concept than most might think. Generally it requires putting requests into some canonical form, but defining that canonical form (especially what it excludes) requires making exactly the same kinds of distinctions that were missed to make these attacks possible. It just shifts the problem around, and introduces new potential for breakage. In the end it's no better than just following the darn standard, whose authors probably defined what was cacheable with exactly these concerns in mind.
Depends why you're using the CDN. If it's purely for geographic access times, then passing a 400 back to the origin is reasonable behaviour. If the CDN also offers DDoS protection, and a hacker can just trigger a 400 and always hit the origin, then your DDoS protection is near-useless.
I think you misunderstand me; some people don't care about the CDN capabilities as much as the caching and DDoS protection. If my web server takes 10 seconds to complete a request (yes, I've had really terrible journalistic sites that do that) and I can send malformed requests and always hit the customer's webserver, then I could get accidentally DoS'd, or intentionally DDoS'd with very little effort.
Some customers also believe they want this: CDNs are often about both performance (content closer to user) && reduced origin load. Caching a (too!) broad set of error codes “achieves this”, with the caveat of a poisoned cache.
Many CDNs cache error responses for a lesser period of time. For example, an image might be cached forever, but only for 5 minutes if it results in an error response.
An origin error (i.e. 5XX) is okay to be cached for a while. But a 400 is an error in the request so it should never be cached, ever. I guess that's the point of the article, I'm very surprised that CloudFront caches 400s.
404s could be cached, I think, but it's a risky one.
Theoretically if you cache based on the full header other than request time being identical, you could even get away with breaking the RFC and caching a 400 to save the backend generating it again. This specific problem only happens if you do all three of: 1. pass the troublesome header through, 2. cache the 400, 3. treat a new request to the same resource without the poisoned header as if it was the same as the one with the poisoned header.
They're caching the resource based on URI only or on HTTP method and URI only is the problem.
If you're caching 400 (which many people have noted is against the RFC and can be troublesome) then you need to make sure you're only caching it for a matching request. If I send a poisoned header and you cache a 400 for my full request including that poisoned header, then that's one thing. If I send a poisoned header and you cache a 400 for all accesses to that URI whether or not they include the header that caused it to be a bad request, that's vulnerable to this BS.
The error could be (and I'd say, usually is) transient, i.e you hit a page and something is broken, you reload and then it's fixed.
Real-life example I'm familiar with: Content being served from a busy NAS that buckled under load. You'll have some requests time out with 504s, some return 500s, some that make the files appear missing and so you get 404s. I know, braindead design that shouldn't happen, but it does happen.