Hacker News new | past | comments | ask | show | jobs | submit login
I've been abusing HTTP Status Codes in my APIs for years (slimjim.xyz)
206 points by that_james on July 13, 2022 | hide | past | favorite | 322 comments



I think the article is missing some points with regards to the REST.

If the API that the author is building is a REST API then the response for a non-existing resource is 404.

In case of REST the main idea is that if you try to GET a resource by ID then you assume that the resource exists. Thus if it does not exists => 404.

It does not matter too much which part of the URL is the one that is causing the URL to be wrong.

Thus `api/v11/employees/1` and `/api/v1/employees/100` both are wrong resources.

In the first case, asking for `/api/v11/employees/1` is not a search or find to see if there is a V11 and if inside there is employee 1. Building an URL is an intentional choise that includes assumptions, like there is an 'api', that has a 'v11' and inside there are employees and one should have the ID 1.

The same goes for later case with employee ID 100. If you ask for an employee with an ID that means you know (or assume that you know) that employee exists. Thus if it does not it should be very level clear => 404.

In both cases responding with 200 means something like "I kind understand that you want an URL that is similar with some that I have so it is almost okish".

But in REST this is not the case. It is like you are serving some static folders and you want to get the file 100 under /api/v1/employee and that file does not exists.

Nobody is stoppping anyone to add response body to a 404 to indicate which nested resource is invalid. That can be added as a debug message for the developer for example.

Of course this is IMHO and I am only addressing REST API. If the API is not supposed to be REST then do whatever you want but make sure you document it well and be consistent.


I agree - though I also agree with the author that this is an area where REST specifications are a little clunky.

Yes, `api/v1/employees/100` should return a 404, because that path represents the location of a specific entity and that entity does not exist.

Just as the author thinks it's clunky to return HTTP error codes representing application errors, I think it's clunky to apply application logic to HTTP method semantics. GET, POST, and DELETE were designed as instructions telling a web server how to handle static-ish content, and it shows in their design. Why would a GET have a body? It's a request for the server to return a specific file. This leads to breaking REST standards - for example, search endpoints that logically are GET calls (they return entities), but are implemented as POST methods because the search criteria is complex enough that you wanted a request body. Or bulk delete calls that are similarly implemented as POST methods.

REST works best in a "rules are meant to be broken" manner in my opinion. It's not a bad system (which is why it is so common), but mixing application logic with HTTP transport logic does lead to oddities.


Some applications opted for using GET with request bodies as well, for better or worse. Elasticsearch comes to mind.

There have been some attempts to extend the list of http verbs to include something that fits those use cases more naturally - SEARCH and QUERY. QUERY got more traction, if I remember correctly, but I haven't heard much about it in a while.

https://www.w3.org/2012/ldp/wiki/Proposal_for_HTTP_QUERY_Ver...


SEARCH is used by the WebDAV standard, not that I like WebDAV search syntax https://gist.github.com/CarlSchwan/66fde29d52022c0ef76aec362...


> SEARCH is used by the WebDAV standard

That’s probably part of the reason the more general one proposed for HTTP is “QUERY” and not “SEARCH”. Also, WebDAV, in following its apparent design philosophy of “more is more”, has multiple different GET-with-(required or optional)-body methods with different purposes: PROPFIND and REPORT as well as SEARCH.



>Yes, `api/v1/employees/100` should return a 404, because that path represents the location of a specific entity and that entity does not exist.

A GET request to that path should return the current state of the resource at the path. As you note, the current state of that entity is that it does not exist.

Let's look at the spec:

>200 OK - The request has succeeded. The information returned with the response is dependent on the method used in the request, for example:

> GET an entity corresponding to the requested resource is sent in the response;

Did the request succeed?

Yes, we found the current state of the entity so we return 200

What do we return?

"An entity corresponding to the requested resource". In this case however we want to represent an entity that doesn't exist.


> A GET request to that path should return the current state of the resource at the path

There is no resource at the path.

> As you note, the current state of that entity is that it does not exist.

A thing which does not exist does not have state. Existence (and even moreso non-existence) isn’t a state, existence is logically presupposed in any description of state.


>existence is logically presupposed in any description of state

Then please explain how you can write and I can understand the phrase "A thing which does not exist".


Its convenient linguistic shorthand for a common case which would be more properly be described as “a concept of a thing which does not correspond to any actual thing”. While the equivalent linguistic shorthand has been present in most languages for quite a long time, the recognition that the confusion it can sometimes produce between concepts and concrete things is a category error is, while not nearly as old, also fairly old; it is central to Kant’s argument against the ontological argument for God’s existence, for instance.


And why shouldn't we go with the convenient linguistic shorthand instead of the esoteric ontological argument?


> And why shouldn't we go with the convenient linguistic shorthand

Who said you shouldn't? The shorthand is convenient. But it doesn't mean what other statements with the same shape mean. A thing which isn't doesn't have a state or representation.


`100` could be the employee id more than a name of the employee. So a query for id 100 returns the information associated with that id, nominally that it is currently free and not associated with any employee.

Many online REST discussions focus too much on shallow linguistic analogies with verbs/nouns.


> So a query for id 100 returns the information associated with that id, nominally that it is currently free and not associated with any employee

If you were doing a static website serving from a file system and said the same thing about a filename as a path component, you'd be seen as mad. But there is no difference when there is an id for a potential employee than a name for a potential file: if no employee/file exists when the server looks for the id/name, the correct result is 404 Not Found.

The id isn't the resource, it's part of the path to the resource (and if it is the resource in that URL, you need a different scheme for when you need to find the employee pointed to by the id and not info about the id.)


That is defined by the previous segment of the path.

The content of the file `/api/v1/employees/100` need to be defined off-band somehow since you cant actually put an employee on disk.

moreover part of REST is that you are not responding with static files by path, but with representations of entities; the question then is whether the entity is the employee or the id.

I agree that `/api/v1/employees/100` is not the best name for the id entity, something like `/api/v1/employee-id-status/100` might be more descriptive.


If you have POST for search and bulk delete, you are "doing REST wrong" and calling RPC by another name.


Not true.


The "doing REST wrong" was in quotes to express that the practice might be widely accepted but it is not exactly kosher.

In "true" REST, if you have a "search" endpoint, POSTing to it should create to a resource, which could be a reference to the list of results. The created resource should have its own endpoint (something like `/search/results/:query_id`), and the client could then cache it.

If you don't want to do that, but still want to say you are really following REST, you could have an endpoint to represent your resources (`/products`) and use GET with filter parameters in the query string (`/products?search=my+search+term`)


this


I very intentionally avoided the REST vs HTTP RPC debate.

This is _specifically_ about HTTP APIs. REST is not a synonym for HTTP but there are much better resources out there that rant on about the important of hypertext and URL support etc.

This is mostly about the perspective of consumers.

>It does not matter too much which part of the URL is the one that is causing the URL to be wrong.

Your consumer disagrees, they'd like to know if their URL was fat fingered or if a record was missing. My argument is 404 is inappropriate because the web service exists, but the record doesn't.

> Thus `api/v11/employees/1` and `/api/v1/employees/100` both are wrong resources.

I can't say this is wrong, but it doesn't feel right. `/api/v11` straight up resolves to nowhere. Maybe this an instance where Gone is better than Not Found?

> Nobody is stoppping anyone to add response body to a 404 to indicate which nested resource is invalid. That can be added as a debug message for the developer for example.

100% agreed. It's just a thought I had while debating with my team.

I posted this here for this exact kind of feedback :D good points raised


(I think debating is good and healthy, so let me do a short rebuttal about the difference between `/api/v11/employees/1` and `/api/v1/employees/100` with an example specifically because you say that you are talking about _HTTP_ APIs.

So I will focus on HTTP then.

Say you install an nginx/apache and then have a static structure where you have the profiles of employees saved as PDFs directly on the disk.

Then you do `GET /public/v1/employees/1.pdf` and it works returns 200 with the content. Then you do `GET /public/v11/employees/1.pdf` in this case all servers will return 404. The same goes for `GET /public/v1/employees/100.pdf` will still be 404. What if someone asks for `GET /public/v1/employeea/1.pdf` the server will again respond with 404.

Then I go and I implement an webapp to replace that. I plan to keep the URLs the same but now there is an app that will return the .PDF as a datastream or file.

For me, I don't see any reason why to change the behaviour of the URLs just because I replaced a static app with a dynamic web app. Any HTTP server will respond in the same way thus the current one should respond the same.

Responding like this has a compatibility (let's say) reason behind that is not a personal nor related specifically to my project.


Honestly I’m amazed that you managed to find a closer for this argument. And it works. I was _firmly_ of the “204 instead of 404” camp, but I find the “swap between static and dynamic serving” quite compelling.

It’s worth being redundantly explicit that this does not extend to all cases. There are cases where a 204 is warranted. But I’m roughly convinced that it may not be as ubiquitous as I thought. Very rad.


>Say you install an nginx/apache and then have a static structure where you have the profiles of employees saved as PDFs directly on the disk.

>Then you do `GET /public/v1/employees/1.pdf` and it works returns 200 with the content. Then you do `GET /public/v11/employees/1.pdf` in this case all servers will return 404.

Which is a sensible default because the most nginx/apache can conclude is that the resource does not exist on the server. However, if we know that this server is the canonical record for these pdfs we can conclude that it doesn't exist if it's not on the server. So now we know the state (it doesn't exist) and can return its representation.


Ok, I see now what is the point that we disagree:

What does 200 means with regards to existence/non-existence.

I think 200 means something exists and 404 is the representation of non-existence.

You think (please correct me if I am wrong) that the existence or non-existence information should be in the body.

Actually I think the underlying (and more important point) is about how valuable the existence/non-existence information is for the client => how quick the client should have feedback about this?

I think existence is a very important information and thus should be a first class citizen of the data representation. Thus I want it in the status code on the same level with the body itself.

If you put it in the body then that means for me an extra step to parse the body and then see what is there. So then the existence/non-existence is on the second level.

In your case with responding with 200 + body then the 200 status becomes irrelevant and I always need to parse the body => time is lost to access the _first_ important information that should then guard my business logic to parse or not the body.

While in my case (using 200 and 404 status) the client receiving 404 knows directly (without any parsing of the body) that the request was not successful in retrieving existing data.


>I think 200 means something exists and 404 is the representation of non-existence.

But that's not what the spec says:

>200 OK - The request has succeeded. The information returned with the response is dependent on the method used in the request, for example:

> GET an entity corresponding to the requested resource is sent in the response;

>The 404 (Not Found) status code indicates that the origin server did not find a current representation for the target resource or is not willing to disclose that one exists.

404 is not the representation of non-existence. It's the representation of not found. Something can be "not found" for many more reasons than non-existence. Which ultimately causes the person integrating your API much consternation because they have no idea if it's a "Everything worked" 404 or "My DNS is borked" 404 or "Your server's routing is borked" 404 or half a dozen other possibilities. Sure, you might add further information to your 404 response but that means you can't have generic 404=bad monitoring. Plus causes headaches for people that are working in systems that do assume 404=bad.

200 means that the request has succeeded. And in these cases it has. You requested a representation of employee 100 and you're getting one (it doesn't exist).

Even if you disagree with the word smithing the latter is far far easier to work with.


If the people who designed the web didn't want information about the application code to show up as a status code, we wouldn't have status 500.

Originally, anything with a path was meant to simulate a directory tree of static files. We build it dynamically because that's easier to maintain. But making it look and act the same by returning 404s is historically correct.

Of course things evolve and move on. You're free to do as you wish. But to me you're making a bizarrely arbitrary distinction about what part of the application is allowed to return a 404 (routing code in the framework) and what aren't (your own code). Or did you not realize that a framework like Django isn't actually part of the webserver?


This is a bit weird to me.

Your article is entitled "I've been abusing HTTP status codes" ... but... you're not "abusing" them, you're "not using" them for your APIs. (Or, said another way, you're leaving them to their normal usage for HTTP servers.)

Thus -- as REST is /the/ canonical "hijack HTTP status codes to mean something clever" paradigm -- your article is /entirely/ in context of REST even if you avoided mentioning it.

...

Anyway - I'm entirely with you on the foolishness of using 404 to mean both "your URL is messed up" and "I couldn't find the resource you wanted".


> Thus -- as REST is /the/ canonical "hijack HTTP status codes to mean something clever" paradigm

It's doubly not. The REST Architectural style is (1) protocol neutral, rather than specific to HTTP and (2) emphasizes using the underlying protocol, whatever “as is”.

> Anyway - I'm entirely with you on the foolishness of using 404 to mean both "your URL is messed up" and "I couldn't find the resource you wanted".

But those are literally the same thing. A URL/URI is a “Uniform Resource Locator/Identifier”.

“I don't have a matching resource” is a 404 (unless you are distinguishing “I had a matching resource but you missed it and it's not coming back”, which is 410.) While you might use a body message to distinguish “I would never expect to have a resource with that shape URL” from “I have resources with URLs shaped like that, but not that particular URL”, both are within the usual, RFC-defined meaning of the 404 status code.


Your argument is obviously what has been normalized in REST APIs, but it's not user friendly AND it's OP's whole point. He built his entire article -- and apparently his APIs -- around avoiding 404 ambiguity.

If you hit an endpoint and get a 404... did you do it wrong? Is my documentation outdated?

Even better: What recourse do you have? how do you figure out the answer?

Your only recourse is to email me. Send me cURL commands and screenshots and sit on your thumbs until I write back.

IMHO the REST folk were blinded by the existing 404 normalcy set up by web servers.

...

personally I think OP's idea isn't great. I think returning 200 and making me parse the response and hoping it's consistent between services is too much work compared to the simplicity of the HTTP response.

Instead, I'd change the default from 404 to 501.

    HTTP 501 - not implemented (URI is not working)
    HTTP 404 - resource not found (Joe doesn't exist in db)


501 is for unrecognized methods. As I was about to say this would then be incorrect usage, it occurred to me that you could in fact use this for an RPC system if the procedure name were used as the HTTP Method.

So instead of

    GET /api/v1/employee/100
    Accept: application/json
You sent

    GetEmployee /1000
    X-API-VERSION: 1
    Accept: application/json
Then if the actual procedure name was "get_employees", the correct response to this request would be 501, and /1000 referring to a non-existent employee would be a 404.

If making an RPC and restricting yourself to the known HTTP methods, the closest is

    GET /api/v1/employee?id=100
    Accept: application/json
which would return 404 only if the controller endpoint didn't exist, and would return whatever the application wanted if userid=100 didn't exist, such as 422 or 200 with a response indicating non-existence. It would be just like a local procedure call that could return a false value, or throw an exception instead of returning a value.


> If you hit an endpoint and get a 404... did you do it wrong? Is my documentation outdated?

Sure, you might want information in addition to that provided by the status code. And, again, rather than reinventing the wheel with some ad hoc mechanism, you can follow the HTTP spec for a solution: almost all HTTP status codes support a response body to communicate additional detail.

> Instead, I'd change the default from 404 to 501.

5xx errors indicate server problems, not request problems. If you wanted a different status code for “that path isn't structured in a way I understand” vs “I understand how I would look up something with that path but can't find it“, 400 or 421 for the former would be better than anything that is not a 4xx since they each (1) are in the correct class and reports a client error, and (2) have a definition which arguably fits the scenario, even if 404 arguably fits better.


I recalled there was a "not found" response and used it whilst spitballing the response above, but you're absolutely correct. 400 (Bad Request) and 421 (Misdirected Request), or even 409 (Conflict) -- as another poster mentioned -- would be great responses in that scenario

The main "issue" is that 404 is the normalized response for web servers when an endpoint doesn't exist. So it feels like one is breaking the established paradigm by using something else, but I think it's absolutely worth doing.

Certainly a bigger fan of defaulting to returning a 409 than making my API consumers parse all my response bodies.


> Thus -- as REST is /the/ canonical "hijack HTTP status codes to mean something clever" paradigm -- your article is /entirely/ in context of REST.

Oof, that's a hell of a good point. So much for that plan lol

> Anyway - I'm entirely with you on the foolishness of using 404 to mean both "your URL is messed up" and "I couldn't find the resource you wanted". Seems like, for REST, you'd want to return a 400 (malformed request) or something if your URL was borked rather than overloading 404.

Yup, that's the headache I'm trying to muddle my way through.

Really it's less "this is how to build APIs" and more "have you considered your consumer when you return data?". But I think even in that context your point stands better.

Back to the drawing board it seems.

At least I can generate more content now :D


If you go down this path (pun not intended), consider the structure of a URL to have semantic content, and yet you want to have HTTP compliant yet meaningful errors: another common scenario would be 409 conflict. This is the HTTP way to say it's not possible to process this request right now, while not suggesting it will never be possible.

This is most appropriate when the URL does not make sense with the current state of the server, but other future operations could (in theory) change the server state such that it does make sense. This might make sense if you have a user-extensible data model, where some kind of relationship is being mapped into the path structure and you want to signal that this relationship is not currently known to the query system, but _could be_ in the future.

Now, you are faced with a decision for when this ephemeral status is a semantic conflict, when it is simply a resource not found, or when it is a forbidden request for the current user.

The last is subtle and depends on other security posture. Do you want to tell your user "this is possible/available for a sufficiently privileged user, just not for you" or do you want to avoid leaking information about higher privilege roles? This is similar to the debate about whether a login UX should tell you whether you have an invalid user id versus password or just say login failed without leaking more information to a potential attacker.


> Your consumer disagrees, they'd like to know if their URL was fat fingered or if a record was missing.

Why?

How often does this really come up?

Who is typing in URLs like this manually?

If you're typo'ing it in code, are you not doing any kind of validation/testing against any kind of spec that can catch this?

Why is it up to the actual webservice returning a 404 to catch these kind of typo errors?

And I'm not saying I disagree with the argument -- I fully get the argument that was made, but practically the fact that you're caring about it suggests you're missing other components in your stack. You're producing a URL request which is outside of the spec of legal URLs for the webservice. You can validate that before you ever make a real web request against a real server.


What does this api do? Get? Update? Remove?

You can’t ignore the http verbs, so your article doesn’t make sens. You also shouldn’t ignore status codes.

You’ll also might get in trouble with caching.

You can easily use status codes, and provide all the detail + status field in the response. It makes consuming a lot easier


> Maybe this an instance where Gone is better than Not Found?

Gone means a resource identified by a URI existed, it no longer exists, and that resource (not the URI, necessarily) will never be available again.


I think the author is aware and I noticed the term REST does not appear anywhere in the article.

The main idea of the article is exactly that he does not agree that http error codes be used for application level errors (as REST principles recommend).

I think at this point that ship has sailed. Most HTTP based API will use http error codes in various ways. I would be surprised to not receive a 404 when requesting a resource ID that does not exist for example when consuming a new API.


> I think at this point that ship has sailed.

If you were talking about TCP/IP, then perhaps - because much of the internet's infrastructure is hard-coded (even burned into silicon) to use current standards. But application semantics aren't frozen in stone by any means.

Servers and clients built atop HTTP are still being written. Why not adopt better patterns that apply properly separate semantics for each layer?

I liked the REST ideas when they came out, particularly because they provided a much better and simpler alternative to SOAP. But I think improving patterns where we can is always a good idea.


Of course. Its just that in the general mindset, an hypothetical average dev who needs to call your API won't be surprised to receive a 404 for a missing resource. I might be wrong of course.

Either way, a proper api doc/spec should make either approach a non-issue.

Personnally, I've switched to graphql where it makes sense. Application level error codes and handled on the graphql layer, not on the HTTP layer, so you could say I've adopted that approach!


> In case of REST the main idea is that if you try to GET a resource by ID then you assume that the resource exists. Thus if it does not exists => 404.

This is actually more than a little scary.

If you're configuring the API client, and typo the base URL, your next automation run could decide that no resources exist anymore, and delete all kinds of things from your database.

I would really recommend designing APIs that communicate more than just the 404, and writing clients that check that extra part, to differentiate between "something is wrong" and "you requested item of type X from api Y, and that does not exist". If the client got a response that doesn't explicitly state X and Y, it should assume misconfiguration and give a more general error.

The response body could be used for this distinction, if you want to stick with 404.


>If the API that the author is building is a REST API then the response for a non-existing resource is 404.

The problem is that this error then overlaps with server path routing issues, DNS problems, and general network issues. Even if it's logically correct it makes dealing with your API annoying.

>Nobody is stoppping anyone to add response body to a 404 to indicate which nested resource is invalid.

But then we've lost standardization which is the whole point of the error codes to begin with.


> But then we've lost standardization which is the whole point of the error codes to begin with.

Returning 200 and then using response body to denote missing resource is no different. So you have to choose, either 404 can be invalid path and missing employee; or 200 can be valid data or missing employee. Personally I would prefer the former as 200/OK then indicates success.


This is getting into semantics, but IMHO a 200 OK with an empty body is the most correct response in this scenario. Everything worked so you get 200 OK and the most accurate representation of a resource that you know doesn't exist is an empty body.

404 Not found is not exactly correct. I argue that the server found a representation of the object with that representation being "It does not exist".


404 is consistent with the way a HTTP server works if what you build is not a dynamic app but a static website.

If you try to request a file `/public/pdfs/100.pdf` and that does not exists what does the server respond? 404

What the server responds if you try `/public/dpdfs/1.pdf`? Still 404 as that path does not exists on the local storage.

What is the difference for a client if 100.pdf should be an actually file or a data stream served from a web framework? There should be no difference.

Choosing to behave when building a dynamic app the same way as the static helps a lot with integrating with multiple other services (eg. caching, observability ...)


The difference is what you want to tell the client:

>The 200 (OK) status code indicates that the request has succeeded. The payload sent in a 200 response depends on the request method. For the methods defined by this specification, the intended meaning of the payload can be summarized as:

>GET a representation of the target resource;

>The 404 (Not Found) status code indicates that the origin server did not find a current representation for the target resource or is not willing to disclose that one exists.

If you to give the client the representation of the target resource (i.e. it doesn't exist) then send 200 and a body indicating it doesn't exist.

If you want to tell the client you couldn't find a representation for the target resource then send 404


So let me ask you this what about the following request:

Let's say that you have a nested resource/path that will return devices owned by the employee so you will have this path:

`GET /api/v1/employees/1/devices/1` and this might return the first device owned by the employee.

Now what do you think the following request should return:

`GET /api/v1/employees/100/devices/1` where the employee with ID 100 does not exists but there is a device with the ID 1 owned by some other employee?

or

`GET /api/v1/employees/100/devices/1000` where both the employee with id 100 and device with id 1000 does not exists?

What do you think the response should be? Still 200?


>`GET /api/v1/employees/100/devices/1` where the employee with ID 100 does not exists but there is a device with the ID 1 owned by some other employee?

So you're asking for the device with id 1 owned by employee 100. The answer is that the device exists but is not owned by employee 100 because there's no employee 100. So return 200 plus however you want to represent "the device exists but is not owned by employee 100 because there's no employee 100".

>`GET /api/v1/employees/100/devices/1000` where both the employee with id 100 and device with id 1000 does not exists?

Same as above but subbing in "because employee 100 and device 1000 don't exist" as appropriate


> If you to give the client the representation of the target resource (i.e. it doesn’t exist)

Nonexistence is not a representation of the target resource. A resource that does not exist does not have a representation.


For no content, I'd argue 204 is even more correct.


But we do have content to send. Namely, that the server knows that the resource the client requested doesn't exist.


That’s not a “representation of the resource”. That’s a fact about the state of the universe, to wit, that it contains no such resource. Which is what is communicated by a 404.


Again, 404 does not mean that the universe contains no such resource. It means that this specific server did not find the resource.


> Again, 404 does not mean that the universe contains no such resource.

It means that the server did not find a current representation of the resource and that the server accepts responsibility for providing an authoritative answer to whether it exists (if the latter is not the case, the most correct response is 421.) Aside from that and the combination of being unwilling to provide a default representation, not having a representation consistent with the Accept header provided by the client, and preferring not to distinguish this case from non-existence with a 406 response (which, like the situations to which 421 applies, is an edge case), the reason for a resource not being found is overwhelmingly that it does not, in fact, exist.

It is true that there are some other things that a 404 might mean, but “does not exist” is not only within the space of things covered by “Not Found”, it is by far the most common reason for it.


>If the API that the author is building is a REST API then the response for a non-existing resource is 404.

The problem is that this error then overlaps with server path routing issues, DNS problems, and general network issues. Even if it's logically correct it makes dealing with your API annoying.

As others have mentioned, you can return anything you want in the body of a 404 to clarify. However the other issues you all mentioned should really be in the 500s with errors.


>you can return anything you want in the body of a 404 to clarify

This isn't how the world works though. Much development happens on COTS platforms or internal frameworks or whatever. Many of those will have generic error handling, logging, alerting, etc. based off the response code.

Your pedantically correct 404 with a clarifying body ruins my day (really weeks) because now I have to chase whoever built those things to fix my problem. I am annoyed and curse your name.


Use a better framework? In any good framework, returning a json body with a status code should be a one-liner.

One topic the article didn't even touch was flagging bad parameters in the request which didn't make sense on the application level. HTTP has 422 for that. I commonly write things like

    validate_my_param($c->req->params->{format})
       or $c->detach(HTTP => 422, ['Invalid format']);
(the HTTP view renders that string either as text/plain or json as {"message":"Invalid format"} depending on the accept header of the request.)

I've used that for years and been quite happy with the downstream results.

Edit: I should elaborate on the downstream results.

On the javascript side, the ajax methods often contain separate success and failure callbacks. You often need to handle the failure callback to ensure the user knows that something broke. If you also have an error path in the success callback, it clutters the code.

    ajax({
      success(data, ...) {
        if (data.wasActuallyAnError) {
          // lines
          // of code
          // to handle
          // the error
        }
        else useTheData(data);
      },
      error(response, ...) {
        // lines
        // of code
        // to handle
        // the error
      }
    })
vs.

    ajax({
      success(data, ...) {
        useTheData(data);
      },
      error(response, ...) {
        // lines
        // of code
        // to handle
        // the error
      }
    })
This problem is not insurmountable of course. you could wrap up your error code in helper functions and reduce the boilerplate. You could write a completely generic error handler for your front-end framework and automatically include it in all ajax calls. You could also write a custom ajax method that interprets the 200 status error messages and diverts to the error callback. But I still think the 404/422 status code is a better pattern to start from, since most of the client-side frameworks I've used switch code paths based on the status code.


>Use a better framework?

Oh sure. I'll just call up the CTO and tell them to scrap the product they spent $10 million on because it's not pedantically correct in parsing HTML codes.


> Many of those will have generic error handling, logging, alerting, etc. based off the response code.

Yes, and then parse the appropriate response payload, in the context of the response code. Thats how eg. you get validation errors for a submitted form that just failed submission. If you're just assuming that everything !=200 is an error, it is your own assumption, not a framework shortcoming. As an example, most of the complex API systems I've developed would actually ignore completely the response code (so, basically the opposite of what you assumed), because the responses need to have further context information in case of errors, such as application-specific error code, context-specific error messages (eg. form field errors) or just translation-aware detailed error messages.

On a vaguely related note, most people working with HTTP API implementations seems to forget that GET requests can in fact have a request body. Most high-level clients/libraries will assume you won't use it, but it doesn't mean you can't use it.


> this error then overlaps with server path routing issues, DNS problems, and general network issues.

No webserver will ever return a 404 for any of those cases.


Exactly the point I bring up to not do this. However, a viable (but often difficult or unsupported) way to solve this would be changing the HTTP Message so it's clearer the intent of the error. Most frameworks like Django hardcode the response messages.


Maybe I'm not understanding this, but wouldn't network issues take priority naturally? For example, if there's a problem with a database connection, you'd return 500. How would you know if the resource even exists?


Network issues between the client and the server.


Those will never return a 404, though.


If your DNS server has the wrong IP for a URL it will.


Historically 404 is returned for static websites. Ergo it is the standard.


I'd say this makes the author's point entirely valid, it only shifts the argument in a broader direction: it's not the application abusing HTTP status codes: it's the REST standard itself.

Why, in all fairness, we always knew. It's an overloading of an existing technology, a bit hacky as it is, and stuff like GRPC are the answer to this, but still the simplicity of HTTP and REST are the demise of standardization.


> it's the REST standard itself.

REST is an architectural style, not a standard.

> It's an overloading of an existing technology, a bit hacky as it is

REST is not tied to a particular technology or protocol, and is not “overloading” whatever protocol(s) it is used with; the architectural style specifies using the subset of the available features of the underlying protocols that corresponds to REST semantics consistent with the specifications of those protocols.


> Returning a 2xx code immediately tells the client that the HTTP response contains a payload that they can parse to determine the outcome of the business/domain request.

There is nothing in the protocol that mandates only 2xx status codes are parsable. Instead, the Content-Type pinpoints to the client what kind of content the body has, regardless the status code. And the content-type will most probably be what the client asked (if not, then the server announces that the response content is not subject to negotiation, something that you see rarely nowadays..)

In general I think this post reflects one of the mentalities that appeared around 10 years ago regarding HTTP APIs (another was the extreme hypermedia-driven APIs). But I really think nowadays we can do much better, we can deliver a practical API that works 99% of the times.

By the way in the latest HTTP RFC (RFC9110) status code 422 is finally officially defined (previously was part of webdav extention), which means that the war 422 vs 400 for validation and other semantic errors is over and we have a clear winner. But 200 vs 404 for something that was not found? Well that was has ended long ago I think..


I think 404 being a common and useful server error is the issue. Had they/REST aligned on 204 No Content (for things like the employees/100 example) or something 2xx and less common I think it wouldn’t be much of an issue at all. I still think it’s actually not much of an issue. Of all the quirks out there, this creates little pain


> Had they/REST aligned on 204 No Content (for things like the employees/100 example)

Who is “they/REST”, and where have “they/REST” aligned on anything. All REST says is “use the underlying protocol without messing with its semantics, but only the parts that correspond to REST semantics”.

If the resource exists and is accurately represented by no data, then 204 fits for a gET, but that’s a different case than where there is no matching resource to the URL. 204 is mostly for modifying actions (e.g., PUT, DELETE) that don’t need to return more than the fact of success.


I've never seen a response indicating the content is not subject to negotiation. I generally only see that the response is not acceptable (i.e. the client has indicated they can't accept it) and the server skips processing the request.


> There is nothing in the protocol that mandates only 2xx status codes are parsable

I think a kinder reading of this point is "a 2xx response means you can parse the thing you were expecting to parse"


That should be indicated by the Content-Type header, not the status code. If you get a 2XX response but the Content-Type isn't what you expect, you probably shouldn't try to parse it. I've seen misbehaving APIs return 2XX when they really should've returned 503.

Often this it to get around the inability for some cloud-based load balancers to accept 5XX status codes as healthy responses. Take AWS ELB/ALB, for example. There are conditions under which a target instance knows the underlying issue isn't related to its own health, such as during planned maintenance, upgrades, or database failures. In these situations, it would be desirable to return 503 and configure ELB/ALB to accept that as a healthy response. Since AWS won't let you do that, some applications will just return an empty or bogus 200 response during upgrades or maintenance.


Ah, you got me on that one, fair point.

But then it would be both, wouldn't it?

I know I have a JSON payload representing a domain response because of the 2xx response code AND the Content-Type header?


You have to validate that the response is well-formed even if you parse it. There's no harm in trying to parse it if the Content-Type indicates it's JSON (or XML, or whatever else you're expecting). You can then use that result--or lack thereof, in case you couldn't parse it--to determine why you got a particular status code.

If a resource isn't found for any reason, 404, 410, or 451 is the correct response. If you want to clarify why it's not found, that should be included in the response body. Don't return 200 while simultaneously reporting an error--that's just bad form. 2XX means everything is good, 4XX means problem on my end, 5XX means problem on the API's end. It's an easy way to tell at a glance who's likely at fault. Yes, status codes are always going to be ambiguous, but that's why there are response bodies alongside them. If the Content-Type header is something you recognize, you can at least attempt to automate that disambiguation process.


> If a resource isn’t found for any reason, 404, 410, or 451 is the correct response.

Nitpick, but 421 should be on this list, although the circumstances where you would need this should be extremely rare.


> I think a kinder reading of this point is “a 2xx response means you can parse the thing you were expecting to parse”

It doesn’t though.

Even with an Accept header on the request, it is not impermissible for a server to return a 2xx result with a different format indicated by the Content-Type header if it cannot, for whatever reason, return a result in a requested format (it may also, consistent with the HTTP spec, return a 406 Not Acceptable or, if it wants to be cagey about a resource representation existing if there is none matching the Accept header, 404 instead.)

If you want to know whether and how you can safely parse the response, the HTTP compliant way is to read the Content-Type header. Otherwise, you are relying on assumptions (which may be valid for out-of-band, non-HTTP reasons) about behavior that are outside of the spec.


This is a whole lot of nice sounding theory but ultimately in practice this is just a downright mess to handle in a real application calling the API. If you are using the Angular httpClient for example, a 404 immediately throws an observable error when your app really should be telling the user that there are no results for that query. This crosswires a potential server-level error (broken routing etc) with a request level error in error handling and would make it way harder to determine the cause of the error and lead a dev to just write `status.code ==="404" ? 'User does not exist': ....`

Did I mention that httpClient, by default, doesn't let you get 404 error bodies?

But ultimately, it is all just ideas on 'how neat it would it be to use codes!' when in practice it is so so much better to just drop it and use the codes for more literal values. Imagining a users/{X} as a 404 for invalid 'X' is fun..but like..the server actually defines it as something like /users/:userId and it isn't actually an invalid route and will not be caught by the default 404 handling. It's a conceptual dance.


> Did I mention that httpClient, by default, doesn't let you get 404 error bodies?

Why would a http client not let you get http response bodies for statuses that usually send bodies? I could understand it for a 201, and definitely for a 204, but for a 404 it just seems like bad design of the client.


What I am hearing is “Angular httpClient is badly defined”, which is the kind of risk you run into a lot with big monolithic highly-opinionated frameworks.


jQuery gives you the body. ExtJS gives you the body. Webix gives you the body. Maybe this is an Angular problem?


Don't get me started on HTTP 300 support


> By the way in the latest HTTP RFC (RFC9110) status code 422 is finally officially defined (previously was part of webdav extention), which means that the war 422 vs 400 for validation and other semantic errors is over and we have a clear winner.

Unfortunately, the way 422 is written implies that the sent body/content has error(s) and not the header. It's close, but I still feel that for GET requests it's wrong.


The standard says nothing about bodies:

> The 422 (Unprocessable Entity) status code means the server understands the content type of the request entity (hence a 415(Unsupported Media Type) status code is inappropriate), and the syntax of the request entity is correct (thus a 400 (Bad Request) status code is inappropriate) but was unable to process the contained instructions. For example, this error condition may occur if an XML request body contains well-formed (i.e., syntactically correct), but semantically erroneous, XML instructions.

https://datatracker.ietf.org/doc/html/rfc4918#section-11.2

Additionally, bodies are allowed on GET requests by the standard though are not commonly used because of bad middleboxes. However, many GET requests include query params or other parts of the url to be parsed, and its completely reasonable interpretation of the standard to return 422 if those are not well-formed according to application rules.


I can't believe that you haven't read the f***ing standard:

"HTTP messages often transfer a complete or partial representation as the message "content": a stream of octets sent after the header section, as delineated by the message framing." - RFC 9110 (https://www.rfc-editor.org/rfc/rfc9110.html#name-content)

Earlier standards even used the "body" and "content" in the same context:

"The presence of a message body in a request is signaled by a Content-Length or Transfer-Encoding header field. Request message framing is independent of method semantics, even if the method does not define any use for a message body. ... When a message does not have a Transfer-Encoding header field, a Content-Length header field can provide the anticipated size, as a decimal number of octets, for a potential payload body. For messages that do include a payload body, the Content-Length field-value provides the framing information necessary for determining where the body (and message) ends." - RFC 7230 (https://www.rfc-editor.org/rfc/rfc7230#section-3.3)

This is probably due to historical reasons - MIME (email standard) uses "content", the original draft uses "body".


On that earlier standard part I wouldn't agree here, the fact a "Content-length" field is used doesn't necessarily give a glossary definition for "content". But I'll be damned, you're absolutely correct, RFC 9110 leaves no room for doubt. This is huge and leaves a terrible taste in the mouth, the word content is far too common to imprint a definition like this. Were the people who wrote every single occurrence of "content" in the standard aware of this? Should we take guesses and interpret HTTP 422 for example in spirit of law? This just makes ambiguity and error far too likely in my opinion, it's appalling to see it in such an important standard without any further explanation at all.


rfc4918 Also doesn't use the verbiage "content" it in fact says "contained instructions" and "request entity" which are not defined terms in either RFC you linked.

> and the syntax of the request entity is correct (thus a 400 (Bad Request) status code is inappropriate) but was unable to process the contained instructions.

Emphasis mine


There's nothing in the protocol, but in general, you should not assume non-2xx status codes have parseable payloads (more specifically, you should not assume the format of a non-200 status code).

Reason being that any intermediary step in the chain of resolving your request can return the response. If your load balancer has fallen over? You'll get a 500 with a plaintext body. Don't parse that as JSON.

(Technically, any intermediary step might also return you a 2xx with a non-parseable body, but that scenario is far, far more rare... Mostly see it in completely misconfigured servers where instead of getting the service you're trying to talk to you get the general "Set up your site" landing page).


> you should not assume the format of a non-200 status code

You should never assume format based on status code at all! You should detect it based on the Content-Type header.

> You'll get a 500 with a plaintext body. Don't parse that as JSON.

Any intermediary which returns plain text with an application/json Content-Type is badly, badly broken.


> You should detect it based on the Content-Type header

In approximately a decade of working on JavaScript and TypeScript UI code, I can count on one hand the number of RPC handlers I've seen that inspect Content-Type in the success codepath.

... for that matter, I can count on one hand the number of RPC handlers I've see nthat inspect Content-Type in the failure codepath also. The common behavior is to dump body to text, log error, and recover.


> There’s nothing in the protocol, but in general, you should not assume non-2xx status codes have parseable payloads (more specifically, you should not assume the format of a non-200 status code).

There’s absolutely something in the protocol, and you should absolutely use the Content-Type header to determine whether there is a body and whether it is in a parseable format irrespective of the status code, except for the minority of status codes defined as not having a body (e.g., 204 & 205.)


Unfortunately real-world use is `Content-Type: application/json` for everything.


> There is nothing in the protocol that mandates only 2xx status codes are parsable.

I think my overly defensive view of this for real-life code is that error states are inherently situations where the normal code contracts break down and that I must make fewer assumptions about the response, for example that they are well-formed or even match the requested content-type.

The number of times that I've encountered a JSON API that can suddenly return HTML during error states or outages is too damn high. So unless you give me a 2xx I'm not going to assume I got something intelligible.


> I must make fewer assumptions about the response, for example that they are well-formed or even match the requested content-type.

I think that you should always assume that an HTTP response is a well-formed HTTP response (otherwise you can't even trust that the 404 itself is correct); and you should never assume that the received MIME type is the same as the ones you indicated you accepted; you should always check the Content-type header.


I'd say most web APIs I've used or developed recently return JSON even with 4xx or 5xx error codes. What can be annoying is knowing what JSON schema to parse with depending on the status code, as not even the Content-Type header will tell you that. APIs (esp. those behind load balancers) that sometimes return HTML and sometimes JSON are far too common though - the problem there is that the JSON responses are appropriate for programmatic consumption (terse/semantically well defined) where HTML ones typically aren't. But even if the Content-Type header is abused (application/octet-stream anyone?) it's not hard to write code that copes with either. One API I used recently returned JSON in some cases and CSV format in others, with no distinction between status code OR content type!


I don't see how using 404 is abusing it. If the client requested `/api/v1/employees/<non-existing-employee>` this means that the entire resource (identified by the full URL, including authority) is non-existing (i.e. Not Found). Both the technical and business requirement are the same here: what the client is trying to access cannot be found. There's no difference between that and `/api/v999/employees/1`, as that resource also does not exist, even if `/api/v1/employees/1` does.

You should detail the error further in the 404 response, and you can say "this entire path prefix could not be found" or just "the path prefix is fine, but this specific employee could not be found". The spec does not prevent you from informing that only the representation of that specific resource is not found. [1]

Using 200 for that is a bit of a cop out, you lose all of the semantics of the response and it's all pushed to out-of-band definitions (like your API's documentation). It could just as easily be argued in the article's example that every single 4xx error is just a 200 with a '{"result": false, ...}' body.

---

[1] - https://datatracker.ietf.org/doc/html/rfc7231#section-6.5.4


I tend to agree with this perspective. It's not HTTP's responsibility to distinguish between infinite permutations of non-existing resources, only that the resource does or doesn't exist.

Returning a 200 is counter-productive because now any sort of logging, metric system, or middleware is going to be ineffective unless you write some custom code to parse out the body and decode whatever custom error resource the developer decides to invent. And in that case why are you even using HTTP as a transport protocol in the first place if you're going to violate 30+ years of precedent.


Exactly. And the same goes for all other status codes.


Boy. This is the best explanation of this belief I've come across yet. It's coherent and reasonable, even if I disagree.

My highest scoring StackOverflow answer [1] (and my most controversial) is on exactly this topic and _no one_ in any competing answer have given _any_ sensible explanation of their reasoning. I've genuinely never understood their thought process, despite trying really, *really* hard. This explains it a bit, even though I totally disagree.

There are other tools for helping differentiate different varieties of 404 or 401. There are status codes, there's _nothing_ preventing you from returning a payload with explanation or overriding the Reason Phrase. If your HTTP client is too fragile to handle a very common usage pattern for HTTP status codes then pick a different one.

---

[1] - https://stackoverflow.com/questions/11746894/what-is-the-pro...


I appreciate the kind words, thank you.

I've said it in other responses, but maybe I should have been specific that I was referring to HTTP RPC (explicitly not citing REST for a reason).

Perhaps a combination of both is best, but in my experience a nice clean break between these 2 layers at the protocol level helps clients simplify their handling of responses.

2xx: Go ahead and parse (another commenter pointed out Content-Type. That one always gets me)

4xx/5xx: Something bad happened and the request wasn't processed by the server, either because it blew up (5xx) or because you did something _very_ wrong (4xx)

The objective is clarity to the caller.

It's a little obtuse but I think it works well with that objective in mind.


If it's not REST then shouldn't the url be /api/v1/employees?id=1000 instead of encoding the employee id in the URL? In this case the path is the endpoint, so a employee id is a parameter and not a controller action.


> If it’s not REST then shouldn’t the url be /api/v1/employees?id=1000 instead of encoding the employee id in the URL?

The query portion of a URL is, in fact, part of the URL, so that is encoding the employee id in the URL, and REST doesn’t care about how URLs are constructed, in the first place (in fact, excessive concerns about out-of-band knowledge about how URLs are constructed mostly are a sign of not doing HATEOAS, in contravention of the REST architectural style.)


Yes I used the wrong word. I meant "instead of encoding the employee id in the path?"

And instead of "if its not rest" I meant (in the context of the article) "if it's HTTP RPC" because the author clarified that they were talking about RPC rather than REST.

Remote Procedure Calls typically identify "endpoints" such that a specification of the endpoints could be compiled into a host language's function-calls. The logical way to do that with HTTP is to have a path be the function, and HTTP parameters (whether url or posted) be the parameters of the function.

Using a URL that has both a resource ID encoded in the path and also query parameters is a mixture of REST and RPC. Since the example given in the article does not show parameters, I think it is reasonable to call it REST, as many other commenters here have inferred. The author claiming it is RPC doesn't jive with my expectations for RPC.

Maybe I'm just complaining about calling it RPC. If it's neither RPC nor REST then it's just some random mismash of HTTP-ish stuff, and I don't think that's a particularly strong argument for "things should be done like this".


I tend to split route params and query params by their function:

Query params tell me something _about_ how I want the data filtered, displayed, or grouped.

Route params _identify_ the object or collection I'm retrieving.


> Route params _identify_ the object

Well... in REST they do :-)

HTTP is designed to give access to a tree of files. The REST design says "lets fully embrace the idea that we are representing our data and program state as a virtual tree of files." You can embrace or reject that to various degrees, but the closer a design is to a virtual tree of files, the more "RESTful" I say it is.

In a RPC ("procedure call") paradigm, you have a collection of named procedures, each of which takes parameters/input and returns a result/output. Anything that isn't the procedure name is a parameter to the procedure. Some are mandatory parameters, and some are optional parameters, but in the most straightforward implementation of "HTTP RPC", the path indicates which procedure, and parameters to the procedure should be a HTTP parameters (or request body). Also note that in this case the HTTP status '404' would indicate the presence or absence of the RPC endpoint, like the author wanted, and the procedure would be expected to return 200 for a within-spec return value, or possibly 422 or 5xx to represent an application-level exception to be propagated to the caller.

There's nothing stopping us from mixing the paradigms, and in fact nobody is forcing anyone to use any paradigm at all, since we can literally do whatever we want with the HTTP path, uri, method, and headers. But when the original article claims to be doing RPC and not REST, I think that is just sowing confusion onto the topic.


I really enjoyed that SO thread, and your answer. In my opinion your take is correct, to the extent that it is fascinating that it is a controversial topic. The majority of people, (and by people, likely limited to software developers), seem to not fully understand HTTP status codes. This is of course fine, the fascinating part is the extent they insist they do. Whenever the majority, in any technically field, opine something different, I assume they are right, and I've missed something. But having designed APIs for years, I can safely say that it doesn't apply here.


Also, the OP article does explain well what could cause this misunderstanding. HTTP resource nouns shouldn't ever be "guessed". The resource either exist or it doesn't. And if it exists, it either has a data, or it doesn't.

404: it doesn't exist 204: it exists and has no data to parse.


Wow, 452 upvotes and 89 downvotes on your answer. Very controversial, indeed!

While I agree with you, I sometimes misuse 200 OK even for actual errors. Most times because whatever internal library I've used to do the requests have made it hard to parse the error data if it happens (just throws a general exception instead when it encounters a 4XX or 5XX). Or sometimes because the request failing is somewhat expected, but the client library then insists on doing lots of error handling when it fails (logging, alarms, circuit breakers and whatnot), polluting everything.


TIL: You can view upvote/downvote count by clicking on the score. I only needed 12 years on SO to find this out...



I have like 25k points so I probably had this for a while now.


Looks like HN people are changing that as we speak


Reading this discussion on hn and stackoverflow, I feel sometimes software development is more philosophy than engineering.


Yes, software dev absolutely is closer to philosophy than engineering.

There is such a thing as software engineering, but that isn't what most of us do (nor is it what we should be doing, IMO).

NASA needs software engineers. Low-level critical systems need them, so the G-MAFIA has a few.

For most business purposes, though, the iterative exploratory "let's get a UI out there and see how it works" is exactly what we need.

It's a useful, cost-effective strategy when lives aren't on the line, and it isn't engineering.

(My current job title is "Software Alchemist")


Only if you let it. Developers tend to overthin things. Be careful on where you spend you energy on.

Because honestly the answer is, as long as you are consistent, it really doesn't matter if you use status codes or not.


I very much disagree with this. This is both semantically wrong and unpractical at the same time.

You can and probably should provide a payload for further processing in some cases, but that's an orthogonal concern.

You are also writing a HTTP server, which provides resources, expects certain formats, provides certain formats, tells intermediate layers and clients about what to cache for how long, says whether a new resource has been created or _will be_ created. You might be redirecting or you might be telling a client that they need to retry a request because there has been a conflict. Your response might be processed by different layers, some of which only care about the status codes, while others will inspect the payload to make further decisions. The list goes on.

Status codes are there so you can do all of these things and more in a standardized fashion. Even just from a human consumption perspective it is useful to just look at the code and immediately have an intuition of what happened.

Also many of the status codes make no sense at all if an application server doesn't provide them. A reverse proxy doesn't know whether there has been a transactional issue on the application server, so it cannot say 409 without that information. A caching layer doesn't know about the domain model so it cannot say whether a specific resource can be cached forever.


I think the issue is here is that I should have explicitly stated this was about HTTP RPC (I am steering clear of REST for a reason).

In the context of RPC, is it still impractical? It feels clearer to me, but I might just be continuing my own confusion now


> In the context of RPC, is it still impractical?

If you are defining a bespoke RPC protocol that incidentally uses some parts of HTTP without much concern about the spec but only concern with what existing infrastructure will do with requests, you have…lots of freedom to design things however you want.

OTOH, a lot of work has gone into handling almost every conceivable aspect of information interchange, within a REST style (because REST was both derived from the architecture of the Web and used in updating HTTP for HTTP/1.1) in the HTTP standards (there are some gaps still, like that addressed by the draft QUERY [0] method), why reinvent the wheel, unless you are specifically dealing with use cases squarely within the gaps?

[0] https://www.ietf.org/archive/id/draft-ietf-httpbis-safe-meth...


My comment makes sense if you lean on REST/HTTP and want to leverage the uniform communication and layered architecture. The more bespoke you get, the less you want to care about what I said above. I interpreted your post as general if that makes sense.

In any case, consistency is key. Personally I think it is easier to lean on standards in order to be consistent. Which is why I made some examples of status codes that can only be formed by an application server. I think "how do I avoid surprises, workarounds and inconsistencies" is a good question to make decisions around what we're discussing here.


This is a terrible idea for observability. It would be difficult or impossible with most observabilty platforms to make it parse the http response to determine error responses, but when my systems are sending out a ton of 422s or 400s that's a useful signal to me that something is going on. I'd highly recommend using status codes to indicate application level behavior.


I think I agree with the article, because I just got bitten by exactly this. My API was returning 404 as a fairly common response when a user makes a typo, but Twilio's observability code was treating that as a serious error it needed to alert me about.


> My API was returning 404 as a fairly common response when a user makes a typo, but Twilio's observability code was treating that as a serious error it needed to alert me about.

Then Twilio's obesrvability code is broken. Requesting a non-existent resource is not a serious error if resources are requested by hand-constructed URLs.


This is one of those tricky philosophical points on web admin, because there are two sources of 404s:

- random clients on the Internet poking around at stuff, which you have no control over and causes no harm so should not alert

- an error in the links or code of the pages you ship to clients, which will cause a consistent degraded user experience and you should be notified about.

In practice, the best way to solve this I'm aware of is not via observation of server-emitted 404s at all... It's via a back-channel for clients to report errors (possibly authenticated, so randos on the web can't fudge your stats) and then tie alerting to that back channel. So you don't track 404s to /foo, but you do alert on your clients screaming at you that they got a 404 trying to access /foooo.

Of course, this solution requires your end-users to enable JavaScript.


You can monitor for the second case and react accordingly, you just don't monitor for them in the same fashion as 5xx class errors (which yes, despite what this article is about, I'm still using that term).

For 5xx errors you should typically monitor against a baseline of 0.

4xx class errors you want to monitor for substantial deviation from previous averages. This is a good indication if you broke something, vs client behaviour changed. Remember, monitoring isn't always going to "give you the answer", as much as alert you to a possible issue so you can investigate further.

Moreso, in a SaaS application, monitoring for deviations in 4xx rate by client is also helpful. An alert on a single client is likely not a system-level issue, but a devaition across all clients likely is.


Wait - it's exactly the "random clients... poking around" that I'd want the alerts for! 404s occurring due to intentional but broken requests is fine - the caller will know about it and deal with it as necessary. But either way, it is a reasonable argument in favor of not using 404 in the case an endpoint was matched but the specific resource id/path was not. It's not entirely dissimilar to the distinction between "file not found" and "directory not found".


If you alert on 404s to URLs that nobody should expect to exist, you create a vector by which malicious actors can wake up your oncall staff at 3AM. That's probably not something you want to do to your oncall staff.


Depending on what sort of service you're running that might actually be justifiable, but I wasn't assuming a threshold breach for 404s would be waking people up in the middle of the night. At any rate, any such alert is always somewhat vulnerable to that problem, regardless of how the error codes are being triggered. Sounds more hypothetical than likely-in- practice.


Does that mean Twilio sends an alert every time some random webscraper tries to GET some favicon or /admin path that doesn't exist on the server? Doesn't that happen hundreds of times each day?


This is probably the biggest reason I like the author's approach: a lot of tools have assumptions about what, eg, a 404 means that might not match what it means as an application error. For example, my API was also returning quite commonly as my frontend checked the existence of various records. As a result, my chrome console was flooded with red error alerts about failing requests (404s), even though each request had "succeeded" just fine.

In another case, I had a site that used http basic auth. An xhr api request returned an expected 403, which resulted in the browser suddenly concluding the basic auth (which was unrelated to the api call) was invalid and the user needed to be reprompted for credentials again.

Both of these could be argued as browser problems, but that's the point: browsers (and observability tools and many other things) think they know what a given http status means. Using http statuses for app-level errors often breaks that.


> An xhr api request returned an expected 403, which resulted in the browser suddenly concluding the basic auth (which was unrelated to the api call) was invalid and the user needed to be reprompted for credentials again.

That sounds like it was a mistake on the part of the api. The browser should only prompt for credentials if the response headers include a WWW-Authenticate header, but that header should only be included with a 401 response (at least according to MDN).

https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/WW...


> It would be difficult or impossible with most observabilty platforms to make it parse the http response

Yea, because they've all been built on top of the same faulty assumption. This isn't a counter argument to the OP in any way and I'm kind of surprised you're not the only one bringing this up as some kind of gotcha.

Obviously platforms would need to be handle and treat http and application layers differently. This is as much a given as the fact you'd have to update all your client code to wrap and unwrap the application layer on top of the http layer.

The observability of your application would actually increase as you'd be able to differentiate between an increase in application failure modes from an increase in http failure modes. That's actually really valuable.


> when my systems are sending out a ton of 422s or 400s that's a useful signal to me that something is going on

Why not ping the monitoring when you encounter the business error?

From that perspective, a stream of HTTP errors could indicate a more specific kind of infrastructure failure vs just clients sending garbage.

Happy to admit I'm wrong though, it's why I posted it here, see what others think


Yikes, no. Other way around in fact!

Observable errors mean that there is an actual error and triggers error handling to the user. By saying, that for example, a search for a user in a list returns no results is a 4XX you are implying that the path is an error when the response is in fact, valid and useful. More than that, you now have cross-wired a REAL 4XX error (for example the API URL was changed and no result will work to it) with this fun pretend 4XX concept that ends up confusing your error handling ('Endpoint cannot be reached with this URL.') with ('Your result has no entries.'). Finally, conceptually speaking, it's not a 404. The URL you requested was handled by the endpoint you intended to reach. That is the exact opposite of a 404 for example.

To me, conceptually it's just not an error in most cases and even when it is, it's likely an error on the other side.


Imagine that we didn't have a fancy dynamic system and we were still using files for web pages like the good old days. In that scenario, what would happen if someone accessed the non-existent `/api/v1/employees/100` path? It would return 404 Not Found. From that perspective, I think it's hard to justify that the original intentions of HTTP would want you to return a 200.


Indeed. The application contemplated by HTTP is hypertext. If we are being sticklers for the RFCs, modern applications implementing non-hypertext business logic via HTTP are abusing more than just response status codes.

But if we relax the “hypertext” presumption a bit and consider HTTP a protocol for the transfer of resources, surely an API can be implemented such that using response codes is sensible and useful. REST and SOAP for CRUD operations can still work just fine, and response codes can still make sense in the context of the API itself, even if the content is JSON describing business logic state.

It’s a messy world, so we have to be liberal in what we accept, but using status codes is not inherently abuse. Any ambiguity might just be a symptom of leaky abstractions in the API.


In this context you would also have to include discoverability of URLs.

In this philosophy you would not suppose how employees are named and would ask for a list of valid employees URLs to a URL previously returned by `/api/v1/`.

In this way you only ever call URLs that the api provided you with, so that you are not expected to find out by trial and error whether `/api/v1/employees/100` exists or not; the api itself has already told you this and a 404 is the response to trying to build your own URLs by hand.


>non-existent `/api/v1/employees/100` path

But the path does exist. The employee record represented by that path doesn't exist.


No, the resource /api/v1/employees/100 does not exist.


Yes, and so we return 200 + the current representation of the resource, i.e. it does not exist.


And 404 is the appropriate response code for a resource that was not found at the path requested.


I guess in a pedantic sense, all possible paths “exist.”


No, paths that reach a server and get processed exist.


Define "processed"?

A request with a path that reaches a server, is looked up in the routing table or filesystem, resulting in no resources found, is expected to return "404 Not Found".

You can say the path "exists", but only as a label. All labels can exist. Only a subset of these are semantically valid resource locator paths, and only a further subset reference objects that exist.

Colloquially, https://news.ycombinator.com/this/path/does/not/exist (path "/this/path/does/not/exist") "does not exist" because we're talking about the requested resource.

Of course the path exists, I just wrote it and it's semantically valid.

That's not an interesting question though, and if API responses were this shallow -- just a "semantic correctness checker" -- then response codes would be useless.


>Define "processed"?

Why? We all know what it means.

>Colloquially, https://news.ycombinator.com/this/path/does/not/exist (path "/this/path/does/not/exist") "does not exist" because we're talking about the requested resource.

That path is 404 - Not found and returns "Unknown"

https://news.ycombinator.com/item?id=52081900

Returns 200 - No such item.

In 10 years when we have that many threads it will return content.

As I've said in several other places it depends what you want to tell the client. 404 if you want to tell them you didn't find the resource. 200 if you want to tell them the resource doesn't exist.


I did not mean to hold up HN as a measure of correctness, and I suggest that it is not a good idea to do so.

> 404 if you want to tell them you didn't find the resource. 200 if you want to tell them the resource doesn't exist.

404 is a client error, like all 4xx codes. The requester can correct their request. It does not indicate that the server ~"tried to find but may have overlooked" a resource, it means that the server can authoritatively state that the resource is not findable.

This is essentially, "does not exist at this time". Resources that do not exist today, might exist in the future, or they might not. This is true for both /users/1000 and /some/other/path.

I interpret the RFC to support my belief:

  6.5.4.  404 Not Found

   The 404 (Not Found) status code indicates that the origin server did
   not find a current representation for the target resource or is not
   willing to disclose that one exists.  A 404 status code does not
   indicate whether this lack of representation is temporary or
   permanent; the 410 (Gone) status code is preferred over 404 if the
   origin server knows, presumably through some configurable means, that
   the condition is likely to be permanent.


>I interpret the RFC to support my belief:

Impressive given that it explicitly contradicts your point:

>the server can authoritatively state that the resource is not findable

>origin server did not find a current representation

Those two are not the same thing.


It does not contradict my point at all.

Those two phrases do not lexically parse in identical ways, but we should not interpret the RFC to be saying the alternative of "I really tried, but boy filesystems are hard and maybe the NFS mount disappeared, and well shucks, couldn't find it! 404!".


We should interpret the RFC as written which is that this particular server could not find the resource. That does not mean the resource is not findable by this server at a different path or by a different server at a different host.


I really do not understand your argument.

There is no defined equivalency between host1/one/path and host2/other/path, even if the resources returned could potentially be the same.

There is no retry formula for a 404 at one URL that might return 200 at another URL.

There is no defined meaning for the same resource at another location. HTTP 404 makes no attempt to comment on the uniqueness or commonness of a resource. No comment on whether that resource existed in the past, is expected to exist later, or in whose imagination it could conceivably exist in the future.

The HTTP response simply states whether the resource was found to exist at the specified location at the time of request. Any further conclusions drawn from a 404 are speculative.


What does "get processed" mean? Plenty of web server frameworks allow generic handlers or middle ware that can handle any path passed to them.


That's a succinct definition, happy to admit that's a better argument than mine


It's the difference between "There is nothing handling this route at all" and "there is something handling this route but the object isn't found." To me the former is an HTTP 404 and the latter is an application "Not Found."

But I also very much don't like playing the "if you expose your app over HTTP you should assimilate HTTP semantics and do a fuzzy lossy map of your application to HTTP verbs and HTTP status codes." I leave the HTTP semantics to the front end web server and live on top.


> It's the difference between "There is nothing handling this route at all" and "there is something handling this route but the object isn't found."

'Nothing handling this route' has no meaning, because routes have no meaning in a hypertext application. Clients should not generally be constructing URLs by hand; they should be receiving them from the servers they communicate with.

In the example in the article, an application dealing with employees should not be constructing URLs by appending employee IDs to strings; rather, every reference to an employee in the application should be to a URL rather than an ID. So when it requests a list of employees, it receives the equivalent of {/api/v1/employees/1, /api/v1/employees/2 … /api/v1/employees/N} rather than {1, 2 … N}.

> I also very much don't like playing the "if you expose your app over HTTP you should assimilate HTTP semantics and do a fuzzy lossy map of your application to HTTP verbs and HTTP status codes."

If you are building a hypertext application, then you should build a hypertext application. It's completely possible. Off the top of my head, protocols such as ACME (used by Let's Encrypt) are good examples to follow.


Yep, there's absolutely no reason an application ought to be guessing URLs, and therefore no reason it should ever be requesting /api/v1/employees/69 where employee 69 doesn't exist. If it does, it's playing silly buggers exactly as much as if it were to request /api/v69/nice/. Any user resources it needs to access will have had URLs provided to the application by another page.


Example Code:

   users = http.client.get("https://yourapi.com/users")

   # Frobulate each user
   for user in users:
      user = http.client.post(user.frobulate_url)
      # While this is running somewhere on the other side of 
      # the world someone deletes one of the users. Oops.

Assuming, users (i.e developers) will just never mistype a URL so you don't have to give useful feedback is just like being a bad netzen.


That's a fair example. And returning 404 (Not Found) or 410 (Gone) is the most useful kind of feedback. More explanation in the body of the 404 response is helpful too, of course.


> Clients should not generally be constructing URLs by hand

I'm not sure I get this, every API doc is like "go to /users for the users and here's the methods we support, the payloads, and responses" If someone mistypes it and tried to get "/user" I want to send a 404 to be like "there is nothing here."

> In the example in the article, an application dealing with employees.

This is all very nice when your domain maps nicely to objects. My litmus tests for this is what would the semantics of mysql.net/query?db=mydb,q='select * from table;' look like.

* If the result is an empty record set should it return 404? Ew. I think it should be 200 with the response being `[]`.

* If you're not allowed to access a table should you get 403?


"Select *" is like find/ search - any API that could logically match multiple entities and return a list should stick to 200 in the case of an empty set. But for an API designed to return just the specific single resource requested there's a decent case for a 4xx status code if it doesn't exist (400/404/410/422 all being justifiable depending on your preferences).


Your example is just a filtered index, you would receive a 200 with no results just like any other index route with a filter applied.


Isn't that enough that there are originalists for constitution, do we now have to have "what did the founding fathers^H^H^H^H^H creators of HTTP mean"-people? That was 30 years ago. Web has evolved since then. What was a good solution then may not be one now. How people work with HTTP now and then is completely different.


I'll take "People who really need to understand the principle of of Chesterton's fence" for $200, Alex...


Or, people who should be designing a protocol specific to their application, rather than abusing HTTP. Really, it's okay! You don't actually have to use HTTP for everything!


The most recent specification of HTTP semantics is about a month old.


I've definitely seen RFCs that specify HTTP status codes should be used to represent the actual application layer. The SCIM protocol, for example, specifies to return a 404 if a user cannot be found, not 200. That's definitely in the business domain, not the network domain.

One other small gripe I have with this approach is that it reduces the visibility of a failing request in the application layer, which is useful for debugging. It's not uncommon for one of my apps to issue many requests to the app server and if everything returns a 200, I have to look through the various payloads to find the request that actually failed.

I guess my question is: is it really abusing HTTP statuses if it works well overall? I rarely find myself wondering whether a request failed in the network layer or the application layer, I could probably count on one hand the number of times this scenario bitten me in my career. I don't really see the point in throwing out something that works over semantics in an RFC.


In the original context of HTTP being a way to distribute interlinked documents, a web application backend sending a 404 for an invalid database identifier is equally as valid as sending a 404 for an unrouteable path. The HTTP spec does not support a distinction between the two cases.

If you absolutely needed separate status codes for the two cases, you probably could use 410 Gone or 501 Not Implemented for unrouteable paths and 404 Not Found for bad IDs.


Static web servers return 404 for resources that don't exist on the file system.

For example, https://news.ycombinator.com/s.gif -> 200

But https://news.ycombinator.com/t.gif -> 404

I don't see why /api/v1/employees/3.1415926 shouldn't return a 404 either.


Nothing prevents you from setting both a non-2xx API header, and still making the difference of "invalid path" vs "entity doesn't exists" clear in the response body.


In regards to interoperability it does matter I think, if it's non-standard then it won't "just work" in an application that's expecting the standard. Practically though, APIs are a wild west, and almost no-one implements status codes right. So the idea of standardized interoperability was out the window a long time ago.


> In regards to interoperability it does matter I think, if it's non-standard then it won't "just work" in an application that's expecting the standard.

The standard is to return a 404 for a resource that is not found. The standard is not to return a 200.

An application which expects a 200 for a not-found resource is not expecting standard behaviour, and is dead wrong.


I never suggested anyone send or expect a 200 for a resource not found, I suggested following the standards, so we're in agreeance.


"Practically though, APIs are a wild west, and almost no-one implements status codes right."

This is the conclusion I've come to. What the RFCs say the status codes mean doesn't matter anymore. There isn't anywhere near enough consensus on what they mean or how to treat them to base a system off of in the abstract.

What can specifically matter is how systems in the concrete will react to them. Do you have something that needs to be CDN'd? Then you'd better return status codes that make that work. Is it going to be proxied? Then you'd better return status codes that make that work. Is it never going to be either of those things and only directly accessed by internal users? Then use status codes in a way that works for them. Do you have a monitoring system? Use status codes in a way that works for those systems.

Do you have conflicting requirements as a result of those things? Bummer. Sucks to be you. Best get to figuring out what to do about that, but one thing I can tell you is that carefully consulting the RFCs about what status codes "mean" won't be much of a help in such a situation. Maybe a little. But not much.

To the extent you want to reply with a "but what about...", I said, if it works, use it. While it is the wild west, there are certainly some patterns. 200 vs. "a non 200" response certainly has patterns of meaning to it, and if playing into those patterns works for you, great! Do it. But if I encounter a case where it doesn't, I cry precisely zero tears and do what does work.

The time to moralistic and prissy about what status codes "really mean", if it ever really existed (to be honest, all there really ever was was "return one of the 5 or 6 codes the browsers understand when appropriate, and the search engines will follow the broswers"), is long gone. You can be wistful about how nice it could theoretically be if we just all agreed on what they all mean, but that's just not going to happen in a world where I can barely get people to agree that numbers shouldn't be strings in JSON, let alone precise details about what exactly a hundred error codes mean across such a diversity of systems and users.

I could even argue they are a failed element of the protocol. Such an argument would center around the futility of trying to represent such a rich variety of possibilities, along with such a rapidly changing variety of possibilities, into a single number. It's hopeless. You can't enumerate all the possible successes and failures of such a broad protocol like this, and it didn't help to try. It's just another instance of the "shared ontologies are fundamentally unscalable" problem.


I think there are some standards fans whipping out the downvotes, but I think jerf and I are both very aware that this is not how it SHOULD be. But it is how it is. I built an API scraper years ago now, and I gave up, APIs are not standardized, they're not consistent. I dreamed of an interoperable library of APIs you could pipe around to create new applications. It's too much work to manage all the differences in how people have built their APIs. But as jerf pointed out, there can be myriad reasons standards end up playing second fiddle.


I think the HTTP standard should be a bit less detailed than it tries and fails to be, but I do think at least OK, Not Found, Unauthorized, and a couple of others are good. Basic proxy control also would be good in general, although those should be moved into headers. (Which they partially are. But I think they should maybe be moved all the way instead of being spread out the way they are. This is, of course, in some hypothetical perfect world where I can just rewrite the standards without having to worry about any existing code, which is not the world either I or the current standards authors live in.)

But if you sit down and look at the RFC, and really try to understand what quite a lot of the codes are, they're just useless. Not necessarily to everyone, but they're useless to me, because many of them clearly involve some sort of not-universally-shared context and probably mean something very, very precise to somebody, but not me, not my API users, not the writers of the APIs I consume, etc.

Plus, you know, ultimately those codes were for documents, not for APIs anyhow, and that shows too, in some basic missing stuff for API calls.

I'd love it if this wasn't the case, absolutely. But it is, and yelling at people to "conform to this RFC harder" is just a waste of time on multiple levels.


in that vein: status is pretty useful, seems like a good solution is using two separate non-200 status codes to differentiate between employee not found vs typical 404 no such endpoint/url/path.


unfortunately HTTP doesn't have good status codes to make the distinction


If you’re a public API, please don’t do this.

I’ve personally integrated with almost 50 different SaaS APIs in the last 2 years.

The worst ones to work with were the ones returning errors in 200.

I don’t want to parse and write switch statements for your strings to understand whether it’s authentication, authorization, not found or any other error.

Now I have logic tied to strings you return and I can’t wait until someone decides to change the error messages they return. People rarely assume there’s an API contract in error messages.


> The worst ones to work with were the ones returning errors in 200.

I've had similar experiences.

Status codes are logical, machine-oriented things. At the very least, 4xx should denote client errors (that implicitly should not be retried without change) 5xx should denote server errors (where it could make sense for the client to retry).

I've integrated with APIs where 2xx can denote an error, so clients had to parse the body or headers of the response to determine the real status some other 'inventive' way. Please, don't do this.


> People rarely assume there’s an API contract in error messages

But then what's the point of an API contract if it's not describing the returned data? What I'm arguing is the opinionated payload provides a lot of the same value. Or am I missing something?

The monitoring systems would naturally not be stoked to see 2xx codes containing errors, but I'd just ping the monitoring system out of my application server anyways. Not sure if that's better or worse though.

I've done like little to no platform engineering and I may be gravely underestimating the consequences of doing this, but it works well with prometheus.

Perhaps I am a fool though :) but how else would I find out if I didn't put my ideas out there :D


Not who you're replying to, but I have input.

  But then what's the point of an API contract if it's not describing the returned
  data? What I'm arguing is the opinionated payload provides a lot of the same 
  value. Or am I missing something?
I guarantee you that you have not actually contracted your error messages. You have a typo, you'll change the phrasing to better describe the problem, you'll need to add more info for ancillary problems, etc. I doubt you'll bump the rev on your API when "The resource you requested was not found" gets tweaked to "We could not find anything by that ID!" in the name of 'friendlyness', but that's what you would need to do for me, as a client, for me to not have to ship a hotfix because you changed your error contract and now my users are seeing "Something went wrong, our engineers are looking at it" instead of my nice branded "404 - Not Found" page.

  You try employee 1, fantastic, it works!
  You try employee 100, not fantastic, it 404’d.
  Huh?
  Why do I get a 404 here? The path is clearly correct, otherwise employee 
  1 wouldn’t have worked either.
  “Ah”, you may be thinking “but it clearly means that the employee wasn’t found!”
  No, there’s nothing clear about that. If I were to call /api/v11/employees/1 
  I would get the exact same error. As an API consumer, all I want to do here 
  is raise my middle finger.
  But as an API producer, this results in a conundrum: What am I supposed to do then?
To start off with, stop worrying about your clients fat-fingering your API namespace. That is not your concern. You don't give a shit if they spend a _week_ hammering the wrong domain and getting 404s, why the hell would you care about them hitting a non-existent v11? You don't do anything beyond direct this confused user to your docs.

If for _some reason_ you want to give more info for requests to `/api/v11/*` or whatever other non-paths you want to handle, just serve a 400 response back and let your consumers figure out what they screwed up; but I argue you have more important things to do with your time.

Also, why are you worried about differentiating your server errors from client network failure? That's for the client app developer to handle. Don't worry about it. Your obligation begins and ends with a connection to your service.

  Opionated payloads should be mandatory
  Returning a 2xx code immediately tells the client that the HTTP response
  contains a payload that they can parse to determine the outcome of the 
  business/domain request. That is to say
    - client checks HTTP response is valid (2xx status)
    - client can confidently parse the response and make a domain oriented
      decision, as opposed to a techinical one
  This makes your client happy. Very, very happy. Using our above examples, 
  here is what we would see:
If I was reviewing an API for an integration and I ran across this blog post and/or descriptions of this behavior in the API docs, your service would go straight into the "won't integrate with" pile. I'm simply not interested in the problems this paradigm will generate for us. I've gone down this road many times, these days it's use the HTTP Spec or GTFO.

  My API is clean, easy to understand and easy to debug. A client no longer
  needs to send me a request to ask for clarity on an endpoint that sometimes
  returns a 200 and other times returns a 404.
Ah, there's the nut. Instead of adding descriptive error bodies to your 404 responses you threw the paradigm out the window and added descriptive error bodies to 200 Success responses. If you're not providing API packages for your users to hide this unexpected behavior, they are not "very, very happy". No one is "very, very happy" as they add yet more Magic Strings with which to infer what their remote resource means when it says "200 Success: Failure"


i mean they should at least have a sub error code right? instead of strings. that sounds insane


I don't want to be too hard on author here but this article is really misguided. Status codes are a very useful tool for observability and error handling and abusing HTTP 200 is a quick path to making your life difficult. Response bodies should disambiguate some of the different cases for the error (this path is wrong vs this employee doesn't exist)

In general, applications using REST should follow these semantics.

2XX - Request was understood and we found what you were looking for

4XX - Something was wrong with the request on the client side. The client should take some action before retrying.

5XX - Something was wrong with the server and you can retry this request sometime later and it may work.


I think the interesting part is that:

2XX - Request was understood and we found what you were looking for

can be broken down into two parts:

1. Request was understood

2. We found what you were looking for.

This guy seems to be advocating that completing a search and there being no results is actually a successful request, and so should be responded to with 200.

The problem though is that the spec (RFC7231) requires a representation of the resource to be sent in response to a GET.

If there's no payload, then you can send a 204 instead, but this raises its own challenges - a 204 just means that the server is not sending content, not that no content was found (which is a subset of 'not sending content').

I think on balance, 404 is correct - the spec indicates that it should be used where there's no current representation, which I think is an accurate description of a failed search.


Completing a "search" SHOULD return a 200 with an empty results set. But a search is "/api/employees?name=Bob", not /api/employees/1199. The former is an endpoint that exists but was unable to find data: it should return the correct data structure normally returned for searches, but with no results. The latter is a direct link to a particular resource, which should 404 if it doesn't exist (as if any other file at a particular path doesn't exist).


> Status codes are a very useful tool for observability and error handling

Adding to this, it is far easier for clients to parse response codes than bodies. TFA makes the following claim:

> Returning a 2xx code immediately tells the client that the HTTP response contains a payload that they can parse to determine the outcome of the business/domain request.

Which is wrong. Response codes say nothing about the body.

Parsing the response body requires much more work: determining the format of the response received by the client, parsing it, validating it, handling edge cases, and so forth.

Further, it is possible, however unlikely, that the client does not accept application/json. Now what? Now there is yet another problem you have to solve.

And don’t forget the server has to generate all this as well.

All of this could have been avoided simply by returning a simple 404.


> Which is wrong. Response codes say nothing about the body.

Actually, that's not correct. A response code can say a fair bit about the body, especially about whether it exists or not, but in some cases also the nature of the body. For example, a 203 says that the body of this request is not the same as the body sent by the origin server. A 204 tells the user agent that there's no body. 400s and 500s SHOULD have a body which includes an explanation of the error.


> 400s and 500s SHOULD have a body

I agree about 4xx bodies, generally. They indicate that the request was malformed or unsuccessful in some way. The requester can fix this problem.

I think a 5xx should always represent an error on the server side that the requester cannot fix. It is always a bug or system failure to return a 5xx, which should be fixed by the API operator. Therefore no details are required or desirable to return to the requester.


That may be your opinion, but the spec is clear:

6.6

[...] Except when responding to a HEAD request, the server SHOULD send a representation containing an explanation of the error situation[...].


"SHOULD" has a very specific meaning in RFCs. Contrast to "MUST".

Unless the user can fix the problem, and you want them to try, an explanation of a 5xx error adds no value. Again, I don't think 5xx is ever a valid response from an API -- it's always a server-side error (code or infrastructure) that needs fixing, often with high priority.

In many environments (e.g. medical, financial, or other secure/private), providing details on server errors is strongly discouraged. Even when the data isn't sensitive, I think you're asking for trouble by broadcasting server-side problem details to users.


Fair enough. You've convinced me on this - especially on the 5xx I think it's a very good point that a description of the error could potentially have undesirable consequences.

I think my original point stands though, that status codes do tell you about the body in some circumstances. Or SHOULD tell you something about the body at least :)


SHOULD is not MUST [0], and “The user can do nothing about the problem and is likely to experience negative utility from any attempt to provide more detail than what the status code provides” is arguably a valid reason not to send a body with 5xx status responses.

[0] https://www.rfc-editor.org/rfc/rfc2119.html


Per my response to OP, I've been convinced on this.


The spec may say one thing in this context, but that is only guidance. Technically, the response header says nothing about the body: they are entirely separate things. You can have any response code, and a body may or may not be present, regardless of the spec.

So yes, it is correct to say that response codes say nothing about the body.


In the case of a missing user ID, if a more verbose response is desirable, why not put the error message in the body of the response?

404: Route not found vs 404: User ID not found seems like it resolves all ambiguity, no?


Straight answer: HTTP 2 and 3 killed that (they only send codes, not an arbitrary message)


In the headers, yes. But you can still put whatever you want in the response body alongside that 404 code in the header


They can still send bodies, which is where GP said the message should be.


I disagree that returning a 404 for `/api/v1/employees/100` is wrong. If `/api/v1/employees/100` is the resource that is being requested, yet the record doesn't exist, then the resource doesn't exist. Much like `/some_photo.jpg` not existing would return 404 if missing too.

What if we changed `/some_photo.jpg` to `/photo?id=some_photo` and it didn't exist? 404? Okay, now what if we change `/photo?id=some_photo` to `/photo?id=100`? What if then change that to `/photo/100`? At which point does it no longer become okay for the request to be tied to the resource?

`/api/v11/employees/1` may be 404, and `/api/v1/employees/100` maybe 404 too, because neither of them are found. If anything, the problem is that HTTP status codes are limited and haven't really kept up with technology. We have a few additional codes, like with Cloudflare, but for the most part, there is no community project or standard for expanding HTTP Status Codes.

Perhaps there should be.


Many of your examples are changing the query string, not the resource path. That's an important designation in REST semantics. REST APIs shouldn't use query strings for accessing singular records with IDs, that's what direct resource paths are for. Anything accessed with a query string should be treated as a search.

/api/v1/employees/100 - direct path to a resource. 404 if it doesn't exist.

/some_photo.jpg - direct path to a file/resource. 404 if it doesn't exist.

/photo/100 - direct path to resource: photo ID 100. 404 if it doesn't exist.

/photo?id=100 - search for a photo with ID "100". Return a code 200 with an empty results set if that ID is not found. (With optional application-specific error feedback in the response body.) 404 if /photo doesn't exist at all.


The spec says:

>The 404 (Not Found) status code indicates that the origin server did not find a current representation for the target resource or is not willing to disclose that one exists.

So when you say:

>yet the record doesn't exist, then the resource doesn't exist.

So you do have a current representation for the target resource. Namely, that it doesn't exist.


> So you do have a current representation for the target resource.

No.

> Namely, that it doesn’t exist.

That it isn’t a representation, it is a fact about the universe (not the resource, because existence is not a predicate) which is inconsistent with their being a representation. A resource must exist to have a representation.


Yes, and sending a 404 status code is the correct and succinct way of expressing that fact.


404 says you didn't find it. Not that it doesn't exist.


> 404 says you didn’t find it. Not that it doesn’t exist.

Since a resource that does not exist cannot have any representation, it is therefore impossible for a representation to be found of such a resource. In fact, other than in the case where 421 is the correct response (“don’t ask me, I’m not responsible for that URL”) or where 406 is a more specific response than 404 (“I have the resource requested, but can’t produce a representation in the format you have requested, and don’t want to waste both of our time sending you a format you may not be able to use”), the only reason for a 404 is that the resource does not exist.


Exactly, my thought as well. Particularly, from QA perspective (automation or performance testing), testers often assume `200` means everything is fine; let's move on. But in reality, it is not.


Api calls and document requests are very different. the latter are primarily interpreted by CDNs and browsers and need to work for them; the former are called by code and handled by code.


For me it boils down to this:

If you ask for an employee that doesn't exist, is it a failure? Or is it just a negative, but expected response?

Perhaps that's a stupid question though.


If you write an API designed to return a single entity matching the path provided, it's absolutely an error if it doesn't exist. If your API returns the simplest possible JSON representation in a success case, what JSON should it return if it doesn't exist?


Nah, this guy is wrong. The reason that he's wrong is that URLs represent resources, such as, in his example API, employee data. If you submit a request for a resource that doesn't exist, then 404 is the correct response. His problem is in thinking that HTTP is just a transport layer for arbitrary applications to use for whatever they want. It's not; it's a framework for particular types of applications, using REST and HATEOAS. He's trying to write some kind of RPC application using REST syntax, but ignoring REST semantics.


This.

REST isn't RPC lite. If your mental model is that an HTTP GET is a function call, you think of 4xx's as exceptions, but that is not the correct mental model.


REST is not HTTP! I didn't mention REST for a reason :)

But that's a good point about the transport layer.


> REST is not HTTP!

REST over HTTP is HTTP used according to spec, because the REST architectural style involves using the underlying protocol(s) per spec, but only the subset of such protocols that corresponds to REST semantics (but in HTTP’s case, since HTTP/1.1 is itself designed around REST principles and HTTP/2 and HTTP/3 maintain HTTP/1.1 semantics, that’s pretty much the whole spec.)


Missing the point of HTTP for things like CDN's caching etc. Looks like an anti-pattern


So as a client, being able to check whether I likely have a usable result without parsing out the body has no value whatsoever? I'm not sold.

I think we need to be using HTTP error codes better at the API layer, not ditching them entirely. Building on this example, for an invalid user, how about an HTTP 204 and an empty response body? That gives us a clear "no such user exists" without having to parse JSON out first.

EDIT: for the record, I think that on some level, REST APIs themselves are an abuse of HTTP. JSON didn't even exist for much of the public Internet's first decade, and certainly the early work on HTTP never mentioned such a thing as a REST API. We're already using HTTP for so many things in so many ways that this is somewhat of a Lilliputian discussion.


The whole argument falls apart for me when it's acceptable to 404 when an api version isn't found, while it's not acceptable to 404 when an employee isn't found.

Why would we handle not finding an employee differently from not finding an API version?

If we're saying employees/100 should return 200 because it "could" exist, but doesn't at the moment, then why would we ever return anything other than 200? Any route "could" come to exist at a future point.


I am ambivalent about 200 for employee not found, but that is clearly a different category of error compared to an api version not existing. for employee not found, the error is purely in data: there is no responsive data, but the request itself was made to an existing endpoint/url/route correctly. no api version existing is an improperly made request: whether or not there is data cannot be determined as the request itself was problematic, so we never even got there.


The endpoint you called doesn’t exist. 404.

If you want to do something the author suggests, just abandon REST pathing and do a service call like api/v1/getemployee?id=100

Then the endpoint api/v1/getemployee exists, there’s just no results for your query.


The query is part of the URL used, along with the path, for specifying the primary resource. The part of a URL representing a secondary resource and whose representation is dependent on the result of a retrieval action on the primary resource is the fragment, not the query; see RFC 3986 (query) https://www.rfc-editor.org/rfc/rfc3986.html#section-3.4 (fragment) https://www.rfc-editor.org/rfc/rfc3986.html#section-3.5


The immediate rebuttal to this whole article is that you can have custom response bodies on any status code, not just when the status code is 200, so you can satisfy both kinds of users by having `/api/v1/employees/100` with `404` as the status code and `{ "success": false, "reason": "No employee with ID 100 exists" }` while `/api/v11/employees/1` can fail with `404` as the status code and `{ "success": false, "reason": "No such path: /api/v11", }`

Now for those who are exploring your API with curl or are reading logs from their service they can see the human-legible information that may help them debug the error, while their application logic can continue to use `2xx` for success, `4xx` for my own failure and `5xx` for a server failure that I should retry after some back-off time, allowing it to do the most sensible thing it can in the situation, rather than explode in an uncontrolled fashion when an endpoint the developers only got successful 200 responses from during development suddenly encounters a 200 response with an error payload.


> The immediate rebuttal to this whole article is that you can have custom response bodies on any status code

Not according to the spec, though you can for most status codes. (Consider, for example, 205.)


Hard pass - this is something that may be narrowly/semantically correct, but I’m not going to decode an HTTP response to figure out what business logic should happen after a bad lookup.

I’ll also point to other status codes like 400 Bad Request or 401 Unauthorized which give a clear reason for why something didn’t work as expected. Under the author’s framework all 4xx responses should be 200.


Honestly, if you're abusing it that way I'll probably be using error 400 for all semantically-invalid requests. This is also compliant with the definition.

> The 400 (Bad Request) status code indicates that the server cannot or will not process the request due to something that is perceived to be a client error (e.g., malformed request syntax, invalid request message framing, or deceptive request routing).


I don't think there's a correct answer, this is just an opinion I stumbled into this morning.

The objective mostly was to try help form a way of reasoning about HTTP APIs from the consumer's perspective

> error 400 for all semantically-invalid requests

Counter point: Passing `a` to the example endpoint is not semantically-invalid from the perspective of HTTP, it's a perfectly valid URL. Nothing about that request is invalid, it's only invalid to the business layer, which has nothing to do with the HTTP layer. Put another way: The server _is_ processing the request, that's how it send back the "employee not found" record. I guess it depends where you want to draw the boundary of server.

Which is not to say you are wrong if you return a 400, but I suspect you might get a couple queries about it from consumers.

The objective was mostly about clarity. Maybe I'm being over-simplistic, but I like the idea of using protocol errors for protocol problems and domain errors for domain problems. Just keeps client-side logic cleaner in my experience


I personally disagree (but I agree that there might not be a "correct" answer - unless IETF stepped and defined exactly what a bad request is). If the server knows that "a" is not valid, then on the server's perspective it's an invalid request - therefore it's semantically-invalid for that server (and a minor nitpick - "a" is invalid in almost all servers, it should be "/a"). Client errors need not to be the user agent - it encompasses even user errors.


That's a very good point.

It's also why I raised the issue of where your "server boundary" is. I dunno how else to phrase it, but what I mean is the separation of domain logic and technical logic.

In my understanding, 400 denotes an HTTP request that doesn't comply with the RFC (missing headers, bad format etc) but I also don't disagree with what you're saying, because the user did send a bad request.

Of course, you could always do both :) Send the opinionated payload with a 400 error.

As long as your consumer can reason about what went wrong, then I guess that's in line with the objective I was trying to get across.


Fair enough (and that's why edited the post to state that the document isn't clear about the intent of code 400). Just don't send the default servers' 4xx error, that'll confuse everyone where it went wrong.


> I don't think there's a correct answer

I think there is a correct answer: use the 404 to indicate that a requested resource does not exist, and use a 200 to return a requested resource that does exist; and use URLs to represent resources.

In other words, follow the HTTP RFCs.

And never, ever, ever EVER use a 200-series status code to return an error.


>Which is not to say you are wrong if you return a 400

5XX - Server did something wrong

4XX - Client system did something wrong

2XX - Everything worked

IMHO, what's missing is a code for "All systems worked but the user did something wrong" for scenarios like looking up a non existent ID or if their CC details don't work.


I've always interpreted 422 to fit this case: a well-formed request that cannot be processed due to the content in the payload.

https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/422


Unfortunately, literal reading of 422 implies that the sent content has error(s) and not the header. It's close, but I still feel that for GET requests it's wrong.


> This approach throws ambiguity straight out of the window.

(+) This approach reduces ambiguity between failed routes and missing resources

(-) This approach adds ambiguity by making missing resources look like they exist


IMO it does not matter much. You'll write some middleware around API anyway. Handling HTTP codes or handling JSON fields - the only thing that I wish is consistency, so I can handle it all in one place.

But what does matter is expectations. People naturally consume lots of HTTP APIs everywhere. And making things like everyone else's doing is important.

Everyone's using HTTP codes for their API. That's reality. So I think that it's better to stick to this format until we'll go some circle and get back to SOAP 3.0 or something.

If you would ask me, I'd prefer distinct set of statuses for APIs. So nginx will never return it, no matter what (mis)configuration happened. Right now if I misconfigure my reverse proxy, it'll return 404 for everything. This should be gateway error, but it's not and some clients will happily work thinking that everything not found which is not true. So something like 1200/1404/whatever for APIs and 200/404 for other things. And author's approach will not help, because it's equally easy to misconfigure nginx to return 200 with some garbage SPA HTML for every request. Well, at least there's hope that failing to parse it as JSON will signal to client that something's not quite right.


404 means "this URL doesn't point to a known resource".

HTTP doesn't and shouldn't make a distinction between "this URL maps to a known route on my API backend but the ID doesn't exist on the database" and "this URL is not mapped on my API backend", this is leaking implementation details to the client.

This not "abusing" status codes, this is the expected behaviour for an HTTP server.


I fundamentally disagree with this as being the correct code, a http get request is give me this resource, not does this resource exist, however it is all about context.

It is your API, it is your decision to decide if you have looked for the resource and were unable to find it, so return an error, or you have looked for the resource and found a placeholder that says there could be a resource here, but currently isn't and successfully return that placeholder.

To suggest people are wrong or abusing the protocol for not choosing to implement placeholders in their application is incorrect. To suggest people may benefit from implementing a placeholder instead of a 404 error is useful.

My personal experience is that the error status code here is much more useful than the message. It's a big red/orange flag that says don't do what you normally do with this resource. But if your API lends itself particularly to being queried for things that often don't exist then those red flags become noise that mask other things you really do want to flag as errors.


It really doesn’t matter how the API behaves unless there are external users.

When you need to convince other people to use your API then those people will bring their own expectations how the API behaves and then you need to manage those expectations. If your API is esocentric on its design choices other people might find it unexpected, curse and move to do something else.


The suggestion at the end looks inconsistent:

  /api/v1/employees/100 -> StatusCode: 200
vs

  /api/v11/employees/1 -> StatusCode: 404
Should these be both "$thing not found" errors and indicated in the same way? The first example is "employee not found" and the 2nd is "API version not found" :)


This reminds me of the “HTML should be semantic” and “CSS class names should be semantic” crowd. People aren’t robots, and we can’t/don’t write crystalline code, nor arguably should we. The status codes are extraordinarily convenient, both to set and to read, and the ship has sailed already on whether to use them this way.


I think this is contrary to what most would expect, and therefore "wrong" or at best annoying to work with.

You can still include a payload with a 404 describing what is missing or why it wasn't found.

If I see a 2xx status code, I assume the request succeeded. OP's way forces an additional check, more code, more potential for bugs.


Opening with "You're using X wrong" or "You're doing X wrong" is a good way to get a reader to ctrl-w and move on. In my experience it is more often than not followed by questionable advice or incorrect assumptions so my instinct is to just go " ah one of these" and tab away.


Many moons ago I worked with a system like this. An XML based API that returned 200 because "the server and/or service isn't broken" but the body would contain errors.

I understand what the owner of the API is trying to convey. "My service is up, and the URL itself exists" (status 200), but in the body "the data you requested doesn't exist" (result: true, error: "doesn't exist). If you are stubborn enough, you can even think you're right.

But by Odin's Beard this was one of the hardest APIs I had to work with. It doesn't help that most HTTP clients assume a sane* API expecting error codes based on the resources.

Note: I say sane, but within this context, this is of course subjective. For me, error codes corresponding to the resources is sane, but to OP this is clearly not the case.


Not only that, but you're using nonstandard ports for your WebSocket connection (wss://blog.slimjim.xyz:1313/livereload)

Since this makes my firewall dialog pop every couple of seconds after I close it, I prefer to close the page without reading it.

Why not just serve it over 443 as well?


Heh - indeed. Methinks the emperor has no clothes.


The author simply disagrees with the the HTTP RFC and a couple decades and many of thousands of implementors.

There's nothing wrong with that, and many implementors and applications have certainly taken the tack the author suggests -- i.e. use HTTP strictly as a simple transport layer rather than a representation of application state -- GraphQL is a great example.

But the rfc is pretty darn clear: "The 404 (Not Found) status code indicates that the origin server did not find a current representation for the target resource or is not willing to disclose that one exists."

I can't find a reasonable interpretation of this that somehow means if the server can't find the document you asked for, you should still return 200.


The author doesn't disagree with the HTTP RFC at all.

All the author disagrees with is mixing the http server layer and application client layer status codes in the same client response being given to users.

The only argument anyone should be having here is if they believe their api should speak HTTP to their client or if their api should speak through HTTP to their client.

Personally?

I think trying to shove business level response logic in HTTP (or REST for that matter) to be monkey butt level stupidity.

HTTP was built as a RESTful protocol to serving files up to web browsers. And it works great for that.

However, it was not designed to handle the 25 unique and user-readable or not-user-readable errors my one /authenticate route can throw back at the user at different times.

REST and HTTP by extension excels at simple CRUD or file serving operations. But it's just not suitable for these kinds of API's.

Here's an example:

If my HTTP speaking API needs to respond with an error, how should my client handle it?

How should I differentiate errors that should be displayed to users and errors that should be used only for debugging?

If you need to suggest that my client needs to parse out the errors or differentiate between server and api errors in order to be able to show these errors to users then you've sort of proved my point that HTTP isn't a suitable solution to this problem - you're already wrapping an extra non-HTTP protocol inside HTTP, you're just only doing it for errors instead of all application responses.


I've started doing this after dealing with a 'smart' load balancer. The good old SOAP protocol requires all SOAP faults to return a 500 status error. And people tended to map SOAP Faults to programming language exceptions representing business errors.

You know what happens next. If a client generates too much business errors, the load balancer tends to throw out a server, which doesn't bother the client in the slightest, except it has to login again for the next request. Keep this up for a few minutes, and it makes all but 1 servers disappear from the pool, and the only survivor to spend half its time negotiating connections.


Posting another comment rather than updating my last one: Thanks to all for the feedback.

After knocking it around in my head, I concede I am wrong :) HTTP is, after all, an Application layer protocol.

Whilst I remain unconvinced by some of the arguments, that one that got through to me was mostly about the reasoning behind using an application layer protocol in the first place: standards. And this breaks the crap out of those standards.

The correct answer is probably closer to a combination of status codes and a _clear_ response message (as well as the correct Content-Type header!)

An empty 404 is ambiguous, which is surprising to nobody :) fair points all round


Everyone who is responding "well, the author just doesn't really understand REST", is pretty much correct, but I feel like they're ignoring the broader point of practicality. I absolutely have seen the problem the author describes, not to mention other issues that come from the fact that using response codes in REST doesn't really separate "technical" problems from "business/domain" problems.

Just another reason I'm a big fan of GraphQL - error handling is much more clearly defined, with responses returning a 200 and the body defining the error.


Sorry but this just sounds like "hey if you could talk to my server in any capacity then its 200!". Which means that all http status codes immediately collapse into a weird state of 200. No thank you.


I wouldn't say you're using status codes wrong - I'd say there's a mismatch between how you're using URLs and how you're using status codes.

If your API was: `GET /api/v1?employee=1`, then returning 200 with a `{ result: false }` for employee 100 makes sense, because a resource identified by the URL `/api/v1` exists, but the parameter asks for something that doesn't. And `GET /api/v11?employee=1` returning 404 is also consistent, as there is no `/api/v11` resource.

However, by setting your API to `GET /api/v1/employees/1` you're saying that each employee is resource identified by a URL, and so using HTTP status codes to say whether that resource exists is consistent with the way that API uses URLs.

The way you're using status codes isn't intrinsically wrong. It would make absolute sense if your API used URLs in a way that complemented that usage. The only problem is that you aren't.

I am wondering, do you have one endpoint per API version, or one endpoint per resource type? i.e. Should `GET /api/v1/xyzzy/1` return 200 or 404 if there is no `xyzzy` resource type? What if you decide to change your architecture and split a monolithic API into multiple services? Or do the opposite?


So, to address this on its own terms, talking about HTTP as an 'application programmer', ignoring browsers and REST and the whole value add of the HTTP stack in terms of caching and security and content negotiation... just thinking of HTTP as a way for a client to call a server and get back a message:

This approach is the HTTP-server equivalent of exposing a function that sometimes returns null.

When you call a function like that in your code, the call succeeds and returns, but the caller needs to now inspect the returned object to see if it is actually useful.

In this case, the HTTP call succeeds, but you now have to parse the result to figure out whether you got something useful.

In code, there are a few alternatives to having a function returning null: throwing an exception, the null object pattern, or the optional type/maybe monad.

Using an HTTP status code lets your client choose an HTTP client implementation that manifests as one of these, according to their preference, simplifying their code.

So just from an application design point of view... if you wouldn't build a function with that return signature, why would you build an HTTP endpoint that does that?


Why people keep trying to reinterpret HTTP? Honestly it's even worrying that so many of these kind of articles pop up lately.


I've been in a SysAdmin/SRE type role for many years and I've used 200 status codes & non-200 codes as part of an automated site-check script for most of it. I've used the below script to automate scanning of 3000+ sites. (other optimizations exist, but this gets the job done quite nicely)

I like to do 2 layers of checks, (1) Is the site returning a 200? (2) Can the site return a specific file?

This script will send me an email alert if the site is not returning a 200 status code, but it's easy to daisy-chain other commands too.

Here's an example of (1):

#!/usr/local/bin/bash

INPUT="/usr/sync/urls.txt"

#DOWN="down.txt"

TODAY="$(date +'%Y-%m-%d_%R:%S')"

while read line ; do

status_code=$(/usr/local/bin/curl -o /dev/null --silent --head --write-out '%{http_code}\n' $line)

if [ $status_code -ne "200" ]

then

echo "$line is down since $TODAY" |mail -s Site_DOWN email@example.com

#echo "$line at $TODAY is DOWN" >> $DOWN #example to output to a file.

#You can add further actions here

fi

done </usr/sync/urls.txt


Well I suppose this is solves a problem with differentiating between malformed payloads and such compared to API-specific errors. But in reality, you're just multiplexing the errors inside the 200 response without standard categorization. HTTP errors are nice because it gives the user immediate direction where to debug their problem. If an API returned 200 with error: 'unavailable to process a request' I'd argue that's way worse than returning 403 with no message at all.

Ultimately I think why this is so confusing is because HTTP error codes are confusing and they should be clearly defined and/or include more codes specific to application errors rather than the network. It makes life much easier when you can see immediately the type of error you receive and if I had to depend on my API producer to write sensible error messages using 200 return codes I'd lose my mind.


On the contrary, I was happy that with REST, we finally didn't put something on top of the current already kinda big pile, but used the topmost layer in a sensible way. 4xx responses can also have a body, which could explain to the caller whether it's the entity or the endpoint that doesn't exist.


Right, but why stop there: going to different URLs for different resources is also nuts. REST is fundamentally broken from this conflation of concerns across different layers.

GraphQL — where you are sending and receiving data (and application errors) to a specific endpoint is much much saner.


No, REST is good, but you should either use it or not. If you're not going to use it, then you really ought to be sending and receiving data to a specific endpoint. And once you're doing that, it actually starts to be questionable whether you need or want HTTP at all (ignoring the hellworld of middleboxes we live in).


> going to different URLs for different resources is also nuts

It’s not, and if you want to know why it is not, think about what “URL” stands for.


I encountered this type of architecture in one of our company's internal services. At first I thought the developers just didn't know about other status codes, but now I read this and I can see there is some rationality behind it.

My take is that this kind of architecture might be fine for personal applications, but once you start working with other people this can get infuriating very fast (I know I was). Others already mentioned why this would be a bad idea (caching, etc.), but for me the most important thing is that it's not conventional. I don't know of any open source web framework that handles http statuses this way out of the box. If someone does please comment so I can have a look.


The idea described here is implemented in e.g. TikTok's API, and as a consumer of that API (in Java at least), it has been awful to code against.

Because most people abuse status codes, most tooling is built around status codes. All of the Java HTTP libraries, for instance, throw very nice and easily-handle-able exceptions for non-200 status codes, but do nothing at all for 200 codes. So, now, we need to:

* try/catch the call anyway, because it might fail * parse the result * figure out if the result is an error or OK * throw / return early from error results * separate that throw from the try above

It's all so much hassle to avoid a mistake that _client libraries don't make_ : using the wrong URL in the first place.


As a Service Provider, the ambiguity of 404's we return is purposeful given that callers can be adversarial. All of these borked protocols were designed when we were all young and naive and all was peace and love. We have to live with them and deal with them as they are.

If you care about your callers and they're internal trusted partners, you can return a traceId header they can use to diagnose the 404 to find out if the request was malformed somehow or the requested ID was truly missing or something else bad happened. Splunk can be a pretty rad thing when everyone is on the OpenTelemetry bandwagon. Otherwise, assume the caller is evil and treat them accordingly.


Why is this person making up their own rules rather than following the standard? https://developer.mozilla.org/en-US/docs/Web/HTTP/Status

You should return the best-fitting error code and an error response in an expected format for the end user you are serving (either your own frontend, or your own api users).

Using 200 for everything means you can't use a whole lot of enterprise server / error tracking software to monitor failures related to the business logic, which are generally as important as most technical failures.


> HTTP is just a protocol defining behavior that belongs in layer 7. It’s not a transport layer in a technical sense, but from the perspective of an API it’s mostly just TCP with extra steps

That's basically arguing against REST and for RPC over HTTP


I'd really like to see the 402 method implemented with Lightning on a machine learning model site for authentication and use of the model. This way, other models could make calls on behalf of the queries they receive to their networks.

https://github.com/lightninglabs/aperture

https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/402


I agree with this, and in a way where I don't really see anyone who disagrees as a peer. We can agree to disagree, but it's condescending on my part.

On one team we once also came up with a compromise where we would use 2 status codes. 200 for all success, 400 for all errors. However I would prefer HTTP status codes just go away altogether. Those are for the web browser software to do things like navigation, not for APIs. Same with HTTP verbs.

Aside from the theoretical reasons why status codes aren't good for APIs, there's also a practical reason. Using the same status code for all intentional results is best because it allows the clients to write simpler error handling code. If they get a 200, then they also get a payload describing success or failure. If you get any other HTTP status, it's an unexpected or unknown error. If you were to instead use many HTTP statuses, each one comes in the known and unknown varieties, where the error description payload may or may not be there. In the simplest cases, this is equivalent. But in a more complicated case, where you many have more than 2 HTTP errors and also more than 2 application-level errors described in the payload, it becomes difficult to handle all the errors correctly.

Since I'm obsessed with telemetry data, as any seasoned engineer eventually will be, the conflation between HTTP error data and application error data is vexing in logs and telemetry. In older APIs I work with, the errors are all conflated in this way--errors were being tracked and categorized in some cases simply by their HTTP status. But there's 400s, and then there's 400s. There were unexpected 500s, and then there were 500s we understood and could have added data to. Sometimes a transient 500 could actually be understood as a 400. The results are not correct, and they would never be correct because it is not manageable.

People get so enamored with trying to make REST APIs into this rigorous thing. The REST thesis was an old academic trick where Fielding just described the way web browsers work, but changed all the specific nouns into spooky abstract ones. Most of the ceremony around HTTP trappings like status codes and verbs are there for the browser and are white elephants, obstacles, when trying to write an API.


> I agree with this, and in a way where I don't really see anyone who disagrees as a peer.

Oddly enough, I disagree with this, and in a way where I don't really see anyone who agrees as a peer.

Why? This approach fundamentally misunderstands the HTTP model. HTTP is not RPC. Again, HTTP is not RPC. Nor is it, as the article claims 'mostly just TCP with extra steps.' HTTP is a protocol for hypertext transfer which enables one to design applications not as APIs, but as state machines.

> If they get a 200, then they also get a payload describing success or failure. If you get any other HTTP status, it's an unexpected or unknown error.

That's just incorrectly written server software. A 400-series error may/should carry a payload. A 500-series error may/should carry a payload.

Use HTTP the way it is intended and designed, and these things are easy; misuse it and they are hard (and as you note, unmanageable). So … don't do that!


HTTP is the only protocol that's supported by browsers. I don't need hypertext in my APIs. I want RPC in my APIs. But browsers force me to use HTTP. So don't complain if my RPC APIs don't fit into the whole REST philosophy, as I don't care about philosophy, I'm engineer and I care about technical arguments. For all I care HTTP is a message protocol with headers to carry some attributes and some bytes in payload. For all I care, between client and server could be some proxies which can cache something (and thanks to HTTPS, I can be sure that all proxies are controlled by me). For all I care, between client and server could be some analytics software which will analyze traffic and might mark node as unhealthy based on 5xx HTTP codes or something like that. That's technical arguments and I do consider it.

Spending time arguing whether 400 or 404 or 422 is better fit is not engineering. You must return exact sting error code and that's about it. Whether that string error code wrapped in 400, 404 or 422 error does not matter. I, personally, use 400 errors everywhere because they're unlikely to occur from misconfigured reverse proxies and they won't confuse my clients which treat everything that's not 400 as gateway error. I think that I should start using HTTP 202 for all successful responses as well.


> I don't need hypertext in my APIs. I want RPC in my APIs.

You may want RPCs, but you need state transfer instead, for reasons which are too long to go into here but are explained pretty well in the original REST thesis.

> I'm engineer and I care about technical arguments

IIRC the REST thesis gives technical arguments against RPC, but in general the issue is that RPC is broken in a 'fallacies programmers believe about networks' way. The right way to perform actions across a network is not remote procedure calls, but transferring state representations.


> People get so enamored with trying to make REST APIs into this rigorous thing.

It is a rigorous thing, though. It's a standard, and one that has existed for a very long time.

> I would prefer HTTP status codes just go away altogether. Those are for the web browser software to do things like navigation, not for APIs. Same with HTTP verbs.

Verbs and status codes are fundamental for APIs that consume REST endpoints.

I totally understand thinking that vanilla REST is not the perfect model for every case. Maybe you should be using SOAP, or GraphQL, or HATEOAS. What a seasoned engineer shouldn't be doing is abandoning a shared specification and expecting everyone else to comply.

If I am using a third party's API and they serve nonsensical status codes (4xx when their DB is down, 2xx to let me know something failed), I will reach out and ask for them to fix their bug. If they tell me that the error code is deliberate, I will probably look for a more sensible vendor to work with instead.


Principle of least surprise would have you follow the spec. How much easier is it when you have to explain to everyone who uses your API that you have your own interpretation of how status codes are supposed to work?


This is like saying that if /web/index.html exists we shouldn't return 404 when /web/junk.html does not exist. That's clearly nonsense and not how http was designed to work.


That was what I meant by "actual web servers" :D poorly described though, fair enough.

I should have been clearer about this relating to HTTP RPC. I updated the post.

That said, after reading the responses here, I can see that what I've actually achieved is making the response harder to determine for the client, which is antithetical to the objective.


I've been low-key abusing emojis in binary protocols: result=0x3a29


Feels like a strawman argument - returning a 200 with an error payload vs an opaque 404 are not the only options.

If you need to differentiate between a technical error and a business domain error, we have other tools: A whole list of 4xx errors can be used for slightly different semantics (400 for "Bad request", 422 for "Unprocessable entity", etc). And we can attach a response body to a response of any status code, not just a 200. I'd definitely try these tools first rather than break HTTP semantics.


Author thinks only a 200 response can contain a body or something


That's not the takeway I was after, I was pretty certain I had clearly described my intention of debating the usage of HTTP status codes as domain error messages.

Not really sure why you think I'm of the opinion only 2xx codes can have responses, perhaps I could have been clearer.


It's because of these things:

1. If I were to call /api/v11/employees/1 I would get the exact same error.

2. Your examples on the bottom of the post shows a response code and body for 2xx but not a body for the 404.


I don't understand. Using the definition in the article, wouldn't other 4xx statuses like 401 or 403 also be business/domain error and and not technical error?


What I like about this approach is that if you are using any other protocols for transport, everything becomes instantly portable. Imagine having the same logic serving objects per IDs through HTTP and websockets. Rest makes things more clumsy. Though I would concede doing things the REST way, as in the app-server participating in the http headers/status enables caching and all sorts of things that would be hell to do through otherwise.


Depends on the API I guess.

If you are using graphql or jsonrpc or something, then sure return a 200 "The request was ok, but go look in errors"

If you are using REST then I think it is fine to return a 404 - "The thing you wanted doesnt exist". That seems correct to me, the thing you wanted to get doesn't exist.

The fact that /non-existant-path also returns a 404 is also correct, the thing/resource you wanted doesn't exist.


There is a standard for the "solution" in the article, passing the parameter on the body of a POST, like a remote call execution. That way you always return 200 if the call is correct, and the content tells you the result or errors. The REST standard uses the error codes as part of it, and using a standard wrong will mean that you can't use most standard tools, which is probably worst.


> The REST standard

There is no such thing as a "REST standard" and that's why we've been having all these pointless arguments for the last 15 years. If Fielding really wanted web developers to use REST he would have wrote a normative spec about REST.

Seems to me that REST was never about developers writing web API. The whole "smart client" capable of API discovery through "hyperlinks" is... a browser with a user clicking on those links. Except that API aren't consumed by browsers but other apps that are neither as smart nor complex.


> Seems to me that REST was never about developers writing web API.

You are correct. <https://news.ycombinator.com/item?id=23672561>


There is a wonderful RFC which solves the same problem - Problem Details for HTTP APIs. This approach is quite popular in good quality APIs. I was quite surprised that nobody mentioned it before.

- https://datatracker.ietf.org/doc/html/rfc7807


I agree with the author. HTTP is used as a transport protocol for your API, not the API itself. It's one layer lower. You don't expect to have to look at what TCP flags you got as part of the API, nor should you care about what HTTP errors you get as part of your API

HTTP transports content, and the content is your API. HTTP should not be part of your API


In typical business applications you often have two endpoints: One to fetch a single resource and one to fetch all resources of that type.

Why not merge them and work with lists, I think a lot people would intuitively understand the following:

GET /users?ids=1

{ "users": [] }

GET /users?ids=1,2,3,4

{ "users": [ { "id": 3 } ] }

Transport the ids in any way you like, that's not my point.


I'm with the author here, I've had this conversation with the other devs on my team a few times and each time it lands on 404 if there's no data.. Its just not as useful as the dev who makes the client side applications. Its more convoluted because the server had no error though it returns an error code so as a client side developer, I'm left to decide was my URL invalid and the server really did 404? or was there no payload for the request so I received a 404.

Ideally, the bad paths all are caught during dev-time but that just can't be assumed to always happen, it doesn't account for any of the risk involved with other systems changing and then 404's popping up. It could happen for other unexpected reasons so why should we assume it won't, it seems like we're just covering our eyes to the problem.

for a short example, Imagine a brand-new account was made in a system. should it make some sort of GET request for a specific component of its account that is created asynchronously to the rest of the account creation, eg a 3rd party billing service, and receive a 404 from the server, there are different interpretations that the client could have around this error code! does it mean one such component was truly not created and needs to be? could it be that as a client we know implicitly of this delay and fudge a 404? using 404 as a catch all for these cases seems error prone and less informative than returning the true results of the GET request. I posit that any good client-side software handles 0 results from a request with an empty state, and 1 or more with a results state and 4XX with the error state(s). its more clear, trustworthy and easier to debug.


There’s a 202 response code for “not yet fully created” https://stackoverflow.com/questions/11746894/what-is-the-pro...


It's all logical until you are dealing with an outage when your clients can't tell the difference between a wrong endpoint and a missing item.

Example: https://news.ycombinator.com/item?id=31849488


I think the first problem with the examples is that cool URIs don't change. This goes double for API paths.

Putting the API version in the path of a request is not a good solution. Doing it introduces exactly this kind of ambiguities that could be avoided without them.


REST stands for Representational State Transfer, so http status code has business meaning to the "state" itself. 404 is not ambigous. The real ambiguity is from 404, 400, 422 errors that you choose to return based on your validation logic.


Y'all have given me a lot to chew on.

I would like to point out a few of the detractors are conflating REST and HTTP RPC, I avoided using the term REST for a reason :)

BUT, that being said, lots of good arguments against this stance. I appreciate the feedback :)


Isn't the solution to this just 400 for bad path, and returning a not found is 404 plus body of "entity not found"? Having to always check result: true is by biggest annoyance in APIs designed like that.


Returning a 404 makes more sense and you can still include a response body

  status: 404, 
  body: {
      "result": false,
      "errorMessage": "No employee found for ID 100"
  }


Reminds me about a comment on graphql couple days ago: https://news.ycombinator.com/item?id=32037909


Here’s the issue:

  /api/v1/employees/<employee_id>
These URLs, which philosophically we’re trying to convince each other are opaque resource identifiers, are actually structured requests.

There is an API being provided, specifying how to craft the URL, among other things. It’s not a “resource” at all, it’s a function call, and the difference becomes extremely evident as soon as you start needing to add variations to the request using query parameters.

Good APIs have few nouns and many verbs. That’s exactly the opposite of REST, which says you should have many nouns (URLs) and few verbs (HTTP methods).

REST is a misguided idea that an academic had years ago. It doesn’t work well and it certainly doesn’t represent some philosophical ideal. We should stop pursuing it.


I think the best way to use REST is to not be cultish about it which just results in more discussions and unread RFCs on your org wiki. I always endup with erring on the side of more documentation


Have they never heard of codes other than 2xx and 4xx?


I even put 500 in the post :)

I was hoping to start a discussion around using HTTP status codes as domain error codes, as opposed to an opinionated payload.

Maybe not as clear as I could have made it.


In HTTP, URLs are opaque identifiers. If you want to isolate semantics from a URL, then use query parameters.


Yes! This is absolutely one of my biggest pet peeves.

A 404 should not be shown on a valid path with an empty result. That’s exactly what 204 is meant for.

Moreover, a 404 in many REST clients will bubble up an error in the application code because it thinks there’s something wrong (you’re trying to reach something in the wrong place). That’s not the case for an empty result, any more than an empty list is an error.


From the perspective of an HTTP server, a resource either exists, or doesn't. There's no concept of a "valid path".

If you do GET /a/b/c, if /a/b/c exists you get a 200, if it doesn't you get a 400 (and if you're not allowed to access it you get a 401 or 403 depending on some other details, and if the server has a problem while replying you get a 500 etc).

This is a very important concept about what an HTTP URL means, and how it is understood by HTTP servers and middle-boxes (caches, proxies etc).


That’s just it though, the resource does exist. The resource is the API to retrieve a result.

The result being empty doesn’t mean the resource doesn’t exist, in the context of an API.


> That’s just it though, the resource does exist. The resource is the API to retrieve a result.

That is not true if the API is RESTful - in that case, the API is the resource. An item with the requested URL either exists or doesn't. If it doesn't, then return 404 and (if possible) give some more information.


Reading the spec, that's definitely not what 204 is meant for.

https://httpwg.org/specs/rfc7231.html#rfc.section.6.3.5

> The 204 (No Content) status code indicates that the server has successfully fulfilled the request and that there is no additional content to send in the response payload body.

I don't see how that could be intepreted to mean "You have requested a resource which does not exist". It's intended for use cases like "Your request to save the file has been received, no further content will be sent".

Or am I misreading the spec?


I don't know if it's still true, but back in the day returning 204 would mean the browser page won't refresh. It was one of the way to send data to server without page reloading.

Sending 204 with body definitely break the spec.


The OP is neglecting the important property that non 2xx responses can contain a body.


Wouldn't 426 be better for nonexisting api roots?


Sho, I don't know. Maybe?

I've never seen one in the wild, but looking at the docs is that error not reserved for using the wrong protocol (such as 1/1 when the server only accepts 2 or something?)


Official yes, but if you want to misuse statuscodes this would be a better solution in my eyes.


Interesting take.

Feels cleaner to me, and while I want to argue about the opinionated payload bit, I suppose that's something that should be clearly defined in the contract anyway.

Maybe I've been abusing status codes too...


That’s how GraphQL is doing when used over HTTP and it feels a bit wrong in the beginning but I also think it is better. And you don’t have to fit your errors into HTTP semantics.


GraphQL does that [everything status 200] because it is designed to work with Browser agents on-purpose self-inflicted technical limitations.

Browsers don't return (in XHR/fetch requests) any response in the 400-599 status range.


I think you are not correct on this.


> That’s how GraphQL is doing when used over HTTP and it feels a bit wrong in the beginning

GraphQL is maldesigned; it is an abuse of HTTP and should not be emulated.


Why do you think so?


Clear, unambiguous and - wrong.


> Why do I get a 404 here?

> [...]

> As an API consumer, all I want to do here is raise my middle finger.

Why? As an API consumer this makes sense to me. The author doesn't go into any detail on why it presumably doesn't make sense to them, nor what problems it presents from a client implementation standpoint?

> RFC 7230 defines HTTP as an Application Layer protocol, which means it should represent application logic, right?

No. What? No. Why on earth would it mean that. It's a protocol at the application layer - it has nothing to say or do w.r.t. the internal implementation details of that application. Why on earth would it?

None of this article makes any sense whatsoever. The author is making some wild assumptions that seem entirely unique to themselves - I've certainly never seen anyone else face these logical challenges.

> In any API call, there are 2 problems to solve as a client when you are processing the response:

> 1: Did the technical request succeed?

> 2: Did the business/domain request succeed?

Where in your client code are you trying to solve these problems? Is it in the same place? If so, you've over-generalised / over-DRY-ed your error handling.

If you've approached writing your client in a sane manner, you have business/domain context at the request site - you can tell the granularity of your request validity by the code location where you're handling the response.

As a developer of client code, 4xx means "I did something wrong". It's not the server's responsibility to know how I've implemented URL-building and whether or not I'm using dynamic version strings. Take the author's example:

/api/v1/employees/100 -vs- /api/v11/employees/1

Here's 2 ways I could build that URL (pseudocode):

  base = '/api/v1'
  section = '/employees'
  url = '$base/$section/$id'

  v = '1'
  url = '/api/v$v/employees/$id'

The author's assuming something like the 2nd way, and supposing the API should be aware your client code could have supplied an invalid value for $v. But what if it's the 1st and you've supplied an invalid value for $base? Expecting knowledge of your client implementation on the side of the API implementers is nonsense.

---

As for the solutions proposed, there's two problems:

Problem 1 (harmless enough): The error context provided in the 200 response body will only be specifically useful for the author's example client implementation - many different clients will consume your API and their implementations will differ. For some of them, the context will be of less/no use (coupled with added developer confusion from the unexpected statuscode).

Problem 2 (harmful): This kind of context is going to enable OSINT user & metadata enumeration in a very large number of API applications, so recommending this as general advice is actively dangerous.


slow clap


I have been doing things the same way for a while. In my RPC API everything is POST (even for "getting" something), and everything returns 200.

I don't like the design of REST. I don't think status codes should have meaning to an application. Why? Because there is not a defined status code for every type of problem.

One thing I have often run into is: what status code are you supposed to use when business rules aren't followed? Let's say the resource exists, so it's not a 404, and the request is properly formatted as JSON and all the fields have valid values, so it's not a 400. But the user is trying to do an invalid action, like booking an appointment slot that is already full. What status code are you going to use?

For this reason, I've seen two types of code bases that use status codes. The first one is error handling soup:

    if (res.statusCode == 404) {
        // Display a "not found" error to user
    } else if (res.statusCode == 400) {
        // There will probably be some validation errors in the body, so display those
    } else if (res.statusCode == 403) {
        // show a "forbidden" error to user
    } else if (res.body.error) {
        // Aha! We have some error that doesn't have a defined status code.
        // This block will contain a completely different type of error handling,
        // based on information found in the body.
    }
And it's just a mess. The second one is a bit better, where they only care about 200 or not-200, and pass error information through the body:

    if (res.statusCode != 200) {
        // Error handling reading information from res.body
    }
But why put in the work to use correct status codes on the server side if it essentially comes down to a boolean value?

So my solution is to always have a boolean value called "ok", and if "ok" is not true there's always a human-readable error you can show to the user.

    if (!res.body.ok) {
        // Show res.body.error to the user
    }
There's a bit more to it since I also account for passing back field-by-field validation errors, but the point is that I'm always returning 200 and I am always reading error information out of the body. If some resource doesn't exist, the error message will say that. If the user is forbidden from doing something, the error message will say that. If the appointment slot is full, the error will say that.

There are only two places where I use status codes.

If some unexpected error happens (like the database is down), I return 500. If my frontend ever sees 500 it sends the error to my error reporting system.

If the user is not authenticated I return 401. If the frontend ever sees 401 it automatically redirects the user to the login screen.

Importantly, both of these things are hidden away in a library I wrote, so my application code never thinks about them. It just thinks about !ok.

Status codes work for things that are generic from the point of view of the client, in the sense that the client doesn't care if my database is down or if I misconfigured something or if I ran out of memory. It only cares if "something bad happened that needs to be reported to the devs", which is 500, or "this user is no longer logged in so they need to log in", which is 401.

For everything else, the client does care about the specifics of what the error means, so I need to pass it that information. Status codes don't work for that.


Stupid post


I think this is essentially incorrect. However. It can sometimes be practical to wrap a model in a result object. I agree with that.

The thing with response codes is that they are often logged in contexts where it is unpractical to read and parse the response body. Like for rate-limit rules in layer 7 proxies and so on.

404 are overused though. Sometimes it is better to design around the problem to only get “real” 404:s so to say. 204:s can also be used even though any REST fanatic will disagree with that.


It is always baffling what you get downvoted for on HN. To have a get by id endpoint that does not return 404 if the entity is not found. Well, I fired people for less than that.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: