Hacker News new | past | comments | ask | show | jobs | submit login
How to (and how not to) design REST APIs (github.com/stickfigure)
192 points by stickfigure 11 months ago | hide | past | favorite | 148 comments



This falls down as soon as it makes a fundamental misunderstanding of what makes a REST api into a REST api.

It gives this as a ‘bad’ example:

   GET /v3/application/shops/{shop_id}/listings/{listing_id}/properties
With the justification that “The {listing_id} is globally unique; there's no reason for {shop_id} to be part of the URL. “

No the point of the API is that /v3/application/shops/{shop_id}/listings/{listing_id}/properties is a globally unique identifier. Your belief that parts of that id have global meaning outside the context of that identifier is irrelevant - that path is the identifier for the resource.

And having hierarchical paths is useful because you can do things like manage permissions on parts of the hierarchy - users might have permission to check listings in certain shops and we can characterize that as them having permission on /v3/application/shops/{shop_id}/listings/*.

Directory structures of resource identifiers are good and logical and not a ‘bad’ API design practice at all. You might as well argue the UNIX file system is a bad design because all the files have a unique inode id so paths are completely unnecessary.


You apparently have not actually used Etsy's API.

No, the shop id is not in fact part of the globally unique identifier of an Etsy listing, and the properties are not dependent on the shop. Etsy listings have a 1:N relationship with Etsy shops.

The API was a mistake, which they are slowly correcting - they've already changed:

    GET /v3/application/shops/{shop_id}/listings/{listing_id}
to:

    GET /v3/application/listings/{listing_id}
...and I presume they will eventually change the rest of the listing-related endpoints over time.

Managing permissions using the hierarchy of a URL is silly at best, dangerous at worst. The first thing any attacker will do is plug in an alternative shop id and see if it grants access to the non-permitted listing. If permissions are attached to the shop (and for Etsy, they are) the server needs to load the listing, figure out the associated shop, and then check permissions. The client cannot be trusted to provide the correct shop id, so there's no point in asking for it.


That’s a critique of Etsy’s API not of good REST resource identification.

No plugging in a shop you have permission to doesn’t work if your resources are hierarchical any more than plugging ~/passwd let’s you read /etc/passwd because you have read access to your home directory. Those are different resources and one of them exists and is locked down and the other one doesn’t exist.


> Managing permissions using the hierarchy of a URL is silly at best, dangerous at worst.

Or perhaps what you call silly is just you being unaware of what you don't know. There are valid cases to handle permissions using the structure of URLs. As well, the danger you allude to comes from handling it naively. Even the hypothetical attack you suggest might be among the first thing any non-tech savvy person might think of trying.

The scenario you're describing above is simply one of dealing with redundant information in a situation where inferring the whole from the part is not detrimental (for the platform). A case can certainly be made that with that simplification, some optimization opportunities are also lost. Perhaps Etsy doesn't need them. Others might.

> The client cannot be trusted to provide the correct shop id.

The client cannot be trusted period. If I provide a signed cookie that contains a list of authorized shops and they return something else, good thing that cookie is signed. Also good thing the cookie contains the shops, no need to touch the disk if the URL doesn't match the list.


You apparently haven't read Fielding's paper. Etsy isn't doing it right. If you read Fielding's paper, OP's point is correct. The whole URL is a resource.


The author gives a reason for that recommendation

> [having shop_id in the URL] inevitably causes problems when your invariant changes down the road - say, a listing moves to a different store or can be listed in multiple stores.

Basically the choice is between having a perpetual unique URL to a listing or multiple ones, maybe valid at the same time and some of them maybe invalid in future, when a listing is removed from a shop.

A visitor with a valid unique listing id will always be able to look at the product. If there is a shop id in the URL that URL might become invalid and the visitor loses access to the product and had to search for it again, adding friction. With the global unique is the visitor will discover that the product is offered by another shop (maybe a new one from the same tenant?) which is usually not important.

Permissions for the listing could be handled by matching the shops a user has access to with the shops the listing belongs to.


That’s what 302 responses are for.

REST does not mean ‘parameters in the path not in the query string’.


I don't understand.

Suppose that we have

  /shops/1/listing/1
  /shops/2/listing/1
where listing 1 is an id local to those shops, they are two different listings. What happens to those URLs when those shops remove those listings? Both of them should return 404.

Then the listing appears in shops 3 and 4.

  /shops/3/listing/1
  /shops/4/listing/1
If we have four different records in the listings table of the database there is usually no way to relate those listings, unless we inspect all the records after creates and updates looking for exact matches. So we can't redirect.

Let's say that those listings 1 are globally unique ids. There is only one record for them in the database. When that listing is removed from shops 1 and 2 and later appears in shops 3 and 4, which shop do we redirect the original URL to? 3 or 4? We have only one choice and if we redirect to shop 4 the owner of shop 3 won't be happy and viceversa. We can add some reference in the JSON response that will look like 302 Location: /shop/4/listing/1 with a body including {"also_sold_by": [3]}' but again, why arbitrarily pick the main URL?

But if we have a URL like

  /listings/1
we can return a reference to shops 3 and 4 in its JSON response. Everybody is happy.


I don’t know if it is a usecase for etcy.. but In the first scenario doesn’t it make having different prices for a listing easier, because you will have a “shop item” table where you tell the system which items are in which shop, and can attach shop specific info.

So you have: |shopid|listingid|itemid|

The item is global and the listing is local to add shop specific info, or even whether that shop carries that item?

Of course this might not be a valid use case or I may misunderstand then meaning of listing.


You're sacrificing usefulness for purity. Never a good bet. I agree with the author.


I’m not arguing for purity I’m arguing that these guidelines are not good guidance for designing ‘REST APIs’.

If you are designing a ‘REST API’ you have already committed to ‘purity’. If you follow this guidance you are not designing a REST API you are designing a JSON over HTTP api with parameters in the query string.


“RESTful” API design is mostly bike-shedding.

There’s no standard. Every REST API looks different. Clients have to refer to documentation anyway, so consistent URL patterns achieve nothing. People waste large amounts of time over totally inconsequential minutiae like whether to use singular or plural words in URLs.

Separating idempotent calls from non-idempotent calls is useful, but REST overcomplicates this. All that’s needed is read and write calls, yet REST has get, post, patch, put, delete…

REST is also inefficient. Clients could read the data they need in one HTTP request, but most “RESTful” APIs force clients to make many requests for the sake of what is essentially aesthetics.


Agree. But I pick REST (or “json over http”) any day of the week instead of graphql, soap, grpc, etc.


Graphql just for the sake of graphql is a disaster for backend engineers.


I want to hug you right now.


You are supposed to do use those 3 through some kind of heavy tool, while it's well understood that you do rest with just an http library.

Rest is simpler, but comes at the cost of a lot of nice things like automatic endpoints generation and type verification. The problem is that the heavy tooling tends to not be there or not work correctly. But this is not a win for that kind of simplicity, that's a reason to improve the protocol design.



JUP - Just Use POST


I’d just like to interject for a moment. What you’re referring to as REST, is in fact, JSON/RPC, or as I’ve recently taken to calling it, REST-less. JSON is not a hypermedia unto itself, but rather a plain data format made useful by out of band information as defined by swagger documentation or similar.

Many computer users work with a canonical version of REST every day, without realizing it. Through a peculiar turn of events, the version of REST which is widely used today is often called “The Web”, and many of its users are not aware that it is basically the REST-ful architecture, defined by Roy Fielding.

There really is a REST, and these people are using it, but it is just a part of The Web they use. REST is the network architecture: hypermedia encodes the state of resources for hypermedia clients. JSON is an essential part of Single Page Applications, but useless by itself; it can only function in the context of a complete API specification. JSON is normally used in combination with SPA libraries: the whole system is basically RPC with JSON added, or JSON/RPC. All these so-called “REST-ful” APIs are really JSON/RPC.

respectfully, https://htmx.org/essays/#hypermedia-and-rest


I'm on your side, but every time you say "a hypermedia" it kills me.


I'm not a native English speaker so, honest question: isn't h a consonant with its own distinct sound, so a instead of an?


> isn't h a consonant with its own distinct sound

Usually, including in “hypermedia”, but “hypermedia” is either an adjective or a mass noun, not a countable noun of which you can have a single instance.

“a hypermedia... ” looks like you are using it as an adjective to modify a countable noun, and when there is no noun looks like you forgot the noun; “JSON is not a hypermedia unto itself...” should probably be something “JSON isn’t hypermedia unto itself... ” (I’d actually prefer “JSON, on its own, isn’t hypermedia”, but that gets behind the issue with the use of the indefinite article.


We take english grammar from the corrupt! The rich! The oppressors of generations who have kept you down with myths of uncountable nouns... and we give it back to you... the people.


It depends on the sound of the h. "A house" vs "an honor"


sorry, an hypermedia


That's not the problem. It's similar to the effect of a garden path sentence: when I encounter the word "a" and then "hypermedia", I'm always expecting something to follow (e.g. "a hypermedia system", "a hypermedia format", etc). Encountering "hypermedia" as a singular noun makes me wince.

(The effect isn't ameliorated by modifiers, e.g. "a natural hypermedia".)


oh, sorry!

an hypermedium


Similarly if I ever heard someone say _an_ hotel...


Found a contradiction that I don't understand. From rule 1:

    # GOOD
    GET /products   # get all the products
    GET /products/{product_id} # get one product
    
    # BAD
    GET /product/{product_id}
But then in Rule 2:

    GET /shop/{shop_id}/listings              # normal, expected
Shouldn't that be "/shops/{shop_id}/listings"? Or is it plural only if you can actually GET the path (i.e. there's no GET for just "/shop") and otherwise it should be singular?


Plural or singular seems far too marginal to be good or bad. I use singular.


Same. My table names are also singular.


I use singular too, but I always wonder to those sticking with plural - what's the convention for words which plural and singular are the same.

ie - like 'staff' or 'species' or 'aircraft'.

Then I can add suffix to those singular ie - 'staffList', 'speciesList' etc


This is the Etsy API, but actually was a typo on my part. They use the plural /shops (as I showed in the other Etsy example). I've corrected the original article, sorry about that!


Rule #1 is terrible advice.

Avoid plural nouns in English API endpoints because English is full of irregular plurals. For example:

goose -> geese child -> children index -> indices vertex -> vertexes analysis -> analyses

This makes English plurals unpredictable especially for for non-native speakers and hurts API consistency and discoverability.

Also consider that for a CRUD interface you may need the singular form anyway (POST api/student/create), and adding the plural means doubling the API route namespace.

It's cleaner and simpler to stick with singular nouns.


You use plurals anyway to fetch collections:

    GET /students
So you can't escape the problem unless you want `GET /child` to fetch multiple children.

Also, you should avoid verbs in URLs (IMHO, of course). You're adding to the students collection, so post to students:

    # BAD
    POST /student/create

    # GOOD
    POST /students


An API is not an essay, in OOP you write Array<Student> and not Array<Students> and yet you understand the type is about an array of students. Getting hung up on grammar in an API is probably the dumbest problem to have.

If you think `GET /student` is confusing, or more importantly, structurally restrictive as an API, you can think about it as `GET /student/filter` where the "filter" may be a specific student id, or a range of ids, or other conditions such as `GET /student/top` or `GET /student/graduated` and then all students will be just the filter "all" or: `GET /student/all`.

As for `POST /student/create`... it doesn't matter. To use one of Fielding's own examples from his blog, how'd you turn a lamp on and off via REST? Would you be like `POST /lamp`? No. It's unclear WTF is happening.


I think just about anyone who uses a name referencing an Array<Student> will give it the name "students".

And it does often end up mattering, for clarity where at its use sites where you won't have the type declaration to help you out, and for having "student" available as a name in the same scope, as it's typical to pull an item out of a collection.

/student/all looks particularly icky to me. For getting a student, it seems unlikely that we'd identify one using these words, but in other domains, they may end up conflicting with another resource. Whatever you end up doing about that, it'll surely be gross.

I think plurals are better so nyah! Heh.

Also, it's totally my job to get hung up on these kinds of details. Clarity, avoiding collisions, enabling easy expansion, and especially averting future breaking changes to deal with the aforementioned matter to others.


API endpoints are not variables. Variable names are local and disposable, they don't even survive the compilation process in most languages.

Meanwhile, APIs, as interfaces, are forever. And the dumbest thing to do is to decide to have two names for one type in an interface, because grammar happens to have single and plural version for words. Why would you do that? APIs have no grammar, they're not sentences, they're made of identifiers that need to uniquely identify something. So stop trying to force grammar in.


If I use turning a lamp on and off, I'd do GET /lamp and in the response there should be an href with a rel of "on" that I can follow (in JSON land).

In HTML, I might have a FORM that does a POST of the lamp's switch to /lamp/switch.

Because the switch is the resource that you're trying to manipulate when turning a lamp on/off, not the lamp itself.


POST /lamp/switch is not any better than POST /lamp as neither communicates what you're doing.


> Fielding's own examples from his blog, how'd you turn a lamp on and off via REST? Would you be like `POST /lamp`? No. It's unclear WTF is happening.

No, of course you'd be like `PATCH {"light": "off"} /lamp`!

Kidding of course but it's true that REST purity does not make for intuitive APIs in complex real-world problem domains.


In HTML, you could have a FORM with the URL of /lamp/switch that you PUT.


PUT is not a valid method from an HTML form. Only GET or POST are permitted.


This kind of reveals a little secret of many of the REST gurus. They have no clue what they're talking about.


So you're putting a switch. Great API design.


> one of Fielding's own examples from his blog

Where?


While I do agree with this in almost all cases, I have found scenarios where there are actions that don't map easily to a HTTP verb and need something more explicit.

What I've generally done in these cases is pretty similar to https://cloud.google.com/apis/design/custom_methods which also explains the problem better than I can.

I'd be interested as to how you'd solve some of these problems without an explicit verb in the path.


I'm not a purist; for unusual edge cases, I'll put a verb (or something appropriate to the context) at the end of the path. But `create` isn't unusual, just POST to a collection.


"POST /students" is a create action, but verbs are fine for individual entities, for example "POST /students/ID/enroll".


I use Nouns exclusively in APIs, not verbs. URLs define resources. Resources are (99% of the time) Nouns.

So POST /students/{id}/enrollment

Or POST /students is the act of enrolling a student, so the returned Location might be /students/{id}/enrollment to reflect the current state of that resource.

The other details of the student might be at URLs like /students/{id}/details, /students/{id}/results, /students/{id}/courses etc etc

If I end up having part of a "sub-resource" in the "main" resource, then I try to always have an href, otherwise you have to put all of the information.

So GET /students/{id} might return a JSON object with an embedded "enrollment" object, but that embedded object would have an href to the full enrollment resource.


how do you differentiate between plural vs singular of:

`GET /staff`

?


I don't? It's fine. I'm also fine just adding an 's' to many words that have unusual plurals; English is flexible, and "persons" is a perfectly acceptable substitute for "people".

That said, I don't love your example. Staff does have a plural, staffs - as in, the separate staffs of multiple organizations.


what's your opinion of using suffix like '_list' to differentiate it ?

ie:

GET /species

and

GET /species_list

?


Seems weird.

Of all the rules, #1 one is by far the most arbitrary and least important. But it's also a thoroughly established convention. If you want to present "this is a normal, boring API with few surprises" to your clients, I wouldn't recommend odd collection suffixes.

But it's not going to fundamentally change the usability of your API, unlike many of the other rules.


The case of having a singular at the end of a GET is so rare that it should be easy to disambiguate. I have a project where one of the main objects is a "series", it's clear what "GET /series" and "GET /series/ID" means to anyone that has seen a REST API.


Does “GET /students” return all the students in the system? Probably not.

So in fact you’re fetching some subset of students anyway, and the size of the returned set might be one or zero depending on your query.

Given that, “GET /student” seems just as meaningful because neither the singular nor the plural can fix the ambiguity about what you’re actually getting.


Yes, I would expect GET /students to return all of the students (or at least, all of the students visible to me). Typically with query parameters for filtering:

    GET /students?min_age=20
Alternatively, `students` might be a collection attribute on another resource:

    GET /classes/{class_id}/students


I expect singular to return one, and plural to return a set, which could range from 0...n.

Likewise, I wouldn't expect a singular to return a single object wrapped in an array, but I would always expect "/plural" (with no further qualifier in the url) to return an array, regardless of 0, 1 or more results.

Why would returning a full set be a condition of whether or not plural is ambiguous?


> for a CRUD interface you may need the singular form anyway (POST api/student/create)

Why? What's wrong with api/students/create?

> Avoid plural nouns in English API endpoints because English is full of irregular plurals.

I don't buy this. I mean, yes, it's true, but how often do people really need to write these endpoints after initially writing the client code?


Verbs in a URL are an API "smell" for me.

URLs refer to a resource that you can manipulate. What resource is /students/create referring to?


You really should not include the action in the URL ie rather than

GET api/student

POST api/student/create

DELETE api/student

it should be

POST api/students

GET api/students

DELETE api/students


> index -> indices vertex -> vertexes

I guess it’s in support of your point, but if you’re going to pluralise index as “indices”, why wouldn’t you use “vertices” for vertex?


It gets even more unpredictable for everyone involved when it's a non-native speaker who writes the API schema.


I’d add:

* If you’re going to forbid people changing a parameter with a PUT or PATCH request, then the schema for these shouldn’t list them as parameters. This seems to creep in to APIs constantly as people are lazy and will use the same serializer method as for POST with an additional check somewhere in the code that changes the response. Just don’t do it!

* Don’t change the response format based on query parameters. It makes it hard for typed languages to use the API because the client has to handle all of the weird response types you’ve got. Inevitably you end up with more and more getting added and it any client becomes crazily complicated. 99% of the time it’s not worth the bandwidth saving - and if there’s lots of useless information that clients don’t want, it’s worth thinking about whether the API design is right in the first place.

* Stick to one mechanism for doing things. Pagination and sorting behaviour should be the same for all endpoints. The end user doesn’t care that you’re a hip microservices company where teams don’t talk to each other - if the APIs behave weirdly and inconsistently between themselves, it will be hard to use.


I very much agree with your first and third point, from experience. As for the second one — if consuming dynamic data structures is hard in typed languages, maybe they are not the right tool for that particular job?

What I have seen is endpoints trying to corral their responses into one-size-fits-all schemas in the situation you're describing, with predictable outcomes. Lots of overhead in most situations, tricky documentation, lots of optionals.

Under that premise, I have to say that at least for generic APIs with many differing clients, the idiosyncrasies of typed-language clients would not rank too highly on my list of design considerations — not when they are in the way of simpler, easier to understand responses.


As for the second point, that's what Accept header is for. And I personally never had much trouble in Go with deserializing all those "weird response types" but it may depend on one's coding style.

> I have to say that at least for generic APIs with many differing clients, the idiosyncrasies of typed-language clients would not rank too highly on my list of design considerations

Hey, would you like to consume an exchange format that has meaningful distinction between strings and atoms? Those come from the dynamically-typed languages area!


So, for 2, what I mean is stuff like:

    GET /api/object/<id>?withAdditionalMetadata=1&expandChildren=1&.....
So then the OpenAPI schema has to be something like:

    schema:
      oneOf:
        - $ref: '#/components/schemas/Object'
        - $ref: '#/components/schemas/ObjectWithMetadata'
        - $ref: '#/components/schemas/ObjectWithChildren'
        - $ref: '#/components/schemas/ObjectWithChildrenAndMetadata
So inevitably the client ends up being quite complex to handle this.


Some good points - particularly about not returning arrays (I've made that mistake!)

But I feel 410 instead of 404 is pretty controversial:

> There are many layers of software that can return 404 to a request

Anything in your stack can return any HTTP error code - I don't see why 404 is special.

> When calling (say) GET /things/{thing_id} for a thing that doesn't exist, the response should indicate that 1) the server understood your request, and 2) the thing wasn't found. Unfortunately, a 404 response does not guarantee #1.

The server is free to return other codes for other classes of problems. The server could return 400 for a bad request, and leave 404 for "thing wasn't found", indicating it understood the request but it wasn't found.

Also surprised not to see RFC 7807 / RFC 9457 (Problem Details) not mentioned in the "structured error format" section.


> Anything in your stack can return any HTTP error code - I don't see why 404 is special.

I'm surprised you don't - in my experience 404's are by far the most common response to get when you haven't wired things up correctly. Sure anything in the stack _can_ return any code and response they want, but you're still much more unlikely to come across a 410 rather than 404. If that unlikeliness saves you support calls down the line then that's pretty good.


With REST 404s due to missing resources (non-existing ID), you should generally get corresponding error information in the body (as also described in TFA), and clients should log/display that information. That should make enough of a difference. There’s a lot of “should” here, of course, but instead of teaching developers to not use 404, it would be better to teach them to create and handle error responses appropriately.


I really feel like you're ignoring the whole point, which is that this is about helping you out when things go wrong. Yes, it'd be great if we could teach people to never write bugs and that they should always check the response body, but that's just not what happens in reality. Being able to immediately differentiate the problem based on the response code is very useful when you get a bug submitted from someone saying they get a 404 and it doesn't work.


There are two reasons behind 404 being a bad response code to use for empty results.

- Did I get a 404 on this endpoint because the endpoint doesn't exist? Or did I get that because the object I was looking for doesn't exist? Great, I need to dig into the response body to find out, indicate that I can either get a 200 or a 404 with this endpoint, and deal with the odd case where the API returns HTML regardless of the MIME type in the Accept header if the endpoint itself is not there because "fuck you, couldn't be bothered".

- Some HTTP libraries will consider anything that's not a 1/2/3xx an error. That can be annoying to deal with.


If anything that's not 1/2/3xx is a problem then 410 won't be a solution. And I doubt that your http library having issues to handle 4xx can handle 1xx correctly.


That rule is a hot take.

> You could use 404 but return a custom error body and demand that clients check for a correct error body. This is asking for trouble from lazy client programmers. It might or might not be "your fault" when clients see eventually inconsistent data, but the support calls they send you will be real.

Make sure that your 404 responses were always documented, then tell them to RTFM.


The problem is that the support call comes in as "your DELETE call isn't actually deleting". Sure it's not your fault, but it imposes a cost on you to investigate. And of course the first time you go directly to RTFM without checking will be the time it actually is your bug.

404 is special because it's so incredibly common. Why take the risk? There are other perfectly good error codes that - in practice - don't have this issue.


I prefer to consider 404 as protocol error and missing thing as business error. That way 404 signals wrong endpoint and 200 + error message + empty result set signals wrong id.

Or even more simple: Anything other than 200 means check infrastructure docs and if you don't like the 200 check the business requirements.


As a client I generally dislike APIs that use 200 for error conditions. The problem is that API implementors often change the structure of the response.

    GET /thing/THG123
    # on success:
    {"id":"THG123", "name":"thingie"}
    # on failure:
    {"error":"no such thing"}
    
Working in typed languages, this requires parsing the response, determining success or failure, then reparsing the response into the appropriate type. Annoying.

Of course it's not always like that, some APIs will put both the error and data in a wrapper object and one field or the other will always be null:

    {
        "error": null,
        "result": {"id":"THG123", "name":"thingie"}
    }
This is less annoying but it's still tedious. We could eliminate the wrapper if we only had an out-of-band signal to indicate whether the client should expect a success response or an error response... like maybe an HTTP status code? I mean, it's right there, why not use it?


Well, sure, without wrapper object it's annoying. But what's the difference whether you check response.data.error or response.status ?


> Some good points - particularly about not returning arrays.

I don't get that one; why is an object with an array property more evolution friendly than an array of objects?


Because you can add new properties for response-level global information on an object, but not on an array.


Obviously if it was global, but if not, you'd include it in the objects in the array, no? This is not a question of schema evolution per se.


Yes. The TFA argument is about top-level information, for example when paging through a collection. Personally I think HTTP headers could fit the purpose for such meta information.


I highly recommend anyone to read Google's [AIP](https://google.aip.dev/). There's even a grpc schema linter for it. Put more focus on the resource data design than nitpicking on transport details. I would consider the best lessons to be:

- Optional but supported user defined identifiers, it's so frustrating to work with API that passes you back an identifier.

- String identifier (names) for resources, with some kind of type namespacing, i.e. the prefix in the author's document - Consistent set of fields (create_time, update_time, annotations, ...)

- Avoid dynamic map (this is a JSON self-inflicted wound)


Second this. Reading this while working at google made me better design. Some that stand out to me are.

Resource Oriented Design: https://google.aip.dev/121

Declarative Friendly APIs: https://google.aip.dev/128

Declarative friendly makes writing scripts, pipelines so much better because of idempotency. It also pairs very naturally with resource Oriented design.

Long Running Operations: https://google.aip.dev/151

LROs are applicable to any request that runs longer than a second or a couple of seconds. Having a unified interface can be very powerful for implementing offline task workers and pipelines.

Filtering: https://google.aip.dev/160

This one is probably controversial as it's makes implementing basic filtering quite a bit harder. I haven't quite seen the issues it's supposed to solve play out in practice but it's interesting nonetheless.


I feel like at this point I have heard convincing arguments for and against basically every point in this article, every article like it, and every comment on them.

Hot take: it doesn't matter. If the user cared enough about your decisions to file a bug report or a complaint, you're doing something right. Consider that a success.

Stop trying to shoehorn a creative outlet into your day job. Pick someone else's terrible design and stick to it.

Frankly I am shocked to see an article on REST design on HN in 2023. We sort of figured this one out.


Regarding rule #4 (DON'T return arrays as top level responses), I feel that meta information returned about the collection really fits the responsibility of HTTP headers. This is similar to existing response headers like Content-Length and Content-Range. REST clients already work with “out-of-band” information like HTTP statuses and If-Modified-Since. Do we always have add yet another layer to nest meta information in?


I struggle with this concept with RabbitMQ as well. The AMQP 0-9-1 protocol used by RabbitMQ has a headers table at the protocol level for user-defined key-value pairs that can be associated with the message payload. The same question applies here, what should go in this protocol header table vs on the message.

One concern I have about using these user-defined headers is that in my designs I'll typically remove the payload from the AMQP envelope and propogate just the payload to the business logic. What to do if the headers need to be included in the business logic. It seems risky to use the headers at the protocol level.

Any thoughts?


Assuming that these key-value pairs are not necessarily related to the request semantics (e.g. like `hasMore` in TFA), and also are a unbounded set of potential user-defined key names, then I think it’s appropriate to nest them in the response body. It also sounds like the response is a single message in your case, so the array question doesn’t come up.

You could also place these user-defined headers in a special property of the message, e.g.:

    { // message object
        "_headerTable": { … },
        // actual message properties here
    }
It is a common convention to use underscore-prefixed JSON properties for such meta data.


While you may be right about headers, I think the point the article makes about easily adding more fields to the result without breaking backwards compatibility is pretty compelling.


It’s a good rule for non-collection resources and for the elements of collection resources, but I was specifically thinking about collections, where all top-level information (apart from the elements of the collection themselves) is necessarily meta information about the returned data, and not part of the returned data proper.


> REST APIs

"You keep using that word. I do not think it means what you think it means".

https://ics.uci.edu/~fielding/pubs/dissertation/top.htm


You might point out what "true REST" means based on the Fielding's paper, but at this point a "REST API" has a (somewhat) clear meaning derived from 2 decades of widespread usage to mean exactly what the author used it for.

I love etymology in general, but don't fall into the trap of thinking that the origin of a word or an expression is the only correct meaning.


The "v3/application/shops/{shop_id}/listings/{listing_id}/properties" structure is typically important for database partitioning when using e.g. DynamoDB. You have to include both the partition key and the sort key in the URL to access the database item. Otherwise the database doesn't scale out as intended.

I also disagree with the advice to always use Arrays instead of Map objects. It is very difficult to partially update Arrays in an idempotent way. When you use a Map object, you can update individual items in the object by their keys. That is why I think you should avoid Arrays in API data structures as much as possible.


Why would you update your input. It's just input, you can locally have maps. Say for example when you read SQL results, it's a list of results, not a map of results. Then you make a map of whatever key you need locally. Or even several of them.

    let myCustomIndex = listOfResults.reduce((i, r) => {
        i[r.pk] = r;
        return i;
    }, {});
In general we call serialization "serialization" because we put things in serial order and send that sequence over the wire. You can't send maps over the wire. A map is a structure in memory optimized for direct access and modification. While JSON maps (objects) are just sequences of keys and vals, encoded as text, not an actual hashmap or a b-tree, as we surely understand.

So you may as well save the redundancy and send a list of results, then format and index it however you please locally.

The fact JSON has maps (objects) at all is a lot less useful than people realize. It's mostly useful for the purpose of denoting "that's a key" and "that's a value". But actually all the work is done after the JSON is being read. You can just as easily send a "map" like this:

    ["key1", "val1", "key2", "val2", ...]


My point about arrays vs maps is based on the idea that you store the data in e.g. DynamoDB as the same structure as it is represented in the API. If you store an array in an attribute of the database item, you cannot update its contents in an idempotent way. Inserting or appending to the array multiple times (using the relevant DynamoDB functions) causes it to grow more and more each time. Whereas when you use a map object, updating a specific key in the object is an idempotent operation and has no effect when repeated.

I realise this is pretty DynamoDB specific. But there are also other points to be made against arrays, such as being forced to maintain their order in any kind of database, which can be quite bad for performance. When using a key value map object, there is no guarantee of the order of the items and users of the API will reflect this in how they use the API, relying on the object keys instead of array indices or array order.


OK but transfer format and working representation shouldn't be the same. That is a bad goal to have. A working representation is sparse, indexed, you can jump to places and change parts. While input (and output) are streams of dense data.

Adding redundancy to input/output so you don't have to make local decisions for your working representation may seem like a simplification, but you're basically chaining yourself from doing what you need to do in order to do work effectively, and burdening input/output with concerns that don't matter on the wire.

We keep seeing this idea come back again and again where "you don't need services" or "you don't need controllers" or "you don't need mapping", so just, you know, grab the database and hose it out over HTTP and into clients as-is. But this always is one of those "immediate gratification" choices that ends up biting you in the ass not long after. It looks great in slides and demos though.


To me it's a good goal. I want to keep things simple and minimize my work. It works very well with a document database like DynamoDB. Even if often there is a small translation going on, removing a few attributes or adding a few to achieve backward compatibility, but mostly passing them 1:1.


We have the same goal. I'm just saying that what's simple for local representation complexifies the transfer format, and what's simple for transfer bogs down the local representation.

Essentially we're discussing the cohesiveness vs decoupling of a transfer format and local representation. But purely on the objective side I have one strong argument: the transfer format of HTTP-based APIs is designed to be client neutral. It won't be just JS in the browser. It may be Swift on the Phone or .NET on a laptop.

And so coupling tightly transfer with a specific client can be a good choice only in the narrow scenario where you control both API, client, and there's no another client. You can always extract more of such a scenario. For example you may not use JSON at all, you can use a custom binary format that directly dumps whatever you want in your app.

But in the general scenario where the API is an API, and the client is a client, what I said stands.


Regarding your point about the DB partitioning, isn't it the backend's job to figure out the shop ID based on the listing ID? I don't really see that as a valid argument


Rule #11, and the general text there on idempotence was very interesting and I definitely learnt nicer ways of handling this, thank you.


It all turns into RPC anyway, I just can’t care that much about REST anymore


I'm curious about your thoughts on omitting properties vs. setting them to `null` for a public facing API. I'm especially curious about your feelings on mixing and matching these strategies. Should you always stick to one strategy or is it ok to sometimes omit and sometimes use null to express the same idea that "the data is missing"?

An example of a null property:

    {
        "firstName": "John",
        "middleInitial": null,
        "lastName": "Doe"
    }
Compared to the omission of the property:

    {
        "firstName": "John",
        "lastName": "Doe"
    }
Essentially the idea discussed in this Stack Exchange post (ignoring the use of empty strings as an option): https://softwareengineering.stackexchange.com/questions/3437...


Experienced point 6 with the GitHub API. “Repository” is a core Git(Hub) primitive, and across the entire API surface (they have an OpenAPI spec, 40 MB in size total), last I counted it was 48 different versions of repository. For example, what’s considered a repo owned by a user might be different from that of an organisation. There is no sane recourse but automatic code generation, which is a ton of effort in itself (tooling isn’t great).


Since you invited comments:

Rules #1, #2, #3: I don't feel these are rules as much as aesthetic opinions. The only thing that matters is for URLs to be unique, and no client should rely on parsing slashes on URLs to derive how data is structured.

Rules #4, #5: Very specific to how data is serialised, I wouldn't take it as a rule, although the future-proofing argument is good.

Rules #6, #7: I feel these are sensible rules in general, not even specific to REST.

Rule #8: 404 or 200 is context specific, but what I would say is that if you can represent the absence in the resource do it, otherwise use 404.

Eg: if the student 99 doesn't exist, GET /students/99 returns 404; but if you want to represent there are no students, GET /students/ can return 200 OK with a [] body – since that _is_ the representation of such information. Many APIs fail here, returning 404 on a resource like GET /students/ that is expected to always exist.

And, definitely don't use 401 GONE in a way that's not in RFC.

Rule #9: I believe it's a good idea to minimize the variation in resource representation in general, this benefits both the client and the server by allowing the reuse of cached data.

Rule #11: Agree. I would go further and even ignore mechanisms like 24 hours temporary idempotency keys - just straight allow clients to PUT a resource with whatever ID, following advice from Rule #6, and be done with it.

All in all, this shows "REST" really means different things to different people at this point, we probably need better definitions for the good practices at the different levels (data structures, HTTP compliance, serialisation).


Doing REST microservices is incredibly slow because of the amount of work it is to agree on what a "clean" and "consistent" api looks like for each service. It's just such an endless well of trying to establish best practices without refactoring constantly.

It's worth it for your public API, but it's such a huge time sink for internal APIs.


I would be interested in what you are using internally that is allowing you to move quicker.


We need to forget this word REST, because all advice about it is superlame, regardless of the original paper.


I don't agree with all the choices, but it's a good post in that it presents a number of things to consider when making an API. It can be used by a team to make/check a tech doc and choose differently as the team sees fit.


I work on enterprise observability, professionally. I think conflating service names (such as /product) and service parameters (such as /product/{product_id}) in the same syntax is a big mistake.

The reason is you want tools to be able to tell a difference and discern how to categorize by service (for reporting). Also, there are privacy implications for monitoring tools, since query parameters might contain sensitive data.

There already are query parameters in the URL, that is better. I wish people never went with the idea of putting parameters in the path.


Regarding #2, I'm guessing the listings are sharded by shop_id, so without it you'll need to query all database shards.

Is there a better place to include the sharding key in a REST request?


This is a great list! One edge case, though, for #6:

If you need to use id's to be ordered, alpha-numeric ordering can be a problem. "1", "2", ... "11", and "11" comes before "2". A simple problem to fix, either by prefixing zeros or type cast to a number. It was a lingering bug at my workplace, oddly not to fix, but to communicate across teams.


are we beating a dead horse here?

Havn't we talked about building REST APIs enough yet?


There’s a new generation that didn’t bikeshed about singular or plural nouns in the API endpoints. Let them be?


Hello there. Have you heard of our lord and savior, GraphQL?


unequivocally


APIs with inconsistent object schemas and lacking error messages are the absolute worst. I've been spending some time working with Keycloak lately. Adding properties that look simple to add in the GUI are an absolute nightmare via API, made even worse with barely-existent documentation.


Regarding #9 BE consistent: this is the consequence of allowing chaos to take over. There are far fewer individuals trying to reduce complexity compared to those who are adding to it. And, the effort required to resolve these inconsistencies is 10x than that needed to create them.


I found json api spec[1] recently. This kind of is better standard for REST APIs. It is bit rough to handle client side but once you get the hang of it, it is breeze to use

[1] https://jsonapi.org/


Liked the idea of using wiki as blog


he actually writes about that. I happen to like it too.

https://github.com/stickfigure/blog/wiki/GitHub%27s-wiki-mak...


A missing rule is "DON'T use strings for timestamps". Which implies "Rule #6: DO use strings for all identifiers" is not good advice.


I disagree that that should be a rule. String timestamps are fine as long as you pick a common standard for the format. Preferable to Unix epoch-seconds or -millis or whatever, at least, if that's what you're suggesting instead. If you're serving over HTTP you clearly don't need the minuscule efficiency gain, and those are a pain (for a human) to read & write.


If a human is reading and certainly writing your JSON then something is wrong. Choosing a number isn't about efficiency its about correctness. A lot can go wrong when you choose a string for a timestamp.

The article references Stripe in a few places for examples of good designs. Guess what, they use numbers for timestamps. And not for performance reasons.


> If a human is reading and certainly writing your JSON then something is wrong.

It just means someone’s working with it. Developing an integration, reading logs or a dump or a raw backup, troubleshooting something, et c. This happens plenty with any system that actually gets used, and not (necessarily) because something’s gone wrong. And timestamps have a way of making it into things like query strings, may not be someone writing your json by hand. Numeric timestamps are better for naïve automatic sorting/ordering, string is better for reading and writing.


Rule #6: DO use strings for all identifiers

What about booleans, like { "success": false },

you opt to convert this to

{ "success": "false" }?


Rule #0: don't design REST APIs, use IDL/RPC with http transport.


How do I make a JSON/HTTP API that I won't be unhappy with?


Regarding #8 - just do not use http status codes for application errors. They are for routers, caches and proxies. Your application should pretty much only return 200 even on errors.

Edit: Bring on the downvotes. I will die on this hill.


> Regarding #8 - just do not use http status codes for application errors. They are for routers, caches and proxies. Your application should pretty much only return 200 even on errors.

What is an application error? If a user tries to query a ressource they are not authorized access to, then returning a 401 is appropriate, if the resource doesn't exist then 404 is also appropriate, in theory (maybe not in practice for security reasons but whatever). Nothing wrong with that.

HTTP codes are not made only for routers, caches and proxies, HTTP was made for user agents such as browsers, HTTP is one of the foundations of REST.

and I didn't downvote you, I'm just asking a question.


To be fair I was being a tiny bit inflammatory. I think it's appropriate to use HTTP codes sparingly. Personally I'll use 401 and 500 but everything else is 200. And if the API is meant for public consumption by 3rd party devs then meeting expectations is more important than my personal philosophy.

But basically the overlap between the classic HTTP status codes and your API's functionality is IMO coincidence. Unless you're building a BLOB store or HTTP middleware you probably do not have enough overlap for it to be truly appropriate for your domain. HTTP is the envelope. It doesn't need to mix with your custom JSON API.


I don't know why you're being downvoted. I think you're completely wrong, but it's not like you're being rude about it.

Status codes are not necessarily useful for the developer directly, as they're a second channel for the same information, but they are useful for middleware of all kinds (your argument applies equally to browsers showing errors to users, and the same counterarguments apply to programmers). For example, the HTTP library I'm using has an error_for_status function which conveniently raises a runtime error, without me having to dig into the response to do that manually. Also, if you invent more kinds of errors later, or even if you just haven't published a master table of errors where I can see it, the status code will still let me extract useful semantic information out of an error kind my code has not been written explicitly to handle.


You need to add more actual reasons or examples when a http status code while good for application error would generate failures for routing and caching and proxies.

Here is a reverse one: say you have an URL that has a slug: maybe it contains the week of the year like /weeks/38

and you delete that from the app and now you say the /weeks/38 will return 200 => so all defaul caches will cache that response. now you go back and decide ahh I actually want that week so you recreate it => well if you dont configure the in front of the api caching it will return you the previous response 200+whatever error code you had.

while in case of 404 the majority of cache services I put in front of any API will by default allow bypass of cache in case of 404 and cache response by default in case of 200.


I see SOAP has entered the chat.


*graphql


The author makes a good point about not throwing 404s.. a logical extension of this is that it should apply to all the other error codes. Does REST encourage the use of http error codes?


So many disagree, so why not graphql


the advice... sounds good.

the stickfigure style... Whaddaya call that, tech beatnik? :P Just messin with ya.


Telling someone you should not return 404 for not found is a great way to not get hired at most tech companies.


Half of the points are complete bs. URLs don’t matter in REST. What matters are link relations.


I would go even further and say that REST is BS Most people understand REST as a format for URLs and a particular semantics for HTTP methods. While it can be perfectly ok to design an api like person?id=123. This is the way that Postgrest does it, which works, and makes certain things like entities with composite primary keys way easier.

Other, like OP, think REST isn't going far enough in that HATEOAS is what "really" makes something restfull. This doesn't really work in practice as if changing the schema of some JSON response can magically change the behaviour of an application. I you want to lift all the logic to the server, I guess you can use htmx, but then you are not building an api anymore but a remote rendering engine.


You are still building an API, it's just a hypermedia API:

https://htmx.org/essays/hypermedia-apis-vs-data-apis/

REST was coined to describe the web. It has been misapplied to JSON APIs over HTTP, and the way we got here is a funny story:

https://htmx.org/essays/how-did-rest-come-to-mean-the-opposi...

other related essays:

https://htmx.org/essays/hypermedia-clients/

https://intercoolerjs.org/2016/05/08/hatoeas-is-for-humans.h...

https://intercoolerjs.org/2016/01/18/rescuing-rest.html


Care to elaborate briefly on why they do?


"REST" is supposed to be an architectural style reflecting the Web as used by humans. Humans don't normally manually construct URLs. They navigate based on links from an entry point. (See also: https://hypermedia.systems/hypermedia-components/#_self_desc... )


The vast majority of people using REST do not follow its original definition, so the original definition doesn’t matter anymore.

It’s like human languages: REST is whatever we make of it, regardless of what academics say.


Sure. But then what words do you use to describe actual REST?

This keeps happening and all it does it cause confusion.

It's one thing to make up new words for new concepts so you can more easily refer to them in a conversation.

It's another thing entirely to give a word a vague but similar meaning without any clear way to differentiate between the original specific and new extremely nebulous concept.


HATEOAS adds that to REST.


H in HATEOAS stands for hypermedia but REST APIs must be hypertext driven.


There are a bunch of bad rules here...

Rule 1 - "DO use plural nouns for collections" - is an entirely arbitrary opinion.

Rule 2 - "DON'T add unnecessary path segments" - I agree with the rule, but the examples are bad because, e.g., "/listings/{listing_id} " and "/shop/{shop_id}/listings/{listing_id}" mean two different things (or at least they should). Now, "/shop/{shop_id}/listings/{listing_id}" is a complex path, so if your API doesn't need it, then I agree, don't include it. But if it does, then it would be bad to not include it.

Rule 3 - "DON'T add .json or other extensions to the url" I mostly agree with the rule, but on the grounds of keeping things simple. Here, keeping to the standard (which means using Accept). But things like supporting a ".json" suffix are nice for cases where you want to give people (not programs) access to the different representations (should be in addition to Accept).

This justification for rule 3: "URLs are resource identifiers..." is simply not true, at least for any reasonably useful definition of identifier. A URL points at a resource, that's it.

Rule 4 is good. ("Rule #4: DON'T return arrays as top level responses") You want to keep the door open to adding metadata in the response body that will be very easy for clients to accept in a backwards-compatible manner.

Rule 5 "DON'T return map structures" doesn't really make sense. Now, you shouldn't do it just to provide a lookup index -- the id should really be the inherent id of the data -- but it's a logically valid way to structure data and your API should strive to match the logical structure of the data. Also, the arguments here are not great.. e.g., "Converting an array of objects to a map is a one-liner in most languages"... that's true, but so is the converse. The openapi example doesn't make sense either. openapi v4 could have simply added a "name" property to the object in the v3 structure, right next to the "post" property -- just like the hypothetical list-based API. I would assume openapi has other reasons for the restructure, because the map-based API doesn't force it.

Well, I'll stop there. It's not all bad, but just don't take these rules to the bank.

Wait one more: Rule 8 "DON'T use 404 to indicate not found"

Come on now, why even write something like that?

The rule is more like, don't use 404 poorly. DELETE should be idempotent (that's a good rule), which means an attempt to DELETE something that could exist but doesn't happen to exist right now, isn't an error, and should return a 2xx code. 404 response for an attempt to delete on a route that doesn't exists makes sense (well, I guess unless your API is so dynamic that routes can be created and deleted on the fly, in which case even there you'd return a success code).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: