Hacker News new | past | comments | ask | show | jobs | submit login
PostgREST (postgrest.org)
771 points by deepersprout on Nov 3, 2019 | hide | past | favorite | 237 comments



Related graphql implementations with similar concepts:

- https://www.graphile.org/postgraphile/

- https://hasura.io/

Love the idea of having APIs flow out of a single set of schema definitions. The Rails style of speccing a model, migrations, and controller/serializer or graphql types feels overly verbose and repetitive.

To me the biggest thing these groups could do to speed adoption is flesh out the feature development / test story. For instance, the postgraphile examples have development scripts that constantly clear out the DB and no tests. Compared to Rails, it's hard to imagine how you'd iterate on a product.

Are there other reasons this hasn't seen more widespread adoption? Is there some inherent architectural flaw or just not enough incremental benefit?


We've been using it for about a year in production across four projects and it's been a major boon for productivity because it allows our frontend developers to do simple backend features on their own. I think there are a lot of use cases where it would be a great fit for many projects and there are I'm sure a lot of companies that could save a lot of time by using it as part of their toolkit.

Also it doesn't have to be all or nothing. For example, there might be an API endpoint needed where there are multiple mutations needed inside a transaction but we just route between that custom backend and postgrest using nginx.

Other things we use:

- dbmate for migrations: https://github.com/amacneil/dbmate

- postgrest-client, a library I developed for constructing type-safe queries to Postgrest in Elm: https://github.com/alex-tan/postgrest-client


You are addressing one of the issues on my mind: Migration.

In many situations, some kind of formal migration management is obligatory and my guess is that with PostgREST you will often want to test your migrations on a current database snapshot before deploying it in production...


I think the approach is fine for prototyping, but the architectural flaw that can make it a bad choice for production systems is that it ties your APIs to your data model. When you need to evolve your API or extend your data model, you get to choose between dealing with potentially massive data migrations and downtime, or having a third layer of glue in the form of database views. You probably would need to adopt a rule of no direct-to-table access, and a separate schema per API version containing views, and dealing with the pain of rewriting the views for all the supported API versions when you need to change your data model. Or scale the database horizontally, with views on foreign data wrappers that pull from the real data sources. Which I think this sort of solution is marketed as a way for a Database Administrator (singular) to expose an API, rather than a scalable approach to application development.


> You probably would need to adopt a rule of no direct-to-table access

That's like one of the oldest RDBMS best practices: no direct to table access, with every application (or class of business user with direct access to the DB) having access through a tailored selection of views so that apps/users are largely isolated from DB changes.

> and a separate schema per API version containing views

You'd probably only need a separate schema for semver-major API versions, minor versions would necessarily be supersets which could be accommodate by backwards-compatible schema extension.

> and dealing with the pain of rewriting the views for all the supported API versions when you need to change your data model.

With a well-designed normalized relational model, most changes to the base model are adding attributes or tables, which will have zero impact on view definitions to maintain an existing API. The next most common is factoring an existing attribute out to a different table because what was conceived of as a 1:1 relationship becomes 1:N, which requires adding a join to view defs where that attribute is involved.


It’s funny how “best practices” get mirrored basically everywhere up and down the stack. Everything at some point will have some way of decoupling data-access and data-structure, you just choose the level you want to do it at (toolkit, ORM, network api, database, schema/view, tablespace, filesystem...) and the number and granularity of layers, picking what you are most comfortable with.

It’s like, no matter how low-level you go, at some point someone had a problem with components being tightly coupled and introduced an indirection layer “for next time”.


How do you deal with writing operations in this approach?


> How do you deal with writing operations in this approach?

The same as read operations, with views, which can be, in Postgres, automatically updatable if they are a simple thin layer over base tables, and otherwise can be made updatable by means of appropriate triggers defining the meaning of INSERT/UPDATE/DELETE on them just as the view definition defines the meaning of SELECT.


> You probably would need to adopt a rule of no direct-to-table access, and a separate schema per API version containing views, and dealing with the pain of rewriting the views for all the supported API versions when you need to change your data model.

My understanding is there is no direct-to-table access with PostgREST; you're only ever interacting with views. Keep your view api versions in separate schemas. If you change your tables, yes, you'll need to update your views, but I don't see how that's much different than having to modify ORM code to accommodate schema changes.


The examples in these docs start with querying tables, and only afterwards go "oh, it also works with views, if you need more complex filtering".


Right, and from my experience the automatic CRUD endpoints don't deal well with views unless you annotate the view with postgrest-specific comments, which seems rather icky to me.


you are confusing postgrest with postgraphile :)


Good point. Whoops.


You can’t just throw complexity at ppl from the first lines of documentation especially when talkin about new ideas


I'd love to point out that in a strictly RESTful design, your client is supposed to be discovering the data model from queries, not embedding it in code. Ergo data and schema migrations should be no trouble at all.

Not sure I understand how any other approach to provisioning an data API is going to be able to avoid requiring a data model.


In practice it is extremely rare to find a pure REST model. It is hard to do properly.


I actually think the one schema to rule them all approach is a bad one. Different parts of the stack have different use cases and require different data representations. I think it's a fundamental flaw many ORMs fall into.

As for postgrest and graphQL, queries and data transformations are proof that your DB schema shouldn't be forced the the stack and onto the clients. I would argue that graphQL strived for the exact opposite of single schema and tends to 'clients should choose their own API' approach.


You haven't really looked into postgrest, it does not force the database schema on to the client. You expose a schema with views that are tailored for the frontend and can be totally different from the underlying tables that hold the actual data


I admit I haven't but I totally agree with you. I'm attempting to say that postgrest and graphQL are _not_ examples of apis from 'a single schema definition' in that they are not the previous incarnation of that desire where an ORM would tightly couple your imperative types to your normalized schema. These new technologies have a wholistic/referential public API but they allow client, server and DB to all have different schemas and I just want to highlight the difference.


Yes, but the constrains sql has wrt. how you represent certain datatypes can lead to problems. Sure in postgres you can fix that by a variety of compound types, and fully new types implemented as pg extension, which can scare people of.

Still there are a tone of applications for which just the basic types are good enough and the server is just a thin wrapper.

In the end the main requirement for efficiency using it is you skill with not basic SQL/plpgSQL (stored procedure, trigger functions, views and knowledge about the effects of the used isolation level). All well documented in postgres.


> In the end the main requirement for efficiency using it is you skill with not basic SQL/plpgSQL

Another way I've heard this phrased is "It's much easier to hire JS devs than PL/pgSQL devs."


Yes, this is the true reason why these elegant solutions don't see so much adoption.

Worse is better. Cheap JS devs to solve everything with heaps of imperative code is cheaper than elegant solutions.

Even long term this is true. Tech debt with code that people can understand and modify is cheaper than tech debt with an elegant DSL that require learning and mental ability beyond if then else.


There's a postgres extension called plv8 that essentially allows one to write database functions in JavaScript, instead of plpgsql.


Oh, absolutely. Same with Python, Tcl, Perl... but that's not the part I'm talking about. The business logic being encoded as pgpl/* is just fine. But having to write complex triggers that wrap pgpl functions and making sure the state of the functions is as expected and all of the other wiring stuff is just not something most people are used to.

I've recently seen an uptick in "Postgres-native" stuff being built, though. So maybe there's hope for the future.


what do datatypes have to do with this. For both PostgREST and any other api layer the situation is the same. The data sits in the db/tabes (so both parties are restricted to the types available in pg) and on the other end you need to output json. The fact that you can have some "smart" internal datatypes in other api layers has 0 advanteges, both systems have to basically take data from the db and display it as json, the intermediate types are irelevant


you can write your stored procedures in JS of Python if you don't like plpgsql that much :)


> Are there other reasons this hasn't seen more widespread adoption? Is there some inherent architectural flaw or just not enough incremental benefit?

I'm a fan of the concept and would love to have a middle layer as thin as postgrest, but authentication/authorization has been my roadblock. I still use node as an app server because of passport. I had not heard of the postgrest starter kit before today and will check out https://docs.subzero.cloud/athentication-authorization-flow/

If they correct the typo before I get to reading it, it will be https://docs.subzero.cloud/authentication-authorization-flow...


you could use Auth0 [1] to authenticate your clients using all the identity providers (FB/GOOGLE...etc) and then just have the api clients use that JWT token when talking to postgrest

If however you do only want to authenticate the users yourself, the login function can be very simple [2]

1. http://postgrest.org/en/v6.0/auth.html#jwt-from-auth0

2. https://github.com/subzerocloud/postgrest-starter-kit/blob/m...


I am using postgraphile in library mode with keycloak and passport.js (an open source project https://www.keycloak.org/) instead of auth0 and it works great.


I would expect authorization and validation to be a real pain with PostgREST (I don't know enough about it to tell definitively, though).

Even row-level access control, while not widespread any way, is often not enough. Complex validation in SQL can be done, of course, but SQL isn't exactly meant for these things, IMHO.


This is an interesting read on validation in postgres: https://begriffs.com/posts/2017-10-21-sql-domain-integrity.h...


Stored procedures often offer a better and more efficient way to implement data validation than non database native alternatives.


It may indeed be efficient.

Maybe I just don't like the idea of putting all my application logic into SQL statements.

What I like about Django's ORM, for example, that you can declare a field to be an IP address, and this not only does the validation (which SQL can do easily, I am sure), but this declaration also carries over to HTML forms, admin interface, REST API serialization and HTML Templates.


why would you "expect" that if you don't know about it?


I know more than basic SQL. Maybe I am indeed not an accomplished enough master in SQL, but I can't see why I would ever prefer SQL over Python in terms of defining application logic.


This is a solved problem https://github.com/subzerocloud/postgrest-starter-kit

It’just the tools you linked didnt prioritize the development workflow from the point of view od a backend developer (code in files, git, tests, migrations) but from the frontend developer perspective (ui to create tables)


The SubZero starter kit is great, and made this whole architecture seem practical to me. Thanks for your hard work here! (I'm currently using PostGraphile, but the starter kit is still an important tool for me.)


Have you tried the PostGraphile Starter? It’s not officially released yet but you can find out more in the Discord chat #announcements channel. https://discord.gg/graphile



do you happen to know if there is a video presentation about this project? after skimming the docs and checking the comments here, i am still unsure what this project has to offer that you don't already have or that you can easily piece together. thanks


How do you easely piece together a (poerfull) rest api just by having the definitions of the tables in the database?

Very very old videos but you can get the core idea https://begriffs.com/posts/2014-12-30-intro-to-postgrest.htm...

https://begriffs.com/posts/2016-03-20-postgrest-tour.html


thanks


I've been using Postgraphile along with Apollo Client with code-generated types for queries. When I change the db schema, I get compile errors in my client. Feels like I'm cheating


I'm doing something similar, but using type-graphql + apollo server + graphql-code-generator + mikro-orm (v3).

I just define my model classes. Graphql resolvers look like regular rest endpoints. Everything else is generated from there and I even get compile errors for my apollo client queries too.

I really love the typesafety of it all!


Thats just bad schema definitions then, you should have defaults and upon making mutations with only a subset of parameters it shouldnt fail, queries shouldnt fail anyway.

If you are actually changing variable names then your client would fail regardless of what you're using


We used Hasura for a while, but it was too easy for clients to write queries which would take it forever to complete.

Postgrest was better, but we had trouble getting it to work with JWT, and have a reasonable permissions model.

Postgrest was DAMN fast though, it was really nice to work with.

In the end we had to write our own.


FYI query whitelisting is on its way in Hasura which pretty much solves the first issue


Query whitelisting has been available since beta 1. It is called Allow Lists :) https://docs.hasura.io/1.0/graphql/manual/deployment/allow-l...

We recommend every production app to setup Allow Lists to prevent arbitrary querying.


> The Rails style of speccing a model, migrations, and controller/serializer or graphql types feels overly verbose and repetitive.

For the record, ActiveRecord models do not specify, for example, table schemas - they are loaded from the database in the runtime, and migrations are not really necessary - they provide schema versioning with a rollback possibility, something that AFAIK the databases don't usually offer. Rails (and Ruby) is actually so non-verbose and non-repetitive, that a lot of people is criticising it for the very reason, usually complaining about it being too "magical".


> they provide schema versioning with a rollback possibility, something that AFAIK the databases don't usually offer

But stand-alone migration tools like Sqitch do.[0]

[0] https://github.com/sqitchers/sqitch


I haven't used Rails a lot, but in Django you get a lot more from the ORM models than just database queries.

The Django ORM is used pervasively throughout django and third party libraries. You can compose and reuse querysets in Templates, Form validation, authorization and so on. You can create a CRUD admin interface and a REST API with almost no extra configuration.

It also has a very mature migration system, meaning you can have your migrations in code, and apply them in production or in multiple environments.

One problem I expect with PostgREST (haven't used it yet) is that validation and authorization may suffer. At the very least this will have to be configured either as SQL data or as SQL stored procedures, with SQL not being the strong suit of most frontend people.

They probably have safeguards against SQL injections through their API. Logic validation will be harder, and Postgres only supports rudimentary permissions, so all of this has to be implemented in stored procedures, also.

My main concern with PostgREST is that the database is stateful and persistent, and that you have to put more code in there. Which means every deployment is going to be a migration, no matter what. You better don't forget any of the state changes you made in your development database. Version management will certainly be harder.

Please don't get me wrong. I am sure PostgREST has perfectly valid usecases, and I like Postgres a lot. I am just dropping a few thoughts on why I would prefer a traditional application between database and frontend.


There are trade offs when you couple your front end to your database structure.

Additionally, if you are using a lossy intermediate query representation instead of plain SQL, you will end up losing a lot of he database performance and find yourself doing complex joins on your front end.


This is a common misunderstanding and I think a big reason why people are reluctant to try this type of architecture. The idea here is not to couple client code to the database structure. You should expose schemas that contain views on top of the basic schema, which provides good encapsulation and prvents both of the issues you bring up. (This was common 25 years ago, when server-side code often amounted to a db schema containing stored procedures.)


That’s a step in the right direction but I’m not sure if it really solves the problem.

But as I said, there are trade offs. I didn’t say it’s always a bad idea.


I agree there are always trade-offs, but at least as far as "losing a lot of he database performance" and "doing complex joins on your front end", I don't think there is anything in the PostgREST approach that would push in that direction. For more background on the performance part (Oracle-specific but most of it carries over) check out http://www.prohuddle.com/webinars/ToonKoppelaars/ThickDB.php


It's amazing that at PostgREST we have reached the same pattern for exposing the API(views + sps).

Thanks a lot for sharing.


when it comes to these tools, all your statements are wrong, give them 5 minutes and check our their capabilities (there is no coupling and joins in frontend) :)


> Are there other reasons this hasn't seen more widespread adoption?

This approach had widespread adoption in the 1990's, most notably via PowerBuilder[0].

> Is there some inherent architectural flaw or just not enough incremental benefit?

Yes, there is an inherent architectural flaw with two-tiered systems in that they force a system to choose to place complexity either in the GUI or in the RDBMS. This often leads to "Jenga Programming" maintenance techniques which cause the code base to take a life of its own.

It turns out that this does not scale.

0 - https://en.wikipedia.org/wiki/PowerBuilder#History


PostgREST's approach is not like PowerBuilder's.

PowerBuilder didn't encourage you to know your DB ins-and-outs and use all of its features. It was DB agnostic and through its DataWindow you got a RAD way of doing CRUD for a desktop app but you still needed to add application logic in PowerScript.

PostgREST on the other hand pushes you to know PostgreSQL.

A more suitable analogy would be to something like Oracle Apex[1].

[1]:https://en.wikipedia.org/wiki/Oracle_Application_Express


> PostgREST's approach is not like PowerBuilder's.

My intent was not to equate PostgREST with PowerBuilder, but instead to use the latter as a well-known example of a two-tier system.

> PowerBuilder didn't encourage you to know your DB ins-and-outs and use all of its features.

While PowerBuilder was "DB agnostic", as you identified, non-trivial applications relied heavily on RDBMS logic (stored procs, views, etc.). So while PB may not have "encouraged" RDBMS specific implementations, those which I encountered were never without.

Granted, the client connection protocol vastly differs between the two offerings. Yet the system architecture is shared IMHO.


Why do you think this is a two tier system?

Even here [1] i count 4 places to put logic into besides frontend

1. https://github.com/subzerocloud/postgrest-starter-kit/blob/m...


> Why do you think this is a two tier system?

The first sentence of site reads thusly:

> PostgREST is a standalone web server that turns your PostgreSQL database directly into a RESTful API.

If there is a database on one end and a client on the other, with no intervening system between them, that is the definition of a two-tiered system.


Sure you can use it as two tier if that does the job but it does not mean you are forced to, you can very well have logic in the proxy layer and then have logic below the db layer (workers reacting to database changes). If i apply the same logic (two tier bad) that i can say every other way of building an api is two tier since you have the frontend and then the rest of the code in php/node... the db would not really count as a later since in those scenatios ppl use it as a dumb store


> Sure you can use it as two tier if that does the job but it does not mean you are forced to ...

The project home page declares this to be its raison d'être.

> ... you can very well have logic in the proxy layer and then have logic below the db layer ...

Why have a "proxy layer" to "then have logic below the db layer"? Why not just let PostgreSQL manage the data and have...

wait for it...

A system which defines the application logic? I mean, that is what you described in a roundabout way.

> If i apply the same logic (two tier bad) that i can say every other way of building an api is two tier since you have the frontend and then the rest of the code in php/node... the db would not really count as a later since in those scenatios ppl use it as a dumb store

No, what you describe as "an api" (which assumes an external client that interacts with it) and "then the rest of the code" (which is colloquially known as application logic) using PostgreSQL (what you call "a dumb store"), is known in the industry as a "three-tiered system."

Not a "two-tiered" system.

This is because there are three significant actors involved in system operations:

1 - something that requests the service provided (browser, phone, desktop binary, etc.).

2 - something that provides the service (application logic).

3 - something that persists relevant state as determined by #2.

See what happened there?

Three distinct things; a client, something that does stuff, and something that stores stuff.


I couldn't stop thinking REST could be derived from sql..

Similarly after a week of vuejs I couldn't stop thinking redux/vuex should be an embedded sqlite ...


You might find CouchDB and PouchDB interesting then. CouchDB works over REST and there's a client-side version called PouchDB that you can selectively sync between.


> Are there other reasons this hasn't seen more widespread adoption?

I think it's because PostgreSQL got major features for web development — like JSON and RLS — not too long ago and most developers are just catching up.

JSON was added back in 2013[1] and RLS back in 2016[2].

[1]: https://www.postgresql.org/docs/release/9.3.0/

[2]: https://www.postgresql.org/docs/release/9.5.0/


JSON support wasn't all that great until JSONB columns were added in December 2014[1]. I think that was when devs started taking it seriously.

[1] https://www.postgresql.org/docs/9.4/release-9-4.html


The Graphile folks are working on Graphile Migrate to improve the migrations story: https://github.com/graphile/migrate

Separately, I've got a fairly advanced setup of schema/seed lifecycle flow in a Graphile-powered application that will likely be open-sourced at some point. The simplicity of the architecture makes these things relatively easy to spin up from scratch.


Thanks for mentioning Graphile Migrate. Migrations are often the biggest issue people have with something like PostgREST and PostGraphile; I’ve used a lot of the various migration tools and Graphile Migrate is the seed of my vision for a much more compelling migration story for everyone in the PostgreSQL community (especially people who don’t mind writing SQL but find the current up/down pattern too verbose/clunky as I do). There’s lots more to come, so make sure you follow the project if you’re interested.


How does it compare with marv [0]? [0] https://www.npmjs.com/package/marv


Any equivalent in MySQL? I have a legacy project in MySQL 5.1 I need to keep up. I have been using Peewee to generate the structure of the DB but would like to do some direct API call like Postgrest



xmysql - Creates instant REST APIs over a mysql database.

https://github.com/o1lab/xmysql/

Give it a try.

(Im the author)


o1lab, FYI, I considered xmysql recently, for my needs it's not currently enough because I'd like to have access rights management. IMHO utilizing the DB users for that is ideal - authenticate the mysql Db users in your rest service session. (PostgREST seems to do this)


ACL and other features are work in progress. Please watch repo for updates.


Deploy Kong in front?


Inherent architectural flaws: - Complete lack of separation of concerns: you can see this by the terrible idea of simply slapping a transformation procedure to a non-specified part of the codebase. Every time you do this you end up having to maintain a codebase which has no architecturally defined place anywhere. See: https://postgrest.org/en/v6.0/how-tos/casting-type-to-custom...

- There is no ORM that you can control. It doesn't mean that there is no transformation needed though. It just simply means that you lose control over it. All the work which were once under your control now is handled by the database layer. Of course you can go to that layer and change the logic of the framework yourself (by sending strangely formatted HTTP requests according to the doc), but then you gain nothing by using this framework. Of course this problem stems from the same fact that there are no separation of concerns. So basically the framework is simply ignoring the fact that there is an inherent object relational impedance mismatch. Of course the authors are aware that there is a mismatch but they simply sweep it under the rug by saying that:

  "The PostgREST URL grammar limits the kinds of queries clients can perform. It prevents arbitrary, potentially poorly constructed and slow client queries. It’s good for quality of service, but means database administrators must create custom views and stored procedures to provide richer endpoints."
This will come back and bite you in the ass when you try to work with anything slightly complicated like inheritance for example. So the authors basically dump every single person in the world under the category of poor programmers, and they offer their excellence in place of your lack of skill or talent. Go, figure! I am always very skeptical when I read something like this. See: https://postgrest.org/en/v6.0/api.html#resource-embedding

https://postgrest.org/en/v6.0/api.html#custom-queries

- No testability: pl/pgSQL is a really poor language to write unit/integration tests for, mostly because it's ancient and it's procedural and there are no native frameworks to do so. Of course you can use pl/Python and others, but then again, why would you when you can write the same thing in Python without embedding the code in Postgres, and the Python ORM would take care of having the same performance on the database level for you.

- Lack of tools for migrations, etc. The list goes on.


Coming from a traditional layered approach I can see why this "de-layered" approach can seem wrong. After all I had to ditch the knowledge I had about Django ORM, Sequelize, HIbernate(I wish I could forget entirely about Spring/JPA books I've read but I still remember some parts). But as mentioned in other comments if you give it 5 minutes—and embrace the db—perhaps you'd see the advantages of this approach over more traditional ones.

> There is no ORM that you can control

You don't need an ORM to control your database. If you treat the database as the source of truth(this would be the architecturally defined place) you can do data transformations with plain and short SQL. The great thing about this is that you can leverage PostgreSQL rich functionality and extension ecosystem and you don't have to worry if your ORM supports a pg type or if you need a plugin for that.

> by sending strangely formatted HTTP requests according to the doc

The "strangely formattted request" wouldn't be so strange if you note that PostgreSQL cast operator is "::"(e.g. `select col::json from table`).

> Python ORM would take care of having the same performance on the database level for you.

That's just false. ORMs have to point to the lowest common denominator of databases. They don't use the full potential of pg(things like LATERAL joins, json_agg, etc). PostgREST queries are specifically tailored for pg and they're being constantly improved.

> No testability

I've shared this before, but by using pgTAP[1] you can get native SQL tests.

HNers have shared migration tools in other comments.

[1]: https://pgtap.org/


> Coming from a traditional layered approach I can see why this "de-layered" approach can seem wrong.

One of the most significant architectural defects of this approach is by demanding the persistent store provide functionality beyond solely managing data, it both conflates disparate concerns (business logic + data management) as well as unduly complicates system evolution.

> If you treat the database as the source of truth(this would be the architecturally defined place) you can do data transformations with plain and short SQL.

And if you treat the database as the source of truth for managing data, then the idea of making it an application engine would be anathema.


> disparate concerns (business logic + data management)

We argue that all "business logic" is in fact "data logic" and they're the same concern. There are several comments on this post on how to evolve this approach(only use VIEWs/sps in a single schema and expose this to clients).

> And if you treat the database as the source of truth for managing data, then the idea of making it an application engine would be anathema.

Why? Could you elaborate more on that?


> > And if you treat the database as the source of truth for managing data, then the idea of making it an application engine would be anathema.

> Why? Could you elaborate more on that?

Sure.

Your previous declaration of:

> We argue that all "business logic" is in fact "data logic" and they're the same concern.

Is both the source of my position and your question.

For at least the last 50 years of which I am aware, the idea of "data rules all" has inextricably lead to migrating system logic closer to where the data resides. In the last 35-ish years, this meant defining application logic in stored procedures executed in an RDBMS (such as what you describe).

This architectural philosophy is commonly know as "client/server", or "two-tier systems," and has been studied extensively (as I am sure you are aware). What makes this relevant to the argument found in the "Motivation" section on the PostgREST site is:

  Using PostgREST is an alternative to
  manual CRUD programming.
And then:

  The PostgREST philosophy establishes
  a single declarative source of truth:
  the data itself.
To achieve that, the logic must reside within the RDBMS in order to fulfill the project goal set forth in the very first line of the site's home page:

  PostgREST is a standalone web server
  that turns your PostgreSQL database
  directly into a RESTful API.
For all intents and purposes, PostgREST could be considered an RDBMS connection pool, with the connection protocol being HTTP. Note that this is not a judgement, only an observation.

Now, consider a different system architecture commonly known as "three-tier" or "multi-tier." The philosophy of this type of system is largely defined by separation of concerns. This is to say, have each tier be responsible for a single "layer" involved in system definition.

In multi-tier systems, the persistent store (RDBMS) is solely responsible for ensuring data integrity. It may, in fact, service disparate components as well as disparate components may use more than one persistent store. An example of the latter is PostgreSQL and Redis deployed in concert. Programs, components, and/or systems which interact with a persistent store are responsible for providing the behaviour expected by their clients whilst the persistent store is expected to enforce rules regarding the relationships it manages.

Since the persistent store is only asked to fulfill a data integrity contract, having it also attempt to be an application engine would clearly conflate concerns, limit what persistent stores could be used, as well as result in assumptive implementations. Hence anathema to those who would treat the persistent store (database) as the source of truth strictly for the one concern which it was designed to address:

Managing data.


I see what you mean. This is the view that considers that the database should only be a Persistence Layer and not a Processing Engine.

There is a great talk[1] that @felixyz shared regarding the shortcomings of the Persistence Layer approach and the advantages of Processing Engine approach(which we at PostgREST embrace).

In summary, having the "business logic" away from the db results in worse performance: more network roundtrips, ineffective cache utilization, wasted CPU cycles in traversing your programming language VM(plus all libs) code paths. Turns out this actually puts more strain on the DB and makes your whole system scale worse.

> Hence anathema to those who would treat the persistent store (database) as the source of truth strictly for the one concern which it was designed to address: Managing data.

I think marrying these concerns and having them in a single place is more of a benediction for OLTP systems development and maintenance. After all, PostgreSQL has gotten more powerful features over the years and it has evolved to do more than data management(LISTEN/NOTIFY, JIT, etc).

I really recommend watching that talk. Btw, slides here[2].

[1]: http://www.prohuddle.com/webinars/ToonKoppelaars/ThickDB.php

[2]: https://www.nloug.nl/downloads/ogh20170615_toon_koppelaars.p...


This is still a horrible idea.

No IDE support, no debugging support, poor language choice. It's just simply screaming NO NO NO NO NO NO NO on every level of the framework.

Not to mention that these days it is more important than ever to make the code also executable in cloud environments like GCP Cloud SQL or AWS RDS.

> You don't need an ORM to control your database.

You are actually wrong about that and you are basically the living proof that I was right. You didn't even try to address what I was writing about object relational impedance mismatch and inheritance.

> you can leverage PostgreSQL rich functionality

I think this is a slightly outdated world-view. As I mentioned above these days the most important driving factors are not what a specific DB vendor supports but mostly what the cloud providers do. And many of the cool things are only supported by plugins in Postgres anyway.

> you can do data transformations with plain and short SQL

You guess what? You can do the same in ORMs too! In any language which has a Postgres DB driver! No one is stopping you from that.


> object relational impedance mismatch and inheritance

Can you elaborate on that? PostgreSQL is an object-relational dbms btw, it supports inheritance[1]. Though I don't know exactly what impedance are you referring to since PostgREST doesn't use an ORM.

[1]: https://www.postgresql.org/docs/12/tutorial-inheritance.html


Somebody in our team put this on production. I guess this solution has some merits if you need something quick, but in the long run it turned out to be painful. It's basically SQL over REST. Additionally, your DB schema becomes your API schema and that either means you force one for the purposes of the other or you build DB views to fix that.


> or you build DB views to fix that.

That's what VIEWs are for! Well, one use-case of VIEWs, anyways.

There's nothing wrong with the schema as the API since you can use VIEWs to maintain backwards compatibility as you evolve your product.

Put another way: you will have an API, you will need to maintain backwards compatibility. Not exposing a SQL schema as an API does not absolve you or make it easier to be backwards-compatible.

You might argue that you could have server-side JSON schema mapping code to help with schema transitions, and, indeed, that would be true, but whatever you write that code in, it's code, and using SQL or something else is just as well.


How do you do CRUD with views? I know Reads are what views do.


> How do you do CRUD with views? I know Reads are what views do

Both automatically updatable views (supporting insert/update/delete by default because their structure provides a clean mapping to the backing table) and views made updatable through instead-of triggers exist in Postgres.


  CREATE TRIGGER <name>
  INSTEAD OF INSERT OR UPDATE OR DELETE
  ON <view-name> ..


You can have insert/update trigger on views. You shouldn't but you can.

More realistically, stored procs would do the CUD parts.


> You can have insert/update trigger on views. You shouldn't but you can.

You can, and there is no reason you shouldn't.

> More realistically, stored procs would do the CUD parts.

Why are stored procs more realistic?


Triggers are implicit, have side effects and are not deterministic. They are confusing and surprising.

A procedure call is explicit, a trigger is implicit. You don't call a trigger, it just happens as a side effect of something else. People tend to forget implicit things. Suddenly you notice that something is acting strangely or slowly in your application. You can look at your functions and procedures and try to find the problem. But if your application has triggers all over the place, how do you know what is going on? A trigger can change a dozen rows, which in turn can change other rows, so changing a single row can fire thousands or millions of triggers. Also, triggers are not fired in a particular order, the database is free to change the query plan according to what it thinks is best at the moment, so triggers are not deterministic. Triggers can sometimes work and sometimes not.

Almost everything that can be done with a trigger can be done with a procedure, but explicitly, deterministically and in most cases without side effects.


> You don't call a trigger

You call an instead of trigger implementing updatability of a view, or the select query defining a view, just as much as you call property setters or getters in OOP languages.

With the DB triggers, as in many OOP languages (C#, Python), this is an implementation detail obscured from the calling site, which is good for loose coupling, modularity, etc.

Your objections, while IMO still overblown, have relevance to some uses of triggers (they are particularly applicable to AFTER triggers and BEFORE triggers other than those implementing constraints, but least applicable to INSTEAD OF triggers implementing view updatability, which is what we are discussing here.)


> much as you call property setters or getters in OOP languages

Good design is obvious and orthogonal[1]. If you write setters in an OOP language in such a way that they do surprising things, i.e. not just setting a value, then I would call that bad design.

> which is good for loose coupling, modularity, etc.

What do you gain by using triggers in this case? All you get is mental overhead, because whenever you use DML you have to keep in mind that there might be a trigger hiding somewhere that does strange things. If you call a procedure instead, you make it clear that you want to do more than just a simple update or insert.

> [...] it is possible for the method call to make use of concurrency and parallelism constructs [...] to do a unknown number of things in an unknown order

Why would I want this? I want my code simple[2], stupid and obvious, and not convoluted, clever and surprising[3].

[1] https://stackoverflow.com/a/1527430

[2] https://www.youtube.com/watch?v=rI8tNMsozo0

[3] https://en.wikipedia.org/wiki/Principle_of_least_astonishmen...


Why do you think triggers must be astonishing (but OOP not so)??



There's nothing wrong with that if that's the logic you want!


> Triggers are implicit, have side effects and are not deterministic. > ... > A procedure call is explicit, a trigger is implicit.

Method dispatch (whether the language be OOP or otherwise) is also "implicit".

Think of triggers as controlled code injection.


You still have to call a method if you want it to do something. It doesn't simply happen as a side effect of something else. A method call only ever does one thing at a time, not multiple things in a random order. Read this if you still think triggers are a good idea: https://blogs.oracle.com/oraclemagazine/the-trouble-with-tri...


> A method call only ever does one thing at a time, not multiple things in a random order.

That's...not at all guaranteed if it is possible for the method call to make use of concurrency and parallelism constructs. It's actually not at all uncommon for method calls to do a unknown number of things in an unknown order (and it can even have an unknown number of them still in progress when the method returns!)


Exactly.


Yeah, and in SQL that's INSERT/UPDATE/DELETE on a table/view. Syntactically, using a stored procedure is really no different, and it can have function calls you don't know about because its body could dispatch just like triggers do.

This whole thing about triggers being implied/hidden is not based on anything other than taste.


> and it can have function calls you don't know about because its body could dispatch just like triggers do.

Please give an example of a stored procedure that does something else than what the code says it will do.

> This whole thing [...] is not based on anything other than taste.

Good taste: Insert, update and delete doing exactly what the statement says. Procedures doing exactly what the code says.

Bad taste: Insert, update and delete surprising you with magic tricks.


> > and it can have function calls you don't know about because its body could dispatch just like triggers do.

> Please give an example of a stored procedure that does something else than what the code says it will do.

That's not what I wrote. I wrote that the caller of an SP need not know anything about the SP's body any more than the caller of a DML need not know anything about triggers. Of course that is true. And of course it's also true that the caller could know all those details in either case.


Let's say customer 1234 wants us to delete his/her customer record, so we issue the following statement:

  DELETE FROM customers WHERE customer_id = 1234;
The statement runs successfully and of course we would expect the customer record to be deleted, because that's what we just said, isn't it? However, the customer is still there somehow. What has happened? Was our statement wrong?

It turns out our applications has triggers and by surprise the DELETE changed into

  UPDATE customers SET contract_status = 'terminated' WHERE emp_id = 1234;
just because a few years ago someone thought that the logic of terminating a contract should be wrapped in a now long-forgotten trigger that magically transforms a straightforward DELETE into an UPDATE.

If the developer had instead put the logic into a procedure like terminate_contract(customer_id), we wouldn't have had this problem in the first place.


There's nothing wrong with having a trigger that turns deletes into updates, if that's what you want. Why would you be surprised that you can do this, or that it's been done?

Your schema is an interface here, so read its docs!

EDIT: BTW, it's quite reasonable to have a policy to not delete, but mark deleted/terminated, rows in certain tables. This is especially important for security-relevant tables where the names of entities are liable to get written down in external (to the DB) ACLs. It's quite reasonable to implement such a policy in the DB as triggers -- indeed, there's no better way to do it, since otherwise a user who doesn't read the docs (like you) but does have privilege, might break the database.

So, actually, I really think you've completely misunderstood. What you thought was a problem was likely a solution to a bigger problem that you were not aware of. If you removed that trigger, you likely broke something. Meanwhile you're convinced triggers are a bad idea and spread that around as gospel.


> Why would you be surprised[...]

The statement says DELETE and that is what I want, the rdms executes it and tells me "deleted 1" (yes, it does). What am I supposed to think?

> Your schema is an interface here, so read its docs!

What docs? Apparently triggers are obvious and self-documenting, or at least that's what the guy who wrote them thought.

> there's no better way to do it, since otherwise a user who doesn't read the docs (like you) but does have privilege, might break the database.

Why properly revoke the deletion permission and offer terminate_contract(customer_id) as an alternative if you can use a magic trick instead?


> > Why would you be surprised[...] > > The statement says DELETE and that is what I want, the rdms executes it and tells me "deleted 1" (yes, it does). What am I supposed to think?

Well, you know about triggers!!! So you're supposed to think that the expected thing happened, whatever the schema designer intended.

Whoever wrote that trigger did it for a very good reason, probably the one I gave earlier (namely: to prevent name reuse, and also because gee, you might create a new contract with that customer later and not want to start from scratch).

> > Your schema is an interface here, so read its docs! > > What docs?

Those written by the people who wrote that trigger, designed the tables, your whole schema.

> Why properly revoke the deletion permission and offer terminate_contract(customer_id) as an alternative if you can use a magic trick instead?

(I assume you meant to write "Why _not_ ...".)

Well, for one, because someone who doesn't understand these things might have enough privilege and ignorance to do a real DELETE and break your data.

Moreover, the logical operation here is a DELETE! Just because you mark the row deleted instead does not make it not a logical deletion.

You're choosing to take one aspect of SQL RDBMSes extremely literally while then also choosing to ignore or malign another part. That's not a recipe for success.


> Well, you know about triggers!!!

Yes, and I also know about other nefarious things like monkey patching in some OO languages. Doing such things is antisocial behavior and something that 99% of developers don't expect.

> Those written by the people who wrote that trigger, designed the tables, your whole schema.

As I said, there was no documentation.

> you might create a new contract with that customer later and not want to start from scratch

Or you might get sued for non-compliance with the GDPR.

> because someone who doesn't understand these things might have enough privilege and ignorance to do a real DELETE and break your data.

Or you revoke the privileges and create a straightforward procedure. No magic, no surprises. And you can still grant delete privileges to a role that can delete customer records for GDPR reasons.

> You're choosing to take one aspect of SQL RDBMSes extremely literally while then also choosing to ignore or malign another part. That's not a recipe for success.

The difference is that SELECT, INSERT, UPDATE, and DELETE are straightforward features that do exactly what you say unless you use another malicious feature recommended by leading database experts not to use:

avoid all triggers, all of them, they are evil. — Tom Kyte

My advice is to avoid using TRIGGERs — Joe Celko


I agree about monkey patching (but not code injection).

> As I said, there was no documentation.

Well, there's your problem :)

> Or you might get sued for non-compliance with the GDPR.

Non-sequitur. That's just an argument that your lawyers need to be involved in matters of data retention.

> malicious feature

That's utter nonsense. If it was "malicious", it would have been removed by now. All the evidence says that they are useful.

What next? No foreign keys? Those are typically implemented as internally-generated triggers. That must make them Doubly-malicious, or something.


> There's nothing wrong with having a trigger that turns deletes into updates, if that's what you want.

While I wouldn't use it on base tables, a view that provides a “current” picture where deletes are updates (or even inserts of a deletion event) into a base tabl with (potentially along with others) provides a complete historical picture of the status of the data of which the view presents a current snapshot is a useful approach.


That's fine, though if you have to worry about sufficiently-privileged and insufficiently-knowledgeable DBAs/devs, then it's best to have such triggers on base tables. After all, you want referential integrity at all times -- it should take drastic action to perform operations that don't adhere to your business rules.


Yes, a DELETE trigger on a table that reports successful deletion but doesn't actually delete the row referenced, leaving it still in the table, is bad design.

It's not evidence that triggers are bad, it's just bad design.

(An INSTEAD OF DELETE trigger on a view that marks a base table entry as logically deleted, causing the corresponding data to be removed from the view is, OTOH, not bad in the same way.)

> If the developer had instead put the logic into a procedure like terminate_contract(customer_id), we wouldn't have had this problem in the first place.

Or if they put it as an INSTEAD OF DELETE trigger on a current contracts view, with the design described above; conversely a badly-named store proc would have the same basic problem as the trigger at issue here. This isn't a trigger problem.


I disagree. The logical operation is "delete", and "mark deleted" is a perfectly fine concrete implementation of "logical delete".


> You shouldn't but you can.

Shouldn't? Why?


They're non-obvious and can bite you in several ways. A lot has been written about why triggers are bad (random example: https://www.itprotoday.com/development-techniques-and-manage...). Back before service layers were common and desktop clients connected to the database directly they were common and just about everyone has some war stories of triggers breaking things.


A lot of things are non-obvious, especially when you're not aware of them and don't expect them.


what's wrong with views (which should have been used formt he start)? What were the pain points?


They are great when used well, but non-materialized views can kill performance with large data sets.


That's the same as saying "unoptimized selects can kill performance with large data sets". Of course they can. That's what optimization is for.

We have quite large amount of data (100TB+ and trillions of rows at this point [0]) and no problem with views.

[0] https://www.citusdata.com/customers/pex


a view is nothing but a query, so if the view is "killing" the performace for you, running the same query from the client will not change anything, the porformance will get "killed" in the exact same way.


Yes, if you are running the same query. Some of the worst use of views I've seen involve massive joins without filters, and then filtering further down, so you end up working with a recordset in the millions of records rather than a few thousand.


I understand the problem you are describing, i would say that is a wrong type of view to create, if one plans to write filters on top of that view then it should be written so the filters can be inlined in the view (which would basically give you the finished query you would be sending from the layer above)


That kind of reminds me of the classic doctors joke though. If it hurts when you do that, don't do that.


It's kind of unclear what problem you are trying to describe. Views shouldn't confound the query planner, and creating views with "filters" sounds like probably a mistake--query the view with the predicates you need then.


So filtering later outside of the db? If you filter on a view the optimiser should be able to sort it out - as others have mentioned, it’s no different from a regular query.


I think PostgREST is the first big tool written in Haskell that I’ve used in production. From my experience, it’s flawless. Kudos to the team.


Having a bit of experience with OCaml, I hoped to see what production-ready Haskell code looked like with this library. I tried to read some files of the project and... IMHO "production-ready" Haskell code is still not easily readable, for example, the main file for the tests :

https://github.com/PostgREST/postgrest/blob/master/test/Main...

and

https://github.com/PostgREST/postgrest/blob/master/test/Quer...

I don't know, maybe it lacks comments ? The code is really not easy to follow if you are not using Haskell 100% of your coding time.

While the library may work well in practice, it's a maintainability red flag and, by using this library, you rely on rare Haskell programmers for the future.


There are companies that use and sponsor[1] PostgREST development now.

There are also Haskell shops that use it. So, if a bug happens, at worst you'd have to open an issue, offer a bounty and watch it get solved.

> Having a bit of experience with OCaml

When I started as a PostgREST contributor I had zero experience on Haskell(not even a college course). I think the bar to contribution is not that high given that you know the domain — PostgreSQL and REST.

Not to say the codebase couldn't be improved. Commenting and refactoring is happening continuously.

[1]: http://postgrest.org/en/v6.0/#sponsors


I’m just a user and o have no visibility in the internal code (I can’t code in Haskell).

What I feel, as a user, is that most features that I use are already implemented and unlikely to bit rot, since PostgreSQL itself doesn’t change a lot.


it is hard to read if you don't have some knowledge of haskell indeed and the comments part is true but it's not any harder then folowing other codebases if you don't know the particular language so i don't think this is a strong argument.

Another point is - it's not a library and you are not the one maintaining it :) the same way you are not maintaining, but still using things like postgresql,nginx,redis, rabbitmq.

I bet it's a lot easier to learn haskell and patch postgrest then to know C for 10 years and patch postgresql :)


> you are not maintaining, but still using things like postgresql,nginx,redis, rabbitmq

I don't maintain them because they are written in either C or C++, which has many more practitioners, guidelines and tools to trust.

On the other hand, I can easily see people using postgres' extensions have to quickly make a patch to fix a bug or change a behavior, extensions being smaller.

The issue is that, eventually, you would like to patch, review or audit the extension. All of these operations will require you to find third-party Haskell developers, Haskell auditors, Haskell reviewers, who are rare in the job market, and therefore it represents a risk for your project.

If the core developers stop to maintain the extension, no one else might be available to maintain it, and now you have code debt and code that no one can fix


> no one can fix

Really? No one? I learned haskell (my first FP lang) and rewrote the core of PostgREST (in my spare time) in about 6m... so stop scaring people :)


Often you do not want users to have access to a whole table, but only posts made by the user, or posts to to user. I could however see this replace Excel apps. But then you will also have to generate the user interface for it to be useful. The developer should only have to specify the views, the rest can be automated. I once made such a tool in order to save a few hundred man-hours on a tight budget, and it worked fairly well. But for most apps you want to customize every layer.


You can do that with row-level security. The PostgREST documentation has examples for that specific use case: https://postgrest.org/en/v6.0/auth.html#roles-for-each-web-u...


Is column based authorisation possible?

What about group/role based security concepts?


You can create a view with a subset of columns and grant permissions on the view.


I feel like this is just moving business logic /back/ into the database.

It' very similar to what we were doing with stored procs 15 years ago and just moves the problem from business logic back to database layer. Given the choice, I'd prefer to write constraints in !SQL, personally.


> Is column based authorisation possible

Yes.

> What about group/role based security concepts?

Yes, those have been standard in RDBMSs for decades.


This is very nice! For many CRUD apps you could skip the back-end side entirely!


It sounds like this would limit scalability quite a bit because you'd either have to keep a DB connection open for each active user or close connections rather aggressively.


PostgREST performs authorization at transaction level, not at connection or session level, so a connection can be used by thousands of active users simultaneously.


Oh that's very interesting indeed! I thought each open connection remained linked to exactly one authenticated user account until it is closed.


The row level security is a feature of the database (postgresql), those rules are written and enforced by the database, they have nothing to do with PostgREST and how it connects to the database


I am aware of that but I thought that this approach would effectively prevent sharing pooled connections between different users. But taffer says otherwise, so that solves the problem I was wondering about.


I like this idea. Especially helpful for prototyping a web UI against an arbitrary existing dataset. PostgREST is much more full featured and as commented by a few others “production-ready”(?) but if you’re into this and looking for something a bit more naive but just as accessible I wrote a similar utility to expose some Postgres data over http: https://github.com/daetal-us/grotto


I'd say it has been production-ready for some years now. There are some documented cases of companies using it in production here: http://postgrest.org/en/v6.0/#in-production.


I really like this approach for mostly crud apps. What is missing is

- something to conveniently version control database objects

- something to conveniently debug stored procedures. Maybe directly from vscode or your preferred editor.

If those two things get solved somehow, pg could be a really awesome application server.


for the first one, read my other comments, this is solved. The second one, you can start here https://www.pgadmin.org/docs/pgadmin4/4.13/debugger.html


> for the first one, read my other comments, this is solved.

> The second one, you can start here https://www.pgadmin.org/docs/pgadmin4/4.13/debugger.html

I think Starter Kit and the pgAdmin debugger lack in convenience. If you write C# or node js code in your preferred editor, you can debug it there. You can debug your express routes, webapi or resteasy controllers in vscode/vs/eclipse/intellij without leaving the file you later commit to git.

Starter Kit and the pgAdmin debugger are fine tools, but they come nowhere close to how you work with a js, C#, java, python or whatever you like codebase.

The development workflow with stored procedures imho is broken, and I think that is one of the main reasons people do not use them much.


It's true that a tool developed by 1-2 ppl recently is not a convenint as the tools developed by armies of developers over decades :) but, when it comes to PostgREST way of building apis, the debugging does not have the same meaning as in other ecosystems.

an api backed by postgres+postgrest is 80% tables and views declarations ... how do you debug a view ... it makes no sense. You just define it and say "select * from view" (even from your IDE) and see if you get what you expect, that's why one can do a lot (develop complex apis) with less (limited debug tools)


You don't debug views of course. But with postgREST you have to write your logic in stored procedures. So say I want to write an accounting app that has an invoice function. A view is not enough to create an invoice, because there are complex rules to apply to the data, so I have to write a SP. I need to debug that stored procedure, and if I have to install pgAdmin and switch from my very much loved and customized editor to debug my invoicing procedure, the workflow is broken, and so it comes that I will write that invoicing function in C#, java, python or js instead, because the tooling is better.

Compare that to a node application: just open vscode, start editing away and press F5 to test it. If you want to debug it, add a breakpoint and step away. When you are done you commit and push to git.

It should be the same with stored procedures.


Let's start with the fact that most of the data centric apis have a 80/20 split on read write, so there is virtually not SP in 80% of your api, so no need to debug 80% of the code :)

so you "might" need stored procedures only for your write part and even then you need them when the input data needs to be split and sent to different tables.

The complex "rules" are nothing more then "constraints" on your data which are split between the columns of the table and become so simple that there is almost nothing to debug.

If it were the case that with postgrest one needs to write complicated stored procedures all over the place you'd be 100% correct. The thing is you don't need them in most cases and when you do they are short simple functions that deal with focused things so there is way less chance to get them wrong.

This has been my experience with using this type of stack for apps like project management/invoicing.... etc (basically basecamp+freshbooks)


Basically you are saying "I don't need stored procedures, therefore neither should you".

If you don't use them clearly you don't need to debug them. That does not change what I said earlier: a difficult development workflow hinders their adoption.


not exactly what i meant.

it's true that it's not a polished workflow to debug stored procedures because you you have to jump from your editor to pgadmin but i don't think this is such a big deal for two reasons.

- postgrest architecture is such that you rarely need stored procedures (not just me, all the projects), it's the exact same way ppl use databases without them (they just send queries to the db, same here). 90% of the code in this type of project is table/view definitions (with constraints) and appropriate grant/rls statements. So there is very little imperative code to debug.

- The type of stored procedures used is quite simple, isolated and mostly self contained, a single function, maybe calling some other helper functions (you are always 2 levels deep at most) so it's not like in other envs where you have to follow the code jumps between hundreds of functions and classes.


> something to conveniently version control database objects

I read your comments but couldn't find the answer. What did I miss?


https://github.com/subzerocloud/subzero-cli

your code is in files (so git ...) and you edit files adn save them and the new version is loaded to the database and you can make the new api call and see it in action


I am (slowly) working on a project with similar tool (postgraphile) to eliminate most of CRUD stuff. One thing I always wondered - how you would version control the schema itself? I settled on Skitch - https://sqitch.org/


Interesting to me I’d this is written in Haskell!

I highly recommend reading the source code https://github.com/PostgREST/postgrest


How do you version your API with this kind of tooling?

As in, how do you change the data model without breaking clients?


you don't expose your tables directly, you expose a schema that consists only of views and stored procedures. If you really need a totally different version then you jsut exppose a new schema but more often it's the same situation as in graphql ecosystem, you jsut add a new column/view/procedure and don't delete the old one. Postgrest has the same power to describe what you want as a graphql api woudl have (by using it's select parameter)


It’s interesting that the project’s own tutorials do not discuss any of that and not only demonstrate making the API schema exactly equal to table schemas, but go further and claim that adding intermediate business logic or using tools that mediate between the data and business logic, like ORMs, are bad abstractions that should be intentionally avoided.

I mean, I agree with the strategy you state about views or stored procedures, but those are just in-database ways of achieving the same kinds of things you might prefer to write in a different language (thus ORM or query engine) because it puts the app or business logic all into the same version controlled system, leverages programming language ecosystems and tools that are often way more valuable than raw database programming (even in Postgres), etc.

Basically, if PostgREST needs you to do the old tricks of views & stored procedures to manage an abstraction layer that safely allows the underlying data schema to change, I just don’t see the benefit over doing this in a much better language ecosystem, like Python, and using much better web server tools to generate the APIs.

PostgREST looks much more useful for quick prototypes, internal use cases where schema breakage might be OK occasionally, or just mirroring & monitoring data as-is for ops and diagnostics. From a performance perspective, it might be fast enough for production, but that’s almost never as big a concern as managing the intermediate abstraction layer and associated app tooling.

Does not look like a good idea for production applications that need an intermediate API layer adapting the data to the use case.


So there are different schools of thought about using relational databases here and I think this is where a lot of the tension comes from.

I think everyone agrees in principle that in a service-oriented architecture you need well-defined, safe, hardened interfaces between services. In an ORM world, the assumption seems to be that the database itself isn't really a service with a well-defined interface, but rather a private data store that just accepts whatever SQL you throw at it.

But what if you think of the database itself as a service? If that's the case, then your service interface should definitely not be arbitrary SQL. This is where you introduce views and stored procedures, which change your DB from a private implementation detail that you have to hide behind a service boundary to a service that sets its own boundaries.

In this world, your REST services have an HTTP client to make service calls to each other, and they have a Postgres client to make 'service calls' to your database. PostgREST is just a deterministic proxy that adapts one service protocol to another, the same way you would use grpc-gateway if you had gRPC services that you wanted to call from REST clients.

I don't think PostgREST obviates the need to write intermediate API layers, at least if those intermediate API layers are doing anything interesting. It may obviate the need to write API layers that only parameterize SQL statements and serialize JSON responses. But that's a good thing to obviate.

And yeah, you should definitely version control your DB schemas, views, and stored procedures. We aren't barbarians :)


Very well said (I'll have to remember this way of explaining it). IMO in most projects, the intermediate layers never do anything interesting so that's why postgrest (as a proxy) is a good fit


> IMO in most projects, the intermediate layers never do anything interesting so that's why postgrest (as a proxy) is a good fit

I try to avoid generalizations about "most projects", because different people have different experiences and it's hard to make a good argument about which case is more typical. At best I think you can lay out the toolbox and explain where PostgREST fits in the toolbox. Whether or not you should use it on a particular project depends on the particular project, and I have no idea what mlthoughts2018 is working on or has worked on in the past, so that's an entirely different question :)


> “I don't think PostgREST obviates the need to write intermediate API layers, at least if those intermediate API layers are doing anything interesting.”

I’m just pointing out that PostgREST’s own tutorials _do not_ say this or even appear to agree with it. I know experienced DBAs & engineers would follow these design ideas, but PostgREST makes it sound like your REST API can literally just be the very table it’s querying from. Hence why I qualified which use cases that might be OK for in my first comment.


Why would you prefer that? In all other intermediate layer you would write "imperative" stuff for how to transform the data and eforce it's correctnes. In the Db you "define" how it should look/behave/ and who has access.

As long as the "business" logic is directly related to the data in question, the code in the database will always be faster, shorter (by a lot) and easier to understand (because it's shorter by orders of magnitude).

I know your reaction comes form years of tutorials telling you not to do this, but this was way back when mysql didn't even have views, things are a lot different now, databases are not dumb stores.

Give it 5 minutes and try a small project, you'll be surprised by the power


> “In all other intermediate layer you would write "imperative" stuff for how to transform the data and eforce it's correctnes.”

No, it could be functional, declarative, OO, whatever, depending on the language used. Personally, after a lot of years of Haskell & Scala experience in large companies, I think functional & declarative programming are way overhyped, and these types of designs do not actually offer the benefits they are claimed to. Given this, I see zero reason to care if the abstraction layer is some OOP tool. That is absolutely fine.

> “As long as the "business" logic is directly related to the data in question, the code in the database will always be faster, shorter (by a lot) and easier to understand (because it's shorter by orders of magnitude).”

This is comically wrong. I remember the ~100 line long MSSQL stored procedure just to compute (poorly) the median of a column.

Expressing things in the languages supported with the database (even in Postgres, even with language extensions) requires waaaaay more code than using application tools in the application language.

I remember how much easier our lives became at an old finance job when we finally ported a ton of stored procedures to instead use pandas for math operations in Python. The cost of passing the data through processing jobs that required serializing it to servers where the jobs could run, transforming it, then transporting it back, was absolutely worth it because we could remove thousands and thousands of lines of stored procedures that were incredibly bug prone, impossible to debug, and totally lacking the linear algebra and analytics functionality we needed to expand the app.

Moving the logic out of the database was 100% motivated by making it safer, less buggy, simpler & easier to test code, and gaining expressive functionality totally impossible to express in the database.


I guess each person's oppinion is colored by personal past experiences, i too can come up with stories of how backend apis with 10s of thousands of lines got reduced to a handfull of views and stored procedures. But it's true this aproach does work when you have to do linear algebra on your data (my feeling is that this requirement does not comes up in so many projects).

On the debuging part it's a bit true, the workflow is not as polished (debuging in db) as oposed to other envs.

About the "fast" part, i would not say "comically wrong", if it were, there'd be no reason for sql beyond "select * from"


Let's not forget that Postgres has first-class support for Python [0] (unlike JavaScript etc which are maintained, or not, by third parties). I don't advocate going for Python for simple stuff but when you do use it, it has the added benefit of having plain ol' Python functions that you can unit test thoroughly (including with the wonderful Hypothesis [1] property testing framework).

[0] https://www.postgresql.org/docs/12/plpython-funcs.html [1] https://hypothesis.works/


The Postgres support of Python is usually way overhyped. I’ve done a lot of crazy things with it, like even compiling my own extension modules using Postgres C libraries to write Python native functions that vectorized business logic over columns like the way numpy works.

It’s extremely hard to manage the Python environment itself with Postgres, as Postgres has to be compiled with it. If you’re working on scientific or analytics applications where you are spinning up a new conda environment all the time, adding & changing package dependencies, etc., and you need to keep your external-to-database Python app logic synchronized with the internal-to-database Python, it’s virtually unusable.

Imagine needing to ship new versions of an in-house machine learning library into the database, test it, update that library’s own dependencies inside the database’s python, etc. It becomes a crazy packaging nightmare very fast.

The plpython extension is basically just a cute toy. Occasionally useful or a small script or single transformation that needs Python standard library functions, but anything beyond that and it becomes very unscalable & unmaintainable very fast.


That seems more related to a lacking devops approach than to plpython capabilities.

pg coupled with Nix[1] could potentially solve all of these issues. nixpkgs includes a large collection of python libs that you can use to get reproducible environments.

[1]: https://nixos.org/nix/


I think you are misunderstanding. There’s plenty of tooling for good Python env / package management, but when you need a custom compiled extension module with dependencies to maintain parity both in the database (recompiled with the custom extension packages each round trip) and outside the database, it’s a big problem.

The problem is not generally making an environment, it’s that different projects need different environments but need the same database.


> shorter

Yes, until the very first time you'd want to reuse some of the code. From then on, any initial advantage is melting into unmaintainable mess of semi-imperative, semi-declarative copy-paste horror.


reuse code where? In the db, why couldn't you? Or do you mean to reuse the logic in the db in some other place?

By this logic one can say "ruby is bad becasue i can't reuse logic i wrote for the backend in my frontend SPA". This is the type of (data logic) that you should not need to reuse in other parts.

Maybe this comment does a better job of explaing the architecture https://news.ycombinator.com/item?id=21436425


Thanks for elaborating. That sounds like a valid strategy.


Does every software out there have to provide all the specific features the entire world needs? Many APIs aren't versioned and that's fine.


Why not use a more descriptive title?

For example:

"PostgREST: a web server that turns a PostgreSQL database into a REST API"


HN policy is generally to use the literal title of the linked website, deficient as it may be.


Given it's the project website, and it's prominently visible, adding the tagline ("Serve a RESTful API from any Postgres database") would be fair to be part of the title.


I'm using the JOOQ type-safe SQL generator with PostgreSQL. My application server build script runs PostgreSQL in a Docker container, creates the database and tables, applies all migrations (via Flyway), and then invokes JOOQ which connects to the database and creates Java classes based on the tables and columns in the database.

JOOQ mostly prevents SQL syntax errors, column name errors, column type errors, supplying the wrong number of arguments, etc. These become compile-time errors.

With PostgREST and other JSON APIs, you only get run-time errors. And you rely on test coverage to check code correctness.

I prefer compile-time errors to runtime-errors. I find that software utilizing comnpile-time checks is easier to maintain.


PostgreSQL already gives you SQL syntax errors(try creating a VIEW with a misspelled SELCT), column type errors(try doing a `select 'asdf'::int;`), wrong number of arguments on a sp call(try putting one more argument to `select int4_sum(2, 3);`).

Thanks to PostgreSQL transactional DDL[1] you would get all of these errors at creation-time and without any change to your database if any migration is wrong. There's no need for a SQL codegen to get this already included safety.

Btw, PostgREST is not only a JSON API. Out of the box, it supports CSV, plain text and binary output and it's extendable for supporting other media types[2]. If you have to output xml by using pg xml functions you can do so with PostgREST.

[1]: https://wiki.postgresql.org/wiki/Transactional_DDL_in_Postgr...

[2]: http://postgrest.org/en/v6.0/configuration.html#raw-media-ty...


Somewhat related discussion from a week ago: https://news.ycombinator.com/item?id=21362190


I love the idea - and it's definitely something I'll put through the paces on one of my projects shortly. Being able to separate the schema from data ingestion, and data transmission is a very powerful scale option for one of the things I'm playing with.


This is so smart. Common sense to the max! Sad I didn't think of it.


Don't do this with a public API with third-party clients!!

This way you are directly tying the REST API to your database schema. The whole point of having a public API (you know, Application Programming Interface) is that you can serve your data in a controlled way, maybe totally different from your schema. In the moment you change a little bit on your schema, congratulations, you broke all clients.


So why use REST at all at this point? What is the benefit REST is bringing to the table here?

Seems like if you want a declarative API you might as well do something like a local read replica a la Firebase. Seems like the natural progression of these API as single schema technologies.

Is the main reasons for sticking to REST here compatibility or is there something in the RESTful design we want to hold on to?


REST it pretty stupid, but it works over HTTP(S) and is state-less. And tools that use HTTP has nice abstraction layers already, and are very common, so it becomes simple to use.

Personally for talking with a web front-end I would use Websocket's with long-polling as fallback. And use JSON instead of query-string for querying. It does however require yet an abstraction layer, and is more brittle and less secure then REST.

REST is a school-bus. Other methods are like exotic sports-cars.


Well sure it's common but my question is whether that's the only reason. If we're going to down the path of declarative requests and the like, why not push it further like firebase has done? I'd prefer that a lot more if there was a self hosted/open source alternative.


What’s the benefit of using this over just using a small framework like Flask/Express with a Postgres lib?


Derek recently wrote about it [0]. The fact that you can have the thinnest client between your front-end and backend makes things incredibly flexible. If you learn PG properly you can do 100% of data preparation on the database side and just expose it through an API. If you decide to change something, you change a view and it's now whatever you just did. No code changes, no redeployments.

[0] https://sivers.org/pg2


That is all true but postgrest goes a step further then Dereks approach (where you need to basically write a lot of stored procedures) and gives the fronend access to safe subset of SQL, so that frontend is not strictly limited to the capabilites of the available procedures that live in the db (thus eliminating the need for most of them)


Derek's approach is one that can be carried between many old version of PG. Today, you can achieve a lot more with generated columns, (materialized) views, partitioning and other fun features added in recent versions.

In any way, we used this approach in our company dealing with billions of rows of data and this allowed us to scale way past our "weight class".


Thanks for this. People are always skeptical of this approach, not because the tried it and failed (or even thought 5 minutes about it) but because they read some blogpost somewhere.

Not to say though that this solves everything, there are cases where it does not work (as someone commented correctly and gave an example where they needed to use linear algebra over the data)


That's what generated columns [0] are for ;) There is very little you can't do on the side of the database these days. And if all of it fails, then stored procedures and triggers can solve the rest. We [1] as a rule don't allow any data to be post-processed once they leave the database and we run database of 100TB+ with trillions of rows with absolutely no problems.

[0] https://www.postgresql.org/docs/current/ddl-generated-column... [1] https://www.citusdata.com/customers/pex


It's micro-service like, or some other crap?

I had situation where was implementing something quite simple - an URL shortener. I didn't use PostgREST, but I decided to use ORM, because it was simple CRUD operation. It had an option to either use generated url or allow user to specify a custom one. And it worked as expected.

But then once completed I decided to add extra functionality, for example extra statistics, like what IPs were accessing it and how much. Adding expiration times etc.

I realized that ORM encouraged me to implement all of my logic in the application even when I actually would put less load on the database and made things simpler if I would let the database do many things for me and use types and functionality provided to me. I am not taking here about using stored procedures, I could do all operations as 1 at most 2 SQL statements. While ORM had to send multiple. In the end I dropped SQLAlchemy (this was python code) and just use psycopg2 directly didn't even bother with wrappers, just used built-in pools. It was also easier for me to make my code use two endpoints for reading and writing, so I can scale my code better.

I realized that ORM did not save me much code at all, it was the same amount of code with or without it, and without ORM I had greater control of what I wanted to do.

I previously believed that ORM was standing in a way when your application gets bigger, but my belief was ORM was good for small projects. Here I realized that it doesn't bring much benefits even for simple projects.

I think REST interface like this is doubling down on what ORM tries to do. Maybe it could be beneficial in places that don't have libraries to communicate with a database and only can make http requests?


It's not doubling on the ORM part, it's not a orm replacement, it's a replacement for the entire api layer (that uses orm internally).

I agree with the other point, this is meant to be used when your client can not ahve direct access to the db (browsers, 3rd parties). If you can connect to the db and use SQL, do that.


You should try SqlAlchemy Core. It does not have the drawbacks of an ORM but still saves you the trouble of writing sql queries as raw strings.


It's better but still not great, because it tries to make your SQL independent of the database. So some things are harder to express that way. You have to understand your database, and then you need to understand how to express in SQLAlchemy to get your desired SQL statement.

PyCharm actually has a database support, you can configure it so it connects to your database, then it will fetch database schema. After that it will recognize your SQL statements in the string, offer auto complete and even take these into account when refactoring your code.

IMO this is how impedance mismatch should have been handled from the beginning.


I am not sure if it is a good idea to add PostgREST to your stack. As PostgREST can only interact with your DB, so you would probably be calling PostgREST from another REST. In this case, you would be better off using ORM.


I've wanted this for a long time.

Is there anything like this for Microsoft SQL Server?


You could use PostgreSQL Foreign data wrappers[1] and leverage PostgREST for a SQL Server schema.

tds_fdw[2] works pretty well for this(I've used it in a project related to open data). Basically, you'd have to map mssql tables to pg foreign tables[3] defined on a pg schema. Lastly expose this pg schema through PostgREST.

[1]: https://wiki.postgresql.org/wiki/Foreign_data_wrappers

[2]: https://github.com/tds-fdw/tds_fdw/

[3]: https://github.com/tds-fdw/tds_fdw/blob/master/ForeignTableC...


so basically this is like Firebase but in RDBMS which is quite awesome


This is intriguing, but how does one go about scaling this? Relational DBs are often where scaling breaks down and sticking more of the application in that problematic part of the stack seems like it could end poorly...


How is the REST API document generated? Looks neat!


This should have been: PostgGRPC


Have you just discovered this and posted to HN? :)


That's roughly the idea of this site, right :-)


Older projects regularly find their way to the front of HN.

For example, recently the NeverSSL project was on the front page. It has been submitted 10 times over the past 3 years (the homepage, not updates), and has been on the front page of HN before.


So?


Nothing wrong with it, it’s just weird to me seing a link directly to the homepage of a product that’s been around for 5 years now, and not something like “how to do x with postgrest”. Don’t get me wrong, it’s a good thing, just a little weird (in a good way) to me


I guess Hacker News is a little quirky like this. Your first comment seemed weirder.


Yeah i know, didn’t know ho to exactly express the feeling (nice surprise)


The frequency by which this project appears at the top of HN does not correlate with its production usage and thus feels like gorilla marketing, at least to me.


I don't see why you're going so ape about it?


Looking through the search results, it seems like it's been potentially high on the front page 3 times in 4 years.


That was the surprise part behind my other (downvoted) comment, surprised that it still comes up on homepage with direct links (as oposed to some new development around it)


Damn gorillas.


It’s not marketing (100% sure :). Do you have some info on it’s usage in production?


"Object-relational mapping is a leaky abstraction leading to slow imperative code"

So they added REST on top of ORM, few more layers of data transformation and even leakier abstraction, so poor dev doesn't have to worry about "low level" SQL.

I lost count of how many different libs/frameworks i saw that exposed CRUD through HTTP, all failed miserably, because it is actually very dumb idea.


postgrest is not (and does not use) a ORM

Postgrest is more like a compiler, it takes one language as input (REST) and outputs another language (SQL) as output.

It has 0 relation to the ORM concept


"ORM" is more like a compiler, it takes one language/(api can be seen as a language) as input and outputs another language (SQL) as output.


totally in agreement with you here. an ORM could also be viewed as a compiler. the original comment didn't deserve the backlash.


"PostgREST is a standalone web server that turns your PostgreSQL database directly into a RESTful API." How did you manage to understand the completely opposite from the very first definition?


Becasue i know the codebase maybe :)? How does the tag line say "postgrest is a ORM" and how am i interpreting it in the exact oposite way?


This is nothing like what you're describing.

Postgrest is not an ORM, it just takes rows from your database and sends them to the client.

It encourages you to write SQL. Write a decent SQL schema. Write SQL views on top of that schema to provide the rows to your API. Even write SQL functions to modify the data in reaction to client calls!

If you hate ORMs and love SQL (specially Postgres, because it's amazing) this is the tool you should be using.


Postgrest is RRM actually- rest relational mapper.

My point is - exposing db over HTTP (REST, GraphQL, ...) is bad architectural design and idea.


between postgrest/postgraphile/hasura there are 30k stars, have all those ppl fallend for this "dumb" idea :)?

You say "expose db" and imply that one would directly give access to the core tables and in this way coupling the api to the data model (one should expose only views and stored procedures) and making the api inflexible.

Or are you saying this is insecure, and if so, can you describe how one would "break" this?


> My point is - exposing db over HTTP (REST, GraphQL, ...) is bad architectural design and idea.

I'm with you that exposing a fairly free-form query language over HTTP seems like something that's gonna bite several someones in the ass pretty hard at some point (GraphQL) but what's that got to do with PostgREST?


actually .... :) Postgrest has the same "capabilities" as GraphQL so if one is not carefull and jsut (naively) exposes tables with millions of rows with just postgrest (and no propper indexes) it can get bad fast since the client could create queries that join and return a lot of data.

That's why one does not deploy postgrest alone but use it in a setting like this https://github.com/subzerocloud/postgrest-starter-kit

where you at the proxy level can say things like "return 400 if the client did not provide at least 2 filters for the table" or "allow filtering on column a and b because they have indexes but not on c"

But again, those are things to consider when you have datasets exceding hundreds of K or millions, you don't need to care about this when you have tables with 10s of Ks


Well sure they have the same capabilities from a certain perspective, in that both ultimately send data from a database to a client, but GraphQL does this entirely through a fairly free-form query language with either auto-generated or custom mapping code to turn that into DB queries (though in all cases it has to handle a fairly flexible query set or "shape", or else what's the point of using GraphQL), while PostgREST requires views and stored procedures to handle all but the most trivial cases. I think the latter's way less likely than the former to cause unexpected problems or force tons of defensive coding and cause anxious, sleepless nights—PostgREST seems to me closer to what's normal for any more traditional DB-to-HTTP-client CRUD glueware (Rails, Django—like 99% of the dynamic web, really) when it comes to inherent risk factor.

Exposing a poorly-indexed table can happen in any ol' Rails API app with the usual lazy 3-line controller methods bouncing about its codebase—that's the same level of risk in any system using a database, more or less. I don't see PostgREST as being more likely to cause that to happen. GraphQL's trickier, especially from a performance perspective, because the scope of a query is so hard to determine or account for up-front, and it's hard to confidently answer the question "what's the largest possible scope of a request to this endpoint?".


Have you looked into postgrest's "select" parameter? Your comment about the "scope" of the query applies here too.

Also GraphQL is a protocol and you can't really say it will have poor performance, it all depends on the specific implementation of a particular schema. You can have a GraphQL schema where each query gets translated into a single sql query [1] jsut like with postgrest. One thing that is true is that the reference implementation for graphql servers (but only because it has to be general) pushes one in a direction where a single graphql request might generate a storm of sql queries but it's on developer to understand and write proper resolvers.

[1] https://github.com/subzerocloud/subzero-starter-kit


So what is the alternative that you propose?


> I lost count of how many different libs/frameworks i saw that exposed CRUD through HTTP, all failed miserably, because it is actually very dumb idea.

The whole point of HTTP is CRUD (Create Read Update Delete). PostgREST maps the HTTP Methods POST/GET/PATCH/DELETE to SQL's INSERT/SELECT/UPDATE/DELETE.


PostgREST actually pushes you to know your PostgreSQL. It aims to be a transparent, yet safe/sandboxed, layer over PostgreSQL.

The generated SQL statements over HTTP requests are there to help reduce SQL boilerplate but you're expected to write custom views, functions, triggers, constraints, etc and understand "low level" pg.


This is a direct mapping of the schema into a REST format to be consumed by frontend applications. It doesn't get much "SQL"ier as far as the frontend is concerned. What alternative would you present?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: