Hacker News new | past | comments | ask | show | jobs | submit login

(I work at Notion, one of the larger Notion clones)

We experimented with using partitioned tables and doing the fancy postgres_fdw thing when we sharded our tenant-clustered tables in 2020. Even with all the Postgres instances in the same region we found the approach unwieldy. Routing queries in the application gives you a lot more control versus needing to teach Postgres the exact logic to route a query, plus do a (dangerous? one-way? global?) schema migration whenever you want to change the routing logic.

We touched on this briefly in our sharding blog post here: https://www.notion.so/blog/sharding-postgres-at-notion

Another reason to avoid complex routing in Postgres is risk. If something goes wrong at 3am in a load-bearing cross-Postgres-instance query, how easily will it be to mitigate that if the routing is happening in Postgres (with whatever advanced clustering system Citus, pgzx, postgres_fdw) versus if the routing is happening in the application? For example if there’s a network partition between the “global” postgres instance and the us-east-2 cluster? Maybe you’re Postgres wizards and know how to handle this with a quick schema change or something in the psql CLI but I’d bet more on my team’s ability to write fault tolerance in the application versus in Postgres internal logic.




Thanks a lot for commenting and pointing me to the blog post. I do think I've seen it before but forgot about it. I've re-read it now and it's a great read!

From what I understand you decided to do sharding in the application code, and given the current state I think that makes total sense and I'd have probably done the same.

Part of my point with the blog post is that there is a built-in horizontal sharding solution in vanilla Postgres (partitioning + FDW), but it's currently badly lacking when it comes to cluster management, schema changes, distributed transactions, and more. If we put work into it, perhaps we tip the balance and the next successful Notion clone could choose to do it at the DB level.


Thanks for this commentary! At a startup where we are preparing to shard Postgres. I'd be curious if you're familiar with AWS limitless, and how you would have approached deciding whether to use it vs. the approach in the blog post had it existed back in 2021.


I’m a solid “no” on any primary datastore database thingy that’s under 5 years of industry wide production workload experience, and would seriously weigh it against something with 10+ years industry use.

In 2019 when I was interviewing at companies for my next position I heard from a few places that the original Aurora for Postgres lost their data. It seems like the sentiment on mainline Aurora has improved a bit, but I would never bet my company’s future on an AWS preview technology. Better the devil you know (and everyone else knows).


What about CockroachDB? There are real-world, large-scale deployments of it (e.g Netflix) going back more than 5 years easy.


It might be a good choice, I don't know enough about either the technology or the market for CockroachDB expertise.


My biggest concern with Limitless – other than inherent performance issues with Aurora – is that according to their docs, it’s built on Aurora Serverless. IME, Serverless anything tends to get absurdly expensive very quickly.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: