Hacker News new | past | comments | ask | show | jobs | submit login

I'm one of the makers of ParadeDB, a modern alternative to Elasticsearch. We build Postgres extensions to do fast search (pg_bm25) and analytics (pg_analytics). I love Postgres. If you have a small workload, like a startup, it certainly makes sense to stay within Postgres as long as you can.

The problem is, at scale, Postgres isn't the answer to everything. Each of the workloads one can put in Postgres start to grow into very specific requirements, you need to isolate systems to get independent scaling and resilience, etc. At this point, you need a stack of specialized solutions for each requirement, and that's where Postgres starts to no longer be enough.

There is a movement to build a Postgres version of most components on the stack (we are a part of it), and that might be a world where you can use Postgres at scale for everything. But really, each solution becomes quite a bit more than Postgres, and I doubt there will be a Postgres-based solution for every component of the stack.




what is "at scale"? Is there a specific metric or range of metrics that raises a flag to begin considering something else? For example, in the olden days when it was my problem, page load times were the metric. Once it got high enough you looked for the bottleneck, solved it, and waited. When the threshold was broken again you re-ran the same process.

Is there an equivalent for postgres?


This bugs me every time performance comes up. No one is ever concrete, so they can never be wrong.

If Michael Jackson rose from the dead to host the Olympics opening ceremony and there were 2B tweets/second about it, then postgres on a single server isn't going to scale.

A crud app with 5-digit requests/second? It can do that. I'm sure it can do a lot more, but I've only ever played with performance tuning on weak hardware.

Visa is apparently capable of a 5-digit transaction throughput ("more than 65,000")[0] for a sense of what kind of system reaches even that scale. Their average throughput is more like 9k transctions/second[1].

[0] https://usa.visa.com/solutions/crypto/deep-dive-on-solana.ht...

[1] PDF. 276.3B/year ~ 8.8k/s: https://usa.visa.com/dam/VCOM/global/about-visa/documents/ab...


minor nit: 9K TPS for Visa are business transactions - TBD how many database transactions are generated...

(still, modern postgresql can easily scale to 10,000s (plural) of TPS on a single big server, especially if you setup read replicas for reporting)


Yeah, I don't mean to say Visa can run global payment processing on a single postgres install; I'm sure they do a ton of stuff with each transaction (e.g. for fraud detection). But for system design, it gives an order of magnitude for how many human actions a global system might need to deal with, which you can use to estimate how much a wildly successful system might need to handle based on what processing is needed for each human action.

For similar scale comparisons, reddit gets ~200 comments/second peak. Wikimedia gets ~20 edits/second and 1-200k pageviews/second (their grafana is public, but I won't link it since it's probably rude to drive traffic to it).


yyy we're in violent agreement!

interesting re reddit, that's really tiny! but again, I'm even more curious about how many underlying TPS this turns into, net of rules firing, notifications and of course bots that read and analyze this comment, etc. Still, this isn't a scaling issue because all of this stuff can be done async on read replicas, which means approx unlimited scale in a single-database-under-management (e.g. here's this particular comment ID, wait for it)

Slack experiences 300K write QPS: https://slack.engineering/scaling-datastores-at-slack-with-v...


The truth is that it really depends on your application work load. Is it read-heavy, or write-heavy? Are the reads more lookup-heavy (i.e. give me this one user's content), or OLAP heavy (i.e. `group by`'s aggregating millions of rows)? Is read-after-write an important problem in your application? Do you need to support real-time/"online" updates? Does your OLAP data need to be mutable, or can it be immutable (and therefore compressed, columnarized, etc.)? Is your schema static or dynamic, to what degree?

I agree with others that a good simplification is "how far can you get with the biggest single AWS instance"? And the answer is really far, for many common values of the above variables.

That being said, if your work load is more OLAP than OLTP, and especially if your workload needs to be real-time, Postgres will begin to give you suboptimal performance without maxing-out i/o and memory usage. Hence, "it really depends on your workload", and hence why you see it's common to "pair" Postgres with technologies like Clickhouse (OLAP, immutable, real-time), RabbitMQ/Kafka/Redis (real-time, write-heavy, persistence secondary to throughput).


For me with any kind of data persistence backend, it's when you go from scaling vertically to horizontally. In other words, when it's no longer feasible to scale by just buying a bigger box.

I don't know that there is a canonical solution for scaling Postgres data for a single database across an arbitrary number of servers.

I know there is CockroachDB which scales almost limitlessly, and supports Postgres client protocol, so you can call it from any language that has a Postgres client library.


For scaling, has anyone here used hash based partitioning to scale horizontally?

In principle, seems like it should work to allow large scale distribution across many servers. But the actual management of replicas and deciding which servers to place partitions, redistributing when new servers are added, etc. could lead to a massive amount of operational overhead.


As other sibling comments noted, Citus does this pretty well. Recommend reading through their docs and use-cases. There's some migration/setup costs, but once you have a good configuration, it mostly just-works.

Main downside is that you either have to either self-manage the deployment in AWS EC2 or use Azure's AWS-RDS-equivalent (CitusData was acquired by MS years ago).

FWIW, I've heard that people using Azure's solution are pretty satisfied with it, but if you're 100% on AWS going outside that fold at all might be a con for you.


Citus is indeed an example for "distributed PostgreS". There are also serverless Postgres (Neon, Nile, AWS Aurora) which do this.

If you are interested in partitioning in an OLAP scenario, this will soon be coming to pg_analytics, and some other Postgres OLAP providers like Timescale offer it already


hash based partitioning makes repartitioning very expensive. most distributed DB now use key-range based partitioning. Iirc, Dynamo which introduced this concept has also made the switch


I think that's what Citus does.


This looks great, I'll add it to my list.

I've gone far out of my way not to use Elasticsearch and push Postgres as far as as I can in my SaaS because I don't want the operational overhead.


This is exactly why we built ParadeDB :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: