> Assuming you know how your data needs to be queried when you create your table, then the hash/range key combination could be the answer to your query problems. However we did not.
DynamoDB's purpose is to allow you to easily scale a known query pattern. I definitely don't recommend it as a first solution when you don't know how the data is going to be accessed. Or when everything fits on one PostgreSQL box.
We've had a lot of success starting out with a big PostgreSQL box and migrating use cases to DynamoDB when a table gets too big and has a query pattern that fits (essentially K/V lookup with some simple indexes).
Good news is that postgres is going to get multi-master replication soon in postgres 10, so a lot of the painpoints when it comes to scaling our writes, are hopefully going to be solved.
I think the poster of that claim is confusing "plain" logical replication and multi-master replication. There's a lot of infrastructure in common between both, but the conflict resolution part of async mm didn't make it it into postgres 10...
How does multi-master replication solve write scalability issues? I'm not familiar with the postgres implementation, but don't all writes have to happen on all servers? And if all writes need to happen on all servers, how does that improve write scalability as the load write load on each server is the same as it would be without multi-master replication.
> How does multi-master replication solve write scalability issues? I'm not familiar with the postgres implementation, but don't all writes have to happen on all servers?
It doesn't generally. You often can increase the total throughput because independent transactions can be batch committed when replicated, which can increase throughput noticeably, but that's more like a constant factor improvement than a real scalability improvement.
What it does however often allow you to do is to:
a) Address latency issues with writers distributed across the globe. If you have masters on each continent your total write rate can often be massively higher than if you have application servers from each of those continents write to a centralized database.
b) Partially replicate your data globally, partially not. Quite often you'll want to have some data available everywhere (typically metadata, security related data, etc), but a lot of the bulk data only available in certain places.
> > a) Address latency issues with writers distributed across the globe
> How? Surely to provide ACID compliance all writers need to co-ordinate a lock if they want to update a record.
Well, there's different systems out there. What you want for something like this is asynchronous multi-master. Which, as you observe, allows for conflicts to occur, which need to be resolved somehow.
> Same with incrementing IDs etc.
The solution for those is usually to have each node assign sequences with different start values, have a system that distributes "chunks" of values (that was what BDR, an async MM version of postgres used to do), or avoiding sequences in favor of UUIDs or similar.
> postgres is going to get multi-master replication soon in postgres 10
No it's not. Postgres barely has decent replication as is. Logical replication is a good step beyond (while still missing major functionality) but nowhere near what it takes for multi-master replication.
> No it's not. Postgres barely has decent replication as is.
You're overly negative whenever postgres comes up on HN. Obviously the v10 stuff is not multi-master and the project definitely is not claiming it has, but saying that Postgres' replication "barely has decent replication as is" is a bit absurd given that it works quite well in a lot of high-throughput scenarios in a lot of companies, and has so for 5+ years. There's a lot of valid complaints about it you can make (especially around flexibility and operational issues), but it's definitely not just "barely decent".
I'm not saying that you have to be fan. It's fine to say the replication isn't good enough for everything, or just ignore postgres.
> but nowhere near what it takes for multi-master replication.
FWIW, there's async MM built on top of the relevant pieces, and it's used in production. It's not ready for wide audiences, but arguably (and I think one can very validly disagree about that) it's closer than "nowhere near".
What is the point of this comment? And then linking to my past comments? Is it wrong to have perspective and criticism or to express it multiple times?
I get that you're a committer but "overly negative" is as much your opinion as what I say, and my comments are the truth as I see it. I've even spent an hour talking to Ozgun about databases and moving to Citus before so my statements come from being a user, operator and maintainer of dozens of different database technologies in production for years, including postgres right now, and from all that experience it is definitely the most behind when it comes to replication/HA compared to anything else I use.
All of those threads have plenty of upvotes and other user comments so it's clear I don't stand alone here. Perhaps instead of policing my statements, time would be better spent on improving the features and experience of the product.
To let you know that I find your comments overly negative, and that they might have a bigger impact if formulated in a bit more relative manner? I mean:
> Postgres is probably the weakest of all relational databases when it comes to scalability, both vertical and horizontal.
Doesn't strike me as a particularly useful comment, given the (to me) obvious hyperbole. If you'd instead, at least every now and then, explain what precisely you find lacking, people - like me - could actually draw useful conclusions.
I only said something because I remembered your name from a couple times reading quite negative and undifferentiated comments. And I linked to the comments because I searched for them to validate whether my memory of the past comments was/is correct.
> Perhaps instead of policing my statements, time would be better spent on improving the features and experience of the product.
That postgres is behind in scaling (multi-core, replication, sharding, multi-master) against all modern relational databases is verifiable truth. Your judgement is clouded by your proximity to the project. Here's just 1 prior thread between us: https://news.ycombinator.com/item?id=13841496
-> Enable 1-line command to setup basic replication (whole node or per db level) without having to edit anything else. Look at redis or memsql for inspiration.
-> Enable 1-line command to easily failover, remove hotstandby requirements.
-> Add DDL support to logical replication because I guarantee this will be the #1 source of problems for anyone that tries to use this.
-> Fix the poor defaults in all the configs.
-> Document the existence of pg_conftool and make it a first-class tool that can also edit pg_hba.conf automatically so I no longer have to deal with any config files. Store any other settings in the database cluster itself. Inspiration from sql server on linux or cockroachdb here.
IT/admin UX at this point is more important than the diminishing core upgrades. Postgres is good enough internally but has sorely fallen behind on usability with obsolete defaults, convoluted configs and way too many 3rd party tools needed for basic scaling. Saying use X tool is effectively the same as just use Y database that already does it. MySQL, MariaDB, even SQL Server are not standing still.
Solid in-cluster and cross-dc replication is the reason we stuck with memsql (+ built-in columnstore), not because it was possible but because it actually works easily and reliably with their fantastic memsql ops program. I've shared all of this feedback online over the years (along with many others) so the lack of understanding is really more a reflection of the postgres team and poor priority planning.
Just saw this now, I do wish HN had a proper notification feature...
> That postgres is behind in scaling (multi-core, replication, sharding, multi-master) against all modern relational databases is verifiable truth.
No, it's really not. It's hardly best of class in them, but the worst? Meh. That's just my point, you're making way to strong statements.
Just to start at your first point: multi-core. Mysql has just about no intra query parallelism, whereas postgres 9.6 has some limited support, and postgres 10 is decent, although far from great. Postgres' inter query parallelism was for a long while better than mysql, then mysql was better for a couple years, in the last couple years it's gone back and forth and depends on the specific use-case. So how is postgres verifyably worse than all other "modern" (whatever that means) relational databases?
> Your judgement is clouded by your proximity to the project.
I do remember that, but there you're again arguing super reductionist. You're arguing that having to do
CREATE PUBLICATION all_tables FOR ALL TABLES;
before subscribing to replication somehow makes logical replication fundamentally too complicated ("overbuilt mess").
Yes, the logical replication stuff has some serious limitations, and that's true. A significant subset already has patches out there for the next version, but that doesn't help current users.
The streaming replication support existing since 9.0 is pretty decent however, and has gotten a lot easier to use in the last releases, and especially with the saner defaults in v10.
> Enable 1-line command to setup basic replication (whole node or per db level) without having to edit anything else. Look at redis or memsql for inspiration.
It's two commands for both physical and logical replication now. Not perfect, but not too bad either.
> Add DDL support to logical replication because I guarantee this will be the #1 source of problems for anyone that tries to use this.
Yup, I've argued just that point, and I've implemented it in a precursor of what's now in core for logical replication. I sure hope that's going to largely get into v11.
> Enable 1-line command to easily failover, remove hotstandby requirements.
Not sure what you mean with "hotstandby requirements"? The defaults? Those have been fixed in v10. Not being able to do certain things on a standby?
> Fix the poor defaults in all the configs.
We're making some progress, e.g. that everything's now by default ready to streaming replication, but there's a lot more to do. It's sometimes hard because the many different usecases ask for different defaults.
> Solid in-cluster and cross-dc replication is the reason we stuck with memsql (+ built-in columnstore)
What specifically are you seeing lacking w/ cross-dc replication? I completely agree that there's UX/complexity issues around replication, but I don't see how those affect cross-dc specifically? There's plenty of people quite happily using cross-dc replication, once it's set up.
> I've shared all of this feedback online over the years (along with many others) so the lack of understanding is really more a reflection of the postgres team and poor priority planning.
Or perhaps also a question of funding, time and communication style.
Wouldn't multi-master replication be bad for write throughput? Depending on the scheme, it will have to be committed to all machines for every write, therefore increasing write operation complexity.
If you want to scale up your writes, sharding would be a better solution.
DynamoDB's purpose is to allow you to easily scale a known query pattern. I definitely don't recommend it as a first solution when you don't know how the data is going to be accessed. Or when everything fits on one PostgreSQL box.
We've had a lot of success starting out with a big PostgreSQL box and migrating use cases to DynamoDB when a table gets too big and has a query pattern that fits (essentially K/V lookup with some simple indexes).