Hi! I'm one of the two authors here. At Materialize, we're definitely of the 'we...

vvern · on Dec 8, 2020

> 1. Log "intents to write" rather than writes themselves in Topic A 2. Have a separate denormalization computed and kept in a separate Topic B, which can be read from. This denormalization needs to be read until the intent propagates from Topic A. 3. Convert those intents into commits. 4. Deal with all the failure cases in a distributed system, e.g. cleaning up abandoned intents, etc.

People do do this. I have done this. I wish I had been more principled with the error paths. It got there _eventually_.

It was a lot of code and complexity to ship a feature which in retrospect could have been nearly trivial with a transactional database. I'd say months rather than days. I won't get those years of my life back.

The products were build on top of Kafka, Cassandra, and Elasticsearch where, over time, there was a desire to maintain some amount of referential integrity. The only reason we bought into this architecture at the time was horizontal scalability (not even multi-region). Kafka, sagas, 2PC at the "application layer" can work, but you're going to spend a heck of a lot on engineering.

It was this experience that drove me to Cockroach and I've been spreading the good word ever since.

> If you use an OLTP database, and generate events into Kafka via CDC, you get the best of both worlds.

This is the next chapter in the gospel of the distributed transaction.

gunnarmorling · on Dec 8, 2020

>> If you use an OLTP database, and generate events into Kafka via CDC, you get the best of both worlds.

> This is the next chapter in the gospel of the distributed transaction.

Actually, it's the opposite. CDC helps to avoid distributed transaction; apps write to a single database only, and other resources (Kafka, other databases, etc.) based on that are updated asychronously, eventually consistent.

vvern · on Dec 8, 2020

I mean, I hear you that people do that and it does allow you to avoid needing distributed transactions. When you're stuck with an underlying NoSQL database without transactions, this is the best thing you can do. This is the whole "invert the database" idea that ends up with [1].

I'm arguing that that usage pattern was painful and that if you can have horizontally scalable transaction processing and a stream of events due to committed transactions downstream, you can use that to power correct and easy to reason about materializations as well as asynchronous task processing.

Transact to change state. React to state changes. Prosper.

[1] https://youtu.be/5ZjhNTM8XU8?t=2075

skyde · on Dec 8, 2020

> If you use an OLTP database, and generate events into Kafka via CDC, you get the best of both worlds.

100% agree this is the way to go instead of rolling your own transaction support you get the "ACID" for free from the DB and use KAFKA to archive changes and subscribe to them.

skybrian · on Dec 8, 2020

It seems like Kafka could be used to record requests and responses (upstream of the database) and approved changes (downstream of the database), but isn't good for the approval process itself?

j-pb · on Dec 9, 2020

Hey Arjun, Thanks for taking the time to reply! I didn't mean to suggest that this was Franks opinion, I merely explained to the other user, who seemed to have a more negative attitude towards Materialize, why I am extremely excited about it, namely Frank being the CTO, and him having an extremely good track record in terms of research and code.

I think the general gist of "use an OLTP database as your write model if you don't absolutely know what you're doing" is completely sane advice, however I think there are far more nuanced (and in a sense also honest) arguments that can be made.

I think the architecture you've sketched is over engineered for what you'd need for the task. So here's what I'd build for your inventory example, IFF the inventory was managed for a company with a multi Terra Items Inventory that absolutely NEEDS horizontal scaling:

One event topic in Kafka, partitioned along the ID of the different stores whose inventory is to be managed. This makes the (arguably) strong assumption that inventory can never move between stores, but for e.g. a Grocery chain that's extremely realistic.

We have one writer per store ID partition, which generates the events and enforces _serialisability_, with a hot writer failover that keeps up do date and a STONITH mechanism connecting the two. All writing REST calls / GraphQL mutations for its store-ID range, go directly to that node.

The node serves all requests from memory, out of an immutable Data-structure, e.g. Immutable.js, Clojure Maps, Datascript, or an embedded DB that supports transactions and rollbacks, like SQLite.

Whenever a write occurs, the writer generates the appropriate events, applies them to its internal state, validates that all invariants are met, and then emits the events to Kafka. Kafka acknowledging the write is potentially much quicker than acknowledging an OLTP transaction, because Kafka only needs to get the events into the memory of 3+ machines, instead of written to disk on 1+ machine (I'm ignoring OLTP validation overhead here because our writer already did that). Also your default failure resistance is much higher than what most OLTP systems provide in their default configuration (e.g. equivalent to Postgres synchronous replication).

Note that the critical section doesn't actually have to be the whole "generate event -> apply -> validate -> commit to kafka" code. You can optimistically generate events, apply them, and then invalidate and retry all other attempts once one of them commits to Kafka. However that also introduces coordination overhead that might be better served mindlessly working off requests one by one.

Once the write has been acknowledged by Kafka, you swap the variable/global/atom with the new immutable state or commit the transaction, and continue with the next incoming request.

All the other (reading) request are handled by various views on the Kafka Topic (the one causing the inconsistencies in the article). They might be lagging behind a bit, but that's totally fine as all writing operations have to go through the invariant enforcing write model anyways. So they're allowed to be slow-ish, or have varying QOS in terms of freshness.

The advantage of this architecture is that you have few moving parts, but those are nicely decomplected as Rich Hickey would say, you have use the minimal state for writing which fits into memory and caches, you get 0 lock congestion on writes (no locks), you make the boundaries for transactions explicit and gain absolutely free reign on constraints within them, and you get serialisability for your events, which is super easy to reason about mentally. Plus you don't get performance penalties for recomputing table views for view models. (If you don't use change capture and Materialize already, which one should of course ;] )

The two generals problem dictates that you can't have more than a single writer in a distributed system for a single "transactional domain" anyways. All our consensus protocols are fundamentally leader election. The same is true for OLTP databases internally (threads, table/row locks e.t.c.), so if you can't handle the load on a single 0 overhead writer that just takes care of its small transactional domain, then your database will run into the exact same issues, probably earlier.

Another advantage of this that so far has gone unmentioned is that if allows you to provide global identifiers for the state in your partitions that can be communicated in side-effect-full interactions with the outside world. If your external service allows you to store a tiny bit of metadata with each effect-full API call then you can include the offset of the current event and thus state. That way you can subsume the external state and transactional domain into the transactional domain of your partition.

Now, I think that's a much more reasonable architecture, that at least doesn't have any of the consistency issues. So let's take it apart and show why the general populace is much better served with an OLTP database:

- Kafka is an Ops nightmare. The setup of a Cluster requires A LOT of configuration. Also Zookeper urgh, they're luckily trying to get rid of it, but I think they only dropped it this year, and I'm not sure how mature it is.

- You absolutely 100% need immutable Data-structures, or something else that manages Transactions for you inside the writer. You DO NOT want to manually rollback changes in your write model. Defensive copying is a clutch, slow, and error prone (cue JS, urgh...: {...state}).

- Your write model NEEDS to fit into memory. That thing is the needles eye that all your data has to go through. If you run the single event loop variant, latency during event application WILL break you. If you do the optimistic concurrency variant performing validation checks might be as or more expensive than recomputing the events from scratch.

- Be VERY weary of communications with external services that happen in your write model. They introduce latency, and they break your transactional boundary that you set up before. To be fair OLTPs also suffer from this, because it's distributed consistency with more than one writer and arbitrary invariants which this universe simply doesn't allow for.

- As mentioned before, it's possible to optimistically generate and apply events thanks to the persistent data structures, but that is also logic you have to maintain, and which is essentially a very naive and simple embedded OLTP, also be weary of what you think improves performance vs. what actually improves it. It might be better to have 1 cool core, than 16 really hot ones that do wasted work.

- If you don't choose your transactional domains well, or the requirements change, you're potentially in deep trouble. You can't transact across domains/partitions, if you do, they're the same domain, and potentially overload a single writer.

- Transactional domains are actually not as simple as they're often portrayed. They can nest and intersect. You'll always need a single writer, but that writer can delegate responsibility, which might be a much cheaper operation than the work itself. Take bank accounts as an example. You still need a single writer/leader to decide which account ID's are currently in a transaction with each other, but if two accounts are currently free that single writer can tie them into a single transactional domain and delegate it to a different node, which will perform the transaction and write and return control to the "transaction manager". A different name for such a transaction manager is an OLTP (with Row-Level locking).

- You won't find as many tutorials, if you're not comfortable reading scientific papers, or at least academic grade books like Martin Kleppmanns "Designing Data Intensive Applications" don't go there.

- You probably won't scale beyond what a single OLTP DB can provide anyways. Choose tech that is easy to use and gives you as many guarantees as possible if you can. With change capture you can also do retroactive event analytics and views, but you don't have to code up a write-model (and associated framework, because let's be honest this stuff is still cutting edge, and really shines in bespoke custom solutions for bespoke custom problems).

Having such an architecture under your belt is a super useful tool that can be applied to a lot of really interesting and hard problems, but that doesn't mean that it should be used indiscriminately.

Afterthought; I've seen so many people use an OLTP but then perform multiple transactions inside a single request handler, just because that's what their ORM was set to. So I'm just happy about any thinking that people spend on transactions and consistency in their systems, in whatever shape or form, and I think making the concepts explicit instead of hiding them in a complex (and for many unapproachable) OLTP/RDBMS monster helps with that (if Kafka is less of a monster is another story).

I think it's also important to not underestimate the programmer convenience that working with language native (persistent) data-structures has. The writer itself in its naive implementation is something that one can understand in full, and not relying on opaque and transaction breaking ORMs is a huge win.

PS: Plz start a something collaboratively with ObservableHQ, having reactive notebook based dashboards over reactive Postgres queries would be so, so, so, so awesome!