Kafka Is Not a Database

AndrewKemendo · on Dec 8, 2020

Alternatively from Jay Krebs [1] a much more thorough and nuanced discussion that is probably the best send-up on this topic.

"So is it crazy to do this? The answer is no, there’s nothing crazy about storing data in Kafka: it works well for this because it was designed to do it. Data in Kafka is persisted to disk, checksummed, and replicated for fault tolerance. Accumulating more stored data doesn’t make it slower. There are Kafka clusters running in production with over a petabyte of stored data."

[1] https://www.confluent.io/blog/okay-store-data-apache-kafka/

georgewfraser · on Dec 8, 2020

That post explains that there are scenarios where it makes sense to store data permanently in Kafka. "Kafka is Not a Database" makes a different point, which is that Kafka doesn't solve any of the hard transaction-processing problems that database systems do, so it's not an alternative to a traditional DBMS. This is not a straw man---Confluent advocates for "turning the database inside out" all over their marketing materials and conference talks.

hitekker · on Dec 9, 2020

Indeed. The HN community has been quite negative on Confluent marketing Kafka as a database.

https://news.ycombinator.com/item?id=23206566

Apache Kafka also doesn’t refer to itself as a database in its own documentation:

https://kafka.apache.org/documentation/#introduction

cfontes · on Dec 8, 2020

It actually does with |Exactly-once Semantics| in fact I've been using as the single source of truth in a cash management system for almost 2 years without a single issue related to transactions.

jashmatthews · on Dec 8, 2020

How do you deal with side effects outside of the Kafka cluster?

theptip · on Dec 8, 2020

Short answer: you write your consumer's state into the same DB as you're writing the side-effects to, in the same transaction.

Long answer: say your consumer is a service with a SQL DB -- if you want to process Event(offset=123), you need to 1. start a transaction, 2. write a record in your DB logging that you've consumed offset=123, 3. write your data for the side-effect, 4. commit your transaction. (Reverse 2 and 3 if you prefer; it shouldn't make a difference). If your side-effect write fails (say your DB goes down) then your transaction will be broken, your side-effect won't be readable outside the transaction, and the update to the consumer offset pointer also won't get persisted. Next loop around on your consumer's event loop, you'll start at the same offset and retry the same transaction.

arafsheikh · on Dec 8, 2020

Persisting message offsets in DB has its own challenges. The apps become tightly coupled with a specific Kafka cluster and that makes it difficult to swap clusters in case of a failover event.

If you expect apps to persist offsets then it’s important to have a mechanism/process to safely reset the app state in DB when the stored offset doesn’t make sense.

wdb · on Dec 8, 2020

Interesting, do you have any resources about how to best handle side effects with message queues (e.g. GCP PubSub)? Trying to find out if its worth the effort, or good practice to allow replay-ability (like from backup) of message and get back to the same state

arafsheikh · on Dec 9, 2020

Not sure about PubSub, but Kafka doesn’t store messages on disk forever by default. So hydrating the state from a message bus may not be a good idea depending on what you’re doing. I’d say DB is the way to go here.

Don’t have any resources though. I’m only speaking from experience.

hodgesrm · on Dec 8, 2020

Thanks, this is a great article. The money quote for me is:

> I think it makes more sense to think of your datacenter as a giant database, in that database Kafka is the commit log, and these various storage systems are kinds of derived indexes or views.

My company supports analytic systems and we see this pattern constantly. It's also sort of a Pat Helland view of the world that subsumes a large fraction of data management under a relatively simple idea. [1] What's interesting is that Pat also sees it as a way to avoid sticky coordination problems as well.

[1] http://cidrdb.org/cidr2015/Papers/CIDR15_Paper16.pdf

skyde · on Dec 8, 2020

The problem is that Kafka API and Commit log API are very different.

If you wanted to literally use Kafka for your commit log the same way the Amazon aurora are using a distributed commit log. You would find that a lot of feature a commit log need are missing and impossible to add to kafka.

arduanika · on Dec 9, 2020

Greenspun's 11th law: Any sufficiently stateful program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of SQL / relational databases.

weddpros · on Dec 9, 2020

is that another definition for confirmation bias?

I can name so many use cases a relational database is incapable of...

Aqueous · on Dec 9, 2020

Yes. as the article points out there's no way to reject an event based on some criteria that keeps the events internally consistent. for instance, two people can't check out the same item of inventory = 1. you need to first validate this event on the materialized view of the commit log to make sure the item is still available. however, what happens when your validation goes out of date? For example, what if there's an event checking the last inventory in the event log but the materialized view doesn't reflect that yet? you end up checking out the same inventory twice and promising it to both customers! not good.

DBs can solve this through optimistic locking (atomically making sure my new event applies to the version number of the previous event + 1, otherwise failing) but there's no way to do this in Kafka as far as I know (however, as this is a problem i'm currently facing, please feel free to let me know if there is)

hinkley · on Dec 9, 2020

I’d really love for this to be formalized.a standard format streaming the data through several Susie,s and keeping them eventually consistent.

gopalv · on Dec 8, 2020

> Accumulating more stored data doesn’t make it slower

That is a valid theory when we talk about readers which look at recent data or when you are trying to append data to the existing system.

But in practice, the accumulation of cold data on a local disk is where this starts to hurt, particularly if that has to serve read traffic which starts from the beginning of time (i.e your queries don't start with a timestamp range).

KSQL transforms does help reduce the depth of the traversal, by building flatter versions of the data set, but you need to repartition the same data on every lookup key you want - so if you had a video game log trace, you'd need multiple materializations for (user) , (user,game), (game) etc.

And on this local storage part, EBS is expensive to just hold cold data, but then replicate it to maintain availability during a node recovery - EBS is like a 1.5x redundant store, better than a single node. I liked the Druid segment model of shoving it off to S3 and still being to read off it (i.e not just stream to S3 as a dumping ground).

When Pravega came out, I liked it a lot for the same - but it hasn't gained enough traction.

pram · on Dec 8, 2020

Log compaction definitely isn’t problem free. I’d say it was crazy when he wrote that.

We ran a cluster with lots of compacted topics, hundreds of terabytes of data. At the time it would make broker startup insanely slow. An unclean startup could literally take an hour to go through all the compacted partitions. It was awful.

halbritt · on Dec 8, 2020

I recommend his book, "I heart logs". It's a short read, but changed my perspective.

johnx123-up · on Dec 9, 2020

OT: Are there any Kafka alternatives in Rust or Go or C? (for a non-JVM stack)

meditative · on Dec 9, 2020

https://docs.nats.io/nats-streaming-concepts/intro

https://github.com/liftbridge-io/liftbridge

andrewem · on Dec 8, 2020

*Kreps, not Krebs

AndrewKemendo · on Dec 9, 2020

Yes sorry! Unfortunately too late to edit :(

netn3twebw3b · on Dec 8, 2020

Agreed. I feel like the tech community is afflicted with collective functional fixedness, or some sort of essentialism.

At it’s core it’s electron state in hardware. So long as those limits are not incidentally exceeded, and you validate outputs, who really cares what gets loaded?

While we rip rare minerals from the ground and toss all that at scale every 3-5 years later we get economical over installing software.

So long as it offers the necessary order of operations to do the work, whatever.

https://en.m.wikipedia.org/wiki/Functional_fixedness

j-pb · on Dec 8, 2020

Tbh, It's a weird blog post coming from the materialize folks, considering they know better.

The "event sourced" arch they sketched is missing pieces. Normaly you'd have single writer instances that are locked to the corresponding kafka partition, which ensure strong transactional guarantees, IF you need them.

Throwing shade for maketings sake is something that they should be above.

I mean c'mon, I'd argue that Postgres enhanced with Materialize isn't a database anymore either, but in a good sense!

It's building material. A hybrid between MQ, DB, backend logic & frontend logic.

The reduction in application logic and the increase in reliability you can get from reactive systems is insane.

SQL is declarative, reactive Materialize streams are declarative on a whole new level.

Once that tech makes it into other parts of computing like the frontend, development will be so much better, less code, less bug, a lot more fun.

Imagine that your react component could simply declare all the data it needs from a db, and the system will figure out all the caching and rerendering.

So yeah, they have awesome tech with many advantages, so I don't get why they bad-mouth other architectures.

georgewfraser · on Dec 8, 2020

We're trying to address a real problem that is happening in our industry: VPs of eng and principal engineers at startups are adopting the "Kappa Architecture" / "Turning the Database Inside Out", without realizing how much functionality from traditional database systems they are leaving behind. This has led to a barrage of consistency bugs in everything from food-delivery apps to the "unread message count" in LinkedIn. We're at the peak of the hype cycle for Kafka, and it's being used in all kinds of places it doesn't belong. For 99% of companies, a traditional DBMS is the right foundation.

fierro · on Dec 9, 2020

what is the Kappa architecture?

revertts · on Dec 9, 2020

Term coined by Jay Kreps, see https://www.oreilly.com/radar/questioning-the-lambda-archite...

flqn · on Dec 9, 2020

It's either a typo or a play on Kafka since "kappa" is a twitch meme

arjunnarayan · on Dec 8, 2020

Hi! I'm one of the two authors here. At Materialize, we're definitely of the 'we are a bunch of voices, we are people rather than corp-speak, and you get our largely unfiltered takes' flavor. This is my (and George's from Fivetran) take. In particular this is not Frank's take, as you attribute below :)

> SQL is declarative, reactive Materialize streams are declarative on a whole new level.

Thank you for the kind words about our tech, I'm flattered! That said, this dream is downstream of Kafka. Most of our quibbles with the Kafka-as-database architecture are to do with the fact that that architecture neglects the work that needs to be done _upstream_ of Kafka.

That work is best done with an OLTP database. Funnily enough, neither of us are building OLTP databases, but this piece largely is a defense of OLTP databases (if you're curious, yes, I'd recommend CockroachDB), and their virtues at that head of the data pipeline.

Kafka has its place - and when its used downstream of CDC from said OLTP database (using, e.g. Debezium), we could not be happier with it (and we say so).

The best example is in foreign key checks. It is not good if you ever need to enforce foreign key checks (which translates to checking a denormalization of your source data _transactionally_ with deciding whether to admit or deny an event). This is something that you may not need in your data pipeline on day 1, but adding that in later is a trivial schema change with an OLTP database, and exceedingly difficult with a Kafka-based event sourced architecture.

> Normally you'd have single writer instances that are locked to the corresponding Kafka partition, which ensure strong transactional guarantees, IF you need them.

This still does not deal with the use-case of needing to add a foreign key check. You'd have to:

1. Log "intents to write" rather than writes themselves in Topic A 2. Have a separate denormalization computed and kept in a separate Topic B, which can be read from. This denormalization needs to be read until the intent propagates from Topic A. 3. Convert those intents into commits. 4. Deal with all the failure cases in a distributed system, e.g. cleaning up abandoned intents, etc.

If you use an OLTP database, and generate events into Kafka via CDC, you get the best of both worlds. And hopefully, yes, have a reactive declarative stack downstream of that as well!

vvern · on Dec 8, 2020

> 1. Log "intents to write" rather than writes themselves in Topic A 2. Have a separate denormalization computed and kept in a separate Topic B, which can be read from. This denormalization needs to be read until the intent propagates from Topic A. 3. Convert those intents into commits. 4. Deal with all the failure cases in a distributed system, e.g. cleaning up abandoned intents, etc.

People do do this. I have done this. I wish I had been more principled with the error paths. It got there _eventually_.

It was a lot of code and complexity to ship a feature which in retrospect could have been nearly trivial with a transactional database. I'd say months rather than days. I won't get those years of my life back.

The products were build on top of Kafka, Cassandra, and Elasticsearch where, over time, there was a desire to maintain some amount of referential integrity. The only reason we bought into this architecture at the time was horizontal scalability (not even multi-region). Kafka, sagas, 2PC at the "application layer" can work, but you're going to spend a heck of a lot on engineering.

It was this experience that drove me to Cockroach and I've been spreading the good word ever since.

> If you use an OLTP database, and generate events into Kafka via CDC, you get the best of both worlds.

This is the next chapter in the gospel of the distributed transaction.

gunnarmorling · on Dec 8, 2020

>> If you use an OLTP database, and generate events into Kafka via CDC, you get the best of both worlds.

> This is the next chapter in the gospel of the distributed transaction.

Actually, it's the opposite. CDC helps to avoid distributed transaction; apps write to a single database only, and other resources (Kafka, other databases, etc.) based on that are updated asychronously, eventually consistent.

vvern · on Dec 8, 2020

I mean, I hear you that people do that and it does allow you to avoid needing distributed transactions. When you're stuck with an underlying NoSQL database without transactions, this is the best thing you can do. This is the whole "invert the database" idea that ends up with [1].

I'm arguing that that usage pattern was painful and that if you can have horizontally scalable transaction processing and a stream of events due to committed transactions downstream, you can use that to power correct and easy to reason about materializations as well as asynchronous task processing.

Transact to change state. React to state changes. Prosper.

[1] https://youtu.be/5ZjhNTM8XU8?t=2075

skyde · on Dec 8, 2020

> If you use an OLTP database, and generate events into Kafka via CDC, you get the best of both worlds.

100% agree this is the way to go instead of rolling your own transaction support you get the "ACID" for free from the DB and use KAFKA to archive changes and subscribe to them.

skybrian · on Dec 8, 2020

It seems like Kafka could be used to record requests and responses (upstream of the database) and approved changes (downstream of the database), but isn't good for the approval process itself?

j-pb · on Dec 9, 2020

Hey Arjun, Thanks for taking the time to reply! I didn't mean to suggest that this was Franks opinion, I merely explained to the other user, who seemed to have a more negative attitude towards Materialize, why I am extremely excited about it, namely Frank being the CTO, and him having an extremely good track record in terms of research and code.

I think the general gist of "use an OLTP database as your write model if you don't absolutely know what you're doing" is completely sane advice, however I think there are far more nuanced (and in a sense also honest) arguments that can be made.

I think the architecture you've sketched is over engineered for what you'd need for the task. So here's what I'd build for your inventory example, IFF the inventory was managed for a company with a multi Terra Items Inventory that absolutely NEEDS horizontal scaling:

One event topic in Kafka, partitioned along the ID of the different stores whose inventory is to be managed. This makes the (arguably) strong assumption that inventory can never move between stores, but for e.g. a Grocery chain that's extremely realistic.

We have one writer per store ID partition, which generates the events and enforces _serialisability_, with a hot writer failover that keeps up do date and a STONITH mechanism connecting the two. All writing REST calls / GraphQL mutations for its store-ID range, go directly to that node.

The node serves all requests from memory, out of an immutable Data-structure, e.g. Immutable.js, Clojure Maps, Datascript, or an embedded DB that supports transactions and rollbacks, like SQLite.

Whenever a write occurs, the writer generates the appropriate events, applies them to its internal state, validates that all invariants are met, and then emits the events to Kafka. Kafka acknowledging the write is potentially much quicker than acknowledging an OLTP transaction, because Kafka only needs to get the events into the memory of 3+ machines, instead of written to disk on 1+ machine (I'm ignoring OLTP validation overhead here because our writer already did that). Also your default failure resistance is much higher than what most OLTP systems provide in their default configuration (e.g. equivalent to Postgres synchronous replication).

Note that the critical section doesn't actually have to be the whole "generate event -> apply -> validate -> commit to kafka" code. You can optimistically generate events, apply them, and then invalidate and retry all other attempts once one of them commits to Kafka. However that also introduces coordination overhead that might be better served mindlessly working off requests one by one.

Once the write has been acknowledged by Kafka, you swap the variable/global/atom with the new immutable state or commit the transaction, and continue with the next incoming request.

All the other (reading) request are handled by various views on the Kafka Topic (the one causing the inconsistencies in the article). They might be lagging behind a bit, but that's totally fine as all writing operations have to go through the invariant enforcing write model anyways. So they're allowed to be slow-ish, or have varying QOS in terms of freshness.

The advantage of this architecture is that you have few moving parts, but those are nicely decomplected as Rich Hickey would say, you have use the minimal state for writing which fits into memory and caches, you get 0 lock congestion on writes (no locks), you make the boundaries for transactions explicit and gain absolutely free reign on constraints within them, and you get serialisability for your events, which is super easy to reason about mentally. Plus you don't get performance penalties for recomputing table views for view models. (If you don't use change capture and Materialize already, which one should of course ;] )

The two generals problem dictates that you can't have more than a single writer in a distributed system for a single "transactional domain" anyways. All our consensus protocols are fundamentally leader election. The same is true for OLTP databases internally (threads, table/row locks e.t.c.), so if you can't handle the load on a single 0 overhead writer that just takes care of its small transactional domain, then your database will run into the exact same issues, probably earlier.

Another advantage of this that so far has gone unmentioned is that if allows you to provide global identifiers for the state in your partitions that can be communicated in side-effect-full interactions with the outside world. If your external service allows you to store a tiny bit of metadata with each effect-full API call then you can include the offset of the current event and thus state. That way you can subsume the external state and transactional domain into the transactional domain of your partition.

Now, I think that's a much more reasonable architecture, that at least doesn't have any of the consistency issues. So let's take it apart and show why the general populace is much better served with an OLTP database:

- Kafka is an Ops nightmare. The setup of a Cluster requires A LOT of configuration. Also Zookeper urgh, they're luckily trying to get rid of it, but I think they only dropped it this year, and I'm not sure how mature it is.

- You absolutely 100% need immutable Data-structures, or something else that manages Transactions for you inside the writer. You DO NOT want to manually rollback changes in your write model. Defensive copying is a clutch, slow, and error prone (cue JS, urgh...: {...state}).

- Your write model NEEDS to fit into memory. That thing is the needles eye that all your data has to go through. If you run the single event loop variant, latency during event application WILL break you. If you do the optimistic concurrency variant performing validation checks might be as or more expensive than recomputing the events from scratch.

- Be VERY weary of communications with external services that happen in your write model. They introduce latency, and they break your transactional boundary that you set up before. To be fair OLTPs also suffer from this, because it's distributed consistency with more than one writer and arbitrary invariants which this universe simply doesn't allow for.

- As mentioned before, it's possible to optimistically generate and apply events thanks to the persistent data structures, but that is also logic you have to maintain, and which is essentially a very naive and simple embedded OLTP, also be weary of what you think improves performance vs. what actually improves it. It might be better to have 1 cool core, than 16 really hot ones that do wasted work.

- If you don't choose your transactional domains well, or the requirements change, you're potentially in deep trouble. You can't transact across domains/partitions, if you do, they're the same domain, and potentially overload a single writer.

- Transactional domains are actually not as simple as they're often portrayed. They can nest and intersect. You'll always need a single writer, but that writer can delegate responsibility, which might be a much cheaper operation than the work itself. Take bank accounts as an example. You still need a single writer/leader to decide which account ID's are currently in a transaction with each other, but if two accounts are currently free that single writer can tie them into a single transactional domain and delegate it to a different node, which will perform the transaction and write and return control to the "transaction manager". A different name for such a transaction manager is an OLTP (with Row-Level locking).

- You won't find as many tutorials, if you're not comfortable reading scientific papers, or at least academic grade books like Martin Kleppmanns "Designing Data Intensive Applications" don't go there.

- You probably won't scale beyond what a single OLTP DB can provide anyways. Choose tech that is easy to use and gives you as many guarantees as possible if you can. With change capture you can also do retroactive event analytics and views, but you don't have to code up a write-model (and associated framework, because let's be honest this stuff is still cutting edge, and really shines in bespoke custom solutions for bespoke custom problems).

Having such an architecture under your belt is a super useful tool that can be applied to a lot of really interesting and hard problems, but that doesn't mean that it should be used indiscriminately.

Afterthought; I've seen so many people use an OLTP but then perform multiple transactions inside a single request handler, just because that's what their ORM was set to. So I'm just happy about any thinking that people spend on transactions and consistency in their systems, in whatever shape or form, and I think making the concepts explicit instead of hiding them in a complex (and for many unapproachable) OLTP/RDBMS monster helps with that (if Kafka is less of a monster is another story).

I think it's also important to not underestimate the programmer convenience that working with language native (persistent) data-structures has. The writer itself in its naive implementation is something that one can understand in full, and not relying on opaque and transaction breaking ORMs is a huge win.

PS: Plz start a something collaboratively with ObservableHQ, having reactive notebook based dashboards over reactive Postgres queries would be so, so, so, so awesome!

dwelch2344 · on Dec 8, 2020

Totally agree with this. "Kafka is terrible if you use it with poor practices" should be the title of this. It's totally clickbait targeting the "Database Inside Out" articles / concept, which if you read the most common 3-page high level it explicitly states you should be materializing up into other data sources / structures / systems / etc

To date, I've never written code that reads from a Kafka topic that wasn't taking data and transforming + materializing it into domain intelligence

dkhenry · on Dec 8, 2020

I am not terribly surprised. The materialize team was previously at CockroachDB which also had a habit of putting out marketing material like this.

j-pb · on Dec 8, 2020

Maybe I'm biased because I'm such a huge Frank McSherry fanboy. The differential dataflow work he does in rust is simply awesome, and he writes great papers too!

Little known fun fact, the rust type- and borrow-checker uses a datalog engine internally to express typing rules, and that engine was written and improved by Frank McSherry. So whenever you hit compile on a rust program, you're using a tiny bit of Materialize tech.

_lqaf · on Dec 8, 2020

I ignored them for a long time because of it. I just assumed they were lightweights being annoying because they didn't have anything else.

It serves as anti-marketing, at least to me.

zaphar · on Dec 8, 2020

If it stores data it's a database. Filesystems are databases, MongoDB is a database. LevelDB is a database. Postgres and MySQL are databases. Kafka is a database. They are all very different in features and functionality though.

What the authors mean is that kafka is not a traditional database and doesn't solve the same problems that traditional databases solve. Which is a useful distinction to make but is not the distinction they make.

The reality is that database is now a very general term and for many usecases you can choose to special purpose databases for what you need.

yumaikas · on Dec 8, 2020

I think I'd differentiate between a database and a data store.

I'd argue that a filesystem is a data store, rather than a database.

Hamuko · on Dec 8, 2020

Some say that Microsoft Excel is the world's most popular database engine.

bastawhiz · on Dec 9, 2020

Excel allows you to perform queries for almost arbitrary data on a spreadsheet. I wouldn't doubt for a minute that with sufficient time and motivation you could write a full sql engine with excel functions.

tomnipotent · on Dec 9, 2020

Excel has had a robust built-in columnar database along with decent query semantics and data modeling for awhile now (Power Query/Power Pivot).

Baeocystin · on Dec 8, 2020

Honestly, I'd agree with that, too, at least in a colloquial sense.

victor106 · on Dec 9, 2020

The most popular database where the number of concurrent users = 1

CyberDildonics · on Dec 9, 2020

Depends on how many people are looking over your shoulder.

zaphar · on Dec 8, 2020

What is your distinction? Is mongodb a database? What about leveldb?

Whether we want it to be so or not the term database is much more encompassing thank it used to be. You can try to fight that change if you want to but it means you'll be speaking a different language that most of the rest of us as a result.

speed_spread · on Dec 9, 2020

I would draw the line at semantic indexing. If the data is only accessible through immediate metadata (filename, sequence number, datetime, etc) then it's just a data_store_. If the engine also maintains indexes derived from the data being stored, then it's a data_base_. In that regard, Kakfa would not be a database since it has no knowledge of what it is storing.

yumaikas · on Dec 8, 2020

Honestly, where I'd draw the distinction between a database and a datastore is the ability to do multi-statement transactions. So, a filesystem? A datastore. Some support fsync on a single file, but multi-file sync isn't usually supported.

MongoDB and LevelDB both support transactions, so I'd err on calling them databases myself.

IfOnlyYouKnew · on Dec 8, 2020

You're using the term "traditional database" two levels up. Whatever your definition of that is, it's their definition of "database".

Your initial post says "I have a different definition of 'database'", which is different than anyone else's, as you acknowledge when you refer to that common definition as a 'traditional database'.

Then, you fault them for making an argument that is invalid for your non-standard definition of a database!

You would like Kafka, the author.

zaphar · on Dec 8, 2020

No my post said that the term database has evolved beyond their definition. If you want to fight that trend then you will be swimming against the current.

edgyquant · on Dec 8, 2020

I’d say initially file systems were data stores but once they developed hierarchies they became more akin to a database. I’m not sure there’s a huge difference but it seems a database is a collection of data stores (though there or probably a more technical and correct definition.)

Spivak · on Dec 8, 2020

I feel like the inventory thing is a bit of a straw-man because the situation is set out in such a way that you need transactions for it to work. If you find yourself wishing you had a global write-lock on a topic to then of course it won't work. Modeling your data for Kafka is work just the same as it is for MySQL. Of course it might not be the best tool for the job but you should at least give it a fair shake.

You should be able to post "buy" messages to a topic without fear that it messes up your data integrity. Who cares if two people are fighting over the last item? You have a durable log. Post both "buys" and wait for the "confirm" message from a consumer that's reading the log at that point in time, validates, and confirms or rejects the buys. At the point that the buy reaches a consumer there is enough information to know for sure whether it's valid or not. Both of the buy events happened and should be recorded whether they can be fulfilled or not.

jacques_chester · on Dec 8, 2020

> Who cares if two people are fighting over the last item?

The two people, at least. Customers tend to be a bit underwhelmed by "well, the CAP theorem..." as a customer service script.

Spivak · on Dec 8, 2020

None of this shows up as user-facing any differently than a relational database. No CAP theorem at all.

Kafka:

User clicks buy and it shows “processing” which behind the scenes posts the buy message and waits for a “confirmed” message. When it’s confirmed user is directed to success! If someone else posts the buy before them they get back a “failed: sold out” message.

Relational:

User clicks buy and it shows “processing” which behind the scenes tries to get a lock on the db, looks at inventory, updates it if there’s still one available, and creates a row in the purchases table. If all this works the user is directed to success. If by the time the lock was acquired the inventory was zero the server returns “failure: sold out”.

jacques_chester · on Dec 8, 2020

The CAP theorem line was smart-arsery.

The thing here is that the database can update the cart and the inventory in one logical step, to the exclusion of others.

The Kafka approach doesn't guarantee that out of the box, leading to the creation of de facto locking protocols (write cart intent, read cart intent, write inventory intent ...). A traditional database does that for you with selectable levels of guarantees.

Spivak · on Dec 8, 2020

You're absolutely right but I think the advantage is that to the users of the database it doesn't really feel like there's any locking going on and it's this implicit thing that's constructed from the history.

Like for a hypothetical ticket selling platform you just get a log like

    00:00 RESERVE AAA FOR xxx, 5 min
    00:02 BUY     BBB FOR qqq
    00:15 RESERVE AAA FOR yyy, 5 min
    00:23 CONFIRM AAA FOR xxx
    00:25 CONFIRM BBB for qqq
    00:27 REJECT  AAA FOR yyy, "already reserved"
    05:16 RESERVE AAA FOR zzz, 5 min
    05:34 CONFIRM AAA FOR zzz

So although there's "locking" going on here enforced by the consumer sending the confirms the producer just sends intents/events and sees whether they're successful or not and both producer and consumer are stateless.

I guess it just depends on which model you think is more a PITA to work with.

AndyPa32 · on Dec 8, 2020

Yeah, if you are lucky enough that your cart and inventory is stored in the same database.

jacques_chester · on Dec 8, 2020

Of course. That's why Materialize are interesting, they are apparently able to give a greatly improved compute:result ratio than the previous state of the art.

je42 · on Dec 8, 2020

Ok. I admit using Kafka as DB is not straight forward but just stating it doesn't provide ACID functionality is not enough.

The example they give is very simplistic. With the correct design of kafka topics and events the problem of the example can be fixed.

And according to oracle https://www.oracle.com/database/what-is-database/ :

> A database is an organized collection of structured information, or data, typically stored electronically in a computer system.

So Kafka clearly fits that definition.

dathinab · on Dec 8, 2020

Didn't some newspaper use Kafka to store the newspapers they released in order or something similar? (I think it was the New York Times, maybe??).

Honestly as long as you don't use it as a general purpose database, it might very well be the best choice for your use-case.

je42 · on Dec 8, 2020

Indeed. Known the tools and apply the one that has least cons and most pros ;).

Also, a good starter to for knowing how to use a Streaming Store like Kafka as DB is the video Database inside-out. https://www.youtube.com/watch?v=fU9hR3kiOK0

barnabask · on Dec 8, 2020

2017 article: https://open.nytimes.com/publishing-with-apache-kafka-at-the...

lmm · on Dec 9, 2020

I'd actually say Kafka makes a better general-purpose database. SQL databases are only appropriate for a pretty narrow range of problems.

satisfaction · on Dec 8, 2020

I'm guilty of using it as a DB, my home weather station writes to kafka topics, sometimes the postgres instance is down for months, no problems letting the kafka topic store the data until i get around to rebooting pg and restarting the connector.

halbritt · on Dec 8, 2020

The author presumes that every use case requires a transactional database. ACID is nice, especially if it's needed, but generally not needed, especially in many streaming data applications for which Kafka is most suitable.

tacitusarc · on Dec 8, 2020

I think because software engineers tend to excel at pattern recognition, oftentimes solutions to different problems appear so similar that it seems like with a small amount of abstraction, they can be reused. But it's a trap!

Everything abstracted to the highest level is the same, but problems aren't solved at the highest level.

The devil, as they say, is in the details.

UK-Al05 · on Dec 8, 2020

This is lack of abstraction. You can certainly fix this in kafka using various hacks, but its implementation of an abstraction you can get in a standard db for free.

Funnily enough a list of events is pretty much what a transaction log is in a standard db. Although the events have more of a business meaning. In many ways event sourcing is removing a lot of abstraction databases give you.

mrkeen · on Dec 8, 2020

> a standard db

Yes, ACID works for one database. Many databases? Not so much.

> Funnily enough a list of events is pretty much what a transaction log is in a standard db.

When I "SELECT Balance WHERE user = 12345", I usually just get back a balance, I don't get back the transaction log.

If nothing else, adopting the Kafka model gets your teammates to append updates to your ledger, rather than changing values in-place.

UK-Al05 · on Dec 8, 2020

Yes that my point. You get half of a database. The transaction log. Not the up to date state.

UK-Al05 · on Dec 8, 2020

One way around this is to make sure your kafka command streams are processed in order, in serial partitioned by an id where you want the concurrency control.

Normally you only want concurrency control within certain boundaries.

By figuring out the minimum amount transaction and concurrency boundaries you can inch out quite a bit of performance.

loopz · on Dec 8, 2020

Sure, but that defeats the quest for horizontal scalability. You can build highly performant systems based on serial execution, but not sure this is an area where Kafka excels particularly.

UK-Al05 · on Dec 8, 2020

That's why you partition by some id. Say stock SKU id for stock control. Then you can handle other SKUs in parallel. It's only in serial for a single SKU. That's probably the maximum performance potential your going to get in a traditional db anyway.

edbrown23 · on Dec 8, 2020

This definitely seems like the "Kafka" way to solve this problem, but I fear there are implications to this partitioning scheme I'd love to see answered. For example, partition counts aren't infinite, and aren't easily adjusted after the fact. So if you choose, say, 10 partitions originally, for a SKU space that is nearly infinite, then in reality you can only handle 10 parallel streams of work. Any SKU that is partitioned behind a bit of slow work is then blocked by that work.

It's doable to repartition to 100 partitions or more, but you basically need to replay the work kept in the log based on 10 partitions onto the new 100 partitions, and that operation gets more expensive over time. Then of course you're basically stuck again once your traffic increases to a high enough level that the original problem returns. If the unit of horizontal scaling is the partition, but the partition count can't be easily changed, consumers eventually lose their horizontal scalability in Kafka, from my perspective.

morelisp · on Dec 8, 2020

On the other hand Kafka partitions are relatively cheap on both broker and client side; 100 partitions does not require 100 parallel consumers so over-provisioning is not so risky.

jacques_chester · on Dec 8, 2020

This strikes me as mixing the physical and logical models.

UK-Al05 · on Dec 9, 2020

There's logical and physical partitions.

Logical partitions are always handled by the same physical partition. But physical partitions can handle multiple logical partitions.

shay_ker · on Dec 8, 2020

This is maybe a silly question, but what's the difference between the timely dataflow that Materialize uses and Spark's execution engine? From my understanding they're doing very similar things - break down a sequence of functions on a stream of data, parallelize them on several machines, and then gather the results.

I understand that the feature set of timely dataflow is more flexible than Spark - I just don't understand why (I couldn't figure it out from the paper, academic papers really go over my head).

frankmcsherry · on Dec 8, 2020

There are a few differences, the main one between Spark and timely dataflow is that TD operators can be stateful, and so can respond to new rounds of input data in time proportional to the new input data, rather than that plus accumulated state.

So, streaming one new record in and seeing how this changes the results of a multi-way join with many other large relations can happen in milliseconds in TD, vs batch systems which will re-read the large inputs as well.

This isn't a fundamentally new difference; Flink had this difference from Spark as far back as 2014. There are other differences between Flink and TD that have to do with state sharing and iteration, but I'd crack open the papers and check out the obligatory "related work" sections each should have.

For example, here's the first para of the Related Work section from the Naiad paper:

> Dataflow Recent systems such as CIEL [30], Spark [42], Spark Streaming [43], and Optimus [19] extend acyclic batch dataflow [15, 18] to allow dynamic modification of the dataflow graph, and thus support iteration and incremental computation without adding cycles to the dataflow. By adopting a batch-computation model, these systems inherit powerful existing techniques including fault tolerance with parallel recovery; in exchange each requires centralized modifications to the dataflow graph, which introduce substantial overhead that Naiad avoids. For example, Spark Streaming can process incremental updates in around one second, while in Section 6 we show that Naiad can iterate and perform incremental updates in tens of milliseconds.

shay_ker · on Dec 8, 2020

That's very helpful, thanks! I think I still have to wrap my head around _why_ being stateful allows TD to respond faster, but maybe I just gotta dig deeper and see for myself.

frankmcsherry · on Dec 8, 2020

It just comes down to something as simple as: "if I have shown you 1M different things, and now show you one more thing, what do you have to do to tell me whether that one thing is new or not?"

If you can keep a hash map of the things you've seen, then it is easy to respond quickly to that one new thing. If you are not allowed to maintain any state, then you don't have a lot of options to efficiently respond to the new thing, and most likely need to re-read the 1M things.

That's the benefit of being stateful. There is a cost too, which is that you need to be able to reconstruct your state in the case of a failure, but fortunately things like differential dataflow (built on TD) are effectively deterministic.

Also, I suspect "Spark" is a moving target. The original paper described something that was very much a batch processor; they've been trying to fix that since, and perhaps they've made some progress in the intervening years.

shay_ker · on Dec 8, 2020

I see. To my small brain it sounds like TD can intelligently memoize or cache the outputs of each "step" so that it only recalculates when it needs to as the inputs change.

I think Spark does that sometimes these days, but I don't know much about the specifics of how and when Spark does it.

Does TD have to keep _everything_ in memory, or can it be strategic in what it keeps and what it evicts?

frankmcsherry · on Dec 8, 2020

TD lets you write whatever logic you want (it is fairly unopinionated on your logic and state).

Differential dataflow plugs in certain logic there, and it does indeed maintain a synopsis of what data have gone past, sufficient to respond to future updates but not necessarily the entirety of data that it has seen.

It would be tricky to implement DD over classic Spark, as DD relies on these synopses for its performance. There are some changes to Spark proposed in recent papers where it can pull in immutable LSM layers w/o reading them (e.g. just mmapping them) that might improve things, but until that happens there will be a gap.

shay_ker · on Dec 9, 2020

Gotcha. Thanks for answering all my q's!

astn-austin · on Dec 8, 2020

You should check out Apache Flink. It does a bunch of those things that Spark doesn't, though it's also missing a few things that Spark has. https://flink.apache.org/

mamon · on Dec 8, 2020

There's no difference really. All "Big Data" (tm) tools are trying to capitalize on the hype, so Kafka adds database capabilities, while Spark adds Streaming. At some point they will reach feature parity.

fredliu · on Dec 8, 2020

Kafka is essentially commit logs, which are at the core of any traditional database engines. Streaming is just turning the gut of DB inside out (mostly for scalability reasons), while DB is wrapped up commit logs that provides higher level functionalities (ACID, Transactions, etc.). It's two sides of the same coin, yin and yang of the same thing... But on the practical side of things, yes, if what you needed more are indeed what's described in this article, your life would be easier with a traditional DB.

jdmichal · on Dec 8, 2020

So, the problem really being addressed but not named is that eventing systems give eventual consistency. But sometimes that's not good enough. And it's OK to admit that and bring in another technology when you need a stronger guarantee than that.

The example I was taught with was a booking system, where the inventory management system-of-record was separate from the search system. Search does not need 100% up-to-date inventory. A delay between the last item being booked and it being removed from the search results is acceptable. In fact, it has to be acceptable, because it can happen anyway. If someone books the last item after another hit the search button... There's nothing the system can do about that.

When actually committing a booking, however, then that must be atomically done within the inventory management system.

So, to bring it home, it's OK for the search system to be eventually consistent against bookings, and read bookings off of an event stream to update its internal tracking. However, the bookings themselves cannot be eventually consistent without risking a double-booking.

jkarneges · on Dec 8, 2020

Another potential misuse of Kafka I've been wondering about is how a single Kafka instance/cluster is often shared by multiple microservices.

On one hand the ability to connect multiple microservices to a central message broker is convenient, but on the the other hand this goes against the microservice philosophy of not sharing subcomponents (databases, etc). I wonder where the lines should be drawn.

mumblemumble · on Dec 8, 2020

I would argue that, if it's being used properly, the message broker itself is a service. It runs as a separate process, you communicate with it over an API, and its subcomponents (e.g., the database) are encapsulated.

It's all about framing and perspective, of course. But that's how I'd want to try and frame it from a system architecture point of view.

wongarsu · on Dec 8, 2020

By that same reasoning Postgres is it's own micro service. It runs as a separate process, you communicate with it over a well defined API, and it's subcomponents (data store, query optimizer etc) are encapsulated.

With enough framing everything is possible, and in some contexts it will even make sense.

mumblemumble · on Dec 8, 2020

It has to do with how it functions in practice, IMO. PostgreSQL itself is arguably a service, but the database probably is not - you're probably crawling all over its implementation details and data model.

You could take a stand and say, "All access is through stored procedures. They are the API." And, if that API operates as the same semantic level as a well-crafted REST API, then you could make an argument that that particular database is a microservice that just happens to have been implemented on top of PostgreSQL. But I don't think I've ever seen such a thing happen in the wild. It's much more popular to use an ORM to get things nice and tightly coupled.

selfhoster11 · on Dec 8, 2020

Such implementations exist and have been discussed a few days ago here on HN. There are also REST adapters for Postgres: https://github.com/PostgREST/postgrest

ivalm · on Dec 8, 2020

Wait, what? Isn’t the whole point of having multiple publishers/subscribers?

mrweasel · on Dec 8, 2020

I think the point was about using a single cluster for multiple topics, for different services.

Depending on the scenario I can see the point. If the micro services are all part of the larger overall solution, having a single cluster is perfectly fine. Using the same cluster for multiple "product" is a little like having one central database server for a number of different solutions. You can do it, but it potentially become a bottleneck or a central point for your different solutions to impact performance of each other.

jkarneges · on Dec 8, 2020

I'd agree there is an arguable difference between sharing a server vs sharing data within the server.

Bottleneck issues aside, letting two microservices connect to the same Postgres cluster but access different "databases" (collection of tables) within that cluster could be considered an acceptable data separation. Certainly with multi-tenant DBaaS systems there may be some server sharing by unrelated microservices/customers. Whereas letting two microservices access the same database tables would probably be frowned upon.

Nevertheless, sharing the same Kafka topics between microservices seems to be a common thing to do.

ivalm · on Dec 8, 2020

> Whereas letting two microservices access the same database tables would probably be frowned upon.

> Nevertheless, sharing the same Kafka topics between microservices seems to be a common thing to do.

I think if it is part of one whole isn’t this fine? You have one service that generates customer facing output, you may have another service that powers analytics/dashboards you may have yet another service that ETLs data into some data mart. Why wouldn’t they touch the same table/subscribe to the same topic (since they just need read-only access to the data)? Genuinely curious what the problem is except for bottleneck/performance; and if it just bottleneck then wouldn’t scaling horizontally solve it?

jkarneges · on Dec 8, 2020

Sure. I believe microservice boundaries are more about development agility rather than scalability. By limiting each microservice to a minimal API surface and a "2 pizza" team, everyone can iterate faster. And if a particular microservice is implemented as multiple sub-services sharing database tables only known by the team in charge, that seems fine.

onefuncman · on Dec 8, 2020

this isn't any different than microservices getting a deathball dependency on a user service or logging service or security service or ...

you either don't allow microservices to consume from others' topics, or you publish event schemas so they can still iterate independently.

the move to a 2nd kafka cluster in my experience has always been driven by isolation and fault tolerance concerns, not scalability.

Cojen · on Dec 8, 2020

Jim Gray disagrees: https://arxiv.org/ftp/cs/papers/0701/0701158.pdf

georgewfraser · on Dec 8, 2020

It's funny that you use that example, we actually cited that in an earlier draft of this post. Despite the seemingly opposite title "Queues are Databases", that note actually makes many of the same arguments, that message brokers are missing much of the functionality of database management systems and this is a problem.

soumyadeb · on Dec 8, 2020

The architecture of dumping events into Kafka and creating materialized views is a perfect choice for many use cases - e.g. collecting clickstream data and building analytical reports.

If ACID is a prerequisite, then lot of things won't classify as databases - None of Mongo, Cassandra, ElasticSearch etc. Not even many data-warehouses.

Kalium · on Dec 8, 2020

As recently as last year, I worked for a company where the Chief Architect, in his infinite wisdom, had decided that a database was a silly legacy thing. The future looked like Kafka streams, with each service being a function against Kafka streams, and data retention set to infinite.

Predictably, this setup ran into an interesting assortment of issues. There were no real transactions, no ensured consistency, and no referential integrity. There was also no authentication or authorization, because a default-configured deployment of Kafka from Confluent happily neglects such trivial details.

To say this was a vast mess would be to put it lightly. It was a nightmare to code against once you left the fantasy world of functional programming nirvana and encountered real requirements. It meant pushing a whole series of concerns that isolation addresses into application code... or not addressing them at all. Teams routinely relied on one another's internal kafka streams. It was a GDPR nightmare.

Kafka Connect was deployed to bridge between Kafka and some real databases. This was its own mess.

Kafka, I have learned, is a very powerful tool. And like all shiny new tools, deeply prone to misuse.

JMTQp8lwXL · on Dec 8, 2020

Instead of single architects, companies need architect boards. And they can vote on these ideas before a single individual becomes a single point of failure.

Expecting 1 person to make 100% correct decisions all the time is too much expectation for one person. People go down rabbit holes and they have weird takeaways, like replace all the databases with queues.

Kalium · on Dec 8, 2020

I agree in abstract, but in practice it's quite difficult to set up a successful democratic architecture board. You need teams or departments that all have architects, and an engineering organization where both managers and engineers accept a degree of centralized technical leadership. Getting there is, in my opinion, the work of years. It's especially challenging because spinning up such a board requires a person who can run it single-handedly.

In this particular company, the Chief Architect in theory had a group around him. They did nothing to check his poor decisions, and from the outside seemed primarily interested in living in the functional programming nirvana he promised them.

sixdimensional · on Dec 8, 2020

Thinking in terms of building physical buildings.. architects are often the visionaries but engineers are the realists. You know, architects come up with these crazy incredible building designs based on their engineering understanding, but ultimately it has to be vetted, proven and implemented by engineers.

I have, however, always wondered why people should be seen as "only architects" and "only engineers". While the separation of duties is critical to ensure the overall construction is sound, people can be visionary engineers, and people can be knowledgeable in both how to do something in real terms as well as dreaming on how to go beyond.

sixdimensional · on Dec 8, 2020

It sounds like this would have made a better proof of concept than an commitment to the architecture.

The idea on the face of it is not per se a bad one, quite interesting, but the implementations are perhaps not there yet to back such an idea.

It's important to know as an architect when your vision for the architecture is outpacing reality, to know your time horizon, and match the vision with the tools that help you implement the actual use cases you have in your hand right now.

It sounds like this person might have had an interesting idea but not a working system. In another light, this could have been a good idea if all the technology was in place to support it.. but the timing and implementation doesn't sound like it was right, perhaps.

The old saying "use the right tool for the job" comes to mind, but that can be hard to see when the tools are changing so fast, and there is a risk to going too far onto the bleeding edge. Perhaps the saying should have been, "use the rightest tool you can find at the time, that gives you some room to grow, for the job"...

Kalium · on Dec 8, 2020

It was definitely an interesting proof of concept that needed some refinement. The core idea was functional services against nicely organized data streams on a flat network. Which is a really cool approach that works quite well for a lot of things.

Several of these points fell apart when credit card handling and PCI-DSS entered the picture.

EamonnMR · on Dec 8, 2020

At our company we use a ton of services that operate essentially as as functions on a Kafka stream (well, they tend to read/write in batches for efficiency) but we write event streams we want to query later into a regular database for later query. It works out very well. The idea of our poor Kafka cluster having to field queries in addition to the load of acting as a transport layer is frightening. The 'superpower' Kafka gives you is the ability to turn back time if something goes wrong and the ability to orchestrate really big pipeline. You have to build or buy a fair bit of tooling to make it work though.

Kalium · on Dec 8, 2020

This particular org did its level best to think of Kafka as the regular database for queries.

lmm · on Dec 9, 2020

> There were no real transactions, no ensured consistency

Which is the right way to do it, because transactions don't extend into the real world. If you need to wait for the consequences of a given event, wait for the consequences of that event. Otherwise, all you really care about is all events happening in a consistent order. It's a much more practical consistency model.

> and no referential integrity

The problem with enforcing referential integrity is how you handle violations of it. Usually you don't really want to outright reject something because it refers to something else that doesn't exist yet, so you end up solving the same problem either way.

> There was also no authentication or authorization, because a default-configured deployment of Kafka from Confluent happily neglects such trivial details.

Pretty common in the database world - both MySQL and PostgreSQL use plaintext protocols by default. Properly-configured kafka uses TLS and/or SASL and has a good ACL system and is as secure as anything else.

> It was a nightmare to code against once you left the fantasy world of functional programming nirvana and encountered real requirements. It meant pushing a whole series of concerns that isolation addresses into application code... or not addressing them at all.

My experience is just the opposite - ACID isolation sounds great until you actually use it in the real world, and then you find it doesn't address your problems and doesn't give you enough control to fix it yourself. It's like when you use one of those magical do-everything frameworks - it works great until you need to customise something slightly, then it's a nightmare. Kafka pushes more of the work onto you upfront - you have to understand your dataflow and design it explicitly - but that pays off immensely.

> It was a GDPR nightmare.

Really? I've found the exact opposite - teams that used an RDBMS had to throw away their customer data under GDPR, because even though they had an entry in their database saying that the customer had agreed, they couldn't tell you what the customer had agreed to or when. Whereas teams using Kafka in the way you describe had an event record for the original agreement, and could tell you where any given piece of data came from.

Kalium · on Dec 9, 2020

> Really? I've found the exact opposite - teams that used an RDBMS had to throw away their customer data under GDPR, because even though they had an entry in their database saying that the customer had agreed, they couldn't tell you what the customer had agreed to or when. Whereas teams using Kafka in the way you describe had an event record for the original agreement, and could tell you where any given piece of data came from.

This is absolutely wonderful! Unfortunately, this team decided to store data subject to GDPR deletion requests in Kafka, where deletion is quite difficult. It was a problem, when trying to do deletion programmatically, across many teams using the same set of topics.

The real nightmare came when this team, obsessed with the power of infinite retention periods, encountered PCI-DSS. You see, the business wanted to move away from Stripe and similar to dealing with a processor directly, in order to save on transaction fees. So obviously they could just put credit card data into Kafka...

lmm · on Dec 10, 2020

Yeah fair enough. I'd argue that this is kind of a double standard (a traditional RDBMS may well be copies of "deleted" data on dirty pages, and may well leave that data on the physical disk indefinitely, for much the same reasons as Kafka does - it just makes it a bit fiddlier for you to access it), but your legal team may decide that it's required.

I don't think your overall scorn is warranted - there are bigger problems that are endemic to RDBMS deployments, and the advantages of a stream-first architecture are very real - but there are genuine difficulties around handling data obliteration and it's something you need to design for carefully if you're using an immutable-first architecture and have that requirement.

hodgesrm · on Dec 8, 2020

Is this really a thing? Do people really try to use Kafka as the system of record for financial transactions or similar data?

mrkeen · on Dec 8, 2020

Hell yes. Best thing I ever did.

Updating balances using an RDBMS is like managing your finances with pencils and erasers. Unless you somehow ban the UPDATE statement.

Updating balances with Kafka is like working in pen. You can't[1] change the ledger lines, you can only add corrections after the fact.

[1] Yes, Kafka records can be updated/deleted depending on configuration. But when your codebase is written around append-only operations, in-place mutations are hard and corrections are easy, so your fellow programmers fall into the 'pit of success'.

jacques_chester · on Dec 8, 2020

The original sin of SQL is that it begins without memory.

The original sin of Kafka is that it begins with only memory.

To me the right middle way is relational temporal tables[0]. You get both the memoryless query/update convenience and the ability to travel through time.

[0] SQL:2011 introduced temporal data, but in a slightly janky just-so-happens-to-fit-Oracle's-preference kind of way.

sixdimensional · on Dec 8, 2020

Blockchain databases take this model to the extreme - it is a database that is append only and immutable, with cryptographic guarantees of the ledger.

I'm not selling any particular tool but Amazon's QLDB is an interesting example of a blockchain-based database. I am interested to see how things like Kafka and this might come together somehow.

In my opinion, we have evolved to a point where storage is not a concern for temporal use cases - i.e. we can now store every change in an immutable fashion. When you think about this being an append only transaction log that you never have to purge, and you make observable events on that log (which is what most CDC systems do)... yeah it works. Now you have every change, cryptographically secure, with events to trigger downstream consumers and you can really rethink the whole architecture of monolithic databases vs. data platforms.

It is an exciting time we are in, in my opinion.

hodgesrm · on Dec 8, 2020

I would just put the ledger in a database table if it's that important and maintain the current state of the account in a separate table. ACID transactions and database constraints make this kind of consistency easier to achieve than many alternatives. It's also easier to prove correctness since you can run queries that return consistent results thanks to the isolation guaranteed by ACID. (Modulo some corner cases that are not hard to work around.)

Just my $0.02.

mrkeen · on Dec 8, 2020

> ACID transactions and database constraints make this kind of consistency easier to achieve than many alternatives.

If your company only runs one database.

> It's also easier to prove correctness

RDBMSs are wonderful and I don't consider them unreliable at all. But I can't prove the correctness of my teammate's actions. I want them to show their working by putting their updates onto an append-only ledger.

jacques_chester · on Dec 8, 2020

> If your company only runs one database.

I think the same argument can be made with "only one Kafka cluster" and "only one blockchain".

lmm · on Dec 9, 2020

> I think the same argument can be made with "only one Kafka cluster"

No it can't. Kafka events for the same partition will always be processed in order. If you have a stream transformation that reads from a topic that lives in one cluster and writes to a topic that lives in another cluster, then (as long as you chose the right partition key) everything will work correctly. Even if you have a sequence of transformations that zig-zag between two different clusters, it will do the right thing. You can't achieve that with traditional databases.

jgraettinger1 · on Dec 8, 2020

This post doesn't mention the _actual_ answer, which is to:

1) Write a event recording a _desire_ to checkout. 2) Build a view of checkout decisions, which compares requests against inventory levels and produces checkout _results_. This is a stateful stream/stream join. 3) Read out the checkout decision to respond to the user, or send them an email, or whatever.

CDC is great and all, too, but there are architectures where ^ makes more sense than sticking a database in front.

Admittedly working up highly available, stateful stream-stream joins which aren't challenging to operate in production is... hard, but getting better.

fuckadtech · on Dec 8, 2020

Hard is an understatement. Particularly so if you are using Kafka Streams to attempt to run a highly available, fault tolerant, zero downtime, etc., service. The race condition and compaction bugs in that library are not fun to debug.

vladsanchez · on Dec 8, 2020

I want to Upvote this more than once. So much facts into a condensed into a small essay. Good job!

Money quote: "Event-sourced architectures like these suffer many such isolation anomalies, which constantly gaslight users with “time travel” behavior that we’re all familiar with."

rowland66 · on Dec 8, 2020

Maybe I am just old because I had to Google what gaslighting meant, but as best I can tell getting gaslighted by your system architecture is really stretching the meaning of the term gaslighting to the point of absurdity.

vladsanchez · on Dec 10, 2020

I concur, but we both got it's a metaphor. ;)

theptip · on Dec 8, 2020

This is a bit dumbed down, and ignores the domain terminology required to properly discuss the trade-offs here (which is puzzling given that it links to a post by Aphyr, where you can find incredibly thorough discussions around isolation levels and anomalies).

> The fundamental problem with using Kafka as your primary data store is it provides no isolation.

This is false. I can only assume the author doesn't know about the Kafka transactions feature?

To be specific, Kafka's transaction machinery offers read-committed isolation, and you get read-uncommitted by default if you don't opt-in to use that transaction machinery (the docs: https://kafka.apache.org/0110/javadoc/index.html?org/apache/...). Depending on your workload, read-committed might be sufficient for correctness, in which case you can absolutely use Kafka as your database.

Of course, proving that your application is sound with just read-committed isolation is can be challenging, not to mention testing that your application continues to be sound as new features are added.

Because of that, in general I think that the underlying point of this article is probably correct, in that you probably shouldn't use Kafka as your database -- but for certain applications / use-cases it's a completely valid system design choice.

More generally this is an area that many applications get wrong by using the wrong isolation levels, because most frameworks encourage incorrect implementations by their unsafe defaults; e.g. see the classic "Feral concurrency control" paper http://www.bailis.org/papers/feral-sigmod2015.pdf. So I think the general message of "don't use Kafka as your DB unless you know enough about consistency to convince yourself that read-committed isolation is and will always be sufficient for your usecase" would be more appropriate (though it's certainly a less snappy title).

georgewfraser · on Dec 8, 2020

"Read-committed isolation" is not a meaningful implementation of transactions. If you can't do read, then a write, while guaranteeing the database didn't change in between, then you don't really have transactions.

theptip · on Dec 8, 2020

Depends on your use-case; if it's meaningless, why is it implemented in all the leading SQL DBs? It's the default in Postgres...

https://www.postgresql.org/docs/9.5/transaction-iso.html

If you're arguing that in practice this isn't enough isolation, then sure, that's what I said in my post; most applications need more than the default isolation levels. I feel like you're making an absolutist point (just like the original article) where my point was that the domain is actually more nuanced, and absolutes just obscure the technical complexity.

LgWoodenBadger · on Dec 8, 2020

This sounds like "serializable" which is (in my experience) rarely useful for a meaningful system.

theptip · on Dec 9, 2020

If you read the "Feral concurrency control" paper I linked above, particularly section 7 on conclusions, they make the case that serializable is the only isolation level that's actually safe with naive coding styles on frameworks like Rails and Django which do application-level validation. If you do validation in your application and don't use serializable isolation, then you have to be careful about manually locking, OR just be sure that your usage pattern isn't vulnerable to the anomalies that you're introducing by using a weaker isolation level.

If you're building a financial ledger, you absolutely must use serializable isolation. If you're building a Twitter clone, sure, use something weaker that will gain you some performance.

I'd make the case that we should be recommending the use of serializable by default unless you have a reason why you think it's OK to use something weaker, rather than having the default be better-performing-but-unsafe. The sort of concurrency validation errors that you get if you needed Serializable and used Read-Committed instead are really, really hard to reproduce, debug, and diagnose.

isbvhodnvemrwvn · on Dec 9, 2020

It's not strictly speaking serializable, what they described is a lost update anomaly, to avoid which repeatable read or snapshot isolation is sufficient.

Serializable (or serializable snapshot isolation) is stronger, it doesn't allow anomalies such as write skew, but also a lot more expensive since you need to keep track of changes in rows matched by predicates to avoid phantom reads (as compared to just keeping track of the rows returned by the query with predicates in this particular transaction).

Also worth noting that some DBs such as Oracle actually lie about implementing serializable (in this case they only offer snapshot isolation), so it's worth keeping that in mind and use locks if necessary.

lmm · on Dec 9, 2020

> The problem we now have is called write skew. Our reads from the inventory view can be out of date by the time the checkout event is processed. If two users try to buy the same item at nearly the same time, they will both succeed, and we won’t have enough inventory for them both.

And you'll have exactly the same problem if you're using a traditional ACID database: the user saw the item as being available, clicked buy, but it was unavailable by the they went to get it. Using an ACID database doesn't gain you anything; you might as well just use Kafka for everything.

bastawhiz · on Dec 9, 2020

The A in ACID literally stands for atomicity. If you're using an ACID database that can't guarantee that an item is available for purchase at the time it's purchased, you're using a bad database.

The user having stale data in their browser and finding an item has already been purchased is very different from the database, within a transaction, allowing an item which has already been purchased to be purchased again.

lmm · on Dec 9, 2020

> The user having stale data in their browser and finding an item has already been purchased is very different from the database, within a transaction, allowing an item which has already been purchased to be purchased again.

This is like "the operation was a success, but the patient died". What matters is the user-facing behaviour of your whole system; a transaction that can't actually cover the parts the user cares about is pointless.

bastawhiz · on Dec 9, 2020

You're misunderstanding the problem. If I press "buy now" and it says "success", I don't want to receive an email later saying "oops, we didn't actually have any in stock, we'll send you a refund" because another order was sitting in a queue. That's the failure mode here. The staleness of what's in the browser is an orthogonal concern unrelated to the database or use of Kafka.

lmm · on Dec 10, 2020

It's very much related. You have one item in stock. Two users see the item as available in their browsers and click "buy now". That's the problem that you actually have to solve, and database transactions don't help you solve it: whether you're using a transactional datastore or not you have to do pretty much the same thing when the user clicks "buy now": issue an attempt to buy it, wait for that attempt to be confirmed/denied, and handle both cases.

And once you've solved that problem you don't need or want ACID transactions because they don't actually do anything for you. Order confirmation emails have the same problem as the browser: you can't (or at least shouldn't) actually hold a database-level transaction open while you connect to an email server, so you have to do something like recording a queue of email confirmations that are ready to be sent - exactly the same thing you do when using Kafka.

bastawhiz · on Dec 12, 2020

You're still missing the point. You're saying:

> issue an attempt to buy it, wait for that attempt to be confirmed/denied, and handle both cases.

Only one user should receive a success. The article describes a setup where both users receive a success because of write skew. This is the very problem that transactions avoid: locking the row storing the number of items in stock so that it cannot be decremented below zero.

You can still use Kafka. What the article is saying is that you can't just check the DB to see if items are in stock and write your order to a queue. Because there might already be an order for the same item in the queue which makes your new order invalid.

Literally the whole point of ACID databases is that multiple clients can operate on the same set of data and avoid putting it into a bad state. If I'm a bank storing an account balance of $20 and you withdraw $20 and I withdraw $20, only one of us should get $20. This is the same problem.

lmm · on Dec 15, 2020

> Only one user should receive a success. The article describes a setup where both users receive a success because of write skew.

It describes a setup where neither user bothers to check whether they had a success because the authors are affecting not to understand how to use Kafka.

> What the article is saying is that you can't just check the DB to see if items are in stock and write your order to a queue.

You can't do that with a database either. You have to actually handle the case where it fails. It's not actually any easier than doing it properly with Kafka.

> Literally the whole point of ACID databases is that multiple clients can operate on the same set of data and avoid putting it into a bad state. If I'm a bank storing an account balance of $20 and you withdraw $20 and I withdraw $20, only one of us should get $20.

Banks don't actually use ACID transactions for that, because they don't work in the real world; the bank needs to have a record of both your attempts to withdraw $20, not for one of you to see an error. What a bank ends up implementing is much the same as what you implement when using Kafka.

bastawhiz · on Dec 15, 2020

> You can't do that with a database either. You have to actually handle the case where it fails.

This is provably false, as evidenced by Postgres's FOR UPDATE locking support. Of course you have to handle the failure, that's what tells you your mutation failed. RDBMS systems can absolutely guarantee that writes are performed atomically. One only needs to look at Aphyr's extensive testing of databases with Jepsen. Claiming this is all bunk is dismissing reality.

> Banks don't actually use ACID transactions for that

That's a big ol [citation needed] from me.

ben509 · on Dec 9, 2020

The ACID system can guarantee nobody is billed for an item that you can't deliver. If you want a more user-friendly guarantee, you can reserve it when it's added to the cart.

lmm · on Dec 9, 2020

> If you want a more user-friendly guarantee, you can reserve it when it's added to the cart.

If you open an ACID transaction when the user adds something to the cart and don't close it until they check out, you'll find your database gets locked up pretty quickly. So you can't actually use the ACID transactions to implement the behaviour you want - you have to implement some kind of reserve/commit semantics in userspace, whether you're using an ACID database or not.

ben509 · on Dec 9, 2020

You don't need to hold the transaction open the entire time, you just need the inventory count to be correct.

You track the inventory and reservations. Taking a reservation checks that inventory is available. With row-level locking, only that inventory item is locked. If that fails, it can search for timed out reservations, update the inventory and try again.

If it succeeds, it decrements the inventory then adds a reservation. At that point, the transaction can close, and in the common case, you only held locks long enough to update a row in inventory and add a row to reservations.

lmm · on Dec 10, 2020

You still have to handle the case where your transaction fails though. So you actually end up writing the same thing you'd do if you didn't have transactions: you issue an attempt to reserve, wait for the response to that attempt (whether that's transaction commit succeeding/failing or a queue processor processing), and handle both possible results. The transaction support doesn't actually help you because it's at the wrong level to be useful.

fouc · on Dec 8, 2020

Any sufficiently complex software will end up implementing a database.

dgb23 · on Dec 8, 2020

The article links to this talk[0], which has a funny and interesting sounding title: "Did you accidentally build a database?"

[0] https://www.oreilly.com/library/view/strata-hadoop/978149194...

revertts · on Dec 8, 2020

That link's a 3min clip for non-subscribers, but the full talk is here https://www.youtube.com/watch?v=Bz2EXg0Fy98

quickthrower2 · on Dec 8, 2020

I’d qualify that as “complex infrastructure software”

arthurcolle · on Dec 8, 2020

I had an issue with RabbitMQ where I didn't know how my consumer was going to use the data that I was writing to a queue yet (from a producer that was listening on a SocketIO or WebSockets stream), and I was kind of just going to figure it out in an hour or something.

Eventually, my buffer ran out of memory and I couldn't write anything else to it, and it was dropping lots of messages. I was bummed. Is there a way to avoid this in Kafka?

atmosx · on Dec 8, 2020

RabbitMQ and Kafka serve different needs.

RabbitMQ is the most widely used open source implementation of the AMQP protocol. It is slower but can support complex routing scenarios internally and handle situations were at-least-once-delivery guarantees are important. RabbitMQ supports on-disk persistent queues, which you can tune if you like. Compared to Kafka, RabbitMQ is slow in terms of volume that can be managed per queue.

Kafka is fast because it is horizontally scalable and you have parallel producers and consumers per topic. You can tune the speed and move needle where you need between consistency and availability. However, if you want things like at-least-once-delivery and such, you'll have to use the building blocks kafka gives you, but ultimately you'll have to handle this on the application side.

Regarding storage, by default kafka stores data for 7 days. IIRC the NY Times stores articles from 1970 onwards on kafka clusters. The storage is horizontally scalable and durable. This is a common use case. As many have pointed out, the cluster setup depends highly on you needs. We store data for 7 days in kafka as well and it's in the order of 500GB or more per node.

Looks like you have a configuration issue. You can configure rabbitMQ to store queues on the hard disk and with a quick calculation you can make sure you have enough space for 10 or 150 hours of data. I don't see any reason to switch to kafka, a different tool with different characteristics, just because you need more storage.

amai · on Dec 8, 2020

Elasticsearch is also not a database.

charlieflowers · on Dec 8, 2020

Could you elaborate on that? Because it sure seems like a DB to me.

Not a relational one. And should not replace a typical CRUD OLTP DB.

But it sure seems like a no-sql DB to me.

amai · on Dec 9, 2020

Elasticsearch is a search engine. It is optimised to return the best fitting results to a query at the first page. But if you want to retrieve all results for a query (which is a common usecase for DBs) the performance of elasticsearch will plummet.

For example: Try to retrieve a list of all the 10000 movies you stored in an Elasticsearch index. You will get the first 100 results easily, but is you scroll through the results, you will notice that elasticsearch will become very slow.

It is not optimized for that use case. Otherwise it would be called ElasticDB.

diehunde · on Dec 8, 2020

Relevant to the discussion:

Martin Kleppmann | Kafka Summit SF 2018 Keynote (Is Kafka a Database?) [1]

[1] https://www.youtube.com/watch?v=v2RJQELoM6Y

based2 · on Dec 8, 2020

https://www.postgresql.org/docs/10/rules-materializedviews.h...

EamonnMR · on Dec 8, 2020

Kafka is a very nice communication channel. You can dump the results into a database and query it if you need a database.

tutfbhuf · on Dec 8, 2020

Well, then you have never heard of ksqlDB. It adds SQL and DB features to Kafka. It is backed by Confluent (LinkedIn) same company that developed Kafka initially.

https://ksqldb.io

albertwang · on Dec 8, 2020

We’re very aware of ksqlDB. I would recommend this video from last week where Matthias talks about some of the strengths and weaknesses of ksql: https://youtu.be/KUQuegJ4do8

cirego · on Dec 8, 2020

My understanding is that ksqlDB is a read-only interface on top of streams and only helps people write better consumers. The problems mentioned in the blog post relate to producers.

slt2021 · on Dec 9, 2020

How does using Kafka come into play when implementing Actor model?

Anyone successfully implemented actor model framework over kafka?

interested in learning others' experience

joking · on Dec 8, 2020

neither has to be.

detay · on Dec 8, 2020

using the any tool for correct problem requires skills.

hasanic · on Dec 8, 2020

I mean, duh? Does Apache Kafka ever made the claim that it is a database?

Other things that are not a database: Apache Traffic Server, Apache Mahout, Apache Jakarta, Apache ActiveMQ... hundreds of these exist.

dang · on Dec 8, 2020

The title sounds generic, but the article makes it clear that it's responding to specific proposed uses of Kafka.