The first half is jepsen team trying to divine some actual testable guarantees f...

rystsov · on April 29, 2022

The mess is mostly the result of the mismatch between the classic database transactional model and kafka transactional model (G0 anomaly). If you read the documentation without the database background it seems ok, but when you notice the differences between the models it becomes hard to understand if it's a bug or property of the Kafka protocol.

There is a lot of research happening around this area even in the database world. The list of the isolation levels isn't final and some of the recent developments include PC-PSI and NMSI which also seem to "violate" the order. I hope one day we get the formal academic description of the Kafka model. It looks very promising.

aphyr · on May 2, 2022

I agree--rystsov has covered this well here and in other parts of the thread. I just want to add that some of the Kafka documentation did claim writes were isolated (where other Kafka documentation contradicts that claim!) so it's possible that depending on which parts of the docs users read, they might expect that G0 was prohibited. That's why this report discusses it in such detail. :-)

btown · on April 29, 2022

Are there good research groups or journals to follow to keep apprised of the state of the art here?

rystsov · on April 29, 2022

I've created this list a while ago https://github.com/redpanda-data/awesome-distributed-transac.... Maybe it's time to update it.

Usually I start with a couple of seed papers then follow the references, look at the other papers the authors wrote. When a phd student explores an area they write several paper on the topic so there is a lot material to read. But the real gem is the thesis, it has depth, context and a lot of links to other work in the area.

btown · on April 30, 2022

Thank you for putting this together!

jeffbee · on April 29, 2022

Total mess. It’s a real indictment of Kafka, more than it is anything about redpanda in the first half.

excuses_ · on April 29, 2022

I wonder if Redpanda thinks about or offers some alternative protocol that would be better defined in terms of transaction guarantees. At this point it looks like Kafka’s protocol was a nice try but it needs a major refactoring.

rystsov · on April 29, 2022

Documentation is a bit confusing: the protocol was evolved over time (new KIPs) and there is mismatch between the database model and kafka model. But we see a lot of potential in the Kafka transactional protocol.

At Redpanda we were able to push to 5k distributed transactions cross replicated shard. It's a mind-blowing for a database to achieve the same result.

Also Kafka transactional protocol works at low level it's very easy to build systems on top of it. For example, it's very easy to build a Calvin inspired system http://cs.yale.edu/homes/thomson/publications/calvin-sigmod1...