The mess is mostly the result of the mismatch between the classic database transactional model and kafka transactional model (G0 anomaly). If you read the documentation without the database background it seems ok, but when you notice the differences between the models it becomes hard to understand if it's a bug or property of the Kafka protocol.
There is a lot of research happening around this area even in the database world. The list of the isolation levels isn't final and some of the recent developments include PC-PSI and NMSI which also seem to "violate" the order. I hope one day we get the formal academic description of the Kafka model. It looks very promising.
I agree--rystsov has covered this well here and in other parts of the thread. I just want to add that some of the Kafka documentation did claim writes were isolated (where other Kafka documentation contradicts that claim!) so it's possible that depending on which parts of the docs users read, they might expect that G0 was prohibited. That's why this report discusses it in such detail. :-)
Usually I start with a couple of seed papers then follow the references, look at the other papers the authors wrote. When a phd student explores an area they write several paper on the topic so there is a lot material to read. But the real gem is the thesis, it has depth, context and a lot of links to other work in the area.
I wonder if Redpanda thinks about or offers some alternative protocol that would be better defined in terms of transaction guarantees. At this point it looks like Kafka’s protocol was a nice try but it needs a major refactoring.
Documentation is a bit confusing: the protocol was evolved over time (new KIPs) and there is mismatch between the database model and kafka model. But we see a lot of potential in the Kafka transactional protocol.
At Redpanda we were able to push to 5k distributed transactions cross replicated shard. It's a mind-blowing for a database to achieve the same result.