It just feels like two *widely* different scenarios we're talking about here. ht...

rystsov · on May 17, 2023

We can't ignore or pretend that network partitioning doesn't happen. When people talk about choosing two out of CAP the real question is C or A because P is out of our control.

When we combine network partitioning with single local data suffix loss it either leads to a consistency violation or to a system being unavailable desperate the majority of the nodes being are up. At the moment Kafka chooses availability over consistency.

Also I read Kafka source and the role of network partitioning doesn't seem to be crucial. I suspect that it's also possible to cause similar problem with a single node power-outage https://twitter.com/rystsov/status/1641166637356417027 and unfortunate timing

comet-engine · on May 16, 2023

For what it’s worth, this form of loss wouldn’t be possible under KRaft since they (ironically?) use Raft for the metadata and elections. Ain’t nobody starting a new cluster with Zookeeper these days.

slt2021 · on May 16, 2023

you are right, this is a failure of both zookeeper and leader node, so two independent failures at the same time

frant-hartm · on May 16, 2023

Exactly my thoughts.