Just going from the last one I read for DGraph, it did extremely well. Pretty sure etcd did well.
They always have bugs somewhere, but there are huge differences between bugs that show up for very specific, niche cases, and normal "I wrote to the db and it dropped it".
"We found five safety issues in version 1.1.1—some known to Dgraph already—including reads observing transient null values, logical state corruption, and the loss of large windows of acknowledged inserts."
Loss of large windows of acknowledged inserts. Durability is hard.
As staticassertion is mentioning, some of the violations that were found were only around tablet moves, which happen only in certain cluster sizes and quite infrequently. Of course, Jepsen triggers those moves left-right-and-center to evoke some of those failure conditions; but that's not how tablet moves are supposed to work in real world conditions. This is different from other edge cases like process crashes, or machine failures, network partitions, clock skews, etc., which can and do happen. In those cases, Jepsen didn't find any violations.
We were planning to look into those tablet move issues and get them fixed up (shouldn't be that hard), but honestly, the chances of our users encountering them is so low that we de-prioritized that work over some of the other launches that we are doing.
But, we'll fix those up in the next few months, once we have more bandwidth.
I don't really feel like playing the quotes game... but, sure.
"All of the issues we found had to do with tablet migrations"
"ndeed, the work Dgraph has undertaken in the last 18 months has dramatically improved safety. In 1.0.2, Jepsen tests routinely observed safety issues even in healthy clusters. In 1.1.1, tests with healthy clusters, clock skew, process kills, and network partitions all passed. Only tablet moves appeared susceptible to safety problems."
No one is here to claim that anyone is getting through any kind of rigorous testing without bugs found. But there is a huge difference between "My extremely common write path + a partition = dropped transactional writes" and "Under very specific circumstances, with worst case testing, multiple partitions, and the db in a specific state, we drop writes".
There is an ocean between, say, mongodb's test results, and Dgraph's.
"If you use Redis as a queue, it can drop enqueued items. However, it can also re-enqueue items which were removed. "
"f you use Redis as a database, be prepared for clients to disagree about the state of the system. Batch operations will still be atomic (I think), but you’ll have no inter-write linearizability, which almost all applications implicitly rely on."
"Because Redis does not have a consensus protocol for writes, it can’t be CP. Because it relies on quorums to promote secondaries, it can’t be AP. What it can be is fast, and that’s an excellent property for a weakly consistent best-effort service, like a cache."
Again, Redis is a very different type of database, so expectations should be aligned. Further, this test is quite old.
But that's a huge difference from DGraph's results.
Basically, saying "Well no one does well on Jepsen" isn't really true. Lots of databases do well, but you have to adjust your definition of "do well".
They always have bugs somewhere, but there are huge differences between bugs that show up for very specific, niche cases, and normal "I wrote to the db and it dropped it".