Scaling Blockchains with Apache Kafka

galeaspablo · on July 31, 2017

> If you aren’t familiar with the database pattern known as event sourcing (don’t worry — it’s relatively new),

It's not relatively new. That “transaction file” thing in your database? Event Sourcing.

https://goodenoughsoftware.net/2012/03/02/case-studies/

> If you’re not looking at the public chain, you’re wasting your time

I disagree. Not having a single point of failure (one place that can get hacked) is valuable.

> From a trust perspective, it makes no difference if your banking cartel is writing to a Quorum, Hyperledger, or Kafka instance.

Of course it does. The protocol of blockchains makes them work with "proof of X". Appending to any event store, whether in Kafka or SQL does not require proof of anything.

> Blockchains are built for trust, databases for throughput. Event sourcing allows us to achieve a hybrid model with characteristics of both.

No, the reason blockchains can't have high throughpout / almost infinite horizontal scalability... is because there's a logic check. E.g. in bitcoin, you can't send more bitcoins than you have a balance. Event sourcing gives you the high throughpout if there's no logic checks across aggregates --- if there are, you won't have immediate consistency, and you have to be ready for compensating events.

I recommend two books, that cover event sourcing from a Domain Driven Design perspective. The consequences are similar.

https://www.amazon.co.uk/Domain-driven-Design-Tackling-Compl... https://www.amazon.co.uk/Implementing-Domain-Driven-Design-V...

-----------------

If that doesn't do it for you, please just remember the good old CAP theorem.

https://en.wikipedia.org/wiki/CAP_theorem

hudon · on July 31, 2017

>> From a trust perspective, it makes no difference if your banking cartel is writing to a Quorum, Hyperledger, or Kafka instance.

> Of course it does. The protocol of blockchains makes them work with "proof of X". Appending to any event store, whether in Kafka or SQL does not require proof of anything.

The author should have qualified that from a user's perspective, it makes no difference. If my bank decided to store its users' transactions on a proof of work database, I wouldn't even know. Which is the author's point: it makes no difference from a trust perspective, I'm still trusting the bank to store and settle my transaction either way.

It's not proof of work by itself that makes something like Bitcoin trustless (again, from the user's perspective). It's the fact that both the proof of work and blocks are public and verifiable, thus I can validate the blockchain and make sure the miners are doing the work correctly (my transactions are there and the proof of work is valid). Proof of work without making the database public and audit-able by users is pointless. But if it is public and it's shown that the miners are not settling transactions as they should, then users can fork or move to a blockchain that doesn't censor transactions.

galeaspablo · on Aug 1, 2017

May I refer you back to

>> If you’re not looking at the public chain, you’re wasting your time

> I disagree. Not having a single point of failure (one place that can get hacked) is valuable.

I.e. if your chain isn't public there are benefits to using it.

If you suddenly say the same benefits can be obtained with kafka or a relational database, you will be introducing proof of something... Which means you'll now have a blockchain / distributed system based on a relational database. Which comes with the limitations imposed by the CAP theorem.

The most popular version of event sourcing produces such a high throughput, because immediate consistency is sacrificed. I'd like to see what the author proposes in a production system. Global rules would not be enforceable (e.g. no balance under zero), unless throughput is sacrificed to allow for immediate consistency.

eternalban · on Aug 1, 2017

> good old CAP theorem.

Sat in on a candidate interview recently whereby I heard the news that the blockchain "invalidates CAP". Not sure if it will also cure cancer.

buckie · on July 31, 2017

Preface: lead for Juno & ScalableBFT

First, some additional benchmarks:

* Juno (w/ hardcoded language): 500 tx/s

* TendermintBFT w/ EVM: 1k tx/s

* ScalableBFT w/ Pact: 8k tx/s

The thing about the high-performance private blockchains is that they are limited by the sequential smart contract execution performance. Juno ran an embedded "rough draft of a langauge" do it doesn't really count (not a full language, more like a complicated hardcoded counter). From TendermintBFT's docs, if memory serves, they say that if you hardcode a counter they hit +10k/s. For ScalabeBFT, it's it's ~14k/s. This is a minor difference, by the way, that isn't due to the consensus mechanism but more to the engineering of the system.

The reason for the non-hardcoded performance difference is that ScalableBFT runs Pact for smart contracts, which was designed to be a high performance interpreted language in part because of this bottleneck. Even if/when the EVM moves to WASM, the performance bump only impacts fat contracts by making them kill performance less. As in, if your 10k-step contract takes 200ms to execute and that drops to 2ms you can get 100x perf (not quite but it's fine)... but that only takes you from 5/s to 500/s and not to 1k/s or 8k/s.

The numbers above are for a simple coin transfer contract so the performance is mostly dependant on the read/write performance for keys in the DB. There's just not much contract level work to do when you're transferring coins between accounts so the WASM move won't bump things up much if any.

More broadly, I think that the article misses the point of private blockchains when it discusses them:

> I would discourage you from blockchain consortia if your intention is to never use the public chain and if you don’t care about Ethereum. I’m going to put it bluntly: if you’re not looking at the public chain, you’re wasting your time. The benchmarking numbers paint a pretty obvious story — Quorum will never give you the speed of Kafka, especially since blockchains get less efficient as more participants join (because of that pesky “consensus” thing).

They serve a specific purpose: being a multi-administrative DB. Distributed DB systems (like Kafka, raft-based systems, etc.) can't robustly/safely serve that end.

I have a longer comment about it here: https://news.ycombinator.com/item?id=14853521

GordonS · on July 31, 2017

> If used correctly, it is tamper-proof, just like the blockchain

Is tamper proofing typically a feature of event sourcing systems? If so, how is it implemented?

galeaspablo · on July 31, 2017

As olaonde pointed out,

> Yes in the sense that event sourcing systems typically [are]... "append only"

However, I'd caution that this is not the same at all as cryptographic / blockchain style tamper proofing. I can still go into the database and change things myself, or create a new one altogether.

There are some interesting ideas in the article. I.e. making a peg to a blockchain, but that won't solve the throughput / trust dilemma.

olalonde · on July 31, 2017

Yes in the sense that event sourcing systems typically have an "append only" data store which gives a full log over all state changes. That makes event sourcing particularly attractive for finance, gambling, etc.

GordonS · on July 31, 2017

But surely the append only property is rather weak, unless it is backed by physically read-only storage? Otherwise the system itself may not be able to change existing events, but a rogue admin or other bad actor could.

tonyhb · on July 31, 2017

Yes, the idea behind a blockchain is a merkle tree of events, metadata and salt that hashes with a predetermined number of 0's (the difficulty) as a prefix.

If you wanted to make your event stream more tamper-proof you could also introduce a merkle tree-like row which hashes the current event, the previous hash and a secret key. Any tampering of the data would need this key and would need to recompute future events.

It's still not perfect because with the key this is easy to do.

olalonde · on July 31, 2017

Yes, event sourcing is just a loosely defined pattern. It does make tamper proofing a bit easier due to its log like nature. I know some systems do use physically "append-only" storage.

GordonS · on July 31, 2017

Do you have any more info on these RO storage systems? What springs to mind is... tape?!

antonvs · on Aug 1, 2017

There's some info here:

https://en.wikipedia.org/wiki/Write_once_read_many#Current_W...