If in 2020 you still choose Kafka as your messaging infrastructure, you are well...

sixdimensional · on July 4, 2020

So thank you for sharing this. I wasn't explicitly recommending nor condemning Kafka (or Confluent). Your input is definitely valuable.

I was implicitly referring to the fact that Confluent is now trying to position their product as being a database (something I think is not really the right use case for that product), and the irony of the fact that the database/queue discussion in this article is from '95.

I do believe the unbundling of some database features have given us tools from those monoliths to use in our software architectures which are really handy.

Regarding your comment about, "nobody ever got fired for Kafka" - I've been in our industry long enough to have lived through this old saw several times, so I get it. I don't think people should go into Kafka for heavy use cases without realizing they will end up paying Confluent, and that it won't be cheap and/or perfect. I'm not suggesting people go with Kafka or Confluent for that or any specific reason, unless they prove it is the right tool for their job.

You seem to be proposing discarding Kafka completely in favor of Pulsar. I have not had the privilege of implementing Pulsar, heard about it, will need to look into it some more. I take it you have personally implemented it? Are you part of the project? What gives you the confidence that Pulsar will not become the next Kafka when the next great tool comes out? I'm genuinely interested.

Personally, we have a handful of use cases that Kafka works fine for. There are some uses cases that it promises or suggests possible, but still falls short, that we would like to have in our toolbox. They are not the common use cases it is used for. We specifically are NOT using it as a database. We are also not pushing Kafka nearly anywhere as hard as companies like LinkedIn or others do.

So given this article is about how queues are databases, I'd like your opinion on the pattern this suggests. What was your take on things like RethinkDb, which had great real time change notification (but not really messaging). Or, how about the direction Amazon QLDB seems to be going, with streams emitted from a immutable ledger for storage? Do you see any actual database which has a great story that addresses this feature?

These are all interesting tools, but personally the pattern of an unbundled, immutable transaction log, with change notification feeding a messaging system feels like a helpful tool to have in the architecture tool chest.

Your thoughts?

ramraj07 · on July 4, 2020

Wow, now pulsar as well. As someone who isn't full-time trying to keep up with this tsunami of names it's just impossible to keep up. Apache (or someone, anyone please) needs to make a matrix of all their own competing technologies and what the actual differences are between them. It's just impossible!

taywrobel · on July 4, 2020

I feel ya. Software is a world that’s constantly evolving.

Apache is great at software engineering, but sorely lacking in product design. Because open source software is almost definitionally not a product, but a tool.

With that comes increased bifurcation of the tooling when different requirements arise, and increased complexity with running it. Kafka and pulsar both have zookeeper as an external dependency, for instance. Pulsar has an extra dependency even in bookkeeper, one of the few things I’ll readily fault it for. It’s a stark contrast to openly commercial products like CockroachDB, which has a single static binary, with symmetric nodes, and built in management UI. It’s a product, not a tool.

FWIW, Apache has a project directory - https://projects.apache.org/projects.html

But it’s a far cry from a comparison matrix as you (and I think many other confused and disheartened engineers) desire.

jlokier · on July 4, 2020

It's not due to bifurcation.

It's because Apache "adopts" products that are created independently by other companies who then want to open source their product, and leave to it someone else to look after.

Kafka was created at LinkedIn and eventually donated to Apache Foundation.

Pulsar was created at Yahoo and eventually donated to Apache Foundation.

MrBuddyCasino · on July 4, 2020

People know Kafka, it works well, AWS will sell you a managed version, using Protobuf or Json payloads isn’t an issue, and the conceptual model is easy to understand.

Pulsar may be better, but is it better enough to displace an entrenched piece of core software at the heart of an enterprise?

maps7 · on July 4, 2020

First time I've heard of Pulsar