Redpanda – A Kafka-compatible streaming platform for mission-critical workloads

bouk · on Nov 16, 2020

Who called it Kafka-compatible and not Kafkaesque

tincholio · on Nov 16, 2020

Because someone beat them to it https://kesque.com/

agallego · on Nov 16, 2020

Haha. Too good.

ckdarby · on Nov 16, 2020

Conflict of interest: I'm happily using Pulsar, I come from an extensive Kafka system, and would like to see Pulsar win this entire space personally.

I see some differences, instead of Pulsar functions RedPanda has gone the extra step of using WASM, suspect the Pulsar community will end up going this direction as the whole community begins this push forward.

Got rid of Zookeeper, I've never truly understood the hatred toward Zookeeper aside from the viewpoint of it being one more external dependency the project requires.

Compatible Kafka API, this is a smart business choice to grab up any business using Kafka that is unhappy with the operational costs of Kafka and want to move off. Pulsar has a connector for Kafka which lets a business leave their existing work entirely untouched and stream to the new source taking the strangler pattern approach. The problem with compatible API is still the fact you need to touch the running system to point over from Kafka to RedPanda and then it opens the cans of worms on how to handle aborting RedPanda roll out and switching back to Kafka without losing data. Business now needs to modify all their existing code where Kafka is producing to also now write to RedPanda.

The other option I see is just the same which is RedPanda has a connector to Kafka and only streams off the existing Kafka which kind of makes the API compatible IMO pointless aside from marketing & sales standpoint with customers.

voctor · on Nov 16, 2020

> aside from the viewpoint of it being one more external dependency

This is exactly the point. One more dependency, one more executable. Redpanda get rid of that.

ckdarby · on Nov 16, 2020

Kafka also has a KIP to get rid of Zookeeper and I see a bunch of the issues related to the KIP resolved and looks like it should be happening this year.

Doesn't this just internalize the dependency within the project itself? Isn't Redpanda taking on all the effort that Zookeeper has been doing for years, all the edge cases, all the additional support and now the coupling of it within the very project itself?

agallego · on Nov 17, 2020

we have exactly 1 replication protocol. That's raft.

We spend a ton of time ensuring it's correctness

1. https://vectorized.io/validating-consistency/ 2. https://vectorized.io/kafka-redpanda-availability/

That is our essential complexity. If you are trying to replicate data to machines, you need to replicate data to machines. We chose raft as the only way. In essence we are much simpler than upstream w.r.t protocols for data replication.

atombender · on Nov 16, 2020

With Pulsar you have to run (as I understand it) not only ZooKeeper, but also Apache BookKeeper. Operationally, Pulsar sounds even more complex than Kafka.

I've never managed any of these, but I know that both ZK and Kafka have a reputation for being operationally complex. I've read comments by other people on HN about Pulsar being complex, too.

I'm optimistic about Pulsar becoming a widely deployed tool once they can get rid of the ZK dependency. In particular since Pulsar seems quite friendly to non-Java languages, while BK requires Java on the client and does not, and will not ever, support other languages.

blackoil · on Nov 16, 2020

Running five node zookeeper cluster was pure overhead.

ckdarby · on Nov 16, 2020

>Running five node zookeeper cluster was pure overhead.

I have not experience this issue. I run a 5 pod ZK in K8s with each pod's memory: 256Mi & cpu: 0.1 for a couple hundred thousand messages a second with Pulsar.

I think 1.5 Gi and half a core for handling quorum & metadata locking for a stream storage isn't exactly what I would consider overhead. It isn't like deleting ZK tomorrow will not mean that Redpanda doesn't take on the additional resources itself.

agallego · on Nov 16, 2020

I'm Alex, the founder. Happy to answer any questions.

AlejandroM_E · on Nov 16, 2020

Hey Alex, Alejandro here :wave:

I was a bit surprised to see the announcement about making Redpanda open source (although very interesting for poking around!).

Question: are you, as a company (Vectorized) pursuing the same business model as the Confluent one? OSS Kafka, but paid official support, cloud, connectors and registry?

agallego · on Nov 16, 2020

We had success with very large customers (enterprise) and also some success with finance company (looking for 0 data loss). The idea is that we can give the kafka API compat system with 10x lower tail latencies for free and still monetize the high end of the market.

I think there is a big shift when you have a single binary. i.e.: no one really complains from running ngix, etc because it's easy to get up and running.

so the gist we wanted to let everyone use it and reserve the right to be the only hosted redpanda provider.

gtirloni · on Nov 16, 2020

> so the gist we wanted to let everyone use it and reserve the right to be the only hosted redpanda provider.

It seems totally fair to me. Good luck with this approach. I think yours odds are good.

agallego · on Nov 16, 2020

also i'd note one more thing which is the WASM API is going to give us an opportunity to add more enterprise value.

i.e.: think GDPR compliance as long as your data streams are in JSON or some predefined format.

WASM basically allows us to push computational guarantees to the storage engine itself without a separate cluster.

porker · on Nov 16, 2020

Is there a blog post or docs showing using WASM? I searched your site but couldn't find details.

I've failed to grasp if it's an "alternative plugin-engine" kind of system to extend Redpanda, or you're storing data as WASM and therefore it's executable (to take your example: GDPR compliant auto-expire if it's past a certain date).

cookiesboxcar · on Nov 16, 2020

what does WASM mean in this context?

sorenbs · on Nov 16, 2020

https://webassembly.org/

cookiesboxcar · on Nov 16, 2020

Thanks -- I had never considered using webassembly in this way, so I thought it was something different.

gunnarmorling · on Nov 16, 2020

It is not open-source, it is source-available. That's a perfectly valid choice, but the two approaches shouldn't be conflated.

newswasboring · on Nov 16, 2020

I am guessing the difference is that

1. You can't just take this and make money from it

2. They don't accept or expect external inputs.

agallego · on Nov 16, 2020

Hi there. Of course you can make money using it. The main restriction is to not run redpanda as a service. Otherwise have fun.

Its the same as cockroachdb in that regard

ipsum2 · on Nov 16, 2020

How can you run redpanda any other way? :)

Lachedup · on Nov 16, 2020

feel free to embed, ingest customer data, run on a saas application. The only restriction is hosting redpanda as as service for other customers (Think AWS MKS)

If you have any questions or think your use may be confusing please reach out to us.

_msw_ · on Nov 16, 2020

If I am reading the fields of this implementation of BSL correctly, it will be open-source software in 10 years, when it converts to Apache 2.0, but not today. Until then, the license does not comply with the open source definition due to field of use restrictions.

Lachedup · on Nov 16, 2020

" Our intention is to deter cloud providers from offering our work as a service. For 99.999% of you, restrictions will not apply - welcome to our community!"

ensignavenger · on Nov 16, 2020

"For 99.999% of you, restrictions will not apply"

This is simply not true. The restriction applies to 100% of everyone using the software under the license.

If an alternative service provider can't legally host the service for me, I am restricted from selecting an alternative vendor if my needs converge from the available vendors offerings.

Further, it still isn't open source.

smarx007 · on Nov 16, 2020

Yes, all of us are bound by the terms alike but I think everyone understands it was meant as commercial and non-commercial use for 99.999% of you will not trigger the respective clauses in the license. And the post title literally says "Redpanda is now Free & Source Available".

ensignavenger · on Nov 16, 2020

But there was at least one ancestor comment that said it was open source, and that is what I was replying to.

tomp · on Nov 16, 2020

Hi Alex, I'm an amateur programming language enthusiast & designer, so I've a question related to this.

Looks like the project is mostly written in C++ and Go. What was the reason for this choice? Have you considered other languages, like Rust, Zig or similar instead of C++? TBH not sure what an alternative to Go would be, maybe JVM AOT-compiled with NativeImage (but AFAIK that's still experimental).

Did Go's GC and/or C++'s lack of GC help/impede the project? IMO memory management is one of the main differences between languages... the other is concurrency / memory model / undefined behavior, where JVM is significantly ahead of the rest (no undefined behavior), I'm not sure exactly where Go stands (there seems to be a memory model, but no mention of undefined behavior or lack thereof).

agallego · on Nov 16, 2020

Hi there!

Sure thing. I basically was playing with an RPC framework to make my old company project fast in 2017 see github.com/smfrpc/smf

I wanted to play with dpdk then.

When I started this in 2019 I wanted to use a framework that was battle tested, so seastar.io fit the bill.

There is also a big part of it that I had been professionally programming in c++ for many many years.

Last, this evolved from a prototype on my laptop in miami to company.

I think rust would he am excellent choice as well and I bet we'll write a bunch of rust in the not too distant future.

bashcoder · on Nov 16, 2020

Hi Alex - This looks a lot like what Scylladb did to unleash the potential of the Cassandra space, with a fully optimized C++ rewrite of an Apache project. Did you draw some inspiration from their efforts?

As a satisfied Scylla convert, I'm looking forward to trying Redpanda.

agallego · on Nov 16, 2020

hi there! scylla is great :) we are built on the same framework seastar.io so we literally share code bases in that regard.

Main departure is we are only API compatible. It was an explicit choice to use Raft vs ISR and to not use ZK, etc.

but indeed, seastar is a really fun framework to build storage systems in. I know ceph is also doing a re-write of a subsystem in seastar for example.

runT1ME · on Nov 16, 2020

I'm a fan of scylla too, but if I could go back in time I'd have recommended waiting until mid 2019 to migrate. 'Fully optimized C++ rewrite's tend to take years to become battle tested.

agallego · on Nov 16, 2020

indeed. that's why we continuously _empirically_ prove that we are in fact safe - https://vectorized.io/validating-consistency/

there is no substitute to testing tho

there are 2 levels here. 1) raft has a proof (and a great phd dissertation from diego), but what matters is if we actually implemented it correctly. so 2) is we need to continuously test it. Denis did a lot of similar work at CosmosDB (microsoft) and has spent his career working on consensus.

Hopefully these eases some concerns.

bashcoder · on Nov 16, 2020

In fact, we couldn't switch to Scylla prior to 2019 because their support of incremental counters wasn't yet complete.

agallego · on Nov 16, 2020

totally. today if folks. need txns, they wouldn't be a good fit for us. What we found is about 90% of use cases are covered by the base api. For reference w/ all of the versioning there is something like 144 api calls you can make to kafka, most ppl use a small subset of those via high level clients. (java, python, librdkafka, etc)

polskibus · on Nov 16, 2020

Hi, I need to decide in the near future on a scalable messaging solution with at-least-once or better guarantees for our own SaaS platform. I'm looking for something that can be deployed on-prem, that's not targeted only for the public cloud. Do you have any documentation on how best to deploy redpanda with Docker and/or Ansible? What are the best practices on rolling out the cluster in-house? I've seen some docs on tuning that referred to AWS types of instances, is there something that you could refer me to that is more generic and not tied to a specific cloud vendor?

I'm asking partly because we must be able to offer closed on-prem installations as well as SaaS on a cloud. I'm looking for a low-ops component that will not fail me (as often as alternatives would) :)

Lachedup · on Nov 17, 2020

we are happy to help and answer any questions. Join our community slack channel or hit the contact us button on vectorized.io

eis · on Nov 17, 2020

Hey, congrats on publishing your source. It's a very interesting project indeed. I took a look around and I think with a bit expanded documentation, especially examples using the WASM transformations and maybe a some emphasis on durability and other guarantees it could grow into a great project.

I'd be interested in the write amplification since you went pretty low level in your IO layer. How do you guarantee atomic writes when virtually no disk provides guarantees other than on a page level which could result in destroying already written data if a write to the same page fails - at least in theory - and so one has to resort to writing data multiple times.

timhigins · on Nov 16, 2020

Hey, looks really interesting. Would love to see benchmarks that are open source so we all can check it out in more detail.

Seems like a lot of questions on here are just: how much better is the performance and why?, so maybe showing is better than telling.

Best of luck!

agallego · on Nov 16, 2020

hi tim. absolutely, we exect ppl to test. public benchmarks coming in December. We'll probably just pick the open messaging benchmark that has been going around pulsar+kafka communities so folks can have a frame of reference.

Though what is to me more interesting, is what happens when you inject failures while the benchmarks are running.

patrec · on Nov 16, 2020

What's end-to-end latency for a local cluster like? What's your business model?

I'm a big fan of kafka as an abstract building block, but not so much the actual implementation, which is as painful to setup as a consultancy-based business model might make you suspect it would be, especially if you need reliability. The other problem is that performance kind of sucks, apart from potential latency spikes due to GC pauses I found even the average latencies for reliable end to end (in a fast local network and on decent sized hardware) not in right order of magnitude ballpark.

Lachedup · on Nov 17, 2020

Regarding biz model.

-Redpanda (free) - comparable features to Kafka, no limits -Redpanda Enterprise (paid) - Additional features (security, WASM, tiered storage, support etc) -Vectorized cloud (Free and paid tiers) - Hosted in AWS+ GCP

dilyevsky · on Nov 16, 2020

Hey Alex, this looks very interesting! From browsing the docs it appears rpk tries to mess with cgroups and some kernel settings... is there compatibility mode for running unprivileged in a container environment such as kubernetes? Thanks!

benpope · on Nov 17, 2020

The tuning isn't 100% necessary, it's there primarily for running max performance on metal, and you can pick and choose what to tune.

In general it probably best to run with:

--smp <n> --overprovisioned

for a container with <n> CPUs.

These are standard Seastar flags.

agallego · on Nov 17, 2020

basically what ben said.

we also have a container image `vectorized/redpanda:latest`

sorenbs · on Nov 16, 2020

It sounds like Vectorized is a streaming platform built on top of Redpanda. Is that the right way to think about it? If yes, then why not build it directly on top of Kafka instead?

agallego · on Nov 16, 2020

We kind of started on a different note. We wanted to solve some fundamental problems (no Zookeeper - similar to KIP-500) and most of all we wanted a single binary we can ship around, something that is easy to run.

I think people LOVE the kafka _api_ but they have a hard time operating clusters at scale. So we decided to keep the same API but solve the problem of operational complexity.

MrBuddyCasino · on Nov 16, 2020

> I think people LOVE the kafka _api_ but they have a hard time operating clusters at scale.

That is very true, and this is what you should emphasise. Wish you the best of luck!

The actual speed of Kafka has rarely been a concern (but huge numbers of partitions are, which makes rebalancing a pain) in my experience, in fact it was mostly overkill. But operational complexity was definitely an issue!

agallego · on Nov 16, 2020

Will do. Thank you. Do you have any thoughts here. How would you say it differently. We tried that on the landing page

MrBuddyCasino · on Nov 16, 2020

Landing page isn't bad, when you scroll the important points are there, but you have scroll. I'm not a "landing page optimisation guru", so take this with a grain of salt, but I would change it as follows.

Without scrolling, all the text thats displayed is this:

"Redpanda

A Kafka® API compatible streaming platform for mission-critical workloads.

Try Redpanda today"

It tells me what it is, which is good. It does not tell me why it is better, for that I have to scroll. Kafka is already suited for mission-critical workloads, so that is not a unique value proposition. Maybe:

"Redpanda - 100% Kafka API compatible, but without the headaches. Forget Zookeeper, forget rebalancing issues. Instead, enjoy reliable message delivery, 10x faster speed and ultra-low latencies due to our thread-per-core architecture."

Something like that, plus a visible "call to action" button, maybe "try it out" or "download".

Could also think about a pretty graph comparing latencies or smth. People love pretty graphs.

agallego · on Nov 16, 2020

Sweet thanks! passing this to the designer.

agallego · on Nov 25, 2020

@MrBuddyCasino - I tried incorporating the design on the site we launched 5 seconds ago. lmk what you think (alex @ vectorized.io ) is my email in case you feel inclined :D thank you tho.

manhcuongbk56 · on Nov 17, 2020

Hi Alex, i don't see any client document. Can i use kafka java client with it?

benpope · on Nov 17, 2020

This should work fine. If you have any issues, please report as a bug.

rhodin · on Nov 16, 2020

What do you mean with "Resilient to Scan-The-World"?

ryanworl · on Nov 16, 2020

I'm not Alex, but I would assume he means scanning cold data will not evict all the hot data from the cache. This is a common problem with cache eviction algorithms.

agallego · on Nov 16, 2020

indeed.

max_streese · on Nov 16, 2020

Hi I like that there is a competitor to Kafka in this space and also the build in capability to do transformations. I got a few questions though which I could not find in your docs:

(1) Over at the Apache Arrow FAQ I read that the overhead of serialization in analytical frameworks can be around 80 to 90 % of total compute costs (r_1). While having no concrete numbers on this, from using Kafka together with Kafka Streams I can at least confirm that the overhead of serialization is (very) significant. My question therefore is: Does your WASM engine avoid (de-)serialization between your storage/stream layer and the engine and if not are there plans for this?

(2) Are supported WASM transformations stateless (i.e. single message) only or can they be stateful (i.e. window-ing and stream-stream/table join functionality)

(3) I could not find any reference to the WASM inline lambdas at all in the docs actually, am I missing something?

r_1: https://arrow.apache.org/faq/

agallego · on Nov 16, 2020

great questions.

(1) arrow is great! currently, it does not, but yes it will when we move out of nodejs impl into our own v8 isolates inside an alien thread (seastar concept)

(2) stateful but only for a single partition

(3) will be released in the next week or so. If you look in the github repo you can look into `coproc`

max_streese · on Nov 16, 2020

Ah thanks, really exited for (1)!

Though the explanation went a bit over my head just one (two) follow up question(s): How did you end up with WASM for the inline lambdas? Did you have some discussion on alternatives like Lua? I am curious about insights on choosing scripting engines/implementations, hence why I am asking.

A different question for this could also be if alternatives to V8 where considered I guess as I believe there are quite a few pure WASM engine implementations out there (unless I did not get some feature that requires you to use V8 and rules out pure WASM engines).

agallego · on Nov 16, 2020

1) multi language support. Ppl love go and rust. So some way to open it up to that community.

2) I like Lua a lot. Such a good tool. It just didn't have the target compatibilities of wasm + the security guarantees of wasm

Afaik, there are just a couple vms with superb performance. V8 and anotherone targeting x86 only. We want to support the AWS graviton instances, so only one choice

ch · on Nov 16, 2020

For those eager to get a look at the code it can be found here: https://github.com/vectorizedio/redpanda

BenoitP · on Nov 16, 2020

Nice!

It'd be great to know where do the improvements came from.

Is it from a different architectural design, is it from better exploitation of new hardware, is it from JVM limitations? For example, when Valhalla and Loom land in the JDK, how much of this improvement will stay?

fnord123 · on Nov 16, 2020

I can't find the benchmarks so it's not possible to know what they measured, how they measured it, and if it's a relevant measurement.

Skimming the repository it doesn't seem to be an unreasonable claim. C++ with Seastar compared to Scala on JVM. It's in line with the Scylla/Cassandra improvements.

fnord123 · on Nov 16, 2020

According to this Data Engineering interview (beginning around 8m in) the improvements begin with moving from the JVM which has a latency impact. But the real improvement is that fsync on a file handle puts in a barrier in the FS. RedPanda does some work around batch coalescing, they skip the page cache to side step a bunch of kernel locks, and adaptive allocation (preallocating FS space):

https://vectorized.io/dataengineering-podcast-interview/

The throughput is 4x, the latency is 10x.

agallego · on Nov 16, 2020

Indeed. See the segment appender work in the github with the chunk cache which sums up our write behind strategy.

The reads are more complex since it involves understanding io priority scheduling etc

fnord123 · on Nov 16, 2020

>write behind strategy

You ack writes before they're written?

rystsov · on Nov 16, 2020

Yes and no, we do both. For ack=-1 we wait for fsync & replication confirmation but for other modes we relaxed the behavior.

agallego · on Nov 16, 2020

not sure how you got here tho, maybe i miss explained something. This is talking about filesystem api not about kafka api. the kernel does something similar, standard filesystem things just purpose built for our use case.

fnord123 · on Nov 16, 2020

afaics a write behind strategy means you will report a write as being complete before it has actually been written. This can result in terrific performance because your batching window is larger and you can amortize more (fewer?) iops. But it also means in the face of failure, clients could believe something was written which wasn't.

On the other hand, the storage device can fail so what does it even mean to have written the data? :)

agallego · on Nov 16, 2020

our acks are similar to kafka in that regard. acks=0,1 leader acknowledgement. acks=-1 we go further like denis mentioned above by 'flushing' to disk before responding which is stronger (log completeness guarantee from raft) than available in upstream.

agallego · on Nov 16, 2020

Sure. Gists are

No use of page cache instead custom built read ahead and write behind strategy

Io scheduling groups from seastar

Custom intra cluster RPC to optimize for fragmented buffers native support (most storage work is small materialized metadata and large blob)

Thread per core design with zero locks on the hotpath

If you watch our chat with the LinkedIn kafka team, you will learn a bit more too

Hopefully this helps

steeve · on Nov 16, 2020

Better exploiting hardware (via seastar and the one-thread-per-core model, see ScyllaDB vs Cassandra also) and probably better architecture.

BenoitP · on Nov 16, 2020

This is sort of what I'm getting at. The one-thread-per-core part should bring little to no benefit vs JDK+Loom, for example.

agallego · on Nov 16, 2020

Its a bit deeper.

Custom strategies for reading and writing have a huge impact. (orders of magnitude for specific workloads rather than generic page cache scheduling)

Optimizing for fragmented buffers. Etc. See the LinkedIn vid from the blog posts for more low level deets.

markpapadakis · on Nov 16, 2020

This is fantastic news. What would you say were the hardest aspects of the design and implementation, in terms of effort and thinking that went into those accomplishments ?

agallego · on Nov 16, 2020

I would say, building an scalable raft implementation has been by far the hardest thing. Mostly because we started from scratch and bypassed the page-cache with our read-ahead and write-behind strategies, etc. so building tiers of caches, etc and have a working raft system was hard.

Here is a post on building a dummy key-value db to test jepsen https://vectorized.io/validating-consistency/

atombender · on Nov 16, 2020

This looks awesome!

I've been waiting for a long time for someone to think outside the JVM, and I really hope this is a growing trend. The "big data" industry has seemingly been joined at the hip with Java ever since Hadoop came onto the scene, and the Apache community in particular has a lot of apps that are deeply unfriendly to non-Java apps. For example, you can't use Apache BookKeeper from a non-Java app.

Would you say Redpanda is ready for production use?

agallego · on Nov 17, 2020

We have production users today.

We hope to announce our early product partnerships in the coming months. Big Corporate law is incredibly slow.... you have no idea. haha.

fancy_pantser · on Nov 16, 2020

Is Redpanda compatible with Faust (Python stream processing from Robinhood)? I really don't want to use Kafka, but when I must, Faust makes it straightforward. In fact, I wonder if opinionated client libraries/modules are in your interest to develop as well; they could lower the "time to implement" story for your offering.

https://faust.readthedocs.io

agallego · on Nov 16, 2020

indeed! faust uses the regular python client. :) we try to work w/ the full ecosystem. if it doesn't it's a bug. so give it a shot and let me know. feel free to jump on slack if you want real time help too.

returnfalse · on Nov 16, 2020

This is awesome! Can’t wait to try it out.

One question: I assume RedPanda using raft to replicate the topic content not just metadata. Is that correct? If so, how does it perform better compared to Kafka’s ISR? Since raft might be slow for this kind of workload. If I remember correctly, Liftbridge was using raft for log replication and switched away from that because of the performance problems.

agallego · on Nov 16, 2020

Checkout this _extremely_ detailed work that explains exactly that - https://vectorized.io/kafka-redpanda-availability/

To quote my colleague Denis - The results fit the theory. Raft’s (and Redpanda’s) performance is proportional to the best of the majority of nodes, while sync replication (Kafka) works only as well as its worst-performing node.

AFAIK we can push the limits of hardware on throughput.

The gist is that I'd need more details exactly on what you mean raft is slow.

We support the same level of acks as kafka except acks=-1 is much stronger gurantees due to the log completeness guarantee of raft.

for 0 and 1 we short circuit the raft acknowledgement and return to the client to match the exepectations of acknowledgements come to be known by users.

There should be no perf penalty vis-a-vis kafka in any setting I can think of. If it is, is probably a bug on our side.

espadrine · on Nov 16, 2020

Outstanding article!

Are you hoping to achieve no-downtime leader kills by going with the latest Paxos improvements, as you mention at the end?

agallego · on Nov 16, 2020

So many goodies to come.

First we need to finish multi partition transactions that we don't yet support.

We need to balance eager cluster balancing based on saturation (network, cpu, disk, etc) with other performance improvements.

As is the cluster achieves steady state in a couple of seconds for affected partitions and clients in Kafka retry.

returnfalse · on Nov 16, 2020

It's interesting to see that there are still opportunities for further improvements in raft. Thanks for the reply!

tmd83 · on Nov 16, 2020

It says XFS only for data directory in the setup guide. Curious to why that is and how does this work in MAC if that's the case.

steeve · on Nov 16, 2020

Redpanda is built on top of Seastar, as is ScyllaDB, so the reasons might be closely related: https://www.scylladb.com/2016/02/09/qualifying-filesystems/

txdv · on Nov 16, 2020

I was so happy about just running a single small binary for kafka, but then I saw this requirement that is harder to fulfill :/

agallego · on Nov 16, 2020

Why, you can run on a single core and 1G of ram.

We just give you optimal prod settings for saturating boxes.

See rpk container

bithavoc · on Nov 16, 2020

So I don’t need XFS to run a single-node Redpanda cluster?

agallego · on Nov 16, 2020

correct. xfs is just what we test all the time.

agallego · on Nov 16, 2020

Mostly for sparse file system support. We dispatch IO eagerly and out of order.

txdv · on Nov 16, 2020

How much of a hard requirement is this? Can I play around on non XFS?

benpope · on Nov 16, 2020

This is not a hard requirement, but XFS is highly recommended for production.

bithavoc · on Nov 16, 2020

Is there a compatibility table available? Are compact topics supported?

agallego · on Nov 16, 2020

yup compacted topics are supported. see the spill-key-index for implementation https://github.com/vectorizedio/redpanda/blob/dev/src/v/stor... if you are interested.

iAm25626 · on Nov 16, 2020

Would this work with ksqlDB out the box?

agallego · on Nov 16, 2020

i haven't tested myself. the kafka ecosystem is huge. but we should test.

followtherhythm · on Nov 16, 2020

This is so dope, I am trying it out asap!

agallego · on Nov 16, 2020

Let me know if you run into issues - vectorized.io/slack

the entire eng team is there.

tjj5036 · on Nov 16, 2020

Congrats to Alex and team on the launch!

agallego · on Nov 16, 2020

thank you! :)

bluestreak · on Nov 16, 2020

are there plans to support Kafak connector API, specifically PostgreSQL?

agallego · on Nov 16, 2020

Most kafka connectors just use the kafka API. So should "just work" (tm). Hehe. Let me know otherwise

sorenbs · on Nov 16, 2020

What would this do? What is your use case?

marco_craveiro · on Nov 16, 2020

I second this request, I'd be very interested in any articles that talk about PostgreSQL integration with Kafka. Last I heard on this topic was debezium [1] but I quite liked the simplicity of Bottled Water [2].

[1] https://debezium.io/documentation/reference/connectors/postg... [2] https://github.com/confluentinc/bottledwater-pg

bluestreak · on Nov 16, 2020

this would persist kafka traffic to database to make it available long term and at high speed :)

agallego · on Nov 16, 2020

nice thinking! we should test. i don't see any api incompatibility there.

chrislusf · on Nov 16, 2020

If messages are batched, would there be any performance advantages over Kafka/Pulsar/etc with thread-per-core architecture? The context switching cost would be amortized?

agallego · on Nov 16, 2020

message batching and locking can be thought of as orthogonal pieces. In practice, yes there are advantages still of using thread per core, but perhaps not for the reasons you think. It has to do with core-local metadata materialization for a subset of the requests. THere is effectively essential complexity (i.e.: to replicate data, one must replicate data), but the TpC is strictly an optimization for flatter tail latencies.

chrislusf · on Nov 16, 2020

Thanks for the insights!

Are the tail latencies critical though? Messages are expected to be async and delivery latency usually are not hyper critical.

agallego · on Nov 16, 2020

Depends on use cases. We are working with a couple of security companies doing intrusion detection and for them, writing to disk on anomaly + notification seems to matter. Also financial services wanting to leverage open source tooling like spark or tensor flow still care about latency.

pachico · on Nov 16, 2020

Awesome! Great project and best of luck!

agallego · on Nov 16, 2020

thanks! let us know how the testing goes :D

manhcuongbk56 · on Nov 17, 2020

Can i use kafka java client with it ?

greenkey · on Nov 16, 2020

Wouldn’t it be valid to consider Kafka/RedPanda a bottleneck and another point of failure that may delay data getting to a destination?

In some cases, the performance, efficiency, and reliability gains from caching and consolidation make sense.

But, I’ve seen enough poor architectural decisions and lack of architectural oversight result in use of various log streaming, cloud messenging, app monitoring, object DBs, etc., all discounting the request overhead in time and traffic, points of failure, and overall complexity for some false sense of scalability enough to where things that seemed cool ten years ago make me physically sick now.

What are some questions to use to help determine whether Kafka/RedPanda actually make sense to use, without having to first baseline, then implement, then compare request time, reliability, and data freshness to gauge whether it was worth it?

BTW- I think there are valid cases for using it and appreciate all of the work!

agallego · on Nov 16, 2020

Absolutely.

If you use transactions, we don't support it yet. We are in active development here.

Notice that folks love speed but the operational simplicity of having one binary/fault domain makes a lot of our enterprise users use the tech.

Last is if you like the product direction which is to my knowledge fundamentally different from the other engines out there. WASM in particular solves around 60% of all streaming work we see in the wild. It is effectively good at one shot transformations (gdpr, simple enriching, simple connection to downstream system like elastic, etc) as well as tiered storage.

Think the idea was to build something as easy as nginx - apt-get install redpanda and et voila

I hope to continue to focus on the developer experience.