Python Kafka Client Benchmarking

dpkp · on June 16, 2016

kafka-python maintainer here. Our library is designed to be correct first, easy to use second, and fast third. It should not be surprising to anyone that using C extensions improves python performance. I have avoided requiring C compilation in kafka-python primarily because I've found that very few python users care about processing >10K messages per second per core (remember in python w/o C extensions you are generally bound to a single CPU, so spinning up multiple processes usually improves performance. see multiprocessing). I've also found the python infrastructure for distributing C extensions to be not easy (see goal #2 above). But that is changing! I would definitely consider leveraging C extensions for wire protocol decoding given the recent improvements to wheel distribution on linux. I'm not sure whether I would go so far as to delegate the entire client to a C extension. Part of the fun of python is that you can play with all of the guts at runtime. I've found users are very willing to hack up kafka-python internals to help debug issues. I dont think I could expect the same community involvement if it was all distributed as a complied C extension. But I could be wrong.

Anyways, always fun to read benchmarks. I hope kafka-python makes someone out there smile. That's the best benchmark in my book.

pwang · on June 17, 2016

Distributing Python +C extensions are easy with Conda.

https://conda-forge.github.io/

pixelmonkey · on June 16, 2016

My team at Parse.ly also did a benchmark comparing pykafka (pure Python) to pykafka with the librdkafka C extension enabled. That C module is clearly a huge win for Kafka consumer/producer performance on Python and other dynamic languages.

http://blog.parsely.com/post/3886/pykafka-now/

Unfortunately, as the OP illustrates, there are now 2 widely-used Python + Kafka drivers (pykafka and kafka-python), and as of recently, a third, confluent-kafka-python, which is a thin wrapper over librdkafka.

The reason there's all this fragmentation is because Kafka was quite the moving target for non-JVM languages for the past three years. We have used it in production since Kafka 0.7, so we've had to live through it all blow-by-blow. I'm hoping that with Kafka 0.10 recently released, we can finally unify the community around a single driver (somehow).

dpkp · on June 16, 2016

I enjoyed your blog post, but I don't think this is a fair characterization: kafka-python is not "mostly 0.9+ focused..." kafka-python is the only driver that is both forward and backwards compatible w/ kafka 0.8 through 0.10. As I'm sure you remember, kafka-python was the original 0.8 driver, written to support the 0.8 protocol b/c Samsa (pykafka's previous incarnation) was only supporting 0.7 and did not have any plans to upgrade.

jdennison · on June 16, 2016

There is obvious value in a having a pure python implementation of a kafka client. Many deployments don't want C extensions or want to use pypy. However, as python's scipy stack has shown, the right python api wrapping C code can have a vibrant community and the speed to boot.

pixelmonkey · on June 16, 2016

@dkfp Apologies for that, I did not mean to mis-characterize. PyKafka also goes from 0.8 => 0.10. I had assumed kafka-python recently switched to be 0.9-only due to all the changes related to consumer groups.

dpkp · on June 16, 2016

No apology required. Though note that pykafka requires >=0.8.2 , and is only forwards compatible w/ newer brokers. This means that pykafka implements the 0.8.2 feature set. Newer brokers support that feature set, but you are not taking advantage of 0.9 or 0.10 features if you connect to them. kafka-python, on the other hand is both forwards and backwards compatible. It supports all feature sets: from no offsets in 0.8, to zk offests in 0.8.1, to kafka offsets in 0.8.2, to group management in 0.9, to message timestamps and relative-offset compressed messages in 0.10. The feature set to use is chosen based on the broker version we're connected to. As far as I know, no other client supports this approach -- not python, not java, etc. [Though KIP-35 should open this up to other clients for backwards compatibility starting at 0.10]

emmett9001 · on June 17, 2016

Pykafka does currently have support for 0.9 group management, and we intend to add support for message timestamps and the other new 0.10 features. We're not, however, detecting the broker version and turning on features on that basis. Instead we prefer to let the user explicitly enable the features they're interested in using.

iamspoilt · on June 16, 2016

I ran a couple of Kafka client benchmarks using Python, Jython and Java and got pretty interesting results. Check them here: http://mrafayaleem.com/2016/03/31/apache-kafka-producer-benc...

mountainriver · on June 16, 2016

Would have been interesting to add the c-wrappers in there, but still cool. Thanks

willvarfar · on June 16, 2016

Ah this reminds me of one of the very most tricky bugs I ever tracked down: https://github.com/dsully/pykafka/pull/15

DanWaterworth · on June 16, 2016

You have my condolences.

fluential · on June 16, 2016

After a quick glance, first thing that strikes me is using docker for measuring network bound application performance. Across different versions docker handles networking differently and by default it may have quite significant impact on your results, good example comes from percona guys https://www.percona.com/blog/2016/02/05/measuring-docker-cpu... I wonder what would results be without using docker, or using docker with --net=host

jdennison · on June 16, 2016

After rerunning the tests with docker host=net i see a small bump in the rate. ~1% across all the clients.

Msgs/s

confluent_kafka_consumer : 277573.293164 / 261407.908007 = 1.061%

pykafka_consumer : 33433.342585 / 33976.938217 = 0.984%

pykafka_consumer_rdkafka : 164311.503412 / 172008.742201 = 0.955%

python_kafka_consumer : 37667.971237 / 38622.727894 = 0.975%

So yes docker network magic adds overhead, but the bias is consistent across all clients.

StreamBright · on June 16, 2016

I guess some performance testers just don't know what they are measuring, in this case: the overhead of docker of the performance of the Python code. To be fair it is hard to understand a whole system performance. I would love to see a test without Docker though.

jdennison · on June 16, 2016

Original author here. The docker network point is a good one, I'll give it a try with host network.

There is still value with comparing different clients with the same network constraints. Yeah it is a contrived setup(noted in the post), but at least is the same contrived setup for each test.

nerdwaller · on June 16, 2016

Has anyone tried much with the aiokafka library for asyncio (https://github.com/aio-libs/aiokafka)?

sheeshkebab · on June 16, 2016

>I ran these tests within Vagrant hosted on a MacBook Pro 2.2Ghz i7.

Good ole laptop benchmarks