Hacker News new | past | comments | ask | show | jobs | submit login
Running Apache Kafka at Scale (linkedin.com)
49 points by boredandroid on March 20, 2015 | hide | past | favorite | 9 comments



This is very timely. We are just starting to work on a new centralized logging system using logstache->kafka->elasticsearch->kibana, we may also use kibana for some other tasks as well. Anyone have any experience using that setup? Pros/Cons?

We used graylog for a while but ran into some issues with back pressure on elastic search (hence kafka).


Look into the new Graylog (v1.0). It was recently released, and they use Kafka internally now for buffering. We're sending thousands of msgs per second to Graylog without issue.


According to their documentation, the data storage for log events is Elasticsearch and if the Elasticsearch data is lost then the logs are gone. Consideribg that ES is not a database and may lose data this sounds a bit scary to me.


My understanding is that appending only is fine. So if your use case is logging elasticsearch is totatly fine. You probably can't compare it to the robustness of MySQL & PostreSQL. But most NoSQL are not known to be that robust either.


You can solve this in one of two ways:

1. Whatever mechanisim you're using to send data to Graylog, you send that data to S3 as well. You can then reload Graylog at anytime with S3 data.

2. Backup your Elasticsearch nodes to S3

I should've mentioned I run this in AWS. Sorry about that!


read up here... https://www.elastic.co/blog/resiliency-elasticsearch/

they had a major data corruption bug last year but have taken measures to correct it. I asked if elasticsearch could be a source of truth at elasticon and they didn't say yes, but they indicated that you "could do it" and it is a goal


Having started off planning the exact same stack (but for business intelligence event tracking) we ended up dropping kafka for kinesis. It felt like until we got to the scale where it tips in the favour of self managed infrastructure (if ever) then there was no value and only probable pain in managing that part.


We already self manage our own infrastructure. So yeah, we are in exact opposite positions, thanks for the insight though!


I heard comparisons with other technologies before, like RabbitMQ [1] [2] but I would love to see an feature/performance comparison between kafka and other similar solutions, specially with cloud based services like the new kid on the 'Google Pub/Sub' [3]

[1] http://www.quora.com/RabbitMQ-vs-Kafka-which-one-for-durable...

[2] https://youtu.be/MA_3fPBFBtg?t=35m27s

[3] https://cloud.google.com/pubsub




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: