This is very timely. We are just starting to work on a new centralized logging system using logstache->kafka->elasticsearch->kibana, we may also use kibana for some other tasks as well. Anyone have any experience using that setup? Pros/Cons?
We used graylog for a while but ran into some issues with back pressure on elastic search (hence kafka).
Look into the new Graylog (v1.0). It was recently released, and they use Kafka internally now for buffering. We're sending thousands of msgs per second to Graylog without issue.
According to their documentation, the data storage for log events is Elasticsearch and if the Elasticsearch data is lost then the logs are gone. Consideribg that ES is not a database and may lose data this sounds a bit scary to me.
My understanding is that appending only is fine. So if your use case is logging elasticsearch is totatly fine. You probably can't compare it to the robustness of MySQL & PostreSQL. But most NoSQL are not known to be that robust either.
they had a major data corruption bug last year but have taken measures to correct it. I asked if elasticsearch could be a source of truth at elasticon and they didn't say yes, but they indicated that you "could do it" and it is a goal
Having started off planning the exact same stack (but for business intelligence event tracking) we ended up dropping kafka for kinesis. It felt like until we got to the scale where it tips in the favour of self managed infrastructure (if ever) then there was no value and only probable pain in managing that part.
I heard comparisons with other technologies before, like RabbitMQ [1] [2] but I would love to see an feature/performance comparison between kafka and other similar solutions, specially with cloud based services like the new kid on the 'Google Pub/Sub' [3]
We used graylog for a while but ran into some issues with back pressure on elastic search (hence kafka).