Hacker News new | past | comments | ask | show | jobs | submit login

Would Apache Samza fold in here better, perhaps?



I don't know much about Samza, but I don't think stream processors are what we (I work with the author) are looking for. We don't really have a lot of "stream processing" to do, and aggregate functions are usually computed on-demand. Also, still have to put results from the stream process somewhere, right? Back into Kafka is something people do, but we still need indexing capabilities. As the post mentions, we use MySQL for time series storage, but we also use Kafka in front as a durable log.


You're welcome to email me if you like - it's in my profile. Kafka and Samza are intended (generally speaking) to go hand in hand. Samza is a re-imagined datastore that Kafka can shuttle data into. I've been investigating Samza quite heavily specifically for time series data storage. I'd be happy to share thoughts.


While I'm not using Samza, Spark Streaming also works pretty nicely in this case, although it is not so focused on keeping state (it can, though, using the checkpointing system and the `updateStateByKey` transformation) and thus might not perfectly stable if you require to handle failures without reprocessing.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: