Hacker News new | past | comments | ask | show | jobs | submit login

I don't have experience with Samza or Apex, but as for the first three:

1. Flink - Focused on stateful stream processing.

2. Spark - Focused on batch processing. Can be used for continuous streams, but approaches them as "micro-batches".

3. Kafka - A message queue system (for all practical purposes). Has an optional stream processing add-on for basic needs.

Separate use cases and strengths aside, it's worth calling out that all of these products are primarily backed by completely different companies. Apache is a consortium made of many companies, and serves as common branding for "community editions" of their "enterprise edition" products. There can quite a lot of overlap between sponsored products in this consortium.




Spark supports both microbatch and continuous stream processing.

Apache Software Foundation is not a consortium made of many companies but a single non-profit that provides organizational support for open source projects, some of which have contributors employed as such by other companies and some of which have only volunteer contributors.


4. Apache BEAM (same model as Google Cloud Dataflow)




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: