I don't have experience with Samza or Apex, but as for the first three:
1. Flink - Focused on stateful stream processing.
2. Spark - Focused on batch processing. Can be used for continuous streams, but approaches them as "micro-batches".
3. Kafka - A message queue system (for all practical purposes). Has an optional stream processing add-on for basic needs.
Separate use cases and strengths aside, it's worth calling out that all of these products are primarily backed by completely different companies. Apache is a consortium made of many companies, and serves as common branding for "community editions" of their "enterprise edition" products. There can quite a lot of overlap between sponsored products in this consortium.
Spark supports both microbatch and continuous stream processing.
Apache Software Foundation is not a consortium made of many companies but a single non-profit that provides organizational support for open source projects, some of which have contributors employed as such by other companies and some of which have only volunteer contributors.
The only two of those I know are Kafka and Flink. For those two: Flink is much more full-featured and performant (basically the full Google DataFlow API, and several orders of magnitude faster than Kafka Streaming), but Kafka Streaming has a stupid simple API that is useful if you need streaming because $reason but don't care about scaling up to infinity. If you're doing some really hacky demoware, Kafka Streaming will probably be faster to spin up because you just need the Kafka Streaming jar and a Kafka cluster.
Do you have any numbers to back up Flink is faster than KStreams, also under what scenario?
I am genuinely interested as use KStreams a lot but the engineering discipline in the API leads a lot to be desired and more than happy to switch the API if Flink is that much better.
Here's a benchmark of KStreams and Flink [1]. Note that the Flink vs Spark comparison is disputed [2], but both Flink and Spark are several orders of magnitude faster than KStreams. This is inevitable given KStreams architecture -- it stores all its state in Kafka rather than in a data store and with data structures optimized for the use case and doesn't do much coordination among workers. KStreams is there if you want streaming semantics on top of a small-ish Kafka topic you own, but don't care too much about perf. Deploying and maintaining Flink is a much bigger hassle than KStreams -- you need DevOps support to get Flink running, whereas KStreams runs (albeit quite slowly) inside your application with no new state store needed.
Confluent has a good discussion of the ownership issue (DevOps for Flink, devs for KStreams) here [3] though they seriously downplay the huge gap in perf.
There's also Apache Beam, which is an API for streaming, and has Flink and Apex execution engines. Google's Cloud Dataflow is another implementation of Apache Beam.
As to which one to choose, you need to evaluate them, there's no simple answers. If you have Hadoop already then Apex may be a better fit than Flink; OTOH if you do Akka stuff already, then Flink might integrate better with your stack. If you have more batch than streaming use cases, maybe you want Spark. Etc.
Also include storm in the mix too. Storm 2.0 was released recently. We have been using storm for a long time and we really like its
a) Simple programming model
b) Support for a wide variety of sources (e.g Kinesis , EventHub)
c)Easy troubleshooting
We did evaluate Spark streaming (we use Spark for batch workloads and it works well) , but fell back to storm because of the above
Flink vs Spark vs Storm vs Kafka vs Samza vs Apex
How do they compare? How would you choose which one to use?