I can't speak for their implementation but batching is not necessary. Stream processing complex JSON documents and storing the documents to disk at rates of 500k documents/second per server is demonstrably achievable on some scale-out systems.
The internal architectures make an enormous difference in throughput. A proper high-performance stream processing engine does not look anything like the "Hadoop in RAM" style model.
> Stream processing complex JSON documents and storing the documents to disk at rates of 500k documents/second per server is demonstrably achievable on some scale-out systems
So is it per server or scaled out? I thought SSDs have capped around 100k discrete per second (P/E aka write cycles).
Can you give an example? I've been unable to practically reach more than a scale of 10k/sec/server using a number of technologies and combinations to collect from socket, parse json and write to socket. That's just my specific use case.
Looking at the top end of Intel's SSD lineup I see that they have a product that advertises up to 175k IOPS of random 4K writes. Is this what you are referring?
The internal architectures make an enormous difference in throughput. A proper high-performance stream processing engine does not look anything like the "Hadoop in RAM" style model.