MapReduce is actually an incredibly flexible paradigm. If you have a clean imple...

MapReduce is actually an incredibly flexible paradigm. If you have a clean implementation of MapReduce with well defined interfaces you can make behave in a number of different ways by putting it on top of different storage engines. We've built a MapReduce engine as well as a the commit based storage that lets us slide gracefully from streaming to batched.

Our storage layer is built on top of btrfs, we haven't put together comprehensive benchmarks yet but our experience with btrfs is that there isn't a meaningful penalty to reading a batch of data from a commit based system compared to more traditional storage layers. I really wish I had concrete performance numbers to give you but we haven't had the time to put them together yet. I will say that our observations match with btrfs' reported performance numbers and are what I'd expect given the algorithmic complexity of their system.