spark is great for distributed computation.. also has about a million config switches and is generally kind of 'bulky'. EMR makes management a lot easier, but you still have to fiddle with num executors, memory, etc. But it has been 'through the wars' and is generally pretty solid on some pretty large data sets. Once you get it conf'd it's pretty good. The best part is just writing the scala code to run the job.. admittedly, it would be great to use something a bit lighter for certain workloads.