Hacker News new | past | comments | ask | show | jobs | submit login

spark is great for distributed computation.. also has about a million config switches and is generally kind of 'bulky'. EMR makes management a lot easier, but you still have to fiddle with num executors, memory, etc. But it has been 'through the wars' and is generally pretty solid on some pretty large data sets. Once you get it conf'd it's pretty good. The best part is just writing the scala code to run the job.. admittedly, it would be great to use something a bit lighter for certain workloads.



Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: