Hacker News new | past | comments | ask | show | jobs | submit login

Spark has a much nicer API than Hadoop and theoretically can give you significant speedups on iterative in-memory work loads. On the other hand, I've had a terrible experience with the stability and debuggability of previous versions of Spark. They do seem to be rapidly improving so it's hard to recommend anything other trying it out and seeing if Spark works for your needs.

(Also, you probably know this already, but many people don't really have data big enough to necessitate a distributed framework. If your datasets are counted in gigabytes, you can do everything more simply on one machine and/or with a traditional database)




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: