1. Cloudera left M/R for Spark, Mahout left M/R for Spark. Spark community will ...

gtrubetskoy · on Oct 10, 2014

Hadoop != M/R, FWIW. M/R support is left in Yarn for backwards compatibility mostly.

If by M/R you mean Hadoop - Cloudera has done no such thing, their largest customer base is Hadoop.

As to "paradigm shift", we're so early in this that I don't think there even is a paradigm to shift.

metronius · on Oct 10, 2014

I mean M/R by M/R ;) http://vision.cloudera.com/mapreduce-spark/

gtrubetskoy · on Oct 10, 2014

Sure, "we're 100% behind Impala", "oops, sorry, now it's Spark" - give them a few months and they'll change their mind to something else again. :)

gtrubetskoy · on Oct 10, 2014

Spark requires Hadoop to run, so this whole Spark vs Hadoop debate makes no sense whatsoever.

There is a place for arguing how effective Map/Reduce is, but it's been known for years that M/R is not the only, nor best general purpose algorithm for solving all problems. More and more tools these days do not use M/R, Spark including, and Spark certainly is no the first tool to provide an alternative to M/R. AFAIK Google has abandoned M/R years ago.

I just don't understand this constant boasting about Spark, it seems very suspicious to me.

nchammas · on Oct 11, 2014

> Spark _requires_ Hadoop to run

This is not correct. Spark uses the Hadoop Input/Output API, but you don't need any Hadoop component installed to run Spark, not even HDFS.

You can -- and many companies do -- run Spark on Mesos or on Spark's standalone cluster manager, and use S3 as their storage layer.

> this whole Spark vs Hadoop debate makes no sense whatsoever

If we talk about Hadoop as an ecosystem of tools, then yes, it doesn't make sense to frame Spark as a competitor. Spark is part of that ecosystem.

But if we talk about Hadoop as Hadoop 1 MapReduce or as Hadoop 2 Tez, both of which are execution engines, then it very much makes sense to pit Spark against them as an alternative execution engine.

Granted, Hadoop 1 MapReduce is pretty old compared to Spark, and Tez is still under heavy development, but these are alternatives and not complements to Spark.

(Note: In Hadoop 2, MapReduce is just a framework that uses Tez as its underlying execution engine.)

> I just don't understand this constant boasting about Spark, it seems very suspicious to me.

Suspicious how?

I think Spark's elegant API, unified data processing model, and performance -- all of which are documented very well in demos and benchmarks online -- merit the excitement that you see in the "Big Data" community.

metronius · on Oct 10, 2014

Yes, i think that the debate makes no sense too - for me, Spark is no Hadoop competitor its rather complement.

Spark does not need Hadoop - you can run it also with Mesos or in local mode..