Mining of Massive Datasets

joaorico · on Nov 16, 2016

This coursera course was taken down, but it is now up and running at lagunita.stanford.edu [0], which uses edx's open source platform [1]. The same happened to other stanford courses previously on coursera, you can find them here [2], including Compilers, Automata Theory, and Convex Optimization.

[0] https://lagunita.stanford.edu/courses/course-v1:ComputerScie...

[1] https://open.edx.org/ https://github.com/edx/edx-platform

[2] https://lagunita.stanford.edu/courses

PaulHoule · on Nov 15, 2016

What amazed me is how much of this is 1990s stuff.

cantagi · on Nov 16, 2016

How do you mean? Do you know of any up-to-date books on large scale data mining?

mrcactu5 · on Nov 16, 2016

massive is an understatement. I have only dealt with puny GB sized data sets. They deal with vectors which cannot fit into main memory.

infinitone · on Nov 18, 2016

Yes, in general what they refer to are things like the IRS Tax records (250 TB), Yahoo Ad data (900 TB). You just can't use a single machine to work with such data.

leksak · on Nov 15, 2016

Deadlines are at the 29th of November. Best of luck

nthcolumn · on Nov 16, 2016

tldr; parallel map reduce. ;)