Hacker News new | past | comments | ask | show | jobs | submit login
Mining of Massive Datasets (mmds.org)
109 points by markhkim on Nov 15, 2016 | hide | past | favorite | 7 comments



This coursera course was taken down, but it is now up and running at lagunita.stanford.edu [0], which uses edx's open source platform [1]. The same happened to other stanford courses previously on coursera, you can find them here [2], including Compilers, Automata Theory, and Convex Optimization.

[0] https://lagunita.stanford.edu/courses/course-v1:ComputerScie...

[1] https://open.edx.org/ https://github.com/edx/edx-platform

[2] https://lagunita.stanford.edu/courses


What amazed me is how much of this is 1990s stuff.


How do you mean? Do you know of any up-to-date books on large scale data mining?


massive is an understatement. I have only dealt with puny GB sized data sets. They deal with vectors which cannot fit into main memory.


Yes, in general what they refer to are things like the IRS Tax records (250 TB), Yahoo Ad data (900 TB). You just can't use a single machine to work with such data.


Deadlines are at the 29th of November. Best of luck


tldr; parallel map reduce. ;)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: