Hacker News new | past | comments | ask | show | jobs | submit login
Foundations of Data Science [pdf] (cornell.edu)
254 points by kercker on May 23, 2018 | hide | past | favorite | 21 comments



I'm an accountant but work consulting as a data/software "engineer" (or in those roles). For a while I got excited by all the "data science" books, until I admitted to myself 2 years ago that without the Math & Stats background, I'm wasting my time.

I went back to varsity (part time) this year, studying Applied Maths and Stats. Even with the basics that I now know, going through this book; I can feel that a glass ceiling is broken.

I'm going to print it bit by bit, and study its contents (after my exams). Thanks!


Not to be confused with, "The Foundations of Data Science", a free textbook used/created for UC Berkeley's data science course:

https://www.inferentialthinking.com/



The previous discussion is about the version dated November 4th, 2014. The book referenced here is dated January 4th, 2018. From looking at the TOCs, most chapters contain a few additional sections and there is an entirely new chapter. This might warrant a second discussion.


Looks like this book heavily intersects with "Probability and Computing: Randomization and Probabilistic Techniques in Algorithms and Data Analysis" by Mitzenmacher/Upfal [0].

[0] https://books.google.com/books?id=E9UlDwAAQBAJ&pg=PA1&source...


Ravindran Kannan, one of the authors taught a course of the same name at CSA, IISc. The video lectures of the course are available here: http://drona.csa.iisc.ernet.in/~chiru/datascience/iisclectur...


I'd avoid that site- from our Cisco proxy:

"Based on your organization's access policies, this web site ( http://drona.csa.iisc.ernet.in/~chiru/datascience/iisclectur... ) has been blocked because it has been determined by Web Reputation Filters to be a security threat to your computer or the organization's network. This web site has been associated with malware/spyware."


Microsoft's Youtube Playlist : https://www.youtube.com/watch?v=WEBUWYxaqLQ&list=PLD7HFcN7LX...

I don't get what is MS doing here.


I always thought linear regression was a “foundation” of this field, but there is no discussion of a technique by this name in this book. Is there another name it goes by?


Logistic regression is also referred to as a "supervised classification" problem, which this book only addresses in the specialized space of document clustering or image classification. They do also address Support Vector Machines, which is a generalized algorithm for classification. However, there are a wide variety of specific implementations of logistic regressions that require quite a bit more conversation (dummy variables, log-odds ratios, ordinal variables) that are more directly applicable to a general stats background to machine learning itself.

Considering that the authors are all CS professors or researchers and not statisticians, that makes sense to me why they don't view logistic regression as foundational.


I said linear regression, not logistic.


No,it's just linear regression. You'll find it discussed in a stats textbook since its relatively old (and as the joke goes, data science is what a statistician calls themselves when they want a pay rise).

That being said, IMO it's one of the most over-applied techniques around, and it gives many non-technical and technical users the illusion that they understand what's going on in the model...


of all of the modern grad books this one has always struck me as the most mathematically rigorous. an heir to trevor's esl.


Quite dense! Just reading the first paragraph assumes good background knowledge. Wish I could grasp this stuff!


Not to be confused with an 'Introduction to data science' book


is there an epub?





Any idea of what material overlaps or which to read first?


My two cents is to follow courses, or start your own projects. These books should be used as reference as you're learning things.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: