I agree with your sentiment that math is important, but it is also true that math is not taught well. Just briefly glancing at chapter 2 the first part, the author presents the law of large numbers out of the blue, then just goes on to present proofs of it. There is no clear discussion of why this law is important, when you can use it and so on. If you contrast how wikipedia explains it:
> In probability theory, the law of large numbers (LLN) is a theorem that describes the result of performing the same experiment a large number of times. According to the law, the average of the results obtained from a large number of trials should be close to the expected value, and will tend to become closer as more trials are performed.
This book:
> If one generates random points in d-dimensional space using a Gaussian to generate coordinates, the distance between all pairs of points will be essentially the same when d is large. The reason is that the square of the distance between two points y and z ...
I always get a eerie feeling for people who excessively stress on formalism. It almost goes to the point where I feel they wouldn't want to share their knowledge or intuition.
They are trying to help you get a correct intuition. If you have never approached things this way, there are almost certainly very large holes in your understanding. Not to say that everyone needs to be formal all the time, but you need to sometimes.
> If you have never approached things this way, there are almost certainly very large holes in your understanding.
When anyone tells me that X is the ONLY way to do it, almost exclusively I have found them wrong - beyond data science. Formalism is a means of communication, not the end. You can always communicate ideas without formalism.
FYI, I am a Ph.D Computer Science and a practicing Data Scientist for many years now.
> In probability theory, the law of large numbers (LLN) is a theorem that describes the result of performing the same experiment a large number of times. According to the law, the average of the results obtained from a large number of trials should be close to the expected value, and will tend to become closer as more trials are performed.
This book: > If one generates random points in d-dimensional space using a Gaussian to generate coordinates, the distance between all pairs of points will be essentially the same when d is large. The reason is that the square of the distance between two points y and z ...