This is a question I've been meaning to ask on here, but didn't think it warranted its own "Ask HN": What's the best way for someone with an otherwise decent math background, but no statistics, to get started on the topic? Any particular books, websites, etc that people recommend?
The key is to come up with a well-formulated question, such as "I wonder if I can predict stock trends." Then in doing so, you'll search the web and come across predictive modeling, which will lead to machine learning techniques, which will lead to good resources. Within those machine learning sources, search for chapters on prediction and classification, and you'll come across regression techniques, support vector machines, relevance vector machines, etc. Then you'll wonder, "ok, how do I actually solve this problem" so you may search for "SVM implementations" and find Steve Gunn's for Matlab for example. Then after much codebanging, you'll realize that inorder to solve this problem, you need a good dataset, so you go to Yahoo finance and see if you can download some data for IBM.
This is usually the process one needs to follow, albeit with some intermediate steps switched out for others here and there.
it seems like if he did that, he'd be biting off WAY more than he can chew at the present time.
If you want to learn statistics, it's probably better to start at the beginning (_Cartoon Guide to Statistics_, O'Reilly's new _Head First Statistics_, or Huff's _How to Lie With Statistics_) than to leap headlong into a huge, mostly intractable problem and pagefault in knowledge at each point you come across something you don't know how to do.
I agree, but at the same time, you only learn by doing. In my experience, even when you dive in way over your head, you tend to pick up the information rather quickly. I took a machine learning class with no statistics background and after a few weeks of floundering and learning terminology I was fine. The benefit of forming a problem first gives you an ultimate goal of which to work towards.
Don't tell Google that it's slogan ("Organize the world's information") is biting off more than it could chew.
I agree, but at the same time, you only learn by doing.
This is true, but I think in the original poster's case it could be much more easily and reliably be accomplished by continually giving him problems that are within (or more ideally, just outside) his circle of competence, rather than a problem like "predict stock prices" which isn't in any human being's circle of competence. Moreover, with the latter problem, he'll get virtually no feedback as to whether his answer was correct, because a correct answer for "use statistics to predict stock prices" doesn't exist. Odds are that if he follows that path he'll quit before making any progress, or at the least he won't be able to close the feedback loop that is so vital to gaining expertise.
I think that if he's starting from ground zero and wanting to learn statistics, he'd be much better served by sitting down with The Cartoon Guide to Statistics and a deck of cards and set of dice first. He can work his way up to conquering the stock market :)
Don't tell Google that it's slogan ("Organize the world's information") is biting off more than it could chew.
I think that's apples and oranges - he's trying to learn statistics, not trying to convince potential clients or investors that he's already an expert. I don't think it would be harmful at all for him to have a lofty, far-out goal like "predict stock prices" to aim toward, but I do think that if he starts out trying to learn statistics by typing "statistical stock prediction methods" into Google, he will burn out rather quickly. Pagefaulting in knowledge when you need it is probably optimal for something where you just want to make sure your knowledge is passable, but if he wants to truly know his domain, he's gotta get out the marbles and urns. :)
"Don't tell Google that it's slogan ("Organize the world's information") is biting off more than it could chew."
They came up with that slogan well after they were already experts in search, and probably after Google had been written and they'd formed a company. To hear Larry tell it, Google started with the question of "What if we could download all of the web and just keep all the links?" And then Terry Winograd encouraged them, and they realized that the link structure of the web was a lot like the citation graph of academic publications. And then they realized they could build a kick-ass search engine by using that to rank page relevance. And then when nobody would license it, they formed a company.
The way to build massive world-changing systems isn't to start with massive world-changing ideas. It's to start with interesting, challenging, but tractable problems, and then see where they lead you.
No, that's how to figure out if he can predict stock trends. He wants to learn statistics, not necessarily an application of statistics. I see what you're getting at, but before one can ask such a question, one must know if the answer will involve statistical theory and that's a big jump. Learn the basics first, then look for applications lest you can't see the forest for the trees.
The other problem with this approach is that while you may get depth, there is very little breadth. I find that when learning something for the first time, I want to go wide and shallow. Then once I have a basic understanding I can figure out where I want to learn deeper.
OTOH, maybe you just learn things differently from me!
I just gave my friend (a PhD in Operations Research) his basic Statistical Theory book back. I also wanted to get a better understanding of statistics, but that stuff was putting me to sleep too easily :-(
Right. Everyone has different learning styles. The problem with learning just basic theory is that you get caught up in minute details. That's why I always recommend people to start with a focused problem, then in the process of trying to answer that problem, you become knowledgeable about all the pieces of the puzzle. The key is to make the discipline of statistics interesting to YOU!
I can't recommend Cartoon Guide to Statistics by Larry Gonick and Woollcott Smith enough. It's a great resource for developing an intuitive understanding and can often answer questions by explaining the concepts in a different way that traditional texts.
http://www.amazon.com/Cartoon-Guide-Statistics-Larry-Gonick/...
I recently got this book: http://www.amazon.com/Statistics-Gentle-Introduction-Frederi...
I found it to be perfect for what I wanted. Basically, my knowledge of statistics was VERY limited, but one of the things I'm working on now requires a good understanding, so.. The book is gentle and not too math heavy. Everything it covers, it does so in detail and with examples and real-world stories. I found it to be a good way to get a basic foundation. You may want to follow up with a more advanced book after though.
I was in the same boat as you. I have an undergrad math degree but took no statistics. I have found that most statistics books are inscrutable at best and complete wastes of paper and effort at worst. Based on the book recommendations I've been given by people who supposedly use statistics all the time, I don't believe that many trained statisticians have any idea what they are doing.
That said, here's what has sort of worked for me:
- Cartoon Guide to Statistics by Larry Gonick
- Fundamentals of Applied Probability Theory by Al Drake: out of print, and mostly about probability. However, the stats intro at the end is the clearest one I've ever read. Originally recommended to me by Philip Greenspun.
- Introductory Statistics with R by Peter Dalgaard. I'm assuming if you are posting to Hacker News you probably are coming from a programming background. This book does exactly what the title says, shows you how to apply introductory statistics by programming. It's heavier on R than stats.
Based on the book recommendations I've been given by people who supposedly use statistics all the time, I don't believe that many trained statisticians have any idea what they are doing.
Many people working with statistics in their employment have never grappled with the issues of adequate descriptive statistics or reasonable inference. Besides the recommendations I just posted above,
Second the idea that a lot of statisticians don't know what they're doing. The first mistake is to make things easier by glossing over, or mangling, the mathematical details. Many statistics textbooks do this, and the only real protection is to have a good math background. The second mistake is to treat statistics like a bunch of techniques to be learned and applied with little regard for the philosophical problems inherent in every attempt to model the real world.
The Cartoon Guide to Statistics is an excellent way to go from zero to a good overview of the basics with a minimum of hard math. After that, if you're mostly interested in applying basic techniques to your own stuff, you want a good undergrad textbook. I don't have any good recommendations here, unfortunately. If you have a good math background (or are motivated to get it) and you want to keep going, Statistical Models; Theory and Practice by David A. Freedman (http://www.amazon.com/review/R2XUNM92KYU7BB) has the math, the philosophy, the hands on analysis of studies, and the exercises to put you in a better position to evaluate statistical research than some people who produce it.
Physcab suggests a great step #2: diving into some real statistics problems. The information available online today is more than at that point.
Step #1 is going to be getting the foundations so that you can quickly digest the meaning and purpose of things like machine learning. Fortunately, statistics can be largely summarized as "the science/math/art of explaining variance".
Study what variance is and means and you'll dive through probability, distributions, modeling, inference, prediction, parametrization, simulation, and all of those fun topics while keeping an understanding for why they exist.
Finally, I'd highly suggest taking a look at Tufte's work because once you understand variance, you've still got to explain it.
My two favorite orientations to what statistics is all about are two free articles on the Web. Both recommend books for further reading, and they are good books.
"Advice to Mathematics Teachers on Evaluating Introductory Statistics Textbooks"
I highly recommend Hayashi's book on Econometrics. It's what we used in the first year of my PhD program in Economics. But it's probably worth starting with a basic probability and statistics book, preferably one that's pretty mathematical. I've used Hogg McKean and Craig's Introduction to Mathematical Statistics and it's really good.