My problem with books like this is that they have almost no connection to why Bayesian statistics is successful: Bayesian statistics provides a unified recipe to tackle complex data analysis problems. Arguably the only known unified recipe.
The Bayesian book I want should emphasize how Bayes is a recipe for studying complex problems and teach a broad range of model ingredients. Learning Bayesian statistics is about becoming fluent in describing scientific problems in probabilistic language. This requires knowing how to express and compose traditional models and build new ones based on first principles.
An unfortunate reality is that you still need to know computational methods too, but that should change soon enough.
Yes, that's exactly what the objective of this book is! I am not using computation out of necessity, but rather because I think it provides leverage for understanding the concepts, and learning to (as you say) compose traditional models and build new ones.
As the book comes along, I am finding that many ideas that are hard to explain and understand mathematically can be very easy to express computationally, especially using discrete approximations to continuous distributions.
I'd recommend using as many real examples as possible. Things like forecasting, product recommendations, topic modeling, etc. While you can conceptually explain how Bayesian statistics is a unified recipe, it's incredibly hard to have this sink in with toy problems. This is especially true since many people using traditional tools are actually using advanced methods to solve real problems, so when they start reading about urns or doors it all comes across as rather academic. That's sad because the benefit of Bayesian coherency is mostly that it leads to a highly productive mode of practical data analysis.
Definitely shoot me an email at tristan@senseplatform.com if you're interested in the computational side of this area. At Sense (http://www.senseplatform.com), we're working on making applied Bayesian analysis as amazing as it should be.
E.T. Jaynes book, "Probability Theory: the Logic of Science" may come close to what you want. It emphasize that there are rules of thought, which lead to Bayesian statistics. As such, Bayesian statistics aren't just a recipe, but the law.
Now, I can only personally vouch for the first 2 chapters, as I haven't read the rest yet.
So, I'm going to counter here and say I don't find this to be a good intro. I started reading and had not heard of the "Girl named Florida" problem and then went to the linked to blog post http://allendowney.blogspot.com/2011/11/girl-named-florida-s...
The way he explains it I found to be confusing and counter-intuitive. I've taken basic stats in college, and learned some of the associated problems, though not this one, and learned the material though not in this particular way. I have to agree whole-heartedly with the commenter on that post "JeffJo" who stipulates why it's an ineffectual way to present the material. Furthermore, I found the author's dismissal of the valid criticism to be enough to not want me to read further.
The problem with the Girl Named Florida is that the ambiguous wording is more confusing than the math.
Ambiguous: "In a family with two children, what are the chances, if one of the children is a girl named Florida, that both children are girls?"
More clear, and emphasizing the importance of precise wording when discussing probability: "Among families with two children, with at least one of the children being a girl named Florida, what portion have two girls? (Assume that all names are chosen randomly from the same distribution, independently of all other factors; and sex is determined as by a fair coin toss.)"
I agree that his first example, "The Girl Named Florida" was a confusing example.
I feel pretty comfortable with Bayesian statistics, and I thought the other examples that I saw were pretty clear. But his very first example jumps you out to another webpage, and then he mixes it with "the red-haired problem". It was irritating.
His next example, "The Cookie Problem" is the classic intro-to-Bayes example, IMO.
As someone with nearly zero knowledge of statistics or Bayes' Theorem, I agree: the cookie problem was a very clear example to follow. The "Girl named Florida" solution, while interesting, probably doesn't work as well as a textbook example, at least not in that stage of learning.
Reading the Florida problem solution, it made some sense, but was definitely of a higher level of complexity than the rest of the text.
What I found really interesting was that the answers to some of the other questions on the "Girl Named Florida" discussion required knowledge which I would not have considered general math-ish knowledge:
> If the parents have brown hair
> and one of their children has red hair,
> we know that both parents are heterozygous,
> so their chance of having a red-haired girl is 1/8.
Interesting to learn, but this "if you also happen to know this ..." step is something that was mildly frustrating.
(edit: Since the grandparent post linked the Girl Named Florida blog post, I guess I don't need to.)
Oops, I only cut-and-pasted half of what I wanted. This comes after my other reply.
Yes, these sorts of problems can be confusing. But the confusion is propagated by educators who refuse to recognize that what they asked is not what they intended to ask, and so they provide inconsistent answers.
Say you are on a game show, and pick Door #1. The host opens door #3 to show that it does not have the prize, and offers to let you switch to door #2. Should you? Most people will initially reason that door #3 is prize-less 2/3 of the time, evenly split between cases where the prize is behind door #1 and door #2. So it would be pointless to switch. But that is wrong. Few educators will explain why by solving the problem rigorously. They will use an analogy like pointing out how the original choice is right only 1/3 of the time, and since the host can always open a prize-less door, that can’t change.
People don’t believe these educators because their 1/2 answer is indeed more rigorous than the analogy. It just makes a mistake. The probabilities to use are not the probabilities that the cases exist, but the probabilities that the observed result would occur. The existence probabilities are the same, but the probability of the observed result when the initial door was correct is half of what it is when the initial choice was incorrect.
Yes, these sorts of problems can be confusing. But the confusion is propagated by educators who refuse to recognize that what they asked is not what they intended to ask, and so they provide inconsistent answers.
Say you are on a game show, and pick Door #1. The host opens door #3 to show that it does not have the prize, and offers to let you switch to door #2. Should you? Most people will initially reason that door #3 is prize-less 2/3 of the time, evenly split between cases where the prize is behind door #1 and door #2. So it would be pointless to switch. But that is wrong. Few educators will explain why by solving the problem rigorously. They will use an analogy like pointing out how the original choice is right only 1/3 of the time, and since the host can always open a prize-less door, that can’t change.
People don’t believe these educators because their 1/2 answer is indeed more rigorous than the analogy. It just makes a mistake. The probabilities to use are not the probabilities that the cases exist, but the probabilities that the observed result would occur. The existence probabilities are the same, but the probability of the observed result when the initial door was correct is half of what it is when the initial choice was incorrect.
I'm really interested in knowing the prereqs I should have before picking up a book like this. Coming from a weak math background I find these books highly appealing but mildly intimidating. Also, could someone advise me on the preferred order of tackling the following Books?
1. Think Bayes
2. Think Stats
3. Programming Collective Intelligence by T.Segaran
3: Get (back) into the swing of thinking about mathematics and algorithms.
1: Bayesian statistics is a principled, coherent, consistent, intuitive, complete framework for reasoning about uncertainty. A good foundation.
2. Traditional statistics is more random and ad-hoc, but can be more practical than Bayesian methods. (Bayesian models are well-motivated, but it can be impractical to compute exact answers and you'll have to switch to approximation techniques, some of which are simple/universal/slow, and others get fairly complex.)
I need to write a preface to answer this question, but the most important prereq is Python programming. The premise of the series is that if you can program (in any language) you can use that skill as leverage to learn about other topics.
So this is all very well and good, I've had about 5 intros to Bayesian Statistics. But those are a fair bit away from actually applying that knowledge in practice in software.
Let's say we have N different kinds of events with unknown probabilities and unknown dependence or independence between them. The naive approach to gathering data on the probability of event n occurring following an occurrence of event m would require O(N²) in space. Let's say N ~ 10⁹~10¹⁰. Storing that much data as a raw matrix isn't practical in most cases, so we have to find a more efficient data structure - in terms of both space and the operations we need to perform. (and taking into account characteristics of the storage medium, i.e. memory or disk or a combination) What happens if the probabilistic properties of the system change over time?
Are there any introductory books or other resources on modeling this kind of problem? Clearly this has been tackled before, but I'm having a hard time making the leap from theory to practice - and I don't mean import the data into R or SPSS or whatever and let that grind out a solution, but coming up with approximations when you have runtime and space constraints that make that approach impractical.
I think the general approach is dimensionality reduction: start measuring, and round down to 0 for the low-correlation pairs of events.
Do you actually have a stream of more than N^2 observations to process? If not, then most of your correlations are in fact 0, and sparse-matrix techniques apply.
I'm a big fan of the author's other books, Think Python and Think Complexity (haven't had the time for Think Stats), I found them more understandable than most other books that purport to teach people of the same skill level.
I'm hoping this will be as good, but all the negative comments here leave me skeptical. Perhaps this is the crowd that would enjoy K&R C more than Think Python. The former is more of a reference to me than an introductory tome. Perhaps everyone here is just better at math than I am.
Ignore the haters -- Think Bayes is going to be awesome!
Just kidding (mostly), but your point is correct: there is no book that is right for all audiences. But if you can program, and the mathematical approach to this material doesn't do it for you, this book might.
Bayesian is cool because you can make arbitrarily complex models, and when you have the parameters estimated it is really easy to calculate all the cool things you want to.
Bayesian is not cool because estimating the parameters takes bloody ages on a supercomputer, unless you spend ages being really careful to specify your model.
Frequentist statistics is cool because it is a massive big bag of tricks to estimate all sorts of stuff, and pretty much all of the tricks are already in R.
Frequentist statistics is not so cool because calculating all the specific things you want to can be a pain in the ass.
Once either Quantum computers kick in or a better algorithm than MCMC for Bayesian is created, Bayesian will win.
There are some philosophical arguments about the objectivity of the prior in Bayesian statistics, but these wash out in a decision theoretic framework because of the subjectivity of the utility function at the other end of the process.
Also, less than 5% of people reporting p-values really know what a p-value is.
>This HTML version of is provided for convenience, but it is not the best format for the book. In particular, some of the symbols are not rendered correctly.
I would actually recommend the opposite - they have ASCII versions of the symbols that e.g. Chrome might not render correctly, and all I checked looked fine. The PDF meanwhile copies this text from the (not linked) link to the "girl named Florida" article:
And the sections are linked in the HTML version, where they are not in the PDF, which seems like a simple oversight (that infects the vast majority of PDFs, sadly).
It's really, really good, especially if you're interested in the why and not just the "give me the damn test I need to run in SPSS and what number to look at". Plus, because you spend a lot of time coding, it's more fun and less dry than most stats books.
Think Java is also excellent. I think it may have been the first (non R) programming book I read, and it helped me get more into programming which wasn't just for stats.
I have recommended think stats to many people, and it appears to be somewhat of a success. And its free documentation, which is wonderful.
The Bayesian book I want should emphasize how Bayes is a recipe for studying complex problems and teach a broad range of model ingredients. Learning Bayesian statistics is about becoming fluent in describing scientific problems in probabilistic language. This requires knowing how to express and compose traditional models and build new ones based on first principles.
An unfortunate reality is that you still need to know computational methods too, but that should change soon enough.