Think Bayes - Bayesian Statistics Made Simple

equark · on Oct 10, 2012

My problem with books like this is that they have almost no connection to why Bayesian statistics is successful: Bayesian statistics provides a unified recipe to tackle complex data analysis problems. Arguably the only known unified recipe.

The Bayesian book I want should emphasize how Bayes is a recipe for studying complex problems and teach a broad range of model ingredients. Learning Bayesian statistics is about becoming fluent in describing scientific problems in probabilistic language. This requires knowing how to express and compose traditional models and build new ones based on first principles.

An unfortunate reality is that you still need to know computational methods too, but that should change soon enough.

AllenDowney · on Oct 10, 2012

Yes, that's exactly what the objective of this book is! I am not using computation out of necessity, but rather because I think it provides leverage for understanding the concepts, and learning to (as you say) compose traditional models and build new ones.

As the book comes along, I am finding that many ideas that are hard to explain and understand mathematically can be very easy to express computationally, especially using discrete approximations to continuous distributions.

For example, I just posted a section on ABC

http://www.greenteapress.com/thinkbayes/html/thinkbayes008.h...

that (I think) really demonstrates the strength of this approach.

Of course, my premise only applies for people who are as comfortable with programming as with math, or more so.

equark · on Oct 10, 2012

I'd recommend using as many real examples as possible. Things like forecasting, product recommendations, topic modeling, etc. While you can conceptually explain how Bayesian statistics is a unified recipe, it's incredibly hard to have this sink in with toy problems. This is especially true since many people using traditional tools are actually using advanced methods to solve real problems, so when they start reading about urns or doors it all comes across as rather academic. That's sad because the benefit of Bayesian coherency is mostly that it leads to a highly productive mode of practical data analysis.

Definitely shoot me an email at tristan@senseplatform.com if you're interested in the computational side of this area. At Sense (http://www.senseplatform.com), we're working on making applied Bayesian analysis as amazing as it should be.

loup-vaillant · on Oct 10, 2012

E.T. Jaynes book, "Probability Theory: the Logic of Science" may come close to what you want. It emphasize that there are rules of thought, which lead to Bayesian statistics. As such, Bayesian statistics aren't just a recipe, but the law.

Now, I can only personally vouch for the first 2 chapters, as I haven't read the rest yet.

avaku · on Oct 10, 2012

The greatness of this book cannot be overstated

ph0rcyas · on Oct 10, 2012

Agree. It is unbelievable - one has to study it to believe it.

danso · on Oct 10, 2012

Going to Amazon right now...

* edit: Doh, no Kindle version. I don't mind paying $90+ for a good book though, just like it to be electronic: http://www.amazon.com/Probability-Theory-The-Logic-Science/d...

dan_yall · on Oct 10, 2012

Free pdf is available here:

http://bayes.wustl.edu/etj/prob/book.pdf

It's always nice to see good things come out of Wash U. (Alum here.)

Fixnum · on Oct 10, 2012

Unfortunately, it's only the first 95 pages.

gwern · on Oct 10, 2012

There must be a fuller version floating around, though; my PDF version has 548 pages and ends with Appendix E, 'Multivariate Gaussian Integrals'.

EDIT: In case anyone wants to make me feel bad about pirating, Jaynes is dead, and besides that, I bought a hardcopy as backup.

Wilduck · on Oct 10, 2012

I found the full text here:

http://www.naturalthinker.net/trl/texts/Science/Jaynes,%20E....

The first couple pages are a bit funny looking, but after that, there are all 500+ pages. It was the fourth result on Google for me.

incision · on Oct 10, 2012

Well, it's available on Google Books [1], but I don't know about $63 for what appears to be a skewed scan of the print book.

Personally, I searched out a PDF and based on what I've read so far, I'm itching to pull the trigger on Amazon as I'm simply loving what I'm reading.

1: http://goo.gl/UHMBi

Datonomics · on Oct 11, 2012

http://www.naturalthinker.net/trl/texts/Science/Jaynes,%20E....

coderintherye · on Oct 10, 2012

So, I'm going to counter here and say I don't find this to be a good intro. I started reading and had not heard of the "Girl named Florida" problem and then went to the linked to blog post http://allendowney.blogspot.com/2011/11/girl-named-florida-s...

The way he explains it I found to be confusing and counter-intuitive. I've taken basic stats in college, and learned some of the associated problems, though not this one, and learned the material though not in this particular way. I have to agree whole-heartedly with the commenter on that post "JeffJo" who stipulates why it's an ineffectual way to present the material. Furthermore, I found the author's dismissal of the valid criticism to be enough to not want me to read further.

AllenDowney · on Oct 10, 2012

I am coming around to the conclusion that this example is more trouble than it's worth. I think it's kind of fun, but it does seem to annoy people.

This kind of feedback is exactly why I like to post drafts early. Expect this example to magically disappear very soon :)

trhtrsh · on Oct 11, 2012

The problem with the Girl Named Florida is that the ambiguous wording is more confusing than the math.

Ambiguous: "In a family with two children, what are the chances, if one of the children is a girl named Florida, that both children are girls?"

More clear, and emphasizing the importance of precise wording when discussing probability: "Among families with two children, with at least one of the children being a girl named Florida, what portion have two girls? (Assume that all names are chosen randomly from the same distribution, independently of all other factors; and sex is determined as by a fair coin toss.)"

coderintherye · on Oct 10, 2012

Wow, great to see a reply from you, and thanks for taking feedback :)

AllenDowney · on Oct 10, 2012

And... it's gone!

I re-read the chapter and decided that example was doing nothing except confusing half the audience and antagonizing the other half.

markus2012 · on Oct 11, 2012

Underneath Figure 4.2:

I want to addresss on possible source

should be

I want to addresss one possible source

(on -> one)

Peace.

AllenDowney · on Oct 11, 2012

Fixed. Thanks!

spin · on Oct 10, 2012

I agree that his first example, "The Girl Named Florida" was a confusing example.

I feel pretty comfortable with Bayesian statistics, and I thought the other examples that I saw were pretty clear. But his very first example jumps you out to another webpage, and then he mixes it with "the red-haired problem". It was irritating.

His next example, "The Cookie Problem" is the classic intro-to-Bayes example, IMO.

gknoy · on Oct 10, 2012

As someone with nearly zero knowledge of statistics or Bayes' Theorem, I agree: the cookie problem was a very clear example to follow. The "Girl named Florida" solution, while interesting, probably doesn't work as well as a textbook example, at least not in that stage of learning.

Reading the Florida problem solution, it made some sense, but was definitely of a higher level of complexity than the rest of the text.

What I found really interesting was that the answers to some of the other questions on the "Girl Named Florida" discussion required knowledge which I would not have considered general math-ish knowledge:

> If the parents have brown hair > and one of their children has red hair, > we know that both parents are heterozygous, > so their chance of having a red-haired girl is 1/8.

Interesting to learn, but this "if you also happen to know this ..." step is something that was mildly frustrating.

(edit: Since the grandparent post linked the Girl Named Florida blog post, I guess I don't need to.)

jeffjo · on Oct 20, 2012

Oops, I only cut-and-pasted half of what I wanted. This comes after my other reply.

Yes, these sorts of problems can be confusing. But the confusion is propagated by educators who refuse to recognize that what they asked is not what they intended to ask, and so they provide inconsistent answers.

Say you are on a game show, and pick Door #1. The host opens door #3 to show that it does not have the prize, and offers to let you switch to door #2. Should you? Most people will initially reason that door #3 is prize-less 2/3 of the time, evenly split between cases where the prize is behind door #1 and door #2. So it would be pointless to switch. But that is wrong. Few educators will explain why by solving the problem rigorously. They will use an analogy like pointing out how the original choice is right only 1/3 of the time, and since the host can always open a prize-less door, that can’t change.

People don’t believe these educators because their 1/2 answer is indeed more rigorous than the analogy. It just makes a mistake. The probabilities to use are not the probabilities that the cases exist, but the probabilities that the observed result would occur. The existence probabilities are the same, but the probability of the observed result when the initial door was correct is half of what it is when the initial choice was incorrect.

jeffjo · on Oct 20, 2012

Yes, these sorts of problems can be confusing. But the confusion is propagated by educators who refuse to recognize that what they asked is not what they intended to ask, and so they provide inconsistent answers.

Say you are on a game show, and pick Door #1. The host opens door #3 to show that it does not have the prize, and offers to let you switch to door #2. Should you? Most people will initially reason that door #3 is prize-less 2/3 of the time, evenly split between cases where the prize is behind door #1 and door #2. So it would be pointless to switch. But that is wrong. Few educators will explain why by solving the problem rigorously. They will use an analogy like pointing out how the original choice is right only 1/3 of the time, and since the host can always open a prize-less door, that can’t change.

People don’t believe these educators because their 1/2 answer is indeed more rigorous than the analogy. It just makes a mistake. The probabilities to use are not the probabilities that the cases exist, but the probabilities that the observed result would occur. The existence probabilities are the same, but the probability of the observed result when the initial door was correct is half of what it is when the initial choice was incorrect.

udit99 · on Oct 10, 2012

I'm really interested in knowing the prereqs I should have before picking up a book like this. Coming from a weak math background I find these books highly appealing but mildly intimidating. Also, could someone advise me on the preferred order of tackling the following Books?

1. Think Bayes

2. Think Stats

3. Programming Collective Intelligence by T.Segaran

jey · on Oct 10, 2012

3, 1, 2.

3: Get (back) into the swing of thinking about mathematics and algorithms.

1: Bayesian statistics is a principled, coherent, consistent, intuitive, complete framework for reasoning about uncertainty. A good foundation.

2. Traditional statistics is more random and ad-hoc, but can be more practical than Bayesian methods. (Bayesian models are well-motivated, but it can be impractical to compute exact answers and you'll have to switch to approximation techniques, some of which are simple/universal/slow, and others get fairly complex.)

AllenDowney · on Oct 10, 2012

I need to write a preface to answer this question, but the most important prereq is Python programming. The premise of the series is that if you can program (in any language) you can use that skill as leverage to learn about other topics.

So I would recommend

1. Think Python 2. Think Stats 3. Think Bayes

spin · on Oct 10, 2012

If you are strong in Python but weak in math, then I would also recommend #3, (Collective Intelligence, by Segaran)

pmjordan · on Oct 10, 2012

So this is all very well and good, I've had about 5 intros to Bayesian Statistics. But those are a fair bit away from actually applying that knowledge in practice in software.

Let's say we have N different kinds of events with unknown probabilities and unknown dependence or independence between them. The naive approach to gathering data on the probability of event n occurring following an occurrence of event m would require O(N²) in space. Let's say N ~ 10⁹~10¹⁰. Storing that much data as a raw matrix isn't practical in most cases, so we have to find a more efficient data structure - in terms of both space and the operations we need to perform. (and taking into account characteristics of the storage medium, i.e. memory or disk or a combination) What happens if the probabilistic properties of the system change over time?

Are there any introductory books or other resources on modeling this kind of problem? Clearly this has been tackled before, but I'm having a hard time making the leap from theory to practice - and I don't mean import the data into R or SPSS or whatever and let that grind out a solution, but coming up with approximations when you have runtime and space constraints that make that approach impractical.

trhtrsh · on Oct 11, 2012

I think the general approach is dimensionality reduction: start measuring, and round down to 0 for the low-correlation pairs of events.

Do you actually have a stream of more than N^2 observations to process? If not, then most of your correlations are in fact 0, and sparse-matrix techniques apply.

hessenwolf · on Oct 11, 2012

Sounds like you need a Bayesian network - the junction tree algorithm.

You can send the cheque in the mail. ;)

PostOnce · on Oct 10, 2012

I'm a big fan of the author's other books, Think Python and Think Complexity (haven't had the time for Think Stats), I found them more understandable than most other books that purport to teach people of the same skill level.

I'm hoping this will be as good, but all the negative comments here leave me skeptical. Perhaps this is the crowd that would enjoy K&R C more than Think Python. The former is more of a reference to me than an introductory tome. Perhaps everyone here is just better at math than I am.

AllenDowney · on Oct 10, 2012

Ignore the haters -- Think Bayes is going to be awesome!

Just kidding (mostly), but your point is correct: there is no book that is right for all audiences. But if you can program, and the mathematical approach to this material doesn't do it for you, this book might.

SkyMarshal · on Oct 10, 2012

FB announcement where I found this - https://www.facebook.com/thinkstats/posts/325778617519425

roryokane · on Oct 10, 2012

Link to Think Stats, the author’s previous book: http://thinkstats.com/

hessenwolf · on Oct 10, 2012

Bayesian is cool because you can make arbitrarily complex models, and when you have the parameters estimated it is really easy to calculate all the cool things you want to.

Bayesian is not cool because estimating the parameters takes bloody ages on a supercomputer, unless you spend ages being really careful to specify your model.

Frequentist statistics is cool because it is a massive big bag of tricks to estimate all sorts of stuff, and pretty much all of the tricks are already in R.

Frequentist statistics is not so cool because calculating all the specific things you want to can be a pain in the ass.

Once either Quantum computers kick in or a better algorithm than MCMC for Bayesian is created, Bayesian will win.

There are some philosophical arguments about the objectivity of the prior in Bayesian statistics, but these wash out in a decision theoretic framework because of the subjectivity of the utility function at the other end of the process.

Also, less than 5% of people reporting p-values really know what a p-value is.

zmjones · on Oct 10, 2012

Definitely agree. Do check out Stan though. HMC is pretty fast compared to BUGS/JAGS.

trhtrsh · on Oct 11, 2012

Bayesian statistics gives you a subjective answer to your question. (conditioned on the prior you choose)

Frequestist statistics gives you an objective answer to a question that has the same words as the one you asked, but arranged differently.

juanfatas · on Oct 10, 2012

I found Udacity's class (CS373, ST101) and the 2011 ai class also explained bayesian very well by Sebastian Thrun.

Groxx · on Oct 10, 2012

>This HTML version of is provided for convenience, but it is not the best format for the book. In particular, some of the symbols are not rendered correctly.

I would actually recommend the opposite - they have ASCII versions of the symbols that e.g. Chrome might not render correctly, and all I checked looked fine. The PDF meanwhile copies this text from the (not linked) link to the "girl named Florida" article:

  ❤tt♣✿✴✴❛❧❧❡♥❞♦✇♥❡②✳❜❧♦❣s♣♦t✳❝♦♠✴✷✵✶✶✴✶✶✴❣✐r❧✲♥❛♠❡❞✲❢❧♦r✐❞❛✲s♦❧✉t✐♦♥s✳❤t♠❧

And the sections are linked in the HTML version, where they are not in the PDF, which seems like a simple oversight (that infects the vast majority of PDFs, sadly).

ninetax · on Oct 10, 2012

Has anyone used the Think Stats book? Is it a good intro to stats?

stdbrouw · on Oct 10, 2012

It's really, really good, especially if you're interested in the why and not just the "give me the damn test I need to run in SPSS and what number to look at". Plus, because you spend a lot of time coding, it's more fun and less dry than most stats books.

disgruntledphd2 · on Oct 10, 2012

Think Java is also excellent. I think it may have been the first (non R) programming book I read, and it helped me get more into programming which wasn't just for stats.

I have recommended think stats to many people, and it appears to be somewhat of a success. And its free documentation, which is wonderful.

jcrubino · on Oct 14, 2012

I think you and Prof Scott Page could make a great team combining your think series with his model thinking class content.

jcrubino · on Oct 14, 2012

U Mich. Prof Scott E. Page http://masi.cscs.lsa.umich.edu/~spage/

signa11 · on Oct 10, 2012

thank you ! should help in coursera's pgm course somewhat i guess