Bayes’ Theorem – What is it and what is it good for?

tjradcliffe · on June 26, 2015

Bayes' Theorem tells us that the quest for certain knowledge, which drove a great deal of science and philosophy in the pre-Bayesian era (before about 1990, when Bayesian methods started to gain real traction in the scientific community) is much like the alchemist's quest for the secret of transmutation: it is simply the wrong goal to have, even though it generated a lot of interesting and useful results.

One of the most important consequences of this is noted by the article: "Confirmation and falsification are not fundamentally different, as Popper argued, but both just special cases of Bayes’ Theorem." There is no certainty, even in the case of falsification, because there are always alternatives. For example, superluminal neutrinos didn't prove special relativity false, although they did provide some evidence. But the alternative hypothesis that the researchers had made a mistake turned out to be much more plausible.

Bayesian reasoning--which is plausibly the only way of reasoning that will keep our beliefs consistent with the evidence--cannot produce certainty. A certain belief is one that has a plausibility of exactly 1 or 0, and those are only asymptotically approachable applying Bayes' rule. Such beliefs would be immune from any further evidence for or against them, no matter how certain it was, essentially because Bayesian updating is multiplicative and anything times zero is still zero.

There is a name for beliefs of this kind, which to a Bayesian are the most fundamental kind of error: faith.

davmre · on June 26, 2015

> Bayesian reasoning--which is plausibly the only way of reasoning that will keep our beliefs consistent with the evidence--cannot produce certainty.

To nitpick: Bayesian updating can produce certainty, in exactly the way you suggest: multiplying by zero. If the evidence you observed has zero probability under a particular hypothesis, then the posterior probability of that hypothesis will be zero. If the evidence you observe has zero probability under all hypotheses except for one, then posterior will give probability 1 to that hypothesis (assuming it had nonzero prior probability).

This won't come up if you're stick to densities like Gaussians that are supported everywhere. And it's certainly a good principle of model design to always allow your beliefs to be changed by new evidence (consistency theorems for Bayesian inference do depend on assumptions about the support of the prior and likelihood). But there's nothing formally preventing you from designing Bayesian models that rule out hypotheses with total certainty. In fact, this is what allows classical logic to be a special case of Bayesian reasoning.

ubernostrum · on June 26, 2015

For someone who disdains beliefs which are assigned a plausibility of 1, you sure do seem eager to assign that to one of your beliefs.

mkramlich · on June 26, 2015

I awake. It is dark.

Therefore when I awake it is always dark.

Problem. Mismatch.

Turns out that if I awake between the local hours of 5am and 7pm then it is light. Otherwise it is dark. Problem. Mismatch. Turns out, it depends on the "time zone". Also turns out, depends on whether I'm sleeping inside or outside. In a hotel room or tent. Whether in a tent or in a building room with blinds. Etc. Etc. Each devil-in-the-details helps refine the case even further. But the "bet" to make is always the most "correct" bet to make, based only on the evidence observed to date, at hand. Thus Bayes.

Thus the Turing award.

It's just as perfect and reliable as that. And just as imperfect or vulnerable as that.

mkramlich · on June 26, 2015

Bayes is perfect but also painfully bloodily sharp-edged. Best description I have.

jhallenworld · on June 26, 2015

I recently started to think about connections between Bayes' Theorem and fuzzy logic:

http://sipi.usc.edu/~kosko/Fuzziness_Vs_Probability.pdf

    (also from wikipedia on fuzzy logic):

    "Bruno de Finetti argues[citation needed] that only one 
    kind of mathematical uncertainty, probability, is 
    needed, and thus fuzzy logic is unnecessary. However, 
    Bart Kosko shows in Fuzziness vs. Probability that 
    probability theory is a subtheory of fuzzy logic, as 
    questions of degrees of belief in mutually-exclusive 
    set membership in probability theory can be represented 
    as certain cases of non-mutually-exclusive graded 
    membership in fuzzy theory. In that context, he also 
    derives Bayes' theorem from the concept of fuzzy 
    subsethood. Lotfi A. Zadeh argues that fuzzy logic is 
    different in character from probability, and is not a 
    replacement for it. He fuzzified probability to fuzzy 
    probability and also generalized it to possibility 
    theory. (cf.[10])"

darkxanthos · on June 26, 2015

Thank you for sharing this! It stretched my mind a bit. :)

shoo · on June 26, 2015

Here are a few tangentially related things that may be of interest:

(i) MacKay's book on Information Theory, Inference, and Learning Algorithms: http://www.inference.phy.cam.ac.uk/itila/

(ii) Probability Theory As Extended Logic: http://bayes.wustl.edu/

(iii) Causal Calculus: http://www.michaelnielsen.org/ddi/if-correlation-doesnt-impl...

(iv) I recall reading a pretty good blog post a year or two ago that described how to implement some kind of Bayesian token recognition thing to parse screen captures from some database (or something roughly like that). The gist of the approach was like this:

1. define a model expressing that certain combinations of neighbouring tokens are more likely to occur than others 2. approximate the full Bayesian inference problem as MAP inference 3. the resulting combinatorial optimisation problem could be encoded as a relatively easy mixed integer program 4. easy mixed integer programs are very tractable to commercial solvers such as CPLEX, Gurobi, or sometimes even the open source COIN-OR CBC

At the time I found the idea fascinating as I was working with LPs/MIPs and had some interest in Bayesian inference, but hadn't figured out that the former could provide a way to computationally tackle certain approximations of the latter.

I cannot for the life of me find the link again for this.

le0n · on June 26, 2015

“Seeing the world through the lens of Bayes’ Theorem is like seeing The Matrix. Nothing is the same after you have seen Bayes.”

I'm pretty sure this is an instance of cognitive bias.

RogerL · on June 26, 2015

Sure. We all have biases. Almost everything I think about in engineering is passed through two filters: bayes, and nonlinear optimization[1]. It's enormously useful, and leads me to a lot of insights that others don't come up with. But it is a bias, and we always have to guard against it leading us down the wrong path.

[1] By this I mean I always ask myself - am I incorporating all information, and in a probabilistic (Bayesian) way. If not, have I analytically proven that I can discard the information (dimensionality reduction). If I haven't proven it, my 'go to' assumption is that information, no matter how noisy, should be incorporated until I can prove analytically or empirically that it isn't needed. In more concrete terms people endlessly hand wave "that isn't important" when I ask a question, but then I go prove it is important. It's a cheap trick in some sense, but it sure does work. Don't throw away information. Likewise, I view everything as a nonlinear estimation/optimization problem. I think in terms of manifolds and surfaces - what are my variables, what can I vary, can I vary them smoothly (is the surface locally smooth and continuous). In concrete terms, maybe you are trying to figure out what features to add to a product. Lots of choices, lots of unknowns. Can I iteratively come to an answer in an agile way, do I have to make some discontinuous jumps, what step size should I use, etc. It's all just 'mathy'. Meaning I don't have analytic equations for these decisions, but thinking about it as if it is is usually very informative.

So I 100% agree with the quote.

amelius · on June 26, 2015

Like when you are holding a hammer, everything looks like a nail.

dimino · on June 26, 2015

My biggest issue with Bayes' Theorem as a method of making everyday decisions is that it assumes the ability to accurately assess the underlying likelihoods of events taking place, especially on-the-fly.

I would even argue that it's actually providing a false sense of precision because the sig figs are oftentimes not correctly represented.

tjradcliffe · on June 26, 2015

This is not a problem with Bayes' Theorem. Any alternative method of updating beliefs will suffer from exactly the same problem of noisy inputs, and have the the additional problem that it cannot maintain consistency with all the evidence (which only Bayes' rule is capable of doing.)

Using Bayes' rule consistently will make you aware of how uncertain the inputs are, and that is a feature, not a bug.

dimino · on June 26, 2015

How will it make you aware of uncertainty? At some point you do have to guess (called "estimating" here), do you not, and that will have a compounding effect on the outcome. You incorrectly guess a probability somewhere only by a small amount, and it multiplies its way through to the result, and you're looking at a potentially huge difference in resulting probability, which could easily span the "will act" or "won't act" gap.

Houshalter · on June 26, 2015

I think it's worth keeping the principles of bayesian reasoning in mind. The idea that you should update hypotheses on evidence, and adding probability in one place takes it from everywhere else slightly, keeping track of prior probability, etc.

Not that you should actually do mental calculations on made up probability estimates. I mean you can do that, and if your estimates are at all decent, the result might be better. But I don't think anyone actually recommends that.

dimino · on June 26, 2015

Judging from the content of lesswrong, I actually think literal math is what many people do recommend, and that's what bothers me.

TeMPOraL · on June 26, 2015

In some cases you might want to actually do that math, even if you have to guesstimate numbers, since it still will beat your intuition.

http://slatestarcodex.com/2013/05/02/if-its-worth-doing-its-...

But in general, no one has enough computing power in their heads to go and do explicit bayesian updates on everything all day long. You have to pick your battles, and use the right tools for the task at hand.

davidgerard · on June 26, 2015

That post is like nobody there has ever heard the phrase "don't fall in love with your model".

Quite a lot of LessWrong posts are of the theme "my model gives this counterintuitive result" - the trouble is they go on to "AND THIS IS VERY IMPORTANT AND SIGNIFICANT!!" rather than "hmm, maybe my model needs work."

pcrh · on June 26, 2015

As you say, Bayesian statistics is not of much use when there is no prior information as to the frequency of certain events. Once any amount of such information is available, then it comes into play.

darkxanthos · on June 26, 2015

Two points:

1. Bayesian statistics allows use of "uninformative" priors. 2. Discarding your subjective beliefs is less than ideal as well as overweighting them. You have beliefs due to your experience. Weight them lightly but still use them. In the absence of much information your calculations will follow your gut. What else do you have in the absence of other information?

Using quantified subjective beliefs at least has the advantage of enabling you to make consistent choices based on what you know within a rigorously defined framework.

dimino · on June 26, 2015

I don't think it's as rigorously defined as it's purported to be. I just envision the practical application of manipulating values to produce desired results, and then post-hoc rationalizing those value manipulations to obtain a higer-than-warranted level of confidence in the result, because, "I applied rigor!"

darkxanthos · on June 26, 2015

The math is well defined and rigorous not every analysis that uses it. I think you're conflating those two things.

davidgerard · on June 26, 2015

He's talking about the practical effects of fallible humans using the version of Bayes advocated in the original post (citing Yudkowsky and Muehlhauser).

Pulling numbers out of your backside and running them through a process makes you more confident than just pulling them out of your backside directly, but I've yet to see evidence it does more than increasing your confidence.

dimino · on June 26, 2015

Oh yes, I wouldn't question the math -- it's well beyond my expertise, and I've heard of way too many implementations of the theory to refute it at a theoretical level.

thomasrossi · on June 26, 2015

that is why you basically don't want to use bayes if you have few samples (or you don't have any but you are preparing to collect them). There are much better models, the first coming to mind is Expectation Maximization, which works with series of inputs. The article per se is so confused at the end I was not sure I was reading about bayes (lol?), so I guess we are all a little puzzled.

DennisP · on June 26, 2015

> the Standard Model of particle physics explains much, much more than thunderstorms, and its rules could be written down in a few pages of programming code.

As a programmer who doesn't know advanced math, I'd really like to see that code, in literate form.

davidgerard · on June 26, 2015

This is a Yudkowskyism, i.e. don't expect to see the code, or if you do then expect it to have obvious defects.

lucb1e · on June 26, 2015

Hint: changing the font to Arial improves readability a lot and actually displays italics where the author used them.

jrgnsd · on June 26, 2015

I recently had the need for Bayes Classifier[1] in a couple of projects, so I wrote a service that exposes one through an API. You can set up your prior set and then get predictions against that set.

I haven't gone through the trouble of making it suitable for public consumption yet. Would anyone be interested in consuming such a service?

[1]: https://en.wikipedia.org/wiki/Naive_Bayes_classifier

gedrap · on June 26, 2015

Speaking of Bayes, there's a great book by Allen B. Downey 'Think Bayes' http://www.greenteapress.com/thinkbayes/ available as free PDF or (if you wish to support the author, which I did) a paperback from Amazon.

It teaches Bayes theorem accompanied with Python code examples, which I found really useful.

Pamar · on June 26, 2015

This is excellent and finally prompted me to ask how to use Bayes more in my life: https://news.ycombinator.com/item?id=9782767

mkramlich · on June 25, 2015

I'm in the middle of designing and building a system which uses Bayesian models.

One thing that struck me early is that while Bayes itself is rock solid, like arithmetic, when you go to apply it the results live or die on the quality of the models, and the relevance/realism of the evidence used to train them. GIGO.

But once you do have a good, relevant, signal-producing model, then, using it is a bit like doing a multi-dimensional lookup, or function call. Conceptually easy to understand, and, in many cases (depending, of course, on the details) cache-friendly.

shoo · on June 26, 2015

"all models are wrong, but some are useful" - Box.

I think the Bayesian approach is a good place to start, and provides a coherent way to think about things.

Pragmatically, one might end up needing to introduce a few approximations into the model, to make it computationally tractable, for example, but it is good to be able to view this in the context of what the gold-plated theoretical modelling approach would be.

Instead of doing something ad-hoc that appears to work, say.

RogerL · on June 26, 2015

You can also augment the state to take this into account.

I have a model that says my system does F with Q amount of uncertainty, and my measurements are Z with R uncertainty. But I have to give precise numbers for R, when it is just an imprecise model or SWAG. I can add to my state a parameter for how precise R is, and let the filter estimate it over time. Not always, and it is noisy, but it can be done.

There are other approaches - use a filter bank, each with a different set of assumptions. Run 'em all, and either pick one or blend them, depending on your scenario. 'Depending' being the topic of many a PhD thesis, but again, very doable in practice for many problems.