Occam's razor, as stated by GP is not about correctness, but tractability. In fa...

philh · on July 6, 2016

> (A1 & A2 & ... & Ak) -> H

This is backwards. It should be

    H => (A1 & A2 & ... & Ak)

It's not "if these assumptions hold, the hypothesis is true". It's "for this hypothesis to be true, these assumptions must hold".

Suppose you have the hypothesis that Bruce Wayne is Superman. Then you see the two of them in the same room together. It's still possible that Bruce Wayne is Superman, but only if he has an identical twin. Your credence that Bruce Wayne is Superman should decrease accordingly.

asQuirreL · on July 6, 2016

At least in the terminology I'm used to, of mathematical proof, an assumption is a part of the context under which a thing is proven. So having more assumptions weakens the claim (and there is an associated weakening rule [1]).

In other words, the claim "Assuming Q, I prove P" does not mean (to me) that Q must hold in order for P to hold, but rather that one way to show that P is true is to show that Q is true.

[1]: https://en.wikipedia.org/wiki/Structural_rule

vog · on July 6, 2016

Umm .. no.

Assumptions are the left-hand side of an implication, by definition. (And the right-hand side is called "conclusion".)

The relevant statement here is not "for this hypothesis to be true, these assumptions must hold".

It is: "for this hypothesis to be derived this way, these assumptions must hold".

There is always the possibility that a hypothesis can be proved in a different way from different assumptions.

Unless, of course, your theory not only proves "(A1 & A2 & ... & Ak) -> H" but "(A1 & A2 & ... & Ak) <-> H". That is, if your theory shows that your hypothesis does not only follow from the assumptions, but is equivalent to its assumptions. That's quite a rare case, though.

philh · on July 6, 2016

If you see Bruce Wayne and Superman in the same room, then "Bruce Wayne is Superman" can only be true if you assume something you didn't have to assume before. It means you should be less confident that Bruce Wayne is Superman.

I'm using the word "assumption" in a natural way. (Also in the way that it's used in Occam's razor.) If you have a definition that says I'm using it wrong, then your definition is silly.

vog · on July 6, 2016

This example is totally unclear to be. Although you declared a clear hypothesis in your very first comment, it is totally unclear what exactly your assumptions are that would lead to this hypothesis.

Retric · on July 6, 2016

You can form a hypothesis without basing it on anything. You could for example randomly generate 1 billion sentences and then try to test if they are true.

vog · on July 6, 2016

This is not what is meant by "hypothesis" in Occam's razor, which is about hypotheses that are based on actual assumptions (and using these assumptions to pick a "best" hypothesis).

philh · on July 6, 2016

Okay, it sounds like what you call assumptions, I would call "data". Or "background data" or something.

If I think Bruce Wayne is Superman, I might base that on the fact that they're both physically very fit; that one would need to be very rich in order to have the kind of technology that is indistinguishable from alien powers; that Bruce Wayne's parents were murdered, and this could conceivably draw him to a life of fighting crime, which is a thing Superman does.

That sort of thing leads me to form the hypothesis: "Bruce Wayne is Superman".

But that sort of thing isn't what Occam's razor is about. It's about things that we haven't observed to be true, but which would need to be true for the hypothesis to hold. You should prefer a hypothesis that requires fewer such things.

If I see Bruce Wayne and Superman in the same room, then in order for Bruce Wayne to be Superman, he must have an identical twin. I haven't observed him to have one, but that's what the hypothesis requires. Accordingly, my confidence in the hypothesis decreases.

Retric · on July 6, 2016

The initial hypothesis is only a starting point. When building a model where 'mice are smarter than humans' you need to account for all the evidence out there.* Compared to the model where 'humans are smarter than mice' it's vastly more complex or vastly less testable.

* I have heard this defined as hypothetical baggage or implicit baggage. ie. if CO2 is not increasing temperature then why not?

vog · on July 6, 2016

Sorry for the mathematical nitpick here, but that seems to me like a strawman. You silently moved from the original question:

    What is the probability that the hypothesis is correct?

To the very different question:

    What is the probability that the implication "from the assumptions follows the hypothesis" is correct?

Moreover, this different question has a clear answer for every logically consistent theory: It is 1, because it is always true!

Why? Because that's exactly what the theory proves logically. The theory can't tell you whether A1, ..., Ak are all true in the real world, but it does tell you that _if_ these are true, H is also true.

So this is really a typical strawman argument (although maybe unintendedly): It is different from the original question, and it boils down to a trivial but misleading answer.

------------------

Going back to the original question, you'd have to compare the two hypotheses H1 and H2, where the set of assumptions of H1 are a strict subset of the assumptions of H2:

    A1 & A2 & ... & Ak -> H1
    A1 & A2 & ... & Ak & ... & An -> H2

It is clear that:

    P(A1 & A2 & ... & Ak) > P(A1 & A2 & ... & Ak & ... & An)

But from here it is surprisingly hard to conclude "P(H1) > P(H2)", because we have implications and not equivalences. That is, H1 may be true even though the assumptions don't hold. It may be true for different reasons and derived from a different set of assumptions that turn out to be true. Same for H2. So we need to take into account the probabilities for H1 and H2 to be "true for different reasons", which we'll name Pd1 and Pd2:

    Pd1 = P(not(A1 & A2 & ... & Ak) & H1)
    Pd2 = P(not(A1 & A2 & ... & Ak & ... & An) & H2)

To prove the probability variant of Occam's razor, we need to make the following additional meta-assumption: The probabilities that H1 and H2 are "true for different reasons" are very small, and moreover almost identical. So we have:

    Pd1 = Pd2

But with that meta-assumption, we can finally prove the probability variant of Ocamm's razor, as we can now express P(H1) and P(H2):

    P(H1) = P(not(A1 & A2 & ... & Ak) & H1) + P((A1 & A2 & ... & Ak) & H1)
          = Pd1 + P((A1 & A2 & ... & Ak) & H1)
          = Pd1 + P(A1 & A2 & ... & Ak)
          = Pd2 + P(A1 & A2 & ... & Ak)
          > Pd2 + P(A1 & A2 & ... & Ak & ... & An)
          = Pd2 + P((A1 & A2 & ... & Ak & ... & An) & H2)
          = P(not(A1 & A2 & ... & Ak & ... & An) & H2) + P((A1 & A2 & ... & Ak & ... & An) & H2)
          = P(H2)

In short:

    P(H1) > P(H2)

asQuirreL · on July 6, 2016

You are right, they are different questions, but the straw man was not intentional, I thought the original phrasing was ambiguous enough that it could be interpreted in both ways ;)

In other words, it was unclear to me what the answer to the question "Are the assumptions part of the hypothesis?" was. If, as I did, we assume that "yes, they are" then I don't think it follows that the probabilities will both be `1`, because we do not have logical proofs for the claims, the implication could only be true in the model (they are not necessarily entailments).

The waters are muddied further still when the hypothesis itself is phrased as an implication.

EDIT

It also strikes me that for your line of reasoning to hold, it is not sufficient that Pd1 = Pd2 are small, but instead `Pd1 = 0 = Pd2`, in order to justify this line:

    > = Pd1 + P((A1 & A2 & ... & Ak) & H1)
    > = Pd1 + P(A1 & A2 & ... & Ak)

Which is tantamount to saying

    (A1 & A2 & ... & Ak) <-> H1
  & (A1 & A2 & ... & Ak & ... & An) <-> H2

Is it not?

EDIT (2)

Ignore that, it is not tantamount, it is a weaker condition.

vog · on July 6, 2016

Maybe the final part of the proof I gave is a bit dense, so here are some additional notes.

First of all, if you know that

    (A1 & A2 & ... & Ak) -> H1

then the following two terms are logically equivalent:

    A1 & A2 & ... & Ak
    (A1 & A2 & ... & Ak) & H1

Also, for the proof which I gave it is sufficient that Pd1 = Pd2. It does not need them to be zero.

asQuirreL · on July 6, 2016

Ah yes, I see. I guess I was looking for a place where the fact that `Pd1` and `Pd2` were small, but I guess that's not necessary.

vog · on July 6, 2016

No, the "both are small" was just meant to be a justification for assuming Pd1=Pd2.