> ,association''[2] always occurs when there is causation This is incorrect. See...

godelski · 2024-08-13T23:19:49 1723591189

I think our disagreement is coming down to the interpretation and nuance of your example.

Mutual information between random variables is zero iff the two random variables are independent.

In your example, you illustrate that the MI is non-zero. Sure, it is clear that it may appear zero during sampling, but that's a different story. I fully agree that there is an opportunity to observe no association. That is unambiguously accurate. But in this scenario you presumably haven't sampled animals with damaged livers. But you can also have bad luck or improper sampling even when the likelihood of sampling is much higher! That doesn't mean that there is no association, that means there's no measured (or observed) association. The difference matters, black swans or not. Especially being experimentalists/analysts, it is critical we remember how our data and experimentation is a proxy, and of what. That they too are models. These things are fucking hard, but it's also okay if we make mistakes and I'd say the experiments are still useful even if they never capture that relationship.

If we strengthen your example to the medicine always being (perfectly) filtered out by a liver (even an impaired one) and all animals must have livers, then it does not make your case either. We will be able to prune that from the DAG. The reason being that it does not describe a random variable... (lack of distribution). I think you're right to say that there is still a causal effect, but what's really needed is to extend the distribution we are sampling from to non-animals or at least complete ones. But the point here would be that our models (experiments) are not always sufficient to capture association, not that the association does not exist.

Maybe you are talking from a more philosophical perspective? (I suspect) If we're going down that route, I think it is worth actually opening the can of worms: that there are many causal diagrams that can adequately and/or equally explain data. I don't think we should shy away from this fact (nor the model, which is a subset of this), especially if we're aiming for accuracy. I rather think what we need to do is embrace the chaos and fuzziness of it all. To remember that it is not about obtaining answers, but finding out how to be less wrong. You can defuzz, but you can't remove all fuzz. We need to remember the unfortunate truth of science, that there is an imbalance in the effort of proofs. That proving something is true is extremely difficult if not impossible, but that it is far easier to prove something is not true (a single counter example!). But this does not mean we can't build evidence that is sufficient to fill the gaps (why I referenced [0]) and operate as if it is truth.

I gripe because the details matter. Not to discourage or say it is worthless, but so we remember what rocks are left unturned. Eventually we will have to come back, so its far better to keep that record. I'm a firm believer in allowing for heavy criticism without rejection/dismissal, as it is required to be consistent with the aforementioned. If perfection cannot exist, it is also wrong to reject for lack of perfection.

mjburgess · 2024-08-14T10:39:22 1723631962

I'm not sure what you mean by association here then.

If you mean to say that there are, say, an infinite number of DAGs that adequately explain reality -- and in the simplest, for this liver-kideny case, we don't see association ---- but in the "True DAG" we do.. then maybe.

But my point is, at least, that we dont have access to this True model. In the context of data analysis, of computing association of any kind, the value we get -- for any reasonable choice of formulae -- is consistent with cause or no cause.

Performing analysis as-if you have the true model, and as-if the null rival is just randomness, is pseudoscience in my view. Though, more often, it's called frequentism.

godelski · 2024-08-15T07:59:33 1723708773

  > what you mean by association here then.

Mutual information

  > but in the "True DAG"

I'm unconvinced there is a "true" DAG and at best I think there's "the most reasonable DAG given our observations." For all practical purposes I think this won't be meaningfully differentiable in most cases, so I'm fine to work with that. Just want to make sure we're on the same page.

  > But my point is, at least, that we dont have access to this True model.

Then we're in agreement, but it's turtles all the way down. Everything is a model and all models are wrong, right? We definitely have more useful models, but there is always a "truer" model.

Why I was pushing against your example is because I think it is important to distinguish lack of association because the data to form the association is missing or unavailable to us (which may be impossibly unavailable; and if we go deep enough, we will always hit this point) vs a lack of association because the two things are actually independent[0]. One can be found via better sampling where the other will never be found (unfortunately indistinguishable from impossibly unavailable information).

  > as-if you have the true model

Which is exactly why I'm making the point. We never have (or even have access to!) the "true" model. Just better models. That's why I say it isn't about being right, but less wrong. Because one is something that's achievable. If you're going to point to one turtle, for this, I think you might as well point to the rest. But there's still things that aren't turtles.

[0] I'll concede the to an argument of "at some point" everything is associated tracing back in time. Though I'm not entirely convinced of this argument because meta information.