Hacker News new | past | comments | ask | show | jobs | submit login

> I use this method often for prediction applications. First, it’s a sort of hyper parameter selection, so you should obviously use a holdout and test set to help you make a good choice.

What the article is talking about is inference, not prediction. It's a different problem domain, it's not about telling a company whether design A or B leads to more engagement, it's about finding out about the (true!) causal drivers of that difference. The distinction may seem subtle but it's important. The key problems outlined all talk about common (frequentist) statistical tests and how they get messed up by variable selection. Holdout sets don't address this, because if the holdout set comes from the same distribution as the test set (as it should), the biases would be the same there. Bayesian inference isn't a panacea either, the core problem is structuring the model based on the data and then drawing conclusions about their relationships (Bayesian analysis gives you tools to help avoid this, but comes with its own set of traps to fall into, such as the difficulty of finding truly non-informative priors).




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: