Hacker News new | past | comments | ask | show | jobs | submit login

The biggest mistake engineers make is determining sample sizes. It is not trivial to determine the sample size for a trial without prior knowledge of effect sizes. Instead of waiting for a fixed sample size, I would recommend using a sequential testing framework: set a stopping condition and perform a test for each new batch of sample units.

This is called optional stopping and it is not possible using a classic t-test, since Type I and II errors are only valid at a determined sample size. However, other tests make it possible: see safe anytime-valid statistics [1, 2] or, simply, bayesian testing [3, 4].

[1] https://arxiv.org/abs/2210.01948

[2] https://arxiv.org/abs/2011.03567

[3] https://pubmed.ncbi.nlm.nih.gov/24659049/

[4] http://doingbayesiandataanalysis.blogspot.com/2013/11/option...




People often don’t determine sample sizes at all! And doing power calculations without an idea of effect size isn’t just hard but impossible. It’s one of the inputs to the formula. But at least it’s fast so you can sort of guess and check.

Anytime valid inference helps with this situation, but it doesn’t solve it. If you’re trying to detect a small effect, it would be nicer to figure out you need a million samples up front versus learning that because your test with 1,000 samples a day took three years.

Still, anytime is way better than fixed IMO. Fixed almost never really exists. Every A/B testing platform I’ve seen allows peeking.

I work with the author of the second paper you listed. The math looks advanced, but it’s very easy to implement.


The biggest mistake is engineers owning experimentation. They should be owned by data scientists.

Realize though that is a luxury, but I also see this trend in blue chip companies


Did a data scientist write this? You don't need to be a member of a priesthood to run experiments. You just need to know what you're doing.


I agree with both sides here. :) DS should own experimentation, AND engineers should be able to run a majority of experiments independently.

As a data scientist at a "blue chip company", my team owns experimentation, but that doesn't mean we run all the experiments. Our role is to create guidelines, processes, and tooling so that engineers can run their own experiments independently most of the time. Part of that is also helping engineers recognize when they're dealing with a difficult/complex/unusual case where they should bring DS in for more bespoke hands-on support. We probably only look at <10% of experiments (either in the setup or results phase or both), because engineers/PMs are able to set up, run, and draw conclusions from most of the experiments without needing us.


... and by some definition you'd be a data scientist yourself. (Regardless of your job title)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: