Option 1 will NOT give you the correct answer. You CANNOT use confidence intervals as a stopping criteria. If you do this, you end up running many tests, and then you need to apply a multiple test correction to account for this. Otherwise you run a VERY HIGH risk of picking the wrong result.
I emphasize, because this is a common problem made by A/B test practitioners. For a fuller discussion of the problems, check out the papers by Armitage (frequentist) and Anscombe (Bayesian) on the topic. Or see my summary of the issue here:
I emphasize, because this is a common problem made by A/B test practitioners. For a fuller discussion of the problems, check out the papers by Armitage (frequentist) and Anscombe (Bayesian) on the topic. Or see my summary of the issue here:
http://blog.custora.com/2012/05/a-bayesian-approach-to-ab-te...