For those who are curious, Amazon uses a backend service to control what you see. They randomly hash your customer ID (which you can never see) and then assign you to a bucket. It's a sophisticated A/B testing framework that then allows them to measure and compare pretty much everything about the two buckets of people. They can allocate people between A/B pretty dynamically. Most rollouts start with 5% of people and move up from there. Since it's tied to your customer ID, they can retroactively go look at any other variables they didn't think to look at initially just by knowing whether you were an A or a B and the times when you were in which.
It's also important to be a "random" hash because you have hundreds or thousands of these experiments going on at the same time, and you want to rule out spurious relationships (e.g. other experiments being a confounding factor).
What metric do they use to evaluate the success of an A/B test? Generally straight revenue, though sometimes average order size or category spend will enter into the mix, too.
It's also important to be a "random" hash because you have hundreds or thousands of these experiments going on at the same time, and you want to rule out spurious relationships (e.g. other experiments being a confounding factor).
What metric do they use to evaluate the success of an A/B test? Generally straight revenue, though sometimes average order size or category spend will enter into the mix, too.