Multi-armed bandit isn't an algorithm, it's a model of how to view the problem. Like it or not, the problem web designers face fits the multi-armed bandit model pretty well. The algorithm called "MAB" in the article is one of many that have been developed for multi-armed bandit problems. Traditionally, the "MAB" of this article is known as "epsilon-greedy".
The point of multi-armed bandit situations is that there is a trade-off to be made between gaining new knowledge and exploiting existing knowledge. This comes up in your charts - the "MAB"s always have better conversion rates, because they balance between the two modes. The "A/B testing" always gain more information quickly because they ignore exploitation and only focus on exploration.
I should say also that multi-armed bandit algorithms also aren't supposed to be run as a temporary "campaign" - they are "set it and forget it". In epsilon-greedy, you never stop exploring, even after the campaign is over. In this way, you don't need to achieve "statistical significance" because you're never taking the risk of choosing one path for all time. In traditional A/B testing, there's always the risk of picking the wrong choice.
You aren't comparing A/B testing to a multi-armed bandit algorithm because both are multi-armed bandit algorithms. You're in a bandit situation either way. The strategy you were already using for your A/B tests is a different common bandit strategy called "epsilon-first" by wikipedia, and there is a bit of literature on how it compares to epsilon-greedy.
This comment just sold me on MAB. You can just keep on throwing variations on a design at the system without having to make tenuous decisions. I hope all the A/B tools implement this soon.
It seems like most people who use these content optimization tools don't really understand the statistics involved. What are your thought on this? How do you educate your users on the merit of your approach vs a/b testing when the topic is so complex?
Also, despite this being a slightly pro a/b testing post, I have to say it's actually made me more interested in trying out Myna's approach MAB algorithm.
Same way every product from GWO and T&T on down: show a pretty graph that ignores the underlying assumption that it's even possible to use statistics to conjure certainty from uncertainty, and trust that users will never know or care about the difference.
/former AB test software dev who fought my users to try to stop them from misinterpretation results, and failed.
If it gives you comfort, if there is a significant underlying difference and the calculations are done right, with high probability they will get the right answer even though they are misunderstanding the statistics.
Acceptance of this fact has avoided a lot of potential ulcers for me.
Just to be clear, we don't have anti-MAB stance or pro-A/B testing. The point was that MAB is not "better" as an earlier article titled (20 lines of code that beat A/B testing) had claimed. These methodologies clearly serve two different needs.
I should note here that if you use Myna, you will be using a much better multi-armed bandit approach than the epsilon-greedy which lost in this blog post.
See my longer top-level comment for some of the trade-offs.
The point of multi-armed bandit situations is that there is a trade-off to be made between gaining new knowledge and exploiting existing knowledge. This comes up in your charts - the "MAB"s always have better conversion rates, because they balance between the two modes. The "A/B testing" always gain more information quickly because they ignore exploitation and only focus on exploration.
I should say also that multi-armed bandit algorithms also aren't supposed to be run as a temporary "campaign" - they are "set it and forget it". In epsilon-greedy, you never stop exploring, even after the campaign is over. In this way, you don't need to achieve "statistical significance" because you're never taking the risk of choosing one path for all time. In traditional A/B testing, there's always the risk of picking the wrong choice.
You aren't comparing A/B testing to a multi-armed bandit algorithm because both are multi-armed bandit algorithms. You're in a bandit situation either way. The strategy you were already using for your A/B tests is a different common bandit strategy called "epsilon-first" by wikipedia, and there is a bit of literature on how it compares to epsilon-greedy.
http://en.wikipedia.org/wiki/Multi-armed_bandit#Common_bandi...