Hacker News new | past | comments | ask | show | jobs | submit login

Do you know if there were common mistakes for the incorrect implementations? Were they simple mistakes or more because someone misunderstood a nuance of stats?



I don't remember much specifics, but IIRC, most of the implementation related ones were due to an anti-pattern from the older a/b testing framework. Basically, the client would try and determine if the user was eligible to be in the A/B test (instead of relying on the framework), then in an API handler, get the user's assignment. This would mean the UI would think the user wasn't in the A/B test at all, while the API would see the user as in the A/B test. In this case, the user would be experiencing the 'control' while the framework thought they were experiencing something else.

That was a big one for awhile, and it would skew results.

Hmmm, another common one was doing geographic experiments when part of the experiment couldn't be geofenced for technological reasons. Or forgetting that a user could leave a geofence and removing access the feature after they'd already been given access to it.

Almost all cases boiled down to showing the user one thing while thinking we were showing them something else.


I wonder if that falls under mistake #4 from the article, or if there's another category of mistake: "Actually test what you think you're testing." Seems simple but with a big project I could see that being the hardest part.


I actually just read it (the best I could, the page is really janky on my device) I didn’t see this mistake on there and it was the most common one we saw by a wide margin in the beginning.

Number 2 (1 in the article) was solved by the platform. We had two activation points for UI experiments. The first was getting the users assignment (which could be cached for offline usage). At that point they became part of the test, but there was a secondary one that happened when the component under test became visible (whether it was a page view or a button). If you turned on this feature for the test, you could analyze it using the first or secondary points.

One issue we saw with that (which is potentially specific to this implementation), was people forgetting to fire the secondary for the control. That was pretty common but you usually figured that out within a few hours when you got an alert that your distribution looked biased (if you specify a 10:20 split, you should get a 10:20 ratio of activity).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: