Sidebar to Jakob Nielsen's column Putting A/B Testing in Its Place .
Say we have an e-commerce site with a conversion rate of 2%, and we want to see if that will improve with a bigger Buy button. We then create a test version of the site with a bigger button and expose it to 1.5 million visitors. As a result, we record 30,300 sales instead of the expected 30,000 sales. In other words, sales increased by 1%, corresponding to a new conversion rate of 2.02%.
Is bigger really better? Well, if the conversion rate had stayed at 2.00%, the probability of recording at least 30,300 sales would only have been 4%. Give this, it's unlikely that the extra 300 sales were simply a random fluctuation. Usually, any time the probability for the observed outcome is less than 5%, we reject the possibility that there's been no change. In other words, we conclude that the revised design is in fact better.
As the example shows, if we have enough website traffic, we can get statistically significant results for very small effects, such as a 1% increase in sales. In fact, if we can increase sales by 2%, we get significant results with 340,000 users; a 10% increase in sales would show significant results with only 14,000 users.
Who cares about a miserly 1% increase in sales? Well, for Amazon.com, with sales of $6.9 billion in 2004, a 1% sales increase would amount to $69 million. OK, Amazon is the biggest, but take a smaller site like eBags.com, with estimated annual sales of $40 million. Here, 1% amounts to $400,000 -- still more than enough to justify having a graphic designer spend an hour creating a bigger button and adding the bit of code required to collect data.
eBags gets three million unique visitors per month, so if it showed all users the bigger button, it would take two weeks to collect sufficient data. If eBags ran a true A/B test, giving half the users a bigger button and the other half the original button, the study would take a full month. (The split test is recommended to control for extraneous factors: a luggage site, for example, might sell more during a month in which many people take vacations.)
As discussed in the main article, qualitative user studies are better at finding big effects, such as changes that can double your sales. Obviously, it's better to grow sales by 100% than 1%, but sooner or later you'll have picked all the low-hanging fruit. When the time comes to look for small improvements, the A/B test can prove what's best, even when it's only a tiny bit better.