Jakob Nielsen's Alertbox, November 21, 2011  

Accuracy vs. Insights in Quantitative Usability

Summary:
Better to accept a wider margin of error in usability metrics than to spend the entire budget learning too few things with extreme precision.

Last week, I made a slide for the new User Experience (UX) Basic Training course with the recommended number of test users for different types of studies. I like teaching foundational courses because they afford me just this kind of opportunity — to distill 25 years of usability process research into a single table. Patterns crystallize when complex topics are condensed to the essence.

For example, why do we recommend testing more users for card sorting than for usability studies? Because the usual rule, "we're testing the system, not you," doesn't apply to card sorting. When eliciting mental models, we're actually testing the individual users instead of a predefined artifact, and the variability is thus larger.

The thing that surprised me most about my own table: I recommend doing most quantitative user testing with a sample size that typically entails a 19% margin of error.

19% sounds sloppy. How come a fairly low level of accuracy usually suffices in estimating usability metrics?

Two reasons:

These mathematical points suffice to defend the idea of saving budget and limiting quantitative studies to mid-sized samples.

But there are two deeper arguments that are even more important.

Focus on Big Problems

You shouldn't care about small issues in usability. At this stage, we still have bigger fish to fry. When redesigning a website for usability, the average improvement in key performance indicators (KPI) is 83%. Clearly, most websites still contain horrible usability problems. Intranets and mobile sites/apps are often even worse.

Your focus should thus be on the really big design problems, where your user experience is failing to meet customer needs. Typically, there are only a few issues with immense bottom-line impact. Better to invest heavily in those crucial improvements than mess around with changes that'll gain you only a percent or two.

Wasting your budget on overly precise measurements can easily sidetrack you from the important issues; for sure, you'd have less budget left over to work on them.

Maybe in 20 years, user interfaces will be good enough that our only remaining goal will be to fine-tune them for the last few percents' quality gain. That's definitely not the case today.

Ask More Questions: Learn More

If you ask only one question, you'll get only one answer. That's why it's better to allocate any given budget across a wider range of user research, as opposed to spending it all on getting an ever-tighter confidence interval for a single metric.

Worse yet, if you have only that one answer, you might not know what the real question is. In that sense, quantitative usability studies are like a game of Jeopardy. Your study might tell you that the answer is 42 — but why? How should you change the design to score 50 next time?

That's why I recommend investing instead in parallel and iterative design, which exposes diverse user interface solutions to the harsh light of user testing. Of course, with more studies, each one must be smaller, but that's okay because your insights will sum across the studies. More research = more questions = more answers = better design.

All that said, there are still cases in which it pays to spend on "deluxe usability" — mainly in those rare organizations that have reached a high maturity level with respect to user experience methodologies.

From Small Studies, Big Oaks Will Grow

One final argument in favor of keeping each study at an affordable size is the value of cumulative insights across studies. Year after year, as you keep doing research on your site and your customers, you'll accumulate learnings.

For example, Nielsen Norman Group has tested 1,600 websites with 4,090 users across our various research studies and client projects. Although we haven't tested each individual site with hundreds of users, we've observed many key user behaviors thousands of times. So, when we say, for example, that users tend to scan content on websites and read even less on mobile sites, that finding doesn't arise from just one study which might conceivably include 20 participants who were all particularly reluctant to read.

When you see the same behavior on many different sites, with many different user profiles, the evidence mounts and becomes much stronger than the confidence interval for each of the contributing studies.

Learn More

Full-day User Experience (UX) Basic Training course and a seminar on Managing User Experience Strategy at the annual Usability Week conference.


> Other Alertbox columns (complete list)
> Sign up for newsletter that will notify you of new Alertboxes

Copyright © 2011 by Jakob Nielsen. ISSN 1548-5552