Generally, I advocate qualitative user testing: a handful of users is enough to discover most design flaws. Quantitative testing does have its place, however, and we've recently been running large tests for two different reasons:
- We're testing hundreds of users to generate in-depth findings for our eyetracking research. To determine whether men and women look at Web pages differently, for example, we have to test many people using the same websites. This is because users wander all over sites, and we need to test numerous men and women to get a sufficient number of samples for each page.
- We're running large-scale usability benchmarks for several clients so they can track their design improvements over time. These studies are really expensive and not recommended for small projects. For big projects, however, they're a good long-term management tool.
With all this fresh data at my disposal, I couldn't resist analyzing 1,520 measurements of user time-on-task performance for websites and intranets.
Does Usability Follow a Normal Distribution?
Almost all statistical analyses assume that data follows a normal distribution (the famous bell curve). Most people take this on faith, because it's true for so many phenomena. But let's check.
One way of assessing a dataset's distribution is to draw a quantile-quantile scatterplot. In a QQ plot, we plot each observation's empirical value on the x -axis and its hypothetical value on the y -axis, under the assumption that the entire set is normally distributed. We draw a straight line to represent a case with identical empirical and hypothetical values.
If our plotted datapoints are very close to the straight line, we conclude that the empirical values are very close to the hypothetical values. In other words, the observed data are the same as what the theory predicted, so the dataset follows a normal distribution.
Any datapoints that are far from the straight line represent cases in which the real and theoretical worlds differ substantially — in other words, the data doesn't follow the normal distribution.
I've plotted seventy QQ plots from our recent quantitative usability studies, and they all look the same, whether they come from website or intranet studies. Here are two typical examples:
QQ plots of two user studies: a test of a content-based magazine site (New York Magazine, on the left) and a test of a transaction-based e-commerce site (Kiehl's, on the right).
Each dot represents the task time of one user. The x-axis indicates measured performance and the y-axis indicates the theoretically matching normal distribution.
(Note: Because my analysis didn't include users who failed at tasks, the diagrams show only people who used the sites successfully. All seventy studies measured time on task — see earlier article on the definition of usability for other main quality attributes.)
Although the dots aren't exactly on the straight line, they're pretty close. There are a few outliers, but it seems safe to conclude that most users do follow a normal distribution. Close enough for government work — or more to the point, close enough for any analysis you need in a practical development project.
Outliers for Fast Performance
Outliers in the lower left corner of each QQ plot are shown as solid blue dots. These are users who were fast, but not as fast as the theory predicted. In fact, in the left-hand QQ plot, two dots are below the x-axis, indicating negative y values. The theory predicts that these two users would have finished their task before they started, which is obviously impossible.
In usability testing, there's a clear floor effect for measured task times: people simply can't be faster than a certain minimum, no matter how efficiently they use a site. Downloading pages and moving your hand between mouse and keyboard require a certain amount of time. Even the fastest typists still need time to type in search engine queries; the fastest readers still need time to read, regardless of how quickly they can find the salient information on a page.
All the studies I've analyzed included a few fast outliers. These fast (but not quite fast enough) users are easy to explain, however, and I don't think they should impact our thinking about Web usability.
Outliers for Slow Performance
Outliers in the upper right corner are shown as solid red dots. These are users who were dramatically slower than the slowest predicted users.
Of 1,520 cases, eighty-seven were outliers with exceedingly slow task times. This means that 6% of users are slow outliers. This is too many people to ignore. Of course, you should first and foremost improve the user experience for the 94% of users who are not outliers, but it's worth allocating some usability resources to that slow 6% as well.
The most seemingly obvious explanation for these outliers is simply that a few people are almost incompetent at using the Web, and they'll show up as slow outliers every time they test something. But this hypothesis is false. Once we recruit people for a study, we ask them to do multiple things, so we know how the slow outliers perform on several other tasks. In general, the same users who were extremely slow on some tasks were fast on other tasks.
Sixty different users were responsible for the eighty-seven slow outliers, for an average of 1.5 outliers each. Given that users were tested on an average of 6.7 tasks across the analyzed studies, each of these users had an average of 5.2 "normal" tasks — 3.5 times as many as their outlying tasks.
This topic clearly needs more research, and would make for several good graduate theses. For now, my best conclusion is that slow outliers are caused by bad luck rather than by a persistent property of the users in question.
Good Luck in User Performance
Before turning to bad luck, let's acknowledge that good luck also happens on the Web. People are sometimes "undeservedly" lucky on a website and get exactly what they want in fewer clicks than expected. Maybe, for example, they're looking to buy something that happens to be the homepage's featured promotion that day. In other cases, some users happily skirt gross usability mistakes that cause other people grave difficulties and much frustration.
Here's an example of good luck from a test with disabled users trying to use the website of the IRS (the U.S. tax authorities). One blind user wanted to find out whether she could deduct money donated to a high school band.
Because the IRS page was long and overwhelming, the user decided to have her screen reader device read out the list of links on the page. Further, because the user was looking for tax rules about "donations" she commanded the screen reader to read links that started with a "D." As it turns out, the IRS uses the term "deduction" rather than "donation" — something the user would never discover from a simple page or site search using the word "donation." However, because both words start with "D" and the person was using a screen reader, she easily happened upon "deduction" as the correct link. A joyful outcome, but one that's purely due to good luck.
(There are a few additional usability notes here. First, by using the term "deduction" rather than "donation," the site opts for system-oriented language over a term describing the user's action, which the site is presumably supposed to support. Second, using the screen reader shortcut is an expert behavior; you shouldn't use it as an excuse for long pages, which hurt less-experienced screen reader users. Finally, the "read links" feature is one of the reasons it's a guideline to avoid links with labels such as "click here" or "more," which don't make sense out of context.)
Bad Luck in User Performance
Most website and intranet users are all too familiar with bad luck. Typical examples include:
- Clicking the wrong link and being lost forever in the wrong part of the site.
- Using the wrong words. In contrast to the "good luck" example above, users can waste significant time scouring a site for a term that the site doesn't use and doesn't cross-reference to its own preferred term.
- Companies with multiple websites often bounce users off to the wrong site, but users don't realize the error.
- The link or information the user needs is scrolled just off the screen and so the user never sees it. (I discuss additional scrolling problems, including recent findings on how many users scroll under which conditions, in my seminar on Fundamental Guidelines for Web Usability.)
- A pop-up distracts users just as they were about to get it right.
- Registration hiccups take users on detours long enough that their attempts to buy stuff are doomed, even when they've successfully found what they wanted and placed it safely in the shopping cart.
- Multiple small problems — any one of which could be easily handled in isolation — occur in a row and thus derail users.
Of course, none of these issues are really "luck" in the superstitious sense of something "unnatural" happening. In fact, they're all small, but real defects in the design's usability. What qualifies these flaws as bad luck is that, under rare circumstances, they condemn users to terrible misfortune. If things had gone a tiny bit differently — say, a user had scrolled down one line further — he or she might have had good luck and a very pleasant user experience.
Given that slow outliers account for 6% of Web usage, it's unacceptable to simply write them off. Although the data shows that most users will avoid bad luck in their next online task, you can't just say "better luck next time"; if you do, their next user experience will likely be on somebody else's website.
People leave websites that hurt them — they don't know that it's just bad luck, and that next time will be better. It's therefore incumbent on you to hunt down the root causes of bad luck and eradicate them from your site.