Summary: A study of the benefits of big monitors fails on two accounts: it didn't test realistic tasks, and it didn't test realistic use. Productivity is a key argument for workplace usability, but you must measure it carefully.
In my column on how to design websites for ever-bigger screens, I mentioned that Apple had published a study of the productivity impact of big monitors. I didn't believe in Apple's methodology, so I didn't discuss the study further, but — since it has now gotten significant press coverage -- I'll remedy this deficiency.
A prominent article about Apple's study reports, for example, that "cutting and pasting cells from Excel spreadsheets resulted in a 51.31% productivity gain — a task that took 20.7 seconds on the larger monitor versus 42.6 seconds on the smaller screen."
First, let me note that reducing task time from 42.6 seconds to 20.7 seconds is actually a productivity gain of 105%, not 51%. Productivity is measured by how much value a worker produces per hour. With a small-screen task-time of 42.6 seconds, users can cut-paste 85 times in an hour, whereas with a large-screen task time of 20.7 seconds, they can cut-paste 174 times in an hour. In other words, the user's output increases from 85 to 174, meaning that big-screen users paste 105% more cells into their spreadsheet for each hour worked.
(As an analogy, assume that General Motors improved a factory so that 174 cars — rather than their standard 82 — rolled off assembly lines every hour. In that case, we'd say that productivity had improved by 105%, because the company would be getting more than twice as much output from the same number of workers.)
However, it doesn't matter what the exact number is, because it's irrelevant . Measurement studies are tricky to get right, and this study was wrong in so many ways that its numbers are meaningless.
Operations vs. Tasks
Apple's study focused at the wrong level of work. Pasting spreadsheet cells is not a user task, it's an operation at a low interaction level. More meaningful productivity has to be measured at a higher level, where users string together a sequence of operations to achieve their real-world goals.
With spreadsheets, for example, one of my recent tasks was to update a conference budget to reflect the option of adding another day of seminars. Such a task might well involve operations in which users would identify the cells containing an existing seminar day's expenses; copy these cells; paste them into a new day's area; and update the new cells to reflect the differences between the two days.
True worker productivity in this example would be determined by how quickly users could arrive at the new budget. Interestingly, a bigger screen would benefit many of the task's other operations. For example, it's faster to identify a big budget's relevant elements when you can see all of them at once. It's also faster to compare two potential budgets if you can see both of them together in one view. I don't question that bigger monitors are better, I'm simply pointing out that we can't trust Apple's study to estimate the magnitude of the benefits.
The distinction between operations and tasks is important in application design because the goal is to optimize the user interface for task performance, rather than sub-optimize it for individual operations. For example, Judy Olson and Erik Nilsen wrote a classic paper comparing two user interfaces for large data tables. One interface offered many more features for table manipulation and each feature decreased task-performance time in specific circumstances. The other design lacked these optimized features and was thus slower to operate under the specific conditions addressed by the first design's special features.
So, which of these two designs was faster to use? The one with the fewest features. For each operation, the planning time was 2.9 seconds in the stripped-down design and 4.6 seconds in the feature-rich design. With more choices, it takes more time to make a decision on which one to use. The extra 1.7 seconds required to consider the richer feature set consumed more time than users saved by executing faster operations.
Skilled Performance vs. Realistic Use
A second problem in Apple's study is that it looked at highly skilled users performing rote, low-level operations that they'd trained on repeatedly until they got them exactly right (i.e., had error-free performance). This, of course, is not how most people use computers in real life.
You don't sit and paste the same spreadsheet cells over and over again according to a script. Instead, you focus on achieving your goals and meander among computer features when they seem necessary for task performance. While doing this, you spend a lot of time trying to decide what to do (as discussed in the previous section). Sadly, you also spend a lot of time committing and recovering from errors.
Optimizing skilled performance does have a role in usability, but it's limited.
Skilled performance almost never happens on the Web, because users constantly encounter new pages; that is, they spend most of their time pondering options and trying to understand the content that's being presented. This is why most websites should lay off the fancy drag-and-drop features and focus on the simplest possible interaction techniques that are common to all sites. If your site works the way people are used to working, they can concentrate on your content.
A new feature might let people perform a certain operation faster on your site. Typically, however, the savings are not worth it: people only realize the savings once or twice, but waste significant time trying to understand the feature. Designing an advanced feature to expedite low-level interactions is only worthwhile when users will repeatedly perform the target operation on your site.
Even in applications, skilled performance is rare because modern office workers typically move between many different tasks and screens. The main places we've encountered skilled performance is in call centers and other jobs where workers have a small number of tasks that they perform repeatedly.
In most cases, it's better to design for intermittent use and for people who have to feel their way through the user interface.
A Realistic Productivity Example
Here's how to estimate productivity improvements:
- Involve a broad spectrum of representative users (not just experts).
- Have the users perform representative tasks (not just a few low-level operations).
- Don't tell users how to do the tasks; observe their real behavior.
Frequent readers of my column will notice that these are pretty much the standard rules for basic usability work.
Unfortunately, the last good study I know of that assessed the productivity impact of monitor size was done when 640x480 was a big screen. So, I'll give you an example of productivity calculations from a different study.
In our testing of intranet usability, we measured how quickly employees can perform a wide range of everyday tasks using many different companies' intranets. One of these tasks was to find the head of a group or department. On the best 25% of intranets (i.e., those with usability in the best quartile), employees performed this task in 1 minute, 37 seconds on average. In contrast, employees of companies with the worst 25% of intranets required an average of 3 minutes, 59 seconds to perform the same task.
Note that we didn't measure how fast people could navigate to a certain page, or how fast they could type a name into a search engine. Users might employ different interaction strategies in intranets with different features. The question was: How fast could they perform a real-world task — that is, identify a particular manager? It's a classic mistake for people who are new to usability to test individual system features as opposed to higher-level tasks that users want to perform. Features exist to support tasks; it's no good if a feature is highly usable but doesn't work when people are trying to achieve a real goal.
Knowing the performance difference between good and bad intranet design, we can now compute the productivity impact. In the case of finding a manager, let's assume that employees perform this task about once per week, or 50 times per year. Given this, they'd spend 1.3 hours a year doing the task on a well-designed intranet and 3.3 hours a year doing it on a poorly designed intranet.
To convert time into money, let's assume that the average employee makes $40,000 per year and that the company's overhead amounts to an additional 50% on top of this salary. With these assumptions, an employee costs the company $30.77 per hour. So, the cost of having an employee find various company managers comes to $41 per year on an intranet with good usability and $102 per year on an intranet with bad usability.
For this specific example, we could improve productivity by 149% if we redesigned an intranet from the bottom 25% to achieve the same usability we found among the 25% best intranets. This sounds really good. On the other hand, the cost savings would only come to $61 per employee per year. This hardly sounds worth doing.
Of course, for a company with 10,000 employees, saving $61 per employee per year equates to $610,000 per year — or more than a million dollars over the typical 2–3-year interval between intranet redesigns (even when discounting future years' cash flow). This is more than enough to pay for the necessary usability and redesign work to improve the intranet.
Quantifying Overall Productivity
The real way to use our data is not to look at a single task, as I did here. Instead, if you want to estimate potential productivity improvements for your intranet, you should measure all of the intranet tasks and assess their productivity relative to the designs we documented in our detailed report. If you think a redesign would let you move up a certain degree in usability, you could then see what that level's user productivity would be for all of the tasks. Finally, you'd compute the difference between the target productivity and your current measurements to get an estimate of how much money you'd save by improving the intranet.
Of course, instead of estimating, it's better to have true before/after metrics, but you won't get those until after you've launched the redesign. So, you first need to collect benchmark productivity numbers for the old design, and then run the same study on your new design. I know of several big companies that have done this, and they ultimately documented immense monetary savings.
Productivity is one of usability's most important elements and I wish more people would measure their design's productivity impact. To be meaningful, however, they must do it correctly, which means testing real users performing real tasks.