Summary: This article provides a table with summary statistics for the thirteen usability laboratories described in the papers in this special issue. It also gives an introduction to the main uses of usability laboratories in usability engineering and surveys some of the issues related to practical use of user testing and CAUSE tools for computer-aided usability engineering.
Orignially published as: Nielsen, J. (1994). Usability laboratories. Behaviour & Information Technology 13, 1&2, 3-8.
Affiliation at time of writing: Bellcore (Bell Communications Research)
Comment added December 1996: This article was written in 1994 to summarize a special issue on usability laboratories I edited for the journal Behaviour & Information Technology. The special issue itself is well worth reading if you can get hold of it (many large technical libraries subscribe to the journal and should have the issue). The special issue on usability labs was published as a double issue: Behaviour & Information Technology, vol. 13, nos. 1-2, January-April 1994.
Usability is playing a steadily more important role in software development. This can be seen in many ways, including the growing budgets for usability engineering. In 1971 Shackel estimated that a reasonable share for usability budgets for non-military systems was about 3% (Shackel 1971). Later, in 1989, a study by Wasserman (1989) of several corporations found that "many leading companies" allocated about 4-6% of their research and development staff to interface design and usability work. Finally, in January 1993, I surveyed 31 development projects and found that the median share of their budgets allocated to usability engineering was 6% (Nielsen 1993). Thus, usability budgets have been steadily growing. Other indications of the added emphasis on usability is the increasing number of personal computer trade press magazines that include usability measures in their reviews and the overwhelming response to the call for papers for this special issue of Behaviour & Information Technology on usability laboratories
Assuming that a company has decided to improve the usability of its products, what should it do? Even though this is a special issue on usability laboratories, I am not sure that the answer is to build a usability lab right away. Even though usability laboratories are great resources for usability engineering groups, it is possible to get started with simpler usability methods that can be used immediately on current projects without having to wait for the lab to be built. See Nielsen (1993) for an overview of these methods, which are often referred to as "discount usability engineering." Almost always, the result of applying simple usability methods to current projects is a blinding insight that usability methods improve products substantially, making it hard to believe how anybody could ever develop user interfaces without usability engineering. Unfortunately, many people still do just that, and we need cheap and simple methods to enable them to overcome the initial hurdle of starting to use usability engineering.
Once a company has recognized the benefits of the systematic use of usability methods to improve its products, management often decides to make usability a permanent part of the development process and to establish resources to facilitate the use of usability methods by the various project teams. Two such typical usability resources are the usability group and the usability laboratory.
The staffing, management, and organizational placement of usability groups are important issues that must unfortunately be left unresolved here since virtually no research is available to resolve them. Suffice it to say that usability laboratories as a physical resource can be used by the usability groups almost no matter how they are organized. For example, one of the main schisms in the organizational placement of usability specialists is whether to centralize them in a single human factors department or to distribute them as specialized team members of the individual development projects. Some of the arguments in favor of centralized usability departments are that they are better at attracting talented usability staff because of their higher visibility; that they can nurture the special skills of usability specialists by providing an environment focused on usability issues where new techniques are discussed and developed; that they provide a clear management chain (with the ensuing career paths) for usability staff; that they can maintain corporate interface standards and serve as "interface police" to ensure consistency; and that they can take an objective view of the user interfaces they are evaluating since the interfaces come from outside departments. Some of the arguments in favor of distributed usability staff is that it is more satisfying for usability specialists to work for an extended time on a single product than to consult briefly on multiple products; that usability specialists on a product team are more likely to contribute to the design efforts throughout the lifecycle rather than just clean up the GUI; that some domains are so complicated that one needs to "live" with the product team for an extended period of time to be able to contribute; that usability specialists will only be taken seriously by developers if both groups are part of the same team; and that the communication channels will be shorter (leading to greater productivity) the less organizational distance there is between the produ cers and the consumers of usability knowledge. In spite of the clear distinction between the two types of organizations and the great need to know more about their relative advantages and disadvantages, no research results are currently available to assess what circumstances should lead one to prefer one organization of usability groups over the other.
Interestingly, even though a centralized usability group is the prime candidate to manage a usability lab, the existence of a corporate usability lab is not necessarily an argument in favor of a centralized usability department. It is possible to have a lab supported by a small number of dedicated support staff even while it is being used by usability specialists from a large number of distributed groups. As noted by Dayton, Tudor, and Root in their paper on Bellcore's user-centered-design support center in this issue, a shared usability lab can even serve as a gathering point to provide some of the cross-fertilization and educational benefits to distributed usability specialists that others get from a centralized department.
Usability laboratories are typically used for user testing as discussed in most papers and summarized in the next paper by Salzman and Rivers, "Smoke and mirrors: Setting the stage for a successful usability test." There is no doubt that user testing is the main justification for most usability laboratories and that user testing is one of the foundations of usability engineering. Once a usability lab is in place, however, it becomes a convenient resource for many other kinds of usability activities such as focus groups or task analysis. Palmiter, Lynch, Lewis, and Stempski discuss several such non-test forms of data collection in their paper on "Breaking away from the conventional usability lab," and Zirkler and Ballman describe ways of combining focus groups and traditional testing in their paper on "Usability testing in a competitive market." Usability laboratories are sometimes also used to record design sessions, though this is mostly done as part of research projects and not as part of practical development projects. One exception is the use of participatory design sessions using methods like PICTIVE (Muller, Wildman, and White 1993) where several users and designers sketch out interface designs using bits of colored paper that is moved around on a desk. With this design technique, much of the design information is never written down, so capturing the dynamics of the design session on video can provide a valuable record for later reference.
Usability laboratories can also be used for certain variants of heuristic evaluation (Nielsen 1994). Heuristic evaluation is based on having usability specialists or other evaluators inspect an interface to find usability problems that violate a list of established usability principles (the "heuristics") and does not normally require a special laboratory since the evaluators can work anywhere and are normally supposed to do so individually. Sometimes, however, one wants to have an observer present to log the usability problems discovered by the evaluators so that they do not need to spend time and effort on writing up a report. It may be valuable for this observer to have access to a video record of the evaluation session, and it may also sometimes be advantageous to have developers or other project representatives observe a heuristic evaluation session from behind the one-way mirror in a usability lab. In general, though, only a small minority of heuristic evaluation sessions take place in usability laboratories.
|Company Name||Main Product||Other Labs in Company?||Date of First Usability Lab in Company||Floor Space of Typical Subject Room in Sq. Meters||Floor Space of Total Lab Area in Sq. Meters||Number of Rooms (Subject Rooms, Control Rooms, etc.)||Number of Cameras in Typical Subject Room||Scan Converter Used to Directly Tape Screen Image?||One-Way Mirror?||Usability Staff Supporting vs. Utilizing Lab|
|Ameritech||Communications service||No||1989||12.5||237||7||2||Yes||Yes||1 / 10|
|Bellcore||Telco software||Yes||1985||12.3||121||7||2||No||Yes||0.3 / 30|
|BT (British Telecom)||Telephone service||No||1988||40||96||3||3||Yes||Yes||0.5 / 70|
|IBM||Computer systems||Yes||1981||11.7||165||14||2||No||Yes||0.1 / 4|
|MAYA Design Group||Design consultants||No||1990||8.8||42.9||3||2||No||Yes||3 / 12|
|Microsoft Corp.||PC software||No||1989||10.8||181.3||19||2||Yes||Yes||4 / 22|
|NCR||Computer systems||Yes||1966||13.4||31.2||3||2||Yes||Yes||2 / 15|
|Philips, Corp. Design||Consumer electronics||Yes||1990||30||40||2||3||No||Yes||0.25 / 10|
|Philips, IPO||Consumer electronics||Yes||1990||9||35||3||2||No||No||1 / 25|
|SAP||Enterprise business apps||No||1992||37||63||2||1||No||Yes||2 / 12|
|SunSoft||Workstation software||No||1988||25.1||202.3||8||3||Yes||Yes||3.5 / 8|
|Symantec||PC software||No||1992||23.8||47.6||2||2||Yes||Yes||1 / 1|
|Taligent||PC operating systems||No||1992||13.4||26.8||2||2||No||Yes||1 / 2|
|Mean||38%||1987||19.1||99.2||5.8||2.2||46%||92%||1.5 / 17.0|
|Median||No||1989||13.4||63.8||3||2||No||Yes||1 / 12|
The papers in this special issue describe thirteen usability laboratories that are summarized in Table 1. As noted in the table, 38% of the companies have other usability laboratories that are not represented in the table, so it should only be seen as representing a survey of usability labs and not as a complete listing of labs. It can be seen from the table that the defining characteristics of a usability laboratory seem to be video cameras (used in all the labs) and the one-way mirror (installed in 92% of the labs). The table also shows that usability laboratories are a fairly recent phenomenon, with the median year of the first usability lab in these companies being 1989. Of course, some companies have had usability laboratories for a long time. Also, we should note that companies in industries like aerospace and control room design have employed usability laboratories for many years even though they are not represented in the table. The papers in this issue are mostly from the computer and telecommunications industries, though de Vries, van Gelderen, and Brigham write about usability labs for consumer products at Philips. In general, the lessons described in the papers in this issue may apply most directly to the usability of information technology, but there is no reason to believe that most of the same methods could not be used for other types of products and services.
The table shows that the median usability laboratory has three rooms. Normally, the room distribution is a room for the test subject(s), a control room for the experimenter and other usability specialists involved in running the experiment and operating the recording and logging tools, and an "executive observation lounge" where additional staff can observe the test without interfering with either the subject or the experimenters. Sometimes additional rooms are used for waiting areas for subjects, and the larger labs also often have special video editing rooms to avoid occupying an entire test suite by using the control room facilities for editing rather than testing. The observers in the executive observation lounge may sometimes in fact be executives, but more commonly they are members of the development team who take the opportunity of the user test to get exposure to the users. Even though the video tape equipment in the labs are often used to produce very communicative highlight tapes of the most notable usability problems and colorful user utterances, there is still no substitute for observing a test live.
It can be seen from the table that some usability labs have a very large number of rooms. Often, this only means that multiple tests can be conducted in parallel, but sometimes larger facilities are used for experiments in computer-supported cooperative work, video conferencing, and other cases where multiple users have to be combined for a single test. Lund's paper on Ameritech's usability laboratory discusses this issue in further depth.
At this time of writing, only slightly less than half of the laboratories surveyed in the table used scan converters to make it possible to tape the computer screen image directly without having to point a camera at it. Scan converters have been somewhat expensive, but since they are dropping in price, their use can be expected to be more widespread in the future.
One of the major advantages of having a usability laboratory is that the incremental hurdle for user testing of a new product becomes fairly small since all the equipment is already in place in a dedicated room that is available for testing. This effect is important because of the compressed development schedules that often leave little time for delay. Thus, if usability testing can be done now and with minimal overhead, it will get done. Similarly, usability may get left out of a project if there is too much delay or effort involved before results become available. Because of this phenomenon, the support staff form a very important part of a usability laboratory in terms of keeping it up and running, stocked with supplies, and taking care of the practical details of recruiting and scheduling test users. In my opinion, the ratio of one support person to twelve usability specialists running tests that is shown as the median in Table 1 is actually too small. I believe that a higher number of support staff are well worth their cost in terms of more efficient usability work (which again leads to a larger amount of usability work being done).
The building of a usability laboratory involves a myriad of decisions and trade-offs. Most of the papers in this special issue touch upon several such issues, and the papers by Sazagari, Rohn, Uyeda, and Neugebauer and Spielmann are particularly detailed. The papers by Lucas and Fisher; Dayton, Tudor, and Root; Lund; and Blatt, Jacobson, and Miller take the detailed needs analysis one step further and describe how they redesigned their second-generation usability laboratories based on experience with older labs and using trusted user-centered design principles to find out what usability specialists really need in their lab.
A usability laboratory does not necessarily need to be a fixed facility in a given set of rooms constructed for the purpose. Szczur describes how she used a regular conference room at NASA as a low-budget usability lab by using part of the room for the subjects and part of the room for the observers. Several papers describe "portable usability labs" with video kits and other data logging equipment that can be brought to the field to allow testing, and Zirkler and Ballman emphasize the need to visit customer sites when assessing the usability of systems like their specialized databases for users in the legal and financial communities. Smilowitz, Darnell, and Benson compare standard usability testing in the lab with Beta testing done by having customers test the software at their own sites and report usability problems on their own without supervision by a usability specialist. Even though the Beta tests did have some limitations (notably, that they were restricted to the very last stages of the development lifecycle), they did provide a cheap source of additional usability data that should be considered as a supplement to the lab-based sources.
In addition to the building and equipment of the usability laboratory, an important issue is obviously the actual methods used in running experiments in the lab. Many papers discuss methodology issues, and Fath, Mann, and Holzman provide detailed coverage of five main phases used in evaluation sessions in IBM's Atlanta usability laboratory: designing the evaluation, preparing for it, conducting it, analyzing the data, and reporting the results. One of their conclusions is that as much of the data reduction process as possible should be automated. Usability testing generates huge masses of raw data, and any meaningful conclusions have to be based on analyses that summarize the test events in a comprehensible manner. Hoiem and Sullivan provide detailed information about the integrated set of usability data collection and analysis tools built for Microsoft's lab. In general, it is characteristic for the state of the art that almost all CAUSE tools are homemade for the individual labs. There are probably two reasons for this: First, we still do not know enough about what computerized tools usability professionals actually need, and we know even less about how to make such tools sufficiently usable and efficient. Second, the market is still fairly small, though it is constantly growing and may one day support a wider variety of commercial offerings.
A classic form of CAUSE tool used in many usability labs is the event logger, which is used by an observer to record occurrences of events of interest as the user progresses through the experiment. Typically, events are automatically timestamped (and often linked to a videotape record of the event), and the logger also records the type of event as classified by the human observer in real time -- possibly with a supplementary natural language annotation. In addition to human-coded event logs, usability labs often produce keystroke logs of all user input and activity logs of higher-level dialogue actions (like menu selections, error messages, etc.). These logs can be subjected to many different kinds of analysis, including the sequential data analysis techniques discussed in the paper by Coumo.
Much user testing is simply aimed at generating qualitative insights that are communicated through lists of usability problems and highlights videos showing striking cases of user frustration. Such insights are sufficient for most practical usability engineering applications where the goal is the improvement of a user interface through iterative design. Often, there is no real reason to expend resources of gathering statistically significant data on a user interface that is known to contain major usability problems and has to be changed anyway. By using a probabilistic model of the finding of usability problems and an economic model of project management, Nielsen and Landauer (1993) found that one often gets the optimal cost-benefit ratio by using between three and five test users for each round of user testing.
User testing with the goal of learning about the design to improve its next iteration falls in the category of formative evaluation. Sometimes, one wants to do summative evaluation that is quantitative in nature and results in numbers that can be compared across products. One case where summative evaluation is desired is the comparative product reviews produced by personal computer magazines like the ones discussed in Bawa's paper. Another application of competitive testing is the selection of recommended (or prescribed) software for a big company where the benefits in terms of reduced training and support and increased productivity are sufficiently large to warrant the investment of considerable resources on choosing the most usable product from the available offerings on the market. To my knowledge, not many user organizations currently perform such competitive usability testing before major software purchases, though there are a few that do. Also, it is becoming fairly common for software vendors to commission third-party usability consultants to perform comparative studies and then advertise the results if their own product wins. Finally, summative evaluation is sometimes used for regular software development projects when one wants to investigate whether a revised product has achieved a sufficient improvement in usability over the previous version. The paper by Bevan and Macleod describes a comprehensive set of measurement tools and principles developed as part of the ESPRIT MUSiC project on usability metrics.
Even though usability metrics can be very elaborate (for example, Bevan and Macleod describe the measurement of heart rate variability as a way of assessing the user's mental effort over time), it is also possible to have a "discount usability engineering" approach to usability metrics. As an example, Molich's paper presents a technique for keeping track of the number of "user interface disasters" (certain kinds of serious usability problems experienced by at least two users) as a simple, yet quantifiable measure of interface quality.
As shown by the contrast between the full-blown usability metrics and the simplistic count of interface disasters, there are many different possible approaches to usability engineering (Nielsen 1993). It is important to realize that it is possible to achieve major improvements in usability even if one does not utilize all the most advanced techniques and even if one has a fairly primitive usability lab (or sets up a temporary lab in a conference room as described by Szczur). The single-most important decision in usability engineering is simply to do it! The best intentions of some day building the perfect lab will result in exactly zero improvement in current products, and if the choice is between perfection or doing nothing, nothing will win every time. Luckily, it is possible to start small and then grow a usability effort over time as management discovers the huge benefits one normally gets (Ehrlich and Rohn 1994). It is in the nature of things that this special issue mostly has papers on leading usability laboratories from companies that have a better-than-average approach to usability engineering. This does not mean that people in less fortunate companies should abandon usability, it only means that they have something to strive for as they start small and gradually expand their usability groups and usability labs.
The main review burden in selecting the papers was shouldered by the following reviewers who had been especially selected for this special issue: Tomas Berns (Nomos Management AB), Linda I. Borghesani (The MITRE Corporation), Joseph S. Dumas (American Institutes for Research), Tom G. Gough (University of Leeds), Lovie Ann Melkus (IBM Consulting Group), James R. Miller (Apple Computer, Inc.), Michele Morris (BNR Europe Ltd.), Kenneth Nemire (Interface Technologies), Markku I. Nurminen (University of Turku), Diane J. Schiano (Interval Research Corporation), Sylvia Sheppard (NASA Goddard Space Flight Center), Nancy Storch (Lawrence Livermore National Laboratory), Paulus-Hubert Vossen (Fraunhofer-Institut IAO), and David Ziedman (Philips Interactive Media of America). Additional ad-hoc reviews were provided by: Rita M. Bush (Bellcore), Tom Dayton (Bellcore), Arye R. Ephrath (Bellcore), Richard D. Herring (Bellcore), Arnold M. Lund (Ameritech), Michael J. Muller (U S WEST Advanced Technologies), and Estela M. Tice (Bellcore).
- Bias, R. G. and Mayhew, D. J. (Eds.) 1994, Cost-Justifying Usability (Academic Press, Boston, MA).
- Ehrlich, K. and Rohn, J. 1994, Cost-justification of usability engineering: A vendor's perspective, in Bias, R.G., and Mayhew, D.J. (Eds.), Cost-Justifying Usability (Academic Press, Boston, MA).
- Muller, M. J., Wildman, D. M., and White, E. A. 1993, "Equal opportunity" PD using PICTIVE. Communications of the ACM 36, 4, 64-66.
- Nielsen, J. 1993, Usability Engineering (Academic Press, Boston, MA).
- Nielsen, J. 1994, Heuristic evaluation, in Nielsen, J., and Mack, R. L. (Eds.), Usability Inspection Methods. (John Wiley & Sons, New York, NY), 25-64.
- Nielsen, J. and Landauer, T. K. 1993, A mathematical model of the finding of usability problems, Proceedings of the ACM INTERCHI'93 Conference (Amsterdam, the Netherlands, April 24-29), 206-213.
- Shackel, B. 1971, Human factors in the P.L.A. meat handling automation scheme. A case study and some conclusions. International Journal of Production Research 9, 1, 95-121.
- Wasserman, A. S. 1989, Redesigning Xerox: A design strategy based on operability. In Klemmer, E. T. (Ed.), Ergonomics: Harness the Power of Human Factors in Your Business, Ablex, Norwood, NJ. 7-44.