Redmond, WA, 21-23 July 1993
The second annual UPA conference sponsored by the Usability Professionals Association was hosted by Microsoft on their well-groomed corporate campus outside of Seattle. Having the conference at an actual member company site continued the tradition started by the first conference which was hosted by WordPerfect in Utah. I highly appreciated the opportunity to see the environment in which so much software is produced and where the everyday activities of the usability professionals take place.
The UPA conference had grown to 374 participants this year from about 140 last year, so if the growth rate keeps up even one or two more years, it will soon become impossible for the conference to be hosted at a single company. Anyway, those of us who have had the experience these last two years have benefited tremendously from the hospitality of the host organizations. In addition to taking in the general ambiance and the ubiquitous fridges stocked with free soft drinks, the conference benefited from a usability lab tour. In fact, only a few rooms were included on the tour since Microsoft has an extensive facility spread over two buildings.
Compared with the WordPerfect facility we toured last year, the individual rooms in Microsoft's usability lab were smaller but there were more of them. In general, the impression was that of a facility set up for large-scale testing of many products with an attempt to minimize overhead. For example, the control room could be run by a single person who combined the roles of video tape producer, experimenter, and event logger. This was possible because the video taping mostly relied on fixed camera positions that could be determined in advance of the test instead of being changed during the user's work with the system. Event logging was done by a home-made software tool that allowed the observer to select encodings of the user's activities from dynamically expanding hierarchical menus. These codes were synchronized with the video record, allowing the retrieval of video clips of specific events even though the video was currently being kept on analog video tapes instead of being directly integrated with the test data in a multimedia database. It is typical of the state of the art in the CAUSE field (CAUSE = computer-aided usability engineering) that this logging tool was home-made for this specific laboratory rather than being a generally available system.
The conference was very rich in terms of topics and insights, but there were two issues that struck me as being prevalent in several of the presentations: the need for improved communication between usability evaluation and system design, and the advantages of faster and cheaper usability testing as well as other "discount usability engineering" methods aimed at getting usability results fed back to the design cycle in a matter of days rather than weeks or months.
For many years, usability professionals have been complaining that developers and system designers don't pay sufficient attention to usability. Jared Spool from User Interface Engineering (a consulting company) addressed this issue in the opening plenary, saying that developers were not inherent evil, so if they did not understand us, it was probably our fault for not communicating clearly enough. In other words, he suggested that we might be wise to take some of our own medicine and consider the developers as the "users" of usability engineering results, remembering our standard premise that if the users don't understand something is not because they are stupid but because the system is wrong.
The main approach to improving communication with developers seem to be to involve them directly in usability engineering activities. For example, Michael Kahn and colleagues from Hewlett-Packard had designed a new usability inspection method that was mostly based on involving developers as the interface evaluators. One major reason for the success of usability inspections at HP was the ability to piggyback onto the existing methods for code inspection and other forms of inspection that were already widely used and respected in the company. One more form of inspection was seen as reasonable and as a comprehensible activity which developers and managers were motivated to undertake. For this very reason, HP used the term "usability inspection" combined with standard terminology from code inspection in order to allow participants to feel on familiar ground. Within the usability engineering field, however, the term "usability inspection" is used as a generic term to encompass methods like heuristic evaluation, cognitive walkthroughs, standards inspection, etc., so I prefer using the term "formal usability inspections" for the HP method to highlight its differences from many other methods.
Formal usability inspections rely heavily on the use of a separate moderator to manage the inspection meetings and to plan them in advance. Basically, the inspectors are given instructions prepared by the moderator and the "owner" of the interface with profiles of the main user categories and descriptions of some typical task scenarios. The inspectors then work through the scenarios on their own, noting usability problems at each step. They do so by applying a combination of several other methods, including a set of heuristics like those used in heuristic evaluation combined with a user goal-action-feedback model that can be viewed as a simplified version of cognitive walkthroughs. Finally, the inspectors meet with the moderator and combine their lists of usability problems. This last stage is a major educational experience for many of the developers who participate as inspectors, since they are exposed to a large number of usability problems which they had not found themselves. Thus, in addition to achieving an enhanced list of usability problems, the coordination meeting serves to emphasize the phenomenon that different people find different usability problems and that your own design intuitions are insufficient to cover the ground. A typical formal inspection of a mid-sized project takes about 142 staff-hours, which is somewhat more than the time needed for a heuristic evaluation which may only take on the order of 40-80 hours due to its less formal nature. Even so, the more formal approach was appreciated by the engineers.
As another example of involving developers to increase communication, Mary Beth Butler (now Rettger) from Lotus (now MathWorks) reported on a team method for categorizing usability problems. During user testing, she had some members of the development team sit in the background and note the usability problems they observed on index cards. Each developer would observe a few tests, giving as many developers as possible the chance to participate. The day after the tests, the usability staff and the participating developers would meet for a 90 minute debriefing session where all the index cards with usability problems were pasted onto the walls of a meeting room. Everybody would then move cards around until reasonable categories of usability problems emerged, after which the problem categories would be rated for severity. After the meeting, the usability specialist would write up a report with the categorized usability problems and any other data from the user test for distribution the following day.
By using this method, Lotus had succeeded in cutting the turnaround time for usability reports from about two weeks to two days, with some information getting to the development team the very day after the user test. In general, a major trend at the conference was the need to speed up usability work in order to impact design while it is happening. Kay Chalupnik from IDS Financial Services (an American Express company) mentioned that they needed instant evaluation turnaround to enable them to fax home the same night what was learned on a field study.
Since 1988, I myself have advocated cutting user testing to about 3-5 subjects per test (and then test more iterations). Even though one company said that they needed six subjects, the general feeling at the conference was that studies with larger numbers of test users were becoming rare. This approach was confirmed by Claus Neugebauer from SAP in Germany where they also used 3-5 users, since additional users only added about 15% to what they learned with three. Neugebauer also discussed several ways of reducing the turnaround time from the beginning of a usability study to the delivery of the final results. Most of the cuts stemmed from the notion of constancy in usability testing since they had many products that were similar in many ways. By always using the same questionnaires and other test instruments as well as holding the task outlines (if not the detailed tasks) constant, the preparation time for a study had been reduced from two weeks to two days, and data analysis had been cut from five to three days due to the ability to reuse templates in statistics and business graphics programs.
A further reason to speed up the usability process is to increase the amount of actual usability data that is gathered. I have become concerned that usability specialists spend much more time writing reports, going to meetings, and such than they spend on actually creating new usability information. We might use the term "exposure hours" to denote the time spent with the users in user testing (but not setting up the test), inspecting the interface in heuristic evaluation (but not polishing the report), observing real users working in field studies (but not traveling to the customer site or talking with the users' managers), etc. From personal experience and from talking with people at the conference, it may well turn out that most usability specialists get exposure hours that amount to no more than 10% of their working day. Better communication mechanisms and more efficient usability methods are needed to increase this percentage. Unfortunately, one problem with smaller studies and faster data reporting that was mentioned in questions is the difficulty of communicating usability results to other projects and people working on later releases several months or years later. A possible answer to these concerns might be an integrated CAUSE system for computer-aided usability engineering to handle design rationale hypertexts linking to usability test data that was collected automatically (or with minimal overhead) as well as other low-overhead reports that could make sense if they were given more context. Unfortunately, almost no research is underway on such environments.
The conference was also filled with practical advice and variations on common usability engineering issues. For example, Monty Hammontree from Sun presented a way to involve widely scattered users like system administrators in user testing without having to travel to their site. Instead of a traditional usability laboratory, Hammontree used a "distributed usability lab" running over a computer network, where an exact copy of the user's screen was slaved to the experimenter's screen and the user's comments were relayed by computerized audioconferencing. Nadine Nayak from HP mentioned that she had used a similar method to test international usability with users in Germany without leaving the U.S. (though she used a speakerphone rather than computerized voice transmission for the user's comments).
As a matter of fact, the difficulty of recruiting representative users for usability testing is so prevalent that a session was dedicated to a presentation by Liz Cowan from User Interface Engineering entitled "stalking the wily test subject." Out of the approximately 170 people in the room for that session, only 6 had a specialized person helping them with the job of finding and recruiting test subjects (I am happy to report that I was one of the 6, having a very competent person help me with that job). Again from the perspective of speeding up turnaround time for user testing, it really helps when one can just say "I need 5 subjects of this-and-this kind" instead of having to spend time coming up with a way of finding them. Microsoft had a database of 3,000 local users with various levels of experience in their products who were willing to come in for user testing, and normally each user was used a maximum of three times to avoid getting "professional test users." Cowan mentioned that the most powerful motivator for recruiting subjects often was the promise of exposure to cool technology in their field rather than cash payment though one should normally give the users something (either money or a T-shirt) to show good faith. Usually, Cowan needed slightly more than a week to find a set of inexperienced users for a test and as much as three weeks when specific kinds of experience was needed (e.g., experience running a special kind of printing press for color jobs). Even though managers were often inappropriate as test subjects for software intended for their staff, Cowan found that it was often a shortcut to convince a manager that the test was valuable since that could make it easy to get several subjects from that manager's staff. (See also later results from two hundred companies' experience recruiting subjects for user tests)
In general, the conference was highly oriented towards practitioners with only a few researchers in evidence (though maybe they were just hiding since it was mentioned at the opening plenary that people without Ph.D.'s were often better at usability engineering). An interesting phenomenon was the presence of participants from at least seven major popular computer magazines (MacUser, MacWorld, PC/Computing, PC Magazine U.K., PC Week, PC World, Windows Magazine), many of which had multiple participants at the conference. Several of these magazines have recently started including usability test results or other usability engineering data in their review articles, and the rest are presumably thinking about doing so. PC/Computing has even instituted a "PC/Computing Usability Seal of Approval" awarded to products passing a specified level of usability on several quality criteria. One session was dedicated to discussing this trend based on a talk on PC World's experience entitled "usability makes the difference to 4 million buyers"-referring to the number of readers who make their software purchasing decisions based on what they read in the reviews. Magazines conduct extensive surveys of their readers (who after all are their users), and PC World had found that readers indicated that usability was as important a review parameter as the more traditional issues of speed and features. Initial usability reviews were fairly informal (e.g., have a few users send a fax with fax software to get some usability anecdotes for a review), but recent tests have been more elaborate in employing usability labs (including borrowing an American Airlines simulator for a test of the usability of notebook computers on board airplanes). A major concern in reporting summative evaluations was the difficulty in reporting statistics like confidence intervals which are incomprehensible for the average reader. The solution so far has been to convey much of the same information in natural language to express the s trength of the editors' belief in their conclusions. In the two weeks following this session, I have already seen the word "usability" used in major headlines on the covers of two leading PC magazines.
In a session on consulting jobs, John Claudy from American Institutes for Research mentioned that they were often asked to perform summative evaluations for advertising purposes when a vendor wanted to claim that a certain user interface was X% more usable than the competition and wanted a credible source to cite. Initially, they had refused such contracts, but now they did the work under the condition that they could approve all advertising using their name. Of course, a company that loses a comparative usability test will just not advertise the result. A person from the audience commented that some companies do comparative benchmarking for internal use to see whether certain features or interface elements in competing products are any good, but again such results are of course not published, depriving the field of much valuable data. In this session, a distinction was made between consulting and contracting, where contracting involves doing regular work on a project (with the client's main benefit being the added flexibility of using outside staff instead of permanent staff) and consulting involves a more advisory role. Even though nobody wanted to reveal their actual pricing schedules, typical charges mentioned for usability contracting were $75 per hour and a $25,000 fixed price for usability testing with 20 subjects, whereas consultants are known to charge anywhere between $170 and $500 per hour. Of course, as mentioned above, most user tests involve significantly fewer than 20 subjects, and the cost of a "discount user test" with 3-5 users would be much lower than $25,000.
A fun part of the conference was a design competition organized by Jared Spool and Deborah Mrazek from Hewlett-Packard. Unfortunately, I was too busy with other obligations at the conference to actively participate in the competition, but about 70 people formed teams to design a customer-operated ordering kiosk for a hypothetical taco chain. After several elimination rounds, the three winning designs competed in an open session, where each of them was subjected to live user testing by subjects who were timed as they entered a rather large and complex order. The winner was the design that allowed the user to complete the order in the shortest amount of time. Of course, since the winning times were determined by a single test user for each design, there is no guarantee that the winning design would in fact be the fastest over a range of users, but then this competition was more in the spirit of a fun conference event than a commercial comparative evaluation. The design and testing were done with a low-fidelity prototyping approach, using colored notepads for the buttons and signs on the kiosk, and using humans to simulate the computational processes. For example, one design used highlighting on the display to indicate the "current item" in the order (that could be modified by, e.g., requesting diet or light versions of the item), and this highlighting was simulated by having one of the designers hold a yellow transparency cutout over the line being "highlighted." Not only was the design competition entertaining for the audience and a motivating team-building exercise for the participants, it also taught many people the power of low-fidelity prototyping techniques compared with the elaborate computerized tools used by most usability groups.
In general, UPA'93 was a very enjoyable conference with a wealth of valuable information. Due to the increased number of participants and activity in the field, this year's conference instituted dual tracks making it impossible to take in the entire conference. Last year's conference was single-track and felt more like a large specialized workshop than a conference which I personally enjoyed more. On the other hand, it is hard to turn people away when they want to come and the field is in an explosive expansive phase.
All conference sessions were videotaped by Microsoft's extremely professional A/V crew, resulting in a set of 18 VHS video tapes that is available for $255.00 from American Production Services, 2247 15th Ave. W, Seattle, WA 98119, tel. (206) 282-1776, fax (206) 282-3535. Actually, the videos are cheaper than the conference registration fee, but then they do not include the great coffee and extensive networking opportunities offered to those physically present at the conference.
The list of videos (and thus the list of conference sessions) is as follows:
Volume 1, Opening Panel: An Exploration of the Hot Issues of Usability Don Ballman (Mead Data Central), Mary Dieli (Microsoft), Jakob Nielsen (Bellcore), Janice Rohn (SunSoft), Jared Spool (User Interface Engineering)
Volume 2, Practical Tips for Enhancing the Power of UI Groups in Software Development Organizations Bob Vallone (GO Corporation)
Volume 3, Discount Usability Engineering Six Years Later Jakob Nielsen (Bellcore)
Volume 4, Stalking the Wily Test Subject (how to get the right subjects) Liz Cowan (User Interface Engineering)
Volume 5, Tailoring Data Collection Methodologies to Meet the Research and Business Objectives of Usability Practitioners Bradford Hesse (American Institute for Research)
Volume 6, Usability Makes the Difference to Four Million Buyers (usability in the trade press) Dean Andrews, Thomas Grubb, and Greg Smith (PC World Magazine)
Volume 7, Using Low Fidelity Prototyping as a Usability Tool Jared Spool (User Interface Engineering)
Volume 8, Engineering Usability into the Lab Design Process Denise C.R. Benel (National Information Technology Center), Richard Horst (Man-Made Systems Corp.), Russell Benel (The Mitre Corp.)
Volume 9, Working with Many Masters (being an interface consultant) John Claudy (American Institute for Research), Larry Marine (IQ Cognitive Engineering), Joan Lasselle (Ramsey Inc.), Judith Ramey (University of Washington)
Volume 10, Structuring Usability within Organizations Sue Braun (Northwestern Mutual Life), Janice Rohn (SunSoft Inc.)
Volume 11, Usability Lab Tools Kent Sullivan (Microsoft), Nigel Bevan (National Physical Laboratory), Mary Beth Butler (Lotus), Monty Hammontree (SunSoft)
Volume 12, Methods for Investigating Usability During Early Product Design Judith Ramey (University of Washington), Stephanie Rosenbaum (Ted-Ed)
Volume 13, What's Your Problem (how to define what aspects of usability to look for) Amy Kanerva, Jill Rojek (Microsoft)
Volume 14, Usability Inspections at Hewlett-Packard Michael Kahn, Rose Marchetti, and Amanda Prail (Hewlett-Packard)
Volume 15, The Designer's View Mary Beth Butler (Lotus), Ken Dye (Microsoft), Mark Gowans (WordPerfect), Andrew Kwatinetz (Microsoft), Alice Mead (Lotus), Marshall McClintock (Microsoft), Jack Young (WordPerfect)
Volume 16, Why Test Documentation? JoAnn Hackos (Com Tech Services)
Volume 17, Usability Testing: It Seems to be a HIT (speeding up usability) Claus Neugebauer, Nicola Spielmann (Ergonomics Group SAP AG)
Volume 18, Closing Panel: Usability Horoscope of the 90s Kay Chalupnik (IDS American Express), John Claudy (American Institute for Research), Julie Humburg (Intuit Inc.), Jakob Nielsen (Bellcore), Irene Wong (Apple), Jack Young (WordPerfect)
Share this article: Twitter | Linkedin | Google+ | Email