|
useit.com |
| This paper was originally presented as a keynote at the IFIP INTERACT'95 International Conference on Human-Computer Interaction (Lillehammer, Norway, June 27, 1995). |
If we consider usability engineering as a system, a design, or a set of interfaces with which development managers have to interact, then it obviously becomes the usability professionals' responsibility to design that system to maximize its communication with its users. My claim is that any problems in getting usability results used more in development are more due to lack of usability of the usability methods and results than they are caused by evil development managers who deliberately want to torment their users.
In order to get usability methods used more in real development projects, we must make the usability methods easier to use and more attractive. One way of doing so is to consider the way current usability methods are being used and what causes some methods to be used and others to remain "a good idea which we might try on the next project." As an example of such studies I will report on a study of what causes usability inspection methods to be used.
Usability inspection methods were first described in formal presentations in 1990 at the CHI'90 conference where papers were published on heuristic evaluation (Nielsen and Molich, 1990) and cognitive walkthroughs (Lewis et al., 1990). Now, only four to five years later, usability inspection methods have become some of the most widely used methods in the industry. As an example, in his closing plenary address at the Usability Professionals' Association's annual meeting in 1994 (UPA'94), Ken Dye, usability manager at Microsoft, listed the four major recent changes in Microsoft's approach to usability as:
"I am working [...] with an airline client. We have performed so far, 2 iterations of usability [...], the first being a heuristic evaluation. It provided us with tremendous information, and we were able to convince the client of its utility [...]. We saved them a lot of money, and are now ready to do a full lab usability test in 2 weeks. Once we're through that, we may still do more heuristic evaluation for some of the finer points."
Work on the various usability inspection methods obviously started several years before the first formal conference presentations. Even so, current use of heuristic evaluation and other usability inspection methods is still a remarkable example of rapid technology transfer from research to practice over a period of very few years.
Of the 85 mailed questionnaires, 4 were returned by the post office as undeliverable, meaning that 81 course attendees actually received the questionnaire. 42 completed questionnaires were received, representing a response rate of 52%.
The questionnaire was mailed in mid-November 1993 (6.5 months after the tutorial) with a reminder mailed in late December 1993 (8 months after the tutorial). 21 replies were received after the first mailing, and another 21 replies were received after the second mailing. The replies thus reflect the respondents' state approximately seven or eight months after the tutorial.
With a response rate of 49%, it is impossible to know for sure what the other half of the course participants would have replied if they had returned the questionnaire. However, data from the two response rounds allows us to speculate on possible differences based on the assumption that the non-respondents would be more like the second-round respondents than the first-round respondents. Table 1 compares these two groups on some relevant parameters. The first conclusion is that none of the differences between the groups are statistically different, meaning that it is likely that the respondents are fairly representative of the full population. Even so, there might be a slight tendency to having the respondents were associated with larger projects than the non-respondents and that the respondents were probably more experienced with respect to usability methods than the non-respondents. Thus, the true picture with respect to the full group of tutorial participants is might reflect slightly less usage of the usability inspection methods than reported here but probably not much less.
| Question | First-round Respondents | Second-round Respondents | p |
|---|---|---|---|
| Usability effort on project in staff-years | 3.1 | 1.3 | .2 |
| Had used user testing before the course | 89% | 70% | .1 |
| Had used heuristic evaluation after the course | 65% | 59% | .7 |
| Number of different inspection methods used after course | 2.2 | 1.8 | .5
Comparison of respondents from the first questionnaire round with the respondents from the second round. None of the differences between groups are statistically significant. |
The median ratio between the usability effort of the respondents' latest project and the project's size in staff-year was 7%. Given the sample sizes, this is equivalent to the 6% of development budgets that was found to be devoted to usability in 31 projects with usability engineering efforts in a survey conducted in January 1993 (Nielsen, 1993). This result further adds to the speculation that our respondents are reasonably representative.
| Method | Respondents Using Method After INTERCHI | Times Respondents Had Used the Method (Whether Before or After the Course) | Mean Rating of Benefits from Using Method |
|---|---|---|---|
| User testing | 55% | 9.3 | 4.8 |
| Heuristic evaluation | 50% | 9.1 | 4.5 |
| Feature inspection | 31% | 3.8 | 4.3 |
| Heuristic estimation | 26% | 8.3 | 4.4 |
| Consistency inspection | 26% | 7.0 | 4.2 |
| Standards inspection | 26% | 6.2 | 3.9 |
| Pluralistic walkthrough | 21% | 3.9 | 4.0 |
| Cognitive walkthrough | 19% | 6.1 |
4.1
Proportion of the respondents who had used each of the inspection methods and user testing in the 7-8 month period after the course, the number of times respondents had used the methods, and their mean rating of the usefulness of the methods on a 1-5 scale (5 best). Methods are sorted by frequency of use after the course. |
Respondents were also asked how many times they had used the methods so far, whether before or after the course. Table 2 shows the mean number of times each method had been used by those respondents who had used it at all. This result is probably a less interesting indicator of method usefulness than is the proportion of respondents who had used the methods in the fixed time interval after the course, since it depends on the time at which the method was invented: older methods have had time to be used more than newer methods.
Finally, respondents were asked to judge the benefits of the various methods for their project(s), using the following 1-5 scale:
1 = completely uselessThe results from this question are also shown in Table 2. Respondents were only rated those methods with which they had experience, so not all methods were rated by the same number of people. The immediate conclusion from this question is that all the methods were judged useful, getting ratings of at least 3.9 on a scale where 3 was neutral.
2 = mostly useless
3 = neutral
4 = somewhat useful
5 = very useful
|
The statistics for proportion of respondents having used a method, their average usefulness rating of a method, and the average number of times they had used the method were all highly correlated. This is only to be expected, as people would presumably tend to use the most useful methods the most. Figure 1 shows the relation between usefulness and times a method was used (r = .71, p < .05) and Figure 2 shows the relation between usefulness and the proportion of respondents who had tried a method whether before or after the course (r = .85, p < .01). Two outliers were identified: Feature inspection had a usefulness rating of 4.3 which on the regression line would correspond to being used 6.7 times though in fact it had only been used 3.8 times on the average by those respondents who had used it. Also, heuristic estimation had a usefulness rating which on the regression line would correspond to having been tried by 56% even though it had in fact only been used by 38%. These two outliers can be explained by the fact that these two methods are the newest and least well documented of the inspection methods covered in the course.
|
The figures are drawn to suggest that usage of methods follows from their usefulness to projects. One could in fact imagine that the respondents rated those methods the highest that they had personally used the most in order to avoid cognitive dissonance, meaning that causality worked in the opposite direction as that implicitly shown in the figures. However, the correlation between the individual respondents' ratings of the usefulness of a method and the number of times they had used the method themselves is very low (r=.05), indicating that the respondents judged the usefulness of the methods independently of how much they had used them personally. There is only a high correlation in the aggregate between the mean values for each method. Thus, we conclude that the reason for this high correlation is likely to be that usability methods are used more if they are judged to be of benefit to the project. This is not a surprising conclusion but it does imply that inventors of new usability methods will need to convince usability specialists that their methods will be of benefit to concrete development projects.
| Method |
Respondents using the method as it was taught |
|---|---|
| Pluralistic walkthrough | 27% |
| Heuristic estimation | 25% |
| Heuristic evaluation | 24% |
| Standards inspection | 22% |
| Cognitive walkthrough | 15% |
| Feature inspection | 12% |
| Consistency inspection |
0%
Proportion of respondents who used the methods the way they were taught. For each method, the proportion is computed relative to those respondents who had used the method at least once. |
The survey showed that only 18% of respondents used the methods the way they were taught. 68% used the methods with minor modifications, and 15% used the methods with major modifications (numbers averaged across methods). In general, as shown in Table 3, the simpler methods seemed to have the largest proportion of respondents using them as they were taught. Of course, it is perfectly acceptable for people to modify the methods according to their specific project needs and the circumstances in their organization. The high degree of method modification does raise one issue with respect to research on usability methodology, in that one cannot be sure that different projects use the "same" methods the same way, meaning that one will have to be careful when comparing reported results.
The normal recommendation for heuristic evaluation is to use 3-5 evaluators. Only 35% of the respondents who used heuristic evaluation did so, however. 38% used two evaluators and 15% only used a single evaluator. The histogram in Figure 3 shows the distribution of number of evaluators used for heuristic evaluation.
With respect to user testing, even though 35% did use 3-6 test participants (which would normally be referred to as discount usability testing), fully 50% of the respondents used 10 participants or more. Thus, "deluxe usability testing" is still being used to a great extent. The histogram in Figure 4 shows the distribution of number of test participants used for a test.
|
| |
| Figure 3
Histogram of the number of evaluators normally used by the respondents for heuristic evaluations. |
Figure 4
Histogram of the number of test users normally used by the respondents for user testing. |
As one might have expected, the participants' motivation for taking the course had major impact on the degree to which they actually used the inspection methods taught in the course. People who expected to need the methods for their current project indeed did use the methods more than people who expected to need them for their next project, who again used more methods than people who did not anticipate any immediate need for the methods. Table 4 shows the number of different inspection methods used in the (7-8 month) period after the course for participants with different motivation. The table also shows the number of inspection methods planned for use during the next six months. Here, the participants with pure academic or intellectual interests have the most ambitions plans, but we still see that people who had the most immediate needs when they originally took the course plan to use more methods than people who had less immediate needs.
| Motivation for taking the course | Proportion of the respondents | Number of different inspection methods used since the course | Number of different inspection methods planned for use during the next six months |
|---|---|---|---|
| Specific need to know for current project | 31% | 3.0 | 2.2 |
| Expect to need to know for next project | 21% | 1.4 | 1.7 |
| Expect the topic to be important in future, but don't anticipate any immediate need | 14% | 1.2 | 1.3 |
| Pure academic or intellectual interest | 12% | 2.0 |
3.4
Relation between the main reason people took the course and the number of different methods they have used. |
In addition to the reasons listed in Table 4, 22% of the respondents indicated other reasons for taking the course. 5% of the respondents wanted to see how the instructor presented the materials in order to get material for use in their own classes and 5% wanted to validate their own experience with usability inspection and/or were developing new inspection methods. The remaining 12% of the respondents were distributed over a variety of other reasons for taking the course, each of which was only given by a single respondent.
| Cognitive walkthrough | Consistency inspection | Feature inspection | Heuristic evaluation | Heuristic estimation | Pluralistic walkthrough | Standards inspection | User testing | Proportion of all comments | |
|---|---|---|---|---|---|---|---|---|---|
| Method generates good/bad information | 9 / 1 | 5 / 0 | 5 / 0 | 3 / 1 | 4 / 2 | 5 / 0 | 6 / 0 | 20 / 0 | 33% |
| Resource and/or time requirements | 1 / 3 | 1 / 3 | 4 / 1 | 8 / 1 | 1 / 2 | 0 / 11 | 1 / 0 | 0 / 2 | 21% |
| Expertise and/or skills required | 1 / 8 | 1 / 3 | 0 / 4 | 5 / 1 | 0 / 3 | 1 / 4 | 17% | ||
| Specific characteristics of individual project | 2 / 0 | 2 / 4 | 1 / 2 | 2 / 1 | 0 / 6 | 1 / 0 | 11% | ||
| Communication, team-building, propaganda | 2 / 0 | 1 / 0 | 3 / 0 | 5 / 0 | 4 / 0 | 8% | |||
| Method mandated by management | 1 / 0 | 1 / 0 | 1 / 0 | 1 / 0 | 1 / 0 | 2 / 0 | 4% | ||
| Interaction between multiple methods | 3 / 0 | 1 / 0 | 1 / 0 | 0 / 1 | 3% | ||||
| Other reasons | 0 / 2 | 2 / 0 | 2% | ||||||
| Proportion of comments that were positive | 48% | 55% | 63% | 88% | 60% | 50% | 45% |
93%
Classification of the 186 free-form comments made by respondents when asked to explain why they used (or did not use) a method. In each cell, the first number indicates reasons given for using a method and the second number (after the slash) indicates reasons given for not using a method (empty cells indicate that nobody made a comment about a method in that category) |
Table 5 summarizes the free-form comments according to the following categories:
The two following criteria in the table are both related to the ease of using the methods: resources and time as well as expertise and skill needed. The respondents view heuristic evaluation as superior in this regard and express reservations with respect to cognitive walkthroughs and pluralistic walkthroughs. Remember that the survey respondents came from projects that had already decided to use usability engineering and that had invested in sending staff to an international conference. The situation in many other organizations is likely to make the cost and expertise issues even more important elsewhere.
Furthermore, methods should be flexible and able to adapt to changing circumstances and the specific needs of individual projects. The free-form comments analyzed in Table 5 show project needs as accounting for 11% of the reasons listed for use or non-use of a method, but a stronger indication of the need for adaptability is the statistic that only 18% of respondents used the methods the way they were taught, whereas 68% required minor modifications and 15% required major modifications.
A good example of flexibility is the way heuristic evaluation can be used with varying numbers of evaluators. The way the method is usually taught (Nielsen, 1994a) requires the use of 3-5 evaluators who should preferably be usability specialists. Yet, as shown in Figure 3, many projects were able to use heuristic evaluation with a smaller number of evaluators. Of course, the results will not be quite as good, but the method exhibits "graceful degradation" in the sense that small deviations from the recommended practice only results in slightly reduced benefits.
The survey very clearly showed that the way to get people to use usability methods is to get to them at the time when they have specific needs for the methods on their current project (Table 4). This finding again makes it easier to transfer methods that have wide applicability across a variety of stages of the usability lifecycle. Heuristic evaluation is a good example of such a method since it can be applied to early paper mock-ups or written specifications as well as later prototypes, ready-to-ship software, and even the clean-up of legacy mainframe screens that need to be used for a few more years without available funding for major redesign.
A final issue in technology transfer is the need for aggressive advocacy. Figure 1 shows that heuristic evaluation is used somewhat more than its rated utility would justify and that feature inspection is used much less that it should be. The most likely reason for this difference is that heuristic evaluation has been the topic of many talks, panels, seminars, books, and even satellite TV shows (Shneiderman, 1993) over the last few years, whereas feature inspection has had no vocal champions in the user interface community.