We measured five usability metrics for each version of our website:
Task time was the number of seconds it took users to find answers for specific questions about the content.
Errors was a percentage score based on the number of incorrect answers users gave for questions that had a known answer (one question asked users to determine their favorite tourist attraction: there was no one correct answer, so this question was not scored for errors).
Memory comprised two measures from an exam given to the users after they had finished using the site. Recognition memory was a percentage score based on the number of correct answers minus the number of incorrect answers to 5 multiple-choice questions. Recall memory was a percentage score based on the number of items correctly recalled after the test minus the number incorrectly recalled (users were asked to list as many of the tourist attractions discussed in the site as they could remember).
Time to recall site structure was the number of seconds it took users to draw a sitemap. This is a measure of how well the users had understood the information architecture: if they understood it well, they would draw it quickly; if they understood it poorly, they had to think longer.
Subjective satisfaction was determined from participants' answers to a questionnaire. Each question used a 10-point rating scale. Four satisfaction criteria were averaged to derive the subjective satisfaction score: perceived quality (e.g., "How satisfied are you with the site's quality of language?"), perceived ease of use (e.g., "How easy is it to find specific information in this website?"), likability (e.g., "the term 'fun to use' describes the site very well"), and user affect (e.g., "How tired do you feel right now?").
Note that the subjective metric assessed how well users thought the site worked, not how well the users actually performed. It was quite possible for a user to be very slow at answering the questions and still say that he or she thought that it was very easy to find information on the site.
Overall usability of a site was calculated as the geometric average of these five measures. Each measure was normalized relative to the performance measured for the control condition (for example, if users could remember 5 things in the control condition but 6 things in one of the other conditions, then that condition received a 120% score for memory).
In our study, we gave equal weight to each of the five usability metrics when computing overall usability. Depending on the goal of a project, it may be better to use different weights:
An educational site might give added weight to the memory measure and perhaps also some added weight to learning the site structure
Since intranets are highly performance oriented and should enhance employee efficiency, an intranet project would give the highest weight to task time and errors (a site for customer service reps might place the highest weight on avoiding errors )
A leisure site would place the highest weight on subjective satisfaction and might give zero weight to errors and very low weight to other performance metrics.