|
useit.com |
| The conference proceedings can be bought online |
"Do you want to spend the rest of your life selling sugared water or do you want a chance to change the world?" By asking this question, the then chairman of Apple Steve Jobs enticed John Sculley away from his fast track career in Pepsi. Similar considerations form a large part of my own motivation for working in the user interface field. As Ben Shneiderman said in his keynote speech at CHI'86: Our work will influence how people live in the next century.
On this background, CHI'88 was somewhat disappointing. I did not have the same feeling of breakthrough enthusiasm that I have had at several earlier CHIs. Of course some new developments were reported and nice research results presented. But in most cases they were refinements of earlier work. My real problem is probably just that progress in the field this year has been only 10% instead of the 30-40% I am used to, and of course most fields would be overjoyed to see 10% annual progress rates.
Some new stuff was presented such as the pie menus studied by Callahan, Hopkins, Weiser, and Shneiderman from the University of Maryland. When used as pop-up menus, pies have the advantage that any menu item can be selected by equally small movements of the mouse and the study did indeed show that users performed about 15% faster using a pie menu than using a linear menu. Pie menus also have some potential disadvantages, especially when used with many menu items or in cases that call for hierarchical pop-ups.
In spite of this and some other novelty items, the main feel of CHI'88 was that of improvements of earlier stuff rather than revolutionary new discoveries. Every year, I am able to summarize the main theme of a CHI conference and this year I am not in doubt that the theme was that we are currently slowed down to steady, evolutionary progress in the user interface field.
In spite of these comments, I still enjoyed CHI'88 tremendously. As discussed below, lots of interesting things happened, and as usual "everybody" in the user interface field were there. There were plenty of opportunities to meet people and talk (which is the most important aspect of a conference anyway) and twice I even had the quite un-Danish experience of meetings during "business breakfast" at 7:30 in the morning.
One day I was hanging out in one of the courtesy suites enjoying the free Diet Pepsi when suddenly a person whom I had never met before asked me if I were the author of an electronic report on hypertext. I said yes, indeed I am, but how do you know? It turned out that he recognized me from the scanned photo of myself which I had been vain enough to include in the electronic document to make reading it more of a multi-media experience. I furthermore learned that he had gotten hold of the file by downloading it from the GEnie network service (to which it must have been uploaded by yet another person as I don't have a GEnie account). So the moral of this story is that electronic publishing of online documents may work fairly efficiently and get you a wide distribution even though it also involves a lot of difficulties and is not yet "officially" recognized.
Other next-generation concepts discussed at this conference were the use of video film (in Palenque, see report below) and more knowledge-based interactions. But these effects have not entered the mainstream of user interface design.
Several applications were shown. One of the most immediately understandable was a finger painting system where the color used was determined by the number of fingers shown. I asked Krueger why the system deposited the paint over the user's finger rather than under it which might have seemed more natural. His answer was that sometimes one would not want the hand to obscure the work being drawn.
The painting was cleared by spreading all fingers. Some of these gestures seemed very natural, including the clearing gesture. Gestures in other applications were not that obvious but still frequently very nice, such as having a straight line appear between two fingertips in a CAD-system. One problem they had in developing their gestural language was in parsing hand movements to determine when you just want to move your hand to another part of the screen and when you want to issue a command. In general, there seemed not to be much consistency in the interaction techniques used in the different parts of the system with the exception of the technique of reaching to the upper right corner of the screen to pull out the main menu.
Videodesk is really a special version of the older Videoplace system where the computer is an entire room which you enter to use your body as input device. As such, Videodesk was yet another example of the evolutionary trend at this CHI. The full Videoplace system was not available for the conference as it was installed as part of a large exhibit on Computers and Art at the IBM Building in New York. This was a very interesting exhibition which I had seen by accident before coming to Washington: I had originally jumped on the M2 bus to go uptown to the Metropolitan Museum when I looked out the window and saw a poster at the IBM Building for their special exhibition. Yet another advantage of not using a constrained "transport interface" like the subway: You can change your mind.
All these mouse movements on the video were shown coordinated with An der schönen, blauen Donau as soundtrack. If I should pick a piece of music for my own use of the mouse, it might be C Jam Blues because of the many double-clicks inherent in the overloading of functionality in my single-mouse system.
The many funny episodes and the dainty sound track in the waltzing mouse video may detract somewhat from the very real underlying problem of learning how to use a mouse: One just has to acknowledge that the mouse is not a very intuitive input device. Recently I had occasion to advice someone on the design of a simple, user-driven information system for museums and I had to convince them that they could not just use a mouse as a pointing device for people coming in off the street.
I am certainly able to remember the help provided by Jeff, but this may of course be because I know him personally and was surprised to see him in a computer system. Other aspects of the multi-media help seemed more clumsy, especially the use of synthesized voice which was slow and hard to understand.
Some demos were given in the video of how to build up some reasonable level of functionality using literal pixels: One could build a drawing of a keyboard on the screen and bind it to the real keyboard so that whenever one edited the font on the screen, one changed what would be typed. But in general I must say that the demo was not very convincing as a usable system for real users. It was a convincing exploration of an unusual research idea and it certainly made me think.
One small interesting interaction technique shown in Viewpoint was that the cursor was semi-transparent so that it did not hide what was underneath it. This seems a more natural way of getting this effect than the Videodesk approach of letting the paint go over the painting finger rather than under it.
Otherwise the videos were not all that new to me: Xerox Colab, NASA's head-mounted display, Myers' Peridot, Anderson's intelligent tutors: All very interesting, but I had seen the systems before-in many cases at earlier CHI conferences. Kristee Kreitman from Apple showed a video of her very nice MacWorld Information Kiosk (which I had also seen before) which used a very graphical interface and real-world metaphors to structure information about a large trade show. Kreitman mentioned on the video that the system had been developed by rapid iterative redesign (using HyperCard) based on informal empirical testing but unfortunately she did not show any examples of the usability problems discovered during these tests or how they were solved. We only saw the perfect result (which was of course impressive) but not the process (from which we would have learned more).
Take the Nielsen challenge: Dare show several versions of your system and explain to us what you learned from one iteration to the next. There is no shame in not being able to arrive at a perfect design the first time but hopefully we can see that the final version is better than the first.
repeat 1000 |
tell bird1 tick
|
The bird problem had been proposed by Clayton Lewis as his easy problem, and he indeed solved it very easily by defining a few spreadsheet cells. The actual birds were only stick figures however, while Randy Smith in his ARK solution animated some nicely drawn pigeons. In general, Smith showed the most flashy and immediately convincing solutions. On the other hand, his system relied very much on a large number of predefined objects which made it somewhat harder for the outsider to quickly understand what the system could and could not do.
In the concluding discussion, Borning said that one in the long run might want a more hybrid programming model but that it is interesting to try to push a single concept as far as you can and see what happens.
Brooks stressed the importance of three dimensional graphics since the world has three dimensions. A few objects, such as the "desktop metaphor" are 2D, but most objects in the physical world are 3D. He also emphasized the need for real time interactive graphics where the 3D image would change in immediate response to user actions. Since in most cases we can only show a 2D projection on a computer screen, a 3D illusion requires that we can rotate or otherwise manipulate the simulated object in real time. Some people approach computer graphics by wanting to make images look as real as possible, but Brooks was more interested in making them move as if real, even if that meant that they would not look quite as good: 20 frames per second is required for good representaiton of movement. He mentioned that it was hard for users to understand simulated 3D spaces (except for those which they knew well in advance) but that user-controlled lighting and camera/viewpoint (movements) really enhanced the interface. Users should change how they view a scene to understand it rather than change the model itself.
One really needs maps and overviews for navigating the 3D virtual worlds. In Brook's experience, users start out navigating by looking at the map, while experienced users move around directly by looking at the screen and only every now and then look at the map.
One of the problems in 3D systems is which input/output devices to use. Brooks showed a force output device (a kind of 3D joystick) with which users could manipulate the 3D structures being shown on a screen while feeling the forces acting on the objects: If one moved something in a direction where there was very little force acting on it in the simulated world, the manipulation device would offer little resistance to the user movement while it would be physically hard to move something where there would be strong forces acting on it. One example of use of this device was moving simulated molecules around while feeling the forces acting on them.
Another system showed by Brooks was Walkthrough which was used to let users (architects and/or their clients) "walk" through the rooms of an unbuilt building. Computer screens show the colored 3D scene which a person would see while walking through the building (doors, walls, floors, etc.) and change in real time. Users may move either by "flying" through the building with a joystick or by actually walking on an exercise treadmill, which is a fairly unusual input devices for a computer system. It is not known which of these two actually gives users the most realistic feeling of moving through a building. I do know that I would have liked to try the treadmill interface if it had been demoed at CHI instead of just shown on a video tape. One of the practical points of this simulation is that they had to program it so as not to allow users to walk through walls without using the doors.
Brooks also discussed an interaction technique problem of more fundamental interest which he called the two-cursor problem. The problem is that one wants to have one cursor to specify the commands and one cursor to specify the operands (where one wants the commands to act). This problem is even known in text processing where one has both a pointing cursor for use in scrolling, pulling down menus, etc. and an insertion point cursor for use in placing keyboard input on the screen (my own studies of novice Macintosh users indicate that the two-cursor problem in dealing with text in graphics programs such as MacPaint is one of the harder learning difficulties to overcome). But in 3D interactive graphics the problem becomes worse and cause the user to loose visual continuity and placement in the virtual world each time they have to go to the menu. One of the best solutions to the two cursor problem according to Brooks is that of using command-keys as substitutes for picking commands with the mouse.
Today graphics is mainly used for communicating already established results to others, and especially flashy graphics is often used in reports to funding agencies. Brooks wanted to move to using graphics to generate the insights in the first place. As an example, they had produced a video tape with 40 different visualizations of a molecule and found that they were useful for different kinds of insights. So the goal should not be to find the one best visualization but rather to let scientists build many different visualizations from the same data set.
Brooks also commented on the nature of research results: Some may be useful but too generalized to be 100% true, while others could well be true but too narrow to be useful. He felt that the very essence of a user interface was the conceptual integrity of the whole, meaning that atomized research results might not be useful for designing interfaces.
Unfortunately her model was presented in terms of a quite compact notation which is no doubt very useful in writing papers but which was fairly hard to follow at first sight. Yang classified commands as primitive, compound (several commands issued together), macro (a named sequence of primitives and/or compounds) and meta (a command acting on other commands for the purpose of reuse or user recovery of an earlier state). She then proceeded to classify different kinds of undo and command reuse together with the formal criteria for when they can be used. After the presentation of a quite large number of different kinds of undo/reuse commands, I asked Yang which of these functions actually would be useful to ordinary users who would not want to have to analyze the effects of their actions according to a complex model. The answer was that studies were currently being conducted but that results were not yet available.
This conceptual study of undo/reuse is the kind of work which we need more of in the HCI field. Agreed, it is not a comprehensive model which can explain/predict an entire user-system interaction, but it is a nicely packaged piece of insight which may help in the design of a part of a system and which one can look at when one needs it.
The other paper in the reuse session was an empirical study of the history mechanism in Unix and was presented by Saul Greenberg from the University of Calgary in Canada. Greenberg started by reviewing the advantages of reuse: Cognitively, it is easier to recognize how a command looks from a history list than to have to remember yourself how it should be constructed, and motorically, it is easier to select/click on a command than to enter it. The reason history mechanisms have some hope of success is that most actions are really repetitions of previous ones. As an example, Greenberg referred to telephone use where most of the numbers one dials have been dialed before by the same user even though there are millions of potential telephone numbers to choose from.
In his empirical study of Unix, Greenberg found that between 68% and 80% of the command lines entered by users had been entered before in exactly the same form for the entire line. Of course, in some cases one will have to reach a substantial distance back in the history list to get a line for reuse, but because of locality of reference, about half of all possible reusable command lines could be made available by showing users a list of only the last five lines. Interestingly enough, the command line most often reused was not the one immediately before the line being entered but rather the second command before it. The Unix system studied did provide a history mechanism for users to reuse their commands but it was used by only 20% of the novice users. 92% of expert users did use it at least once but only about 4% of their commands were reused even though they could theoretically have reused 44% just by selecting from their last five commands.
Randy Trigg from Xerox PARC was working on a guided tour card for NoteCards which was intended to let authors guide the user's path through a hypertext network by "holding the user's hand a little bit". A guided tour should not be just a pass through the network, however, but should include annotation and special explanation.
Much of the discussion of hypertext in the corridors at this conference centered around Apple's new HyperCard product, so of course Steve Weyer from Apple mentioned HyperCard but said that it did not come out of the hypertext tradition but rather was a prototyping and programming environment which also had some hypertext capabilities.
Alan Kay (also from Apple) discussed the original paper by Vannevar Bush from 1945 in which he proposed a kind of hypertext system. Bush was one of the computer pioneers and knew a lot about the computer technology available in 1945 but he knew next to nothing about photography. So his dream was about a photography-based information system rather than of a desktop computer. We are now getting the kind of functionality Bush dreamed about, but implemented on computer systems rather than on microfilm. From this story, Kay concluded that the stuff which we know most about is probably where we are most inaccurate in predicting the future since this is where we see all the problems.
Kay then showed a video tape narrated by John Sculley (the president of Apple) with a scenario of how HyperCard might look in 1992. I also saw a Kay-Sculley video of a version of the Knowledge Navigator set even farther in the future and I will comment on both tapes together. Both tapes assumed screens with laserprinter-like resolution capable of showing color video which is probably a reasonable assumption. Both tapes also included the presence of a user agent on the screen in the form of a "butler" complete with bow tie. In the 1992 video the agent was a set of animated line drawings capable of only a few facial expressions while the far-future tape had a live video image of a "simulated" person. The intention was that the user communicates with the agent in more or less everyday natural language while the agent then orders the rest of the computer to do what is necessary.
It is probably extremely unlikely that this kind of natural language understanding agent will be available in 1992, even though the tape did show the agent saying "I do not understand" at some point where the user was not very clear. Of more fundamental interest for user interface scientists is whether users would like to communicate with their computer in spoken language at all. In many cases one could imagine that users would be more comfortable with using a combination of gesture-oriented direct manipulation and menu/command/form-filling dialogues to directly operate the computer rather than going through an intermediary agent. On the other hand, I must admit that "speech I/O" was the absolute top scorer in the small survey I conducted of 57 Danish computer professionals, asking them to list the most important probable changes in user interfaces in the year 2000 compared with 1986.
What makes these videos extremely interesting is not so much whether one agrees or not with some of the individual predictions about a user interface of the future but rather that somebody has done the job of making a set of extremely professionally produced scenarios. It is much easier to discuss the potentials of ideas such as agents and technologies such as high-resolution displays and high-bandwidth networks on the basis of concrete scenarios than based purely on theoretical writing. In general, I advice people to use scenarios because they form part of the method for doing human factors work cheaply, but these scenarios must have cost plenty to design and produce. Even so, I would like to recommend that other research centers try to produce scenarios of their visions of the future so that we can get some debate going about in which directions we would want to pull technology.
Negroponte started by saying that just 10 years ago user interface work was regarded as "sissy computer science". In 1977 they had developed the Spatial Data Management System with a kind of desktop metaphor to let people remember where things are based on where they are placed on the screen. Negroponte had shown slides of this system at conferences and received quite harsh remarks: "Who would want to put a calculator on a screen?". Now the user interface area has been widely recognized and they are 150 people at the Media Lab divided about equally between applications driven work and the "technical imperative" of advancing computer science. The work is 80% funded by industry since Negroponte has found that it is much easier to get money from industry than from government grants.
One of the media they are working with is speech. Negroponte is a real speech fan and claimed that many of the people in SIGGRAPH are just wasting their time in image rendering and would serve the user interface better by working on speech instead. Many people think that speech I/O has to be natural language but he wanted to decouple these two techniques to enable use of speech now. He said that speech is really a very long arm which allows you to interact with objects at a distance and it is also often a channel that is unused while the hands and eyes are otherwise occupied. The most important challenge is to get connected rather than discrete speech recognition, while speaker-dependence (which is far easier to achieve than independence) is OK according to Negroponte because that is what one wants to do anyway with personal computers. If you are in a phone booth wanting to talk to an airline computer, Negroponte recommended that you just call your own computer and let it talk to the airline computer in ASCII. Of course there are tremendous integration and protocol problems to be solved before one can really do this.
Besides speech, the Media Lab is of course also working on vision which they think is one of the ways to get computer-human interaction to approximate human-human interaction. One of the limitations of computers currently is that the machine doesn't know the difference between when you just lift your hands off the keyboard to pause or when you go to have lunch. They built a simple vision system to let the computer know if the user was there or not. The long term goal would be recognition of facial expressions but currently the computer will just react as you leave it by using larger and larger letters on the screen so that you can still read it.
A lively discussion ensued after Negroponte had discussed these and other systems. Ben Shneiderman was mystified by the human-human interaction metaphor and claimed that mature technology does not just mimic nature: Airplanes do not look like birds and flap their wings. Instead Shneiderman wanted to empower users by going beyond the limitations of human-human communication. Negroponte's answer was that we are dealing with a totally different kind of problem than mechanistic design because we are more close to the cognitive abilities of humans and that we are not nearly a mature technology anyway. If we wanted another metaphor, we could take his dog which can recognize the tone of his voice.
Bill Buxton (Xerox EuroPARC) said that cognitive perspectives should be brought more into play and one should not just look at the sensory pattern of communication. He found the Media Lab's work to be compelling and full of wonderful ideas but criticized them for never performing an articulating analysis of the results of their work. Negroponte agreed that they had not faced up to the cognitive and AI side of interactions. He said that they got a lot of critique for just building products and then not testing them but that this method of work was intentional: It is almost enough if Buxton finds the product compelling - Negroponte wanted big changes and systems that got people to think differently. If one needs testing to prove one's point, then it is probably not worth doing. As the answer to another comment that we should not rule out cheaper pencil-and-paper exercises, Negroponte said that the user interface area is so important that expense should not be an issue. The only way we can convince people about our ideas is by showing them. As an example, he pointed to new cars which people would not buy from a paper-and-pencil description-they want to test drive them.
In Palenque, the user/child goes on a simulated voyage to an ancient Maya site and can move around it in the same way as in the Aspen system. One can also move up the stairs of the pyramids and then scan the panoramic view from the top. Users can personalize the interaction by taking "snapshots" of the screen which are then pasted into a simulated album.
In contrast to the Media Lab method, Bank Street did use testing and iterative design in the development of Palenque. One thing they found when asking children how they would like to visit an ancient Maya site in the jungle was a strong feeling that they would not like to do so alone. So they used the video medium to provide users with companionship and guidance in the form of filmed characters from the TV show The Second Voyage of the Mimi.
Palenque is implemented using the DVI CD-ROM encoding which is capable of compressing moving video images to a very great extent. This compression introduces some blur in some cases and a person who has more experience with digital video than myself later told me that he had indeed noticed a few such problems with the video clips from Palenque. I did not notice any problems myself except from the general lower quality of the NTSC standard compared to PAL.
A more general problem mentioned was that the proliferation of journals and conferences in the user interface field risks diluting the field because there is simply not that much good work being done. That there is some truth to this could be seen even at CHI which is the most prestigious event in the field. The CHI conference would have been much better if it had included, say the three best papers from the Hypertext'87 workshop in North Carolina in November 1987 and the five best papers from the Computer-Supported Cooperative Work conference in Portland in September 1988. Instead we have two additional conferences with a substantial overlap in subject and audience with the CHI conference.