|
useit.com |
This research was done in 1993 while the authors were with Bellcore (Bell Communications Research).
Keywords: Handwriting recognition, pen machines, information retrieval, latent semantic indexing, LSI, keyword matching, personal digital assistants, nomadic computing.
An important use of personal digital assistants will probably be the retrieval of previously entered information, allowing the notes to serve as a personal information base for the user. Unfortunately, mis-recognitions in notes may make them hard to find for some types of information retrieval systems, such as those based on simple keyword matching. If the user were searching for one specific keyword that happened to have been recognized erroneously in the note, then the user would never find that note.
It is likely that pen machines will not be used much for handwritten entry of large amounts of text, since keyboards are still superior for that task. Pen input of text will probably be used more for note-taking than for the writing of long papers or reports. Indeed, many applications of pen machines may not involve handwriting recognition to any great extent, but will rely mostly on graphic sketches and the choosing of pre-specified values from menus in form-filling applications, with very little information entered as free-form text. Even so, significant applications do involve some note-taking, and these applications are likely to be among those where searching and information retrieval are especially important.
The goal of the study reported here was to investigate whether information retrieval might work in spite of recognition errors. We used the latent semantic indexing method (LSI) [Deerwester et al. 1990] for retrieval. LSI uses statistical techniques to model the way in which words are used in the overall collection of documents. In the resulting semantic space, a query can be similar to a document even if they have no words in common. LSI is thus not dependent on any single word and might handle recognition errors robustly.
For the handwriting experiment, a test user input 47 abstracts of papers submitted to a hypertext conference to the pen machine. This text amounted to 58,860 characters, of which about 9,000 were spaces that did not need to be explicitly written. The speed and error data reported in the following has been calculated relative to the total number of characters, including spaces, in order to use the same count for handwriting and keyboard input. For comparison, a smaller number of characters (from five additional abstracts for each input mode) were entered on a personal computer keyboard (without the use of the backspace key or other corrections), as well as by free-form handwriting on paper.
Pen machine input was performed with discrete characters written in boxes and thus represents the slowest and most accurate mode. Table 1 shows the speed and error results from the three input media. Pen machine input can be seen to be much slower than both typed and free-form handwritten input. Also, pen input has a much higher error rate than keyboard input. Of course, substantial individual variability can be expected for input speeds, especially with respect to keyboard input, so the table should only be seen as a general indication of relative user performance with the three media.
| Pen Machine | Keyboard | Handwriting on Paper | |
|---|---|---|---|
| Words per minute | 10 | 47 | 21 |
| Characters per second | 1.1 | 5.3 | 2.4 |
| Errors per minute | 1.1 | 1.7 | N/A |
| Error rate per character | 1.6% | 0.5% | N/A |
| Error rate per word | 8.8% | 3.0% | N/A |
Table 1 shows that the experienced error rate in terms of errors per minute was actually worse on the keyboard than with the pen machine. Based on our personal observations while using these two input devices it seems, however, that the handwriting recognition errors are much more distracting to the user. One major reason why pen machine errors are more distracting is the delayed feedback implied by the recognition process. With current pen machines, pen input is not recognized immediately, but only after a delay of a few seconds. Even though the user can write ahead and thus does not need to be slowed down while the computer processes the pen input, the fact that the recognized characters appears with a delay means that the user has to scan back over previously entered text to check for recognition errors. Delaying users by more than a second or two is known from the response time literature to interrupt their flow of thought [Miller 1968], and the need to scan previous text is a further distraction. Contrast this interaction technique with the correction of a typing error on a keyboard: The error is usually noticed immediately after hitting the wrong key, and the correction is effected by pressing the backspace key in the context of the error rather than interrupting a later flow of thought.
The handwriting recognition performance achieved in this experiment (1.6% recognition errors) is somewhat better than that reported in a recent study by Santos et al. [1992] (2.7% recognition errors under our definition). We believe this is because we used a newer recognizer. In general, recognition rates can be expected to go up as better recognition software becomes available, but at the same time, faster and less constrained writing styles (such as connected rather than discrete writing) will prevent perfect recognition from being achieved in the foreseeable future.
Figure 1 shows the changes in recognition error rates over time. It can be seen that the error rate was initially very high (more than 5%) and did not reach the 1-2% steady state band until about 5,000 characters had been entered. Apparently, users need to change their handwriting style somewhat to accommodate current recognition software and write in ways that the machine finds easier to understand. This result shows that even handwriting input cannot be considered as requiring zero learning time as an interaction technique. On the other hand, reaching the "expert" level of 1-2% recognition errors only required 65 minutes of practice with the pen machine, indicating that handwriting input is easier to learn than many other input devices, such as using a mouse. Figure 1 also shows that the recognition rates were fairly variable even after the 1.6% recognition error level had been reached for mean performance.
|
HTs/-24 Boy: Indexing hypcrtext documents in context
To glnlrata intelligent indexing that allows context-sensitive information retrieval, a system must be abld to acavire knowledge directly through interaction with users. In this paper, we present the architecture for LID (Compvter Integrated Documentation,? ? system that enasles integration of various technical documents in a hypertext framework and includes an intelligent browsing system that incorporates indexing in context. LID's knowledge-based indexins mechanism allows ease-based knowledge acquisition by experimentation. It utilizes on-line user information requirements and sussestions either to reinforce current indexing in case of success or to generate new knowledge in case of failure. This allows LID's intelligent interface system to providd helpful responses, even when no a prior; user model is ava:lable. Our system in fact learns how to exploit a user model based on experience cArom user feedback). We describe (ID's current capabilities and provide an overview of our plans for extending the system. Keywords: Contextual indexing, information retrieval, tailorable system, context acquisition, hypertext, paradigms for informat:on access (Structuring hypertext documents for reading and retrieval) |
Figure 2 shows a sample abstract as it was recognized by the pen machine. Notice how every occurrence of the important term CID was misrecognized as LID or (ID, so a keyword search for that term would not find this document. Similarly, a search for the full combined term Computer Integrated Documentation would not find the document either, as "Computer" was misrecognized as "Compvter". In many other abstracts, however, important terms were repeated several times with at least one occurrence recognized correctly.
Retrieval of 47 handwritten conference paper abstracts is certainly not a typical case of the use of a personal digital assistant. We used this set of documents because of the availability of exhaustive relevance ratings. Also, as mentioned in the introduction, pen machine users will not be expected to handwrite long documents, but will probably only write shorter notes. The abstracts used for this experiment had a mean length of 185 words and might thus be more representative of the length of notes that may be written with a pen machine than, say, the full text of the complete papers would be. Finally, the set of abstracts might be taken to approximate a set of notes taken by a pen machine user attending a hypertext conference in terms of scope and vocabulary use. Thus, even though no user would write in these exact documents in real use of a pen machine, they were still a good test set for our information retrieval experiment.
Figure 3 shows the mean rated relevance of the top one to ten abstracts retrieved from the handwritten text as recognized by the pen machine as well as from files without any errors. Relevance was rated on a 0-9 scale, with 9 indicating perfect relevance, and 0 indicating complete irrelevance. The measure for retrieval of a single abstract is the mean rated relevance of the top abstract found for each of the fifteen reviewers. In general, the measure for the retrieval of n abstracts is the mean rated relevance for the top n abstracts returned for each of the fifteen reviewers' interest queries. Figure 3 shows relevance curves for both LSI retrieval and keyword retrieval. LSI retrieval was performed using a 50-dimensional semantic space.
It can be seen from Figure 3 that the retrieval quality was basically the same for the original abstracts and the recognized handwriting, in spite of the 8.8% errors on the word level in the recognized test. For the LSI retrieval, the difference in rated relevance between searching the original text and the recognized handwriting was only 0.06% when averaged over the retrieval of one through ten abstracts. For keyword retrieval, the difference was 0.85% between the original text and the recognized handwriting when averaged over the retrieval of one through ten abstracts.
Latent semantic indexing tended to be slightly better than keyword matching for retrieval of one through five abstracts, with the LSI finding abstracts with a 4% higher relevance rating among the abstracts recognized by the pen machine, and a 1% higher relevance rating among the original abstracts. The two methods were about equivalent when more than five abstracts were asked for.
One reason that standard keyword search performed well was that both the queries (reviewers' profiles) and abstracts were fairly long (abstracts 185 words, profiles 454 words). Important words tended to be repeated several times in the abstracts, increasing the probability that at least one occurrence would be correctly recognized. Furthermore, since the profiles were fairly long, they contained many words matching words in the abstracts. We therefore repeated the experiment with short interest profiles (three words per reviewer) and got essentially the same results.
Our good results in finding text with errors is not as surprising as one might have thought, given that experiments with random data corruption has found that information retrieval performance does not degrade significantly until about 30% of the words contain errors [Smith and Stanfill 1990]. Of course, errors in handwriting recognition are not random since the same characters tend to be misrecognized in the same way, but it anything, this consistency in the errors may be an advantage when using LSI, since it will increase the chance that misrecognized words get scaled correctly.
Future advances in handwriting recognition may decrease the error rates for any given pen input mode. Users are likely to take advantage of such advances to move to less restrictive and faster pen input modes than the writing in boxes used in this experiment, with free-form handwriting as the ultimate goal. Because of this pressure to move to faster and more error-prone pen input modes, some recognition errors will likely remain in pan-based interfaces for a long time. The finding that information retrieval can robustly handle current levels of recognition errors is thus encouraging for the use of pen machines as personal digital assistants. Pen machine users will be able to find their own notes later, even though the recognized text may not be of sufficient quality to show to others. This result only holds to the extent that the notes are at least as long as the abstracts in our experiment (185 words). Very short notes would be difficult to find without the use of further attributes such as time or context of writing.