acm-header
Sign In

Communications of the ACM

Multimodal interfaces that flex, adapt, and persist

Tangible Multimodal Interfaces For Safety-Critical Applications


Despite the success of information technology, there are important problem-solving tasks that computing has had difficulty supporting. Consider an example from the military. In Figure 1, we see officers turning their backs on computing, preferring instead to work with an 8-foot high paper map and Post-it notes. They explain there are good reasons for their reluctance to use digital systems—paper maps are readily available, lightweight, cheap, high in resolution, very large or small in scale, supportive of collaboration, and fail-safe. Likewise, a trip to most local hospitals should convince the observer that most physicians have yet to abandon paper despite the advantages of electronic medical records [3]. Air-traffic controllers have a similar preference for paper flight strips in performing their stressful jobs [5].

Professionals in these safety-critical domains have developed manual work practices their organizations are attempting to replace with computer systems. Though there are good reasons to automate, and the resulting systems may be built with the prevailing best practices, they nonetheless alter fundamental aspects of what users do and value. Not surprisingly, these systems encounter resistance. For example, at Cedars-Sinai hospital in Los Angeles, physicians rebelled against the installation of a physician order-entry system, causing its removal:

"I'm not opposed to change ... but it's got to be new and better," said Dudley Danoff, MD, a urologic surgeon who helped organize physician opposition. "This was new but certainly not better" than paper ... Cedars-Sinai's decision was extraordinary but not unique. David Classen, MD, of First Consulting Group, says he knows of at least six other hospitals that have removed paperless systems in the face of physician resistance and other problems. (American Medical News, Feb. 17, 2003; www.ama-assn.org/scipubs/amnews/pick_03/bil20217.htm).


Rather than require that users change, system designers could adapt their systems to key aspects of the users' work practice. Indeed, tangible multimodal systems enable users to employ physical objects already in their workplace (for example, paper or other physical tools), along with natural spoken language, sketch, gesture, and other input modalities to interact with information and with co-workers.


These failures are predicted by Moore's analysis of technology acceptance [8]. According to Moore, a chasm exists between early adopters and a much larger group comprised of so-called "pragmatists" and "conservatives" in how each group tends to accept technology products. Whereas the former are willing to expend energy on learning and promoting revolutionary technology, the latter prefer evolution over revolution. They want technology that enhances and integrates easily with existing work practices and systems, and that has been used successfully by their peers.

One reason digital systems have fallen into this chasm is that computing hardware and its human interfaces do not fit the way these professionals work. In particular, laptop, keyboard, mouse, trackball, and similar interaction devices are not optimal for field and mobile use, or for face-to-face collaboration. Pen-based tablets and PDAs address some users' needs for mobility and ease of use, but these systems need far better resolution, much less weight, and better portability. Moreover, they are currently limited by their reliance on the desktop metaphor for graphical user interfaces (GUIs), which focus users' attention on the computer itself and on overlapping windows, menus, files, and folders. This distraction from the task can be counterproductive when providing health care, planning battles, or guiding aircraft. Finally, because work stoppages can be costly or life-threatening, users in these safety-critical domains remain concerned about system or network crashes.

Rather than require that users change, system designers could adapt their systems to key aspects of the users' work practice. Indeed, tangible multimodal systems (TMMs) described here enable users to employ physical objects already in their workplace (for example, paper or other physical tools), along with natural spoken language, sketch, gesture, and other input modalities to interact with information and with co-workers. In this article, we discuss tangible and multimodal interfaces separately, and then illustrate how they can be combined to produce TMMs.

Back to Top

Bridging the Chasm with Tangible Multimodal Interfaces

Tangible user interfaces (TUIs) incorporate physical objects as sensors and effectors that, when manipulated, modify computational behavior. To enable the construction of TUIs, systems distinguish and identify physical objects, determine their location, orientation, or other aspects of their physical state, support annotations on them, and associate them with different computational states. To do this, TUIs use technologies such as radio emitters, bar codes, or computer vision. Wellner developed the first tangible paper system, the DigitalDesk [12], incorporating paper via computer vision. The DigitalDesk could copy and paste printed text or numbers from paper into digital documents via OCR, enabling the user to manipulate the information electronically. Mackay and her collaborators have explored tangible flight-strip prototypes for the air traffic control industry since the early 1990s [5]. Their prototypes captured handwritten annotations on paper strips, and tracked the strips' relative locations on a mounting board using video capture techniques and electrical resistance. Ishii and his students in the MIT Media Laboratory have also developed various tangible prototypes, notably the Urp system [11], which used video tracking to support the use of physical tools, such as rulers and clocks, within an urban planning setting.

With flexible multimodal interfaces users can take advantage of more than one of their natural communication modes during human-computer interaction, selecting the best mode or combination of modes that suit their situation and task. For example, the QuickSet multimodal system enables a user to speak while sketching [1, 10]. Here, the sketch provides spatially oriented information such as shape and location, while speech provides information about identity, speed, color, and other attributes. Multimodal interfaces scale to very large or small devices, and can be used within sensor-rich environments. Another important benefit that derives from the ability of an interface to support use of multiple modalities is mutual disambiguation, in which information provided by one or more sources can be used to resolve ambiguities in another, thereby reducing errors [9]. Thus, multimodal systems are better equipped to manage the inherent uncertainty of sensors and recognizers than systems that rely only on a single uncertain information source. Furthermore, they enable more efficient performance of various tasks—research has found multimodal interfaces to be four- to nine-fold faster for map-based tasks compared to GUIs, with no increase in the number of errors [2].

Back to Top

Combining TUIs and MMUIs

Although tangible systems allow users to manipulate physical objects, they typically do not interpret any annotations that accompany those objects. On the other hand, multimodal user interfaces (MMUIs) can analyze users' verbal and written information, but they typically do not acquire information from objects situated in the real world. Thus, neither individual interface style is as well equipped to satisfy the workplace needs of the applications discussed earlier as is the combination of those styles. It is important to notice the work practice routines employed by these professionals typically impart meaning to the physical objects in their environments using a shared workplace-dependent language and/or symbology. If a system could support these existing workplace routines and languages, both physically and computationally, it would permit users to create, change, and understand those physically based meanings in a familiar fashion, while simultaneously updating a digital version of the information. This hybrid system would also support more robust operation, since physical objects and computer systems have different failure modes. Tangible multimodal systems accomplish this at a minimum by:1

  • Processing the relevant state changes of physical work objects;
  • Understanding users' task-related multimodal communication,
  • Fusing information from physical, linguistic, gestural, and other information sources, thereby managing uncertainty and recognition errors;
  • Delivering relevant information and confirmation to users in a manner that is appropriately integrated with the physical work environment.

We have developed three systems—Rasa, NISMap, and NISChart—that demonstrate the feasibility of TMMs.

Rasa enables a military officer to use paper maps and Post-it notes in support of command and control tasks [7]. During battle tracking, officers plot unit locations, activities, and other elements on a paper map by drawing unit symbols on Post-it notes, positioning them on the map, and then moving them in response to incoming reports of their new locations. With Rasa (Figure 2), each of the pieces of paper is mounted on a digitizing tablet—the map is registered to a large touch-sensitive tablet, and the Post-its initially rest upon a tablet that supports both digital and physical ink. The user writes a military unit symbol (for example, for an armored platoon) on a Post-it note. The user can also speak identifying information about that unit (for example, "First platoon, Charlie Company") while drawing the symbol. The computer recognizes both the symbol and the utterance, fusing their meanings using multimodal integration techniques [1]. Then, the user places the Post-it onto the paper map, which causes the unit drawn (and sometimes spoken about) to be recorded in the system's database, with its location specified by the place it was mounted on the map. The system projects a unit symbol onto the relevant location on the paper map, and distributes the result to collaborating systems. If a report arrives indicating the unit has moved, the user only needs to pick up the note and put it at the new location on the map.

Within these organizations, work must somehow continue during a system or communications failure. What happens if the computer supporting Rasa goes down? In order to investigate this question, we undertook an experiment [6] in which officers were studied as they tracked a battle. During each session, Rasa was deliberately crashed, but reports kept arriving. In response, officers simply continued to create and move the Post-it notes on the paper map. When the computer came back online, it digitally projected the old unit locations onto the paper map, whereas the notes indicated the units' current locations. It was then a simple process to reconcile the computer system with the paper version. Thus, because the physical objects constituted the user interface, no additional ongoing backup process was needed.


If a system could support existing workplace routines and languages, both physically and computationally, it would permit users to create, change, and understand those physically based meanings in a familiar fashion, while simultaneously updating a digital version of the information. This hybrid system would also support more robust operation, since physical objects and computer systems have different failure modes.


Whereas Rasa enables users to employ paper maps in performing their tasks, the maps still require large, relatively immobile digitizers. Moreover, because of limitations in current digitizing tablets, only one person can write on the map at a time. Thus, Rasa is still limited in its mobility and support for collaboration. We realized early on that Rasa was a prototype of a much larger set of techniques for TMMs, wherein paper is the principle component of the work practice. We call these techniques Multimodal Interaction with Paper (MIP).

Our new MIP applications employ Sweden-based Anoto AB's digital pen and paper, along with Rasa's multimodal processing capabilities. In these applications, the paper has a dot pattern printed on it, and over the dots is printed content information (for example, a map). The Anoto pen produces ink like any other pen, but it also has a Strong ARM CPU, memory, a Bluetooth radio, and a camera that sees only the dot pattern (Figure 3). The pen decodes the locations it has observed, saving them in memory and/or transmitting them wirelessly via Bluetooth or via a USB-linked "ink well." With the Bluetooth version of MIP, a user can be 30ft. away from the receiving computer and can draw on an ordinary piece of paper covered with the Anoto dot pattern.

NISMap. Like Rasa, the NISMap user can speak and/or sketch on a paper map (Figure 3). In response, the system collects the user's strokes, recognizes writing and/or symbols, correlates and fuses co-temporal speech, and updates a central database serving other systems and colleagues. Multiple users can write on the same map at the same time, so this system provides unique support for face-to-face collaboration. Furthermore, because this TMM has the portability, high resolution, scalability, and physical properties of pen and paper, it meets the needs of officers in the field, in particular, robustness to computer failure. Finally, NISMap addresses officers' concerns that a computer map with a hole in it is a "rock," while a paper map with a hole in it is still a paper map—NISMap continues to work even if the paper has been crumpled, punctured, torn, or taped up.

NISChart. Physicians are accustomed to paper forms. However, if there is only a paper record, opportunities are missed for improving both individual care and institutional efficiency. Whereas the computer-based patient record is identified by the Institute of Medicine as a requirement for better medical care, they caution: "Perhaps the single greatest challenge that has consistently confronted every clinical system developer is to engage clinicians in direct data entry" [4]. They claim that "To make it simple for the practitioner to interact with the record, data entry must be almost as easy as writing." In accord with this suggestion, we have built NISChart, a digital paper-based charting solution where writing on paper and speaking are the primary input modalities. The system allows a physician to enter values, text, check marks, and so on into the hospital's standard forms, printed on Anoto paper. Digital ink is transmitted to the application, which applies contextual and semantic knowledge in conjunction with handwriting, symbol, and speech recognition to populate a relational database. The information is stored in its digital form, either as traditional database entries (for example, text and symbols) or as digital ink. NISChart provides graphical and/or verbal feedback at the point of data entry or to other workers. Finally, the physical paper with real ink can serve as the definitive primary record, which is important both for recovering from failure and for legal reasons. NISChart is currently being alpha tested at a major urban medical center.

Back to Top

Conclusion

We have argued that computing too often requires professionals to alter their work to fit current technology. One critical area in which this mismatch occurs is the user interface. Because most professionals value their skills, time, and ability to interact with clients or colleagues, they resist the introduction of computer systems that do not treat those values as paramount. In order to reach these skilled professionals, we suggest that safety-critical systems let users continue to employ the physical objects, language, and symbology of their workplace through the use of tangible multimodal systems. TMM users are as capable of updating digital systems and of collaborating digitally with colleagues as are users of more traditional systems. At the same time, TMM-based digital systems benefit from many features of the physical world, such as those of paper. Ultimately, rather than being stigmatized as late adopters, users of tangible multimodal interfaces could be included among the power users of next-generation technologies, while at the same time reaping benefits that users of more traditional digital systems lack.

Back to Top

References

1. Cohen, P.R., Johnston, M., McGee, D.R., Oviatt, S.L., Pittman, J.A., Smith, I., et al. QuickSet: Multimodal interaction for distributed applications. In Proceedings of the IEEE International Multimedia Conference (Seattle, WA, Nov. 1997), 31–40.

2. Cohen, P.R., McGee, D.R., and Clow, J. The efficiency of multimodal interaction for a map-based task. In Proceedings of the Applied Natural Language Programming Conference (Seattle, WA, Apr. 2000), 331–338.

3. Gorman, P., Ash, J., Lavelle, M., Lyman, J., Delcambre, L., and. Maier, D. Bundles in the wild: Managing information to solve problems and maintain situation awareness. Library Trends 49, 2 (2000), 266–289.

4. Institute of Medicine. The Computer-based Patient Record: An Essential Technology for Health Care, 2nd edition. National Academy Press, Washington, D.C., 1997.

5. Mackay, W.E. Is paper safer? The role of flight strips in air traffic control. ACM Trans. on Computer-Human Interaction 6, 4 (1999). ACM Press, NY, 311–340.

6. McGee, D.R., Cohen, P.R., Wesson, R.M., and Horman, S. Comparing paper and tangible multimodal tools. In Proceedings of the Conference on Human Factors in Computing Systems, (Minneapolis, MI, Apr. 20–25, 2002). ACM Press, NY, 407–414.

7. McGee, D.R., Cohen, P.R., and Wu, L. Something from nothing: Augmenting a paper-based work practice with multimodal interaction. In Proceedings of the Conference on Designing Augmented Reality Environments, (Helsingor, Denmark, Apr. 12–14, 2000), 71–80.

8. Moore, G.A. Crossing the Chasm: Marketing and Selling High-Tech Goods to Mainstream Customers. Harper Business, NY, 1991.

9. Oviatt, S.L. Mutual disambiguation of recognition errors in a multimodal architecture. In Proceedings of Conference on Human Factors in Computing Systems, (The Hague, The Netherlands, Apr. 1999). ACM Press, New York, NY, 576–583.

10. Oviatt, S.L. and Cohen, P.R. Multimodal interfaces that process what comes naturally. Commun. ACM 43, 3 (Mar. 2000), 45–53.

11. Underkoffler, J. and Ishii, H. Urp: A luminous tangible workbench for urban planning and design. In Proceedings of the ACM Conference on Human Factors in Computing Systems (The Hague, The Netherlands, Apr. 1999). ACM Press, NY, 386–393.

12. Wellner, P. Interacting with paper on the DigitalDesk. Commun. ACM 36, 7 (July 1993), 87–96.

Back to Top

Authors

Philip R. Cohen ([email protected]) is president of Natural Interaction Systems LLC, Portland, OR.

David R. McGee ([email protected]) is vice president of engineering at Natural Interaction Systems LLC, Portland, OR.

Back to Top

Footnotes

Rasa, NISMap, and NISChart are registered trademarks of Natural Interaction Systems LLC. Post-it is a registered trademark of 3M Company.

The research discussed here has been funded partly by DARPA contract N66001-99-D-8503 to the Oregon Graduate Institute of Science and Technology, part of the Oregon Health and Science University, where Cohen is a faculty member and McGee received his Ph.D. The research was also partly supported by DARPA SBIR contract DAAH0102CR051 to Natural Interaction Systems, LLC.

The views and conclusions contained here are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Defense Advanced Research Projects Agency (DARPA) or the U.S. Government.

1See [7] for a detailed discussion.

Back to Top

Figures

F1Figure 1. Officers tracking a battle. Photo courtesy of William Scherlis.

F2Figure 2. Information flow within Rasa.

F3Figure 3. (left) Digital pen enabling Anoto functionality; (right) NISMap application using Anoto pen and paper.

Back to top


©2004 ACM  0002-0782/04/0100  $5.00

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

The Digital Library is published by the Association for Computing Machinery. Copyright © 2004 ACM, Inc.