New multimodal interfaces are being built for strength. As the articles in this section attest, strength in a multimodal interface derives from a number of factors, including their compatibility with users' abilities and existing work practices, and the flexibility these hybrid interfaces permit. The robustness of a multimodal interface also increases substantially as the number and heterogeneity of modalities expands [2, 3]. Performance improves further when adaptive processing tailors the interface to important user and environmental characteristics. Finally, new tangible multimodal interfaces for safety-critical applications can persist and function following physical damage, power outages, and other sources of system failure.
State-of-the-art multimodal interfaces can process two or more combined input modes using recognition-based technologies in order to accurately identify users or to interpret their communicative intent [13]. There are several classes of multimodal system that are relatively mature, including ones that process speech and manual input (speech with pen or touch), audiovisual speech input (speech with lip movements), and multibiometric input. Computer speech and vision processing are two component technologies that are fundamental and pervasively used in developing these new systems, as discussed in the articles by Deng and Huang and by Turk. Some very recent multibiometric multimodal interfaces actually process three heterogeneous information sources together, including behavioral input based on speech and vision processing (voice, face) along with physiological input (fingerprint), as described by Jain et al. All of these multimodal interfaces are hybrids, and they are beginning to inject a new level of hybrid vigor into next-generation interface design.
With eight tentacles and the ability to shift colors rapidly, the intelligent octopus is a master at learning, adapting to, and controlling its environment. To improve their coverage, reliability, and usability, multimodal interfaces likewise are being designed that can automatically learn and adapt to important user, task, and environmental parameters.
Biological foundations of new interface design: Building for strength. In many important ways, these new interface designs emulate basic biological principles and systems, including the coordinated multimodal communication patterns that have evolved so impressively in humans. During multimodal communication, people's multisensory perception can achieve remarkable accuracy through fusion of different information sources. In parallel, their multimodal production is expressively powerful largely because different channels (speech, writing, gestures) each provide complementary advantages in different situations. During interpersonal dialogue, people routinely and flexibly select different modes of expression to avoid errors and maximize a listener's understanding. As the authors expand upon here, multimodal interface designs based on this human communication model are able to achieve a new hybrid vigor that increases their uniqueness, robustness, and resistance to damage, while also extending their utility to more challenging mobile environments and larger groups of diverse users. The article by Reeves et al. examines related principle-driven multimodal interface guidelines.
Bend, don't break: Flexible multimodal interfaces. Natural structures tend to be more flexible than engineered objects, and their survival during turbulence depends on it. Plants adapt to the flow of wind and waves around them, for example, daffodils both twist and bend without snapping in gusty winds. One reason for the success of multimodal interfaces is their ability to "flex" when users need them towhen communicative maneuvering is required to convey meaning in the most effective manner, and when circumstances change during mobile use.
Flexible multimodal interfaces that combine speech with touch or stylus input for selection are now being commercialized for in-vehicle, smart phone, and other applications, as illustrated by the multimodal interface created by SpeechWorks and Ford at the 2003 North American International Auto Show and described by Pieracini et al. This interface gives drivers more flexibility when controlling in-vehicle applications such as navigation, telephony, climate control, and an MP3 player. In addition, context-aware processing capabilities, such as dynamic semantic modeling and new unsupervised audiovisual common-cause techniques, are being developed to enhance multimodal system flexibility and robustness during dynamic real-world situations.
Adapting to the inevitable: Real users and their environments. With eight tentacles and the ability to shift colors rapidly, the intelligent octopus is a master at learning, adapting to, and controlling its environment. To improve their coverage, reliability, and usability, multimodal interfaces likewise are being designed that can automatically learn and adapt to important user, task, and environmental parameters. Both Nock et al. and Jain et al. emphasize the challenges involved in collecting large amounts of data needed to support successful adaptive processing, as well as the development of scalable systems that can handle large diverse user groups and challenging field settings. New adaptive multimodal techniques range from adoption of user-specific combination weights to improve user identification/verification in biometric applications [1], to stream-weighting of input modes (audio, visual) to improve speech recognition in noisy environments [3], and unsupervised audiovisual common-cause techniques for event discovery and improved performance in a variety of tasks.
Apart from designing novel multimodal interfaces that can flex and adapt to handle natural environmental fluctuations, safety-critical systems must also continue to function in spite of physical damage, intermittent power failures, and other unanticipated but disruptive events. The Cohen and McGee article describes new tangible multimodal interfaces designed to minimize or avoid loss of valuable functionality for medical, air traffic control, military, and similar applications. By embedding multimodal processing techniques into a familiar and portable paper-based interface, these authors also outline how Multimodal Interaction with Paper (MIP) systems can model users' existing work practices and assist in overcoming resistance to computer adoption.
Future vision: Integrated M3 systems. Our goal in orchestrating this special section was to illustrate how multimodal interfaces are rapidly beginning to incorporate new strategies to improve their performance while also providing new forms of computational functionality, especially for field and mobile applications. As fusion and other techniques improve, one long-term direction will be the development of integrated M3 systems (multibiometric-multimodal-multisensor) that can interpret natural language and user behavior in a range of challenging real-world contexts. At the center of these pursuits will be the design of new interfaces that can flex, adapt, and persist when people and circumstances require it.
1. Jain, A. and Ross, A. Learning user-specific parameters in a multibiometric system. In Proceedings of the International Conference on Image Processing, (Rochester, NY, Sept. 2225, 2002), 5760.
2. Oviatt, S.L. Breaking the robustness barrier: Recent progress on the design of robust multimodal systems. Advances in Computers. M. Zelkowitz, Ed. Academic Press 56 (2002), 305341.
3. Potamianos, G., Neti, C., Gravier, G., Garg, A. and Senior, A. Recent advances in the automatic recognition of audio-visual speech. In Proceedings of the IEEE 91, 9 (Sept 2003).
©2004 ACM 0002-0782/04/0100 $5.00
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2004 ACM, Inc.
No entries found