In today's pursuit of more transparent, flexible, and efficient human-computer interaction, a growing interest in multimodal interface design has emerged [5]. The goals are twofold: to achieve an interaction closer to natural human-human communication, and to increase the robustness of the interaction by using redundant or complementary information. New interaction paradigms and guidelines are necessary to facilitate the design of multimodal systems from the ground up (see articles by Cohen and McGee and Pieraccini et al. in this section). This article discusses six main categories of guidelines and represents a preliminary effort to establish principles for multimodal interaction design. A more detailed discussion of these guidelines will be available in [6].
Requirements Specification. Critical to the design of any application are the user requirements and system capabilities for the given domain. Here, we provide some general considerations for multimodal system requirements specification.
Design for broadest range of users and contexts of use. Designers should become familiar with users' psychological characteristics (for example, cognitive abilities, motivation), level of experience, domain and task characteristics, cultural background, as well as their physical attributes (for example, age, vision, hearing). An application will be valued and accepted if it can be used by a wide population and in more than one manner. Thus, multimodal designs can aid in extending the range of potential users and uses, such as when redundancy of speech and keypad input enables an application to be used in dark and/or noisy environments. Designers should support the best modality or combination of modalities anticipated in changing environments (for example, private office vs. driving a car).
Address privacy and security issues. Users should be recognized by an interface only according to their explicit preference and not be remembered by default. In situations where users wish to maintain privacy by avoiding speech input or output, multimodal interfaces that use speech should also provide a non-speech mode to prevent others from overhearing private conversations. Non-speech alternatives should also be provided when users enter personal identification numbers, passwords (for example, automatic bank teller), or when they might be uncomfortable if certain private information is overheard by others. For example, to reduce the likelihood of others being aware of a user's mistakes, it may be preferable to provide error messages in a visual form instead of audible speech.
Designing Multimodal Input and Output. The cognitive science literature on intersensory perception and intermodal coordination has provided a foundation for determining multimodal design principles [2, 5, 7]. To optimize human performance in multimodal systems, such principles can be used to direct the design of information presented to users, specifically regarding how to integrate multiple modalities or how to support multiple user inputs (for example, voice and gesture). Here, we provide a brief summary of some general guiding principles essential to the design of effective multimodal interaction.
Maximize human cognitive and physical abilities. Designers need to determine how to support intuitive, streamlined interactions based on users' human information processing abilities (including attention, working memory, and decision making) for example:
Integrate modalities in a manner compatible with user preferences, context, and system functionality. Additional modalities should be added to the system only if they improve satisfaction, efficiency, or other aspects of performance for a given user and context. When using multiple modalities:
Adaptivity. Multimodal interfaces should adapt to the needs and abilities of different users, as well as different contexts of use. Dynamic adaptivity enables the interface to degrade gracefully by leveraging complementary and supplementary modalities according to changes in task and context. Individual differences (for example, age, preferences, skill, sensory or motor impairment) can be captured in a user profile and used to determine interface settings such as:
Consistency. Presentation and prompts should share common features as much as possible and should refer to a common task including using the same terminology across modalities. Additional guidelines include providing consistent:
Feedback. Users should be aware of their current connectivity and know which modalities are available to them. They should be made aware of alternative interaction options without being overloaded by lengthy instructions that distract from the task. Specific examples include using descriptive icons (for example, microphone and speech bubbles to denote click-to-talk buttons), and notifying users to begin speaking if speech recognition starts automatically. Also, confirm system interpretations of whole user input after fusion has taken place [4], rather than for each modality in isolation.
Multimodal interfaces should adapt to the needs and abilities of different users, as well as different contexts of use. Dynamic adaptivity enables the interface to degrade gracefully by leveraging complementary and supplementary modalities according to changes in task and context.
Error Prevention/Handling. User errors can be minimized and error handling improved by providing clearly marked exits from a task, modality, or the entire system, and by easily allowing users to undo a previous action or command. To further prevent users from guessing at functionality and making mistakes, designers should provide concise and effective help in the form of task-relevant and easily accessible assistance. Some specific examples include [5]:
The guiding principles presented here represent initial strategies to aid in the development of principle-driven multimodal interface guidelines. In order to develop both innovative and optimal future multimodal interfaces, additional empirical studies are needed to determine the most intuitive and effective combinations of input and output modalities for different users, applications and usage contexts, as well as how and when to best integrate those modalities. To fully capitalize on the robustness and flexibility of multimodal interfaces, further work also needs to explore new techniques for error handling and adaptive processing, and then to translate these findings into viable and increasingly specific multimodal interface guidelines for the broader community.
1. Cooper, G. Research in to Cognitive Load Theory and Instructional Design at UNSW (1997); www.arts.unsw.edu.au/ education/CLT_NET_Aug_97.HTML.
2. European Telecommunications Standards Institute. Human factors: Guidelines on the multimodality of icons, symbols, and pictograms. (Report No. ETSI EG 202 048 v 1.1.1 (2002-08). ETSI, Sophia Antipolis, France.
3. Kalyuga, S., Chandler, P., and Sweller, J., (1999). Managing split-attention and redundancy in multimedia instruction. Applied Cognitive Psychology 13 (1999), 351371.
4. McGee, D.R., Cohen, P.R., and Oviatt, S. Confirmation in multimodal systems. In Proceedings of the International Joint Conference of the Association for Computational Linguistics and the International Committee on Computational Linguistics (Montreal, Quebec, Canada, 1998).
5. Oviatt, S.L. Multimodal interfaces. The Human-Computer Interaction Handbook: Fundamentals, Evolving Technologies and Emerging Applications. J. Jacko and A. Sears, Eds. Lawrence Erlbaum, Mahwah, NJ, 2003, 286304.
6. Reeves, L.M., Lai, J. et al. Multimodal interaction design: From a lessons-learned approach to developing principle-driven guidelines. International J. of HCI. Forthcoming.
7. Stanney, K.M., Samman, S., Reeves, L.M., Hale, K., Buff, W., Bowers, C., Goldiez, B., Lyons-Nicholson, D., and Lackey, S. A paradigm shift in interactive computing: Deriving multimodal design principles from behavioral and neurological foundations. In review.
8. Wickens, C.D. Engineering Psychology and Human Performance (2nd ed). Harper Collins, NY, 1992.
This article was originally drafted among the co-authors as part of a CHI'03 workshop on multimodal interaction and interface design principles organized by James Larson and Sharon Oviatt.
©2004 ACM 0002-0782/04/0100 $5.00
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2004 ACM, Inc.
No entries found