acm-header
Sign In

Communications of the ACM

Communications of the ACM

Trust in Videoconferencing


Videoconferencing, the combination of real-time sound and real-time images of conversation partners in different locations, has long captured our imagination. Most prospective users associate the use of videoconferencing with the use of computers, but in actuality, the first prototypes were developed long before computers became the mainstream devices they are today. The video signal for the PicturePhone, introduced in 1964 at the New York World's Fair, was transmitted over ordinary analog telephone lines. Despite the poor quality of the images, the PicturePhone captured the popular imagination and the idea of videoconferencing began to seem like a realistic option.

Currently, video interactions are used to some extent in business and non-business settings, but not as frequently as expected in the past. The telephone remains the main channel of real-time communication, and its use continues to grow through increased availability of cellular technology. Additionally, a more recent alternative is the use of Instant Messaging (lM). Some IM clients include use of video, but the primary use of IM remains the exchange of text-based messages. When video is the primary communication channel, a distinction can be made between systems for individual participants and those geared toward larger groups.

Individual videoconferencing systems. In recent years, a wide variety of videoconferencing hardware and software for one-on-one videoconferencing has become available. Inexpensive Web cameras attached to desktop computers can deliver high-quality video at a rate of 30 frames per second, providing smooth motion and images with good detail. Small cameras can even be found in notebook computers, Personal Digital Assistants (PDAs), and cell phones. Such opportunities for individual-to-individual (or i2i) [8] communication using videoconferencing enable richer conversations with distant relatives, as well as customer assistance, legal depositions, and employment interviews.


In education, the use of group videoconferencing systems allows schools to offer classes to multiple locations simultaneously. Smaller campuses can now offer courses that would otherwise be impossible to staff, and make high-quality instructors available to students unable to attend classes at the main campus.


Group videoconferencing systems. Systems designed for more than two participants are found primarily in business and educational settings. Face-to-face business meetings can be replaced with videoconferencing sessions, reducing travel time and expenses, as well as addressing a more recent societal concern by reducing the exposure of participants to the risk of terrorist attacks while in transit. The most important videoconferencing benefit, however, remains the increased flexibility it offers for scheduling meetings. It is easier to coordinate a meeting time for the duration of a meeting only, than for a major part of a day, or even several days if significant travel is involved. As a result, the response time and coordination of geographically dispersed organizations can be improved. North Mississippi Medical Center, a large health care system, frequently schedules V-TEL (www.vtel.com) conferences with its affiliates located as far as 67 miles from the main hospital. The use of videoconferencing has helped the system to coordinate preparations for accreditation inspections and to communicate organizational changes.

In education, the use of group videoconferencing systems allows schools to offer classes to multiple locations simultaneously. One instructor can teach a class to a mix of local students and students on other campuses. Smaller campuses can now offer courses that would otherwise be impossible to staff, and make high-quality instructors available to students unable to attend classes at the main campus. This allows Mississippi State University to schedule joint classes between the main campus and special classrooms in Vicksburg and Columbus.

Group systems generally use one high-quality camera to capture all local participants as a group. Videoconferencing systems using individual cameras for many participants are more difficult to implement. Multiple participants must be displayed on the screen of each participant, and the total bandwidth required can be extremely high due to the number of video feeds.

Back to Top

Videoconferencing Pros and Cons

The addition of images to audio has long been assumed to provide a richer experience to real-time interactive distance communication. According to the original Media Richness Theory (MRT) [4], communication channels differ in the amount and variety of information they can carry. The richness of a medium depends on the availability of instant feedback, the use of multiple cues (such as facial expressions, voice inflections, and gestures), the use of natural language for conveying a broad set of concepts and ideas, and the personal focus of the medium. Face-to-face communication is high in richness, while a typed note with numerical content is low.

The MRT presents the ideal choice of medium as a conscious match between richness of the medium and equivocality of the task. Equivocality represents the possibility of multiple interpretations in a given situation. For equivocal tasks, richer communication media are preferred, and leaner communication media would be chosen for more clearly defined tasks.

But some researchers have concluded that choosing a communication channel only based on media richness and task equivocality is a rather limited approach, and does not do justice to the complexity of social interactions. Whittaker and O'Conaill [12] concluded that video was superior to audio only for social tasks but found little difference in subjective ratings or task outcomes in tasks where social aspects were less important, and Dennis and Kinney found no support for the central proposition of MRT that performance improves by matching media richness to task equivocality [5].

Communication richness does increase the learning capacity of communication, which could contribute to faster and stronger development of trust. However, as Olson and Olson [8] note, previous research has not established if the addition of video actually influences trust. Since managing a video conversation requires increased attention, and video of poor quality creates artificial cues associated with lying, the researchers suggest guarded optimism for video's potential to promote trust.

But technology improvements should help create more seamless videoconferencing. In the past, limited bandwidth and network delays could cause the moving images and audio to lose coordination. This can still be observed in live broadcasts of foreign correspondents, when videophones with relatively low transmission rates are used. In addition to improving the timing between audio and image, technology improvements can also reduce the time it takes for the video signal to travel over the network. With a slow signal, it may appear to the viewer that the speaker hesitates, and hesitation in answering is generally considered to be a sign of dishonesty.

A video sleight of hand that can be used to one's advantage involves the position of the camera relative to the speaker. With the camera above, speakers appear artificially tall, and can have more influence over the discussion. With the camera below, the opposite is true [7].

Other videoconferencing distortions, such as those involving turn-taking cues, continue to cause problems because they affect the normal flow of conversation. In face-to-face conversations, participants alternate speaking and listening by using an intricate mechanism of verbal and nonverbal cues. Some can be very overt, such as directly addressing the next speaker ("What do you think, John?"). Other signals are more subtle, such as slight changes in tone of voice signaling the end of a sentence. A major nonverbal cue in relinquishing one's opportunity to speak involves the use of eye contact [1]. Shortly before a speaker relinquishes his turn to speak, his gaze tends to shift to the next speaker. In videoconferencing, this cue is distorted by the separation of camera and screen. If the speaker happens to look at the camera, all other participants receive a cue to take the next turn. If the speaker does not focus on the camera, as is usually the case, all other participants are left unsure if they can take the next turn. Various solutions have been proposed to restore these turn-taking cues. For instance, Vertegaal [10] developed the Gaze-2 system, where an eye tracker selects the camera closest to the gaze position. The current speaker is shown in full frontal view, and images of the listeners are shown rotated toward the speaker's image (see Figure 1). Another proposed solution is the Virtual Camera System (VCS) [9] using two cameras and software interpolation of the images (see Figure 2).

Back to Top

Measuring Trust

Difficulties in coordinating speaking and listening are not the only problems caused by spatial separation of camera and screen. When one participant always looks at the screen, and not at the camera, it appears to all others that s(he) never looks at them. In face-to-face communications, failure to maintain eye contact is universally considered to be a sign of deception, and leads to feelings of mistrust.

Since trust is a subjective concept, it cannot be measured directly. Instead, it must be inferred indirectly, similar to feelings of satisfaction or attraction. First, certain behaviors, such as lending money or delegating tasks to an employee, indicate a willingness to trust. Increased levels of trust are demonstrated by lending larger sums of money or delegating more important tasks. An alternative method is the use of social dilemma games, where higher rewards can be achieved if all participants decide to trust each other and cooperate for the common good. Finally, trust can be measured by having the participants in a study report their perceptions on a standardized questionnaire. Bekkering [2] used this method to compare trust perceptions of email messages, voice messages, and video messages recorded at three different angles. In order to reduce influences other than the communication channel and the angle at which the video was recorded, all participants saw and heard the same messages. Three videos were recorded simultaneously with three different cameras. Screenshots of the videos recorded from straight ahead, from above, and from the side, are shown in Figure 3.


Humans are highly skilled at perceiving eye contact, and the negative psychological effects of failing to maintain eye contact are significant. Thus the traditional setup of videoconferencing equipment may have hurt widespread adoption of the technology.


The center video was recorded with the subject looking straight into the camera, and the side and top videos were recorded at angles of approximately 20o. These angles correspond with the use of a 17-inch monitor, at a distance of two feet from the monitor as suggested by the American Optometric Association. The audio portion of one of the videos was used for the voice message. The text in the email message matched the text in the video and voicemail messages. All messages concerned a scenario of the sender having been asked to act as a reference for a job application, an important issue in the current tight labor market. Study participants were asked to imagine that the job was offered to someone else due to problems with one or more references, and that they wanted to find out if the references had been sent. After each message, the 34 participants in the study (mean age = 21.5 years, sd = 2.6; 72% male) reported their impressions on the Individualized Trust Scale developed by Wheeless and Grotz [11]. Additionally, participants indicated after each message if they would ask the person sending the message to act as an employment reference again. Thus, not only were differences in perceived trustworthiness measured, but also whether or not the participants' behavior would differ as a result of higher or lower perceived trust.

The results of the study show that trust perceptions can be influenced by the richness of the communication channel. Scores for the different messages are given in the table on the previous page. Based on these results, it appears that the increase in trust enabled by audio is more important than the contribution of the visual aspect of video.

Trust perceptions for voicemail and for the video message that featured the sender addressing the camera directly are significantly higher than for the email message, but the difference between video and voicemail is negligible. Apparently, the visual image does not contribute much more information than what audio alone can provide. This finding is consistent with previous studies of lie detection in videoconferencing, where the frame rate and resolution of the video were manipulated. As the visual quality of the video deteriorated, participants in those studies started to rely more on what they heard than on what they saw [6]. Of course, video can assist communication in ways other than improving trust or detecting lies, for example, by allowing participants to demonstrate an action or object. For merely discussing issues, however, the telephone appears to be just as effective as the more complicated activity of videoconferencing.

More important with respect to trust is the difference between the three camera positions. Although in this study the center video was recorded with the subject looking straight into the camera, traditionally, the center position is artificial, since the camera is generally placed on top of the monitor, occasionally to the side, and never in front of the monitor. Not only are subjects recorded from above or from the side trusted less compared with those looking straight into the camera, but feelings of trust are even lower than for the leanest of the communication channels in the study, namely the email message. Does this mean that the lack of eye contact could cause participants to actively distrust the person in the video, rather than just needing more time to begin to trust them? This seems likely, since the feelings of trust in the traditional video positions are lower than for email, and the angle of recording is the only difference between the three messages.

The conclusion was confirmed by showing 14 other participants (mean age = 29.1 years, sd = 8.7; 71% male) the videos at three angles, and asking when they perceived the person in the video to be looking at them. Most participants reported the person in the center video was looking at them, and none felt this was the case in the top and side videos. These results are consistent with previous studies, where the ability to perceive eye contact has been demonstrated to be highly developed. For example, Chen [3] determined that a gaze deviation of 5o to 10o is sufficient to notice loss of eye contact.

In the study, participants also indicated whether they would seek the sender as a reference again. Not unexpectedly, as trust decreased, so did the probability that the sender would be asked again. The relationship was strong enough to predict the decision with 92.9% accuracy. In short, humans are highly skilled at perceiving eye contact, and the negative psychological effects of failing to maintain eye contact are significant. Thus the traditional setup of videoconferencing equipment may have hurt widespread adoption of the technology.

Back to Top

A Future For Videoconferencing?

Fortunately, the future of videoconferencing can still be bright, as long as its strengths are emphasized. One helpful recent development in the area of equipment is the appearance of small mobile devices equipped with small cameras. Because the screen is so small, the angle between camera and screen is smaller, even if these devices tend to be held at closer distances. Almost by accident, manufacturers may have made a change that will no longer discourage prospective users from using the technology. Further enhancements might be made through the use of Picture-in-Picture (PIP) technology, where a small window featuring the local participant can remind the user of the negative impression created by straying too far from the camera.

As long as videoconferencing cameras continue not being built into the center of the screen, some measure of artificial loss of eye contact is unavoidable. In the past, this may have contributed to dissatisfaction with the technology and limited adoption. Manufacturers and users can both contribute to the solution of this problem. Manufacturers can design equipment with minimal angles between camera and screen, and users can position the equipment to minimize the loss of eye contact or focus more on looking at the camera. Together, they can help realize the significant benefits that videoconferencing offers.

Back to Top

References

1. Argyle, M. and Cook, M. Gaze and Mutual Gaze. Cambridge University Press, Cambridge, England, 1976.

2. Bekkering, T.J.E. Visual angle in videoconferencing: The issue of trust. Unpublished doctoral dissertation, Mississippi State University, 2004.

3. Chen, M. Leveraging the asymmetric sensitivity of eye contact for videoconference. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, (2002), ACM Press, 49–56.

4. Daft, R.L. and Lengel, R.H. Organizational information requirements, media richness and structural design. Management Science 32, 5 (1986), 554–571.

5. Dennis, A.R. and Kinney, S.T. Testing media richness theory in the new media: The effects of cues, feedback, and task equivocality. Information Systems Research 9, 3 (1998), 256–274.

6. Horn, D.B. Seeing is believing: Video quality and lie detection. Unpublished doctoral dissertation, The University of Michigan, 2001.

7. Huang, W. Social dynamics can be distorted in video-mediated communication. Unpublished doctoral dissertation, The University of Michigan, 2005.

8. Olson, J.S. and Olson, G.M. i2i trust in e-commerce. Commun. ACM 43, 12 (2000), 41–44.

9. Ott, M., Lewis, J.P., and Cox, I. Teleconferencing eye contact using a virtual camera. In INTERACT `93 and Proceedings of CHI `93 Conference Companion on Human Factors in Computing Systems. ACM Press, 1993, 109–110.

10. Vertegaal, R., Weevers, I., and Sohn, C. GAZE-2: An attentive video conferencing system. In Extended Abstracts of the CHI `02 Conference on Human Factors in Computer Systems. (Minneapolis, MN, Apr, 2002), ACM Press, 736–737.

11. Wheeless, L.R. and Grotz, J. The measurement of trust and its relationship to self-disclosure. Human Communication Research 3, 3 (1977), 250–257.

12. Whittaker, S. and O'Conaill, B. The role of vision in face-to-face and mediated communication. Video-Mediated Communication. K.E. Finn, A. Sellen, and S.B. Wilbur, Eds. Lawrence Erlbaum Associates, Mahwah, NJ, 1997, 23–49.

Back to Top

Authors

Ernst Bekkering ([email protected]) is an assistant professor of Management Information Systems at Northeastern State University in Oklahoma.

J.P. Shim ([email protected]) is a professor of Management Information Systems and the director of the International Business Strategy Program at Mississippi State University.

Back to Top

Figures

F1Figure 1. Gaze-2 System [

F2Figure 2. Virtual Camera System [

F3Figure 3. The three viewing angles.

Back to Top

Tables

UT1Table. Trust scores for the different message types.

Back to top


©2006 ACM  0001-0782/06/0700  $5.00

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

The Digital Library is published by the Association for Computing Machinery. Copyright © 2006 ACM, Inc.


 

No entries found