ACM

Communications of the ACM

Home/Magazine Archive/January 2009 (Vol. 52, No. 1)/wisePad Services For Vision-, Hearing-, and Speech.../Full Text

Contributed articles

wisePad Services For Vision-, Hearing-, and Speech-Impaired Users

By Dawn N. Jutla, Dimitri Kanevsky
Communications of the ACM, January 2009, Vol. 52 No. 1, Pages 64-69
10.1145/1435417.1435434
Comments

View as: Print Mobile App ACM Digital Library Full Text (PDF) In the Digital Edition Share:

Our vision for the wisePad system, named for the iconic three wise monkeys that see no evil, hear no evil, speak no evil, is a full-service computing platform designed to deliver personalized image-, audio-, and text-transformation services to people with impaired vision, speech captioning for people with hearing problems, and language-translation, text-summarization, and pictographic-illustration services for people needing help with both spoken and text-based language.

wisePad services employ a generic head-mounted display content-transformation device of the type described by Kanevsky and Sorenson.⁷ The device captures image(s) on an output device (such as iPod screen, PC monitor, and projection screen) and receives multimedia signals input from other channels. Related wisePad services then use the service's content-transformer functions to translate captured or received images and display them on the screen of the device.

wisePad transforms an image in several ways: For the vision-impaired user, it magnifies parts of the original image to facilitate visual perception and thus comprehension. For the hearing-impaired, it displays closed-captioned video, pictures, and text in the lenses of the user's eyeglasses. And for speech aid, it might transform text into simplified text in the same language, speech into another language, speech into summary form, and/or speech by adding pictographic icons intended to help in language understanding in the same way pictures and icons help young children learn to read.

We expect within the next 10 years, wisePad will deliver an unprecedented level of personalization for its users, including for artifact selection, page ordering, language, color, magnification, and image-feature transformation. wisePad will be able to store these user choices as preferences in a customized software "skin" in its user interface.

wisePad will be available in several implementation versions, including ordinary-looking glasses with a frame large enough to hold a tiny USB port, miniature transformer, display chips, and goggle-style eyewear with wireless microphones supporting audio output and input. Users select the version that best suits their needs.

Figure 1 shows the kind of eyewear adapted for wisePad services as an output device. Lumus Ltd., Rehovot, Israel (www.lumusvision.com), designs, manufactures, and markets a lightweight designer frame with twin microdisplays at full VGA-type resolution. Tiny projectors embedded in the eyewear's stems project images with a 27.5-degree field of view, as if the wearer were "watching a 70-inch TV two-to-three meters away" (www.lumusvision.com); users see the images reflected on the lenses. Alternatively, an IBM belt-wearable portable computer (www.research.ibm.com/Wearable-Computing/) with built-in GB-capacity hard drive, lightweight eyewear, and tactile controller⁷ delivers images captured through an output device, transforming them and projecting revised images onto the lenses. Microvision, Inc., Redmond, WA (www.microvision.com/wear-able.html), designs, manufactures, and markets generic services for its transparent wearable displays, including to give users the ability to view their email and caller ID information without first retrieving their mobile devices while also being able to "see" GPS directions in the lenses of their glasses.

For the Visually Impaired

A 2003 Forrester Research report⁸ concluded that 60% of working-age adults in the U.S. (approximately 101.4 million people) would likely benefit from accessible technology following a disability or impairment. In 2003, 27% of working-age adults in the U.S. (approximately 45.9 million people) had some sort of visual impairment. Meanwhile, by 2020, 20% of working-age adults in the U.S. (approximately 64 million people) will be 55 or older⁹; aging and its attendant susceptibility to sight-impairing diseases (such as diabetes) will increase demand for accessible technology to help them stay independent and productive.

They'll likely need text and image magnification, color-modification, and usable navigation and information-location services. An example of how wisePad might deliver such a service is when vision-impaired Web users seek out the search window on a Web page when browsing a particular site. Even locating a search window can take such a person much longer than it takes a fully sighted person. Thus wisePad promises the ability to display a search window in the user's focal area, magnify the retrieved information, and/or transform text by summarizing or translating the search results before displaying it in the user's predetermined area of focus. The idea is for users to look through their glasses to the page and for the search window to pop up automatically. For an automated service to support quick recognition of an artifact (such as a search window) on a Web page, the service's search-window component would have to be able to identify itself programmatically through a semantic language tag.

Transforming text into speech would also greatly benefit people with a visual impairment. For example, IBM's ViaScribe¹⁰ is bundled with such a service (www.liberatedlearning.com/technology/index.html). wisePad will, as its constituent technologies and markets mature, similarly offer such a service for visually impaired readers of business reports, consumer books, and other media.

Ubiquitous Participation

The most recent (2006) demographic data available from the U.S. National Center for Health Statistics (www.cdc.gov/nchs/) concluded that 17% of the adult population in the U.S. (approximately 36.5 million people) have some kind of hearing problem. Gallaudet University Research Institute's 20062007 survey (gri.gallaudet.edu/Demographics/2006 National_Summary.pdf) sampled 37,352 U.S. students (most 6 to 17 years of age) with some sort of hearing impairment. In the range of hearing loss, from mild to profound (from 27dB to 91dB and above), the profound-loss category characterizes more than one-third of these students. Moreover, of the surveyed students 8.1% reported having no access to support services, 57.5% had access to speech training and therapy services, 8.8% had access to itinerant teacher services, and only 0.9% had access to real-time captioning services.

Captioning is one of the most useful technology-based services for people with diagnosed hearing disabilities. Experiments with university students (20002008) in Australia, Canada, Japan, New Zealand, and the U.S. using ViaScribe showed positive learning results in terms of enhanced test-taking performance for the disabled (as well as for the nondisabled, as they also appreciate the value of captioned notes) due to increased student participation in class (better note taking) and understanding of spoken and assigned material.

Many hearing-impaired people would like to be able to watch in-flight movies, as well as home videos, and generally participate more fully in classroom, workplace, and social environments. wisePad closed-captioning services on wearable-display eyewear would help them by reading the words they are unable to hear. In 2002, in order to implement the wisePad closed-caption service, Basson and Kanevsky¹ proposed a system to process a signal containing closed captions for audio content associated with multimedia (such as podcasts, videos, and text). In a variety of situations (such as an e-commerce transaction) where some amount of time delay is acceptable, a human-based transcription service could generate captions in real time. When time delay is less tolerated (such as in a meeting), caption accuracy might be sacrificed for real-time automated transcription (such as through ViaScribe and Nuance's Dragon Speaking, www.nuance.com/naturallyspeaking/). These tools deliver accurate text, depending on language domain (such as health and education), acoustic environment, and voice-model quality.

Multiple Languages

The Gallaudet Research Institute's 2006 survey (gri.gallaudet.edu/Demograph-ics/2006_National_Summary.pdf) also reported that over 50% of the 37,352 U.S. students (ages 5 to 18) with hearing disabilities (or almost 19,000 children) suffer from additional impairments possibly involving vision, learning, and attention-deficit disorder. wisePad aims to provide services for users with multiple disabilities, as well as for students (both children and adults) whose first language is not English, young readers, and students with learning disabilities and low literacy levels.

Helen Keller (18801968), a famous inspirational scholar born in Alabama in the U.S., was deaf and blind as a result of a brief illness with a high fever (likely scarlet fever or meningitis) when she was two years old. As a young child she learned first that she could communicate through touch and gesture. It was later that she understood that objects had names and that words could be used to describe them. Position detectors, movement tracers,⁵ and other technologies allow consumers to program gestures into their everyday devices (such as wristwatches, automobiles, and MP3 players) to quickly produce output (such as display a timer, scroll down a window, and select a particular song) without having to touch a button. A video posted on YouTube called "Captioned Version of Artificial Passenger" (www.youtube.com/watch?v=APLEwmPBeoA&feature=channel) demonstrates the technologies described by Kanevsky and Zlatsin⁵ now incorporated into wisePad by the author Kanevsky and his colleagues at IBM T.J. Watson Research Laboratory. These gesture-based technologies allow users to "train" the behavior of electronic devices (such as a mobile phone camera) to respond to user-taught gestures. wisePad combines voice sounds and text through these gestures. The various wisePad user interfaces for obtaining services can be as innovative as a user's imagination can make them. For example, a user can program a gesture (such as touching a hand to a face) in a wisePad-enabled device to add pictures to words, as well as dictionary meanings and synonyms.

With them, a user would be able to transform a page of text and display a summary translation with pictographs on the lens. Users would also be able to train wisePad-enabled glasses to support their own idiosyncratic gestures; for example, tapping the glasses frame twice might connect the user to the Web to view email on the lens display; a user-trained rotation of the frame might magnify visual content; and composite geometric gestures (touch and/or touchless) might adjust the colors in the display.

The wisePad system could also employ Infoscope-type services³ to, say, copy an image of a menu written in Chinese, communicate it through the Internet for translation, and then display a translated menu in English on a user-display device (such as the user's glasses). Language translation in wisePad also acts on audio input. Airplane-based or other mobile transport servers might thus be able to host automatic translation software^4,6 for train, air, and automobile travelers using wisePad. Alternatively, translation services might be delivered through the Internet for those applications that are able to tolerate several seconds of delay.

Others who might find wisePad services useful are newly arrived immigrants or visitors to North America who read and write English better than they speak and understand it. For people with some kind of hearing impairment, closed captioning is a proven way to enhance communications. Children and other users who need literacy aids would also benefit from wisePad's integration of pictographic illustration within the captioning it generates in the user's display. Moreover, document and partial-text summarization would be available through wisePad implemented (pending negotiation with other patent holders) with the help of methods like those in Fein et al.,² including word- and phrase-frequency-based statistical analysis and cue-phrase analysis.

Blueprint

In 2007, we designed the wisePad service-oriented architecture (SOA)-based blueprint (see Figure 2) to express business logic in terms of services that use XML to describe input and output, facilitating integration of IT-application components, including transcribing speech and translating text. Integration is achieved by representing the business logic of the various application components as Web services; the table here lists the composite Web services wisePad provides to its users.

These services consist of multiple layers. For example, at the portal level, users receive Web-enabled services (such as CaptionIT, generating captions from aural input) from a composite Web service process layer. Other services (such as CaptionandPictureIT, generating captioned text with symbols and pictures) consist of "horizontally" integrated component services available from the integration layer below. CaptionandPictureIT combines selected dedicated logic from the underlying CaptionMeNow, TransformMeNow, and PictureMeNow service suites.

Today, the integration layer in Figure 2 consists of IBM Web services that recognize and incorporate IBM's dedicated business logic design for captioning multimedia, translating languages, and simplifying text. IBM code-named the integration layer service groupings "X"MeNow in several IBM first-of-a-kind projects (such as CaptionMeNow) that are funded in part by user organizations. However, another future option may be that existing services by other vendors (such as Verizon's and AOL's AIM and IP-Relay Services, www.ip-relay.com/sp/) could be integrated into this layer. The value of high-level integrated services would be a convenience to users with disabilities, as well as to their communicating partners.

The Business Processes Layer 1 in the wisePad SOA includes specialized software modules for image processing and speech processing and other such functions. They provide the technical engine (novel scientific methods and algorithms developed and patented by IBM) for the Web services identified in Figure 2. The Business Process Layer 2 includes typical wisePad applications (located anywhere on the Web) that may be used as input to the wisePad service suite.

Business Model

Organizations in the entertainment, travel, retail, finance, and government sectors are all likely to be among the first paying subscribers, enabling their employees and customers to use wisePad services. Being first movers will follow their strong inclusive mission statements and/or U.S. federal legislation (such as the Americans with Disabilities Act and Section 508 of the Rehabilitation Act of 1973) mandating inclusion of all demographics, physical handicaps, and employee and customer categories. Profit and productivity would be important considerations for subscribing organizations expecting to benefit through increased Web use by users and customers, especially in terms of e-commerce and related revenue.

Today, the most basic infrastructure components for the wisePad architecture and services is in place, including Web services standards, popular end-user mobile Internet devices, Internet content delivery, telecom carriers, Web companies (such as Akamai.com, and Amazon.com), and the wisePad architecture and its bundling of end-user services. Companies designing network architectures, including those related to wisePad (such as Aircell, www.aircell.com/), deliver high-speed wireless broadband to aircraft, allowing users to send and receive email and engage in e-commerce while in flight. It would thus be possible to provide wisePad services on planes, trains, ferries, and other moving vehicles. All travelers would be able to enjoy multimedia transformed by wisePad to include captions in multiple languages (such as Arabic, English, French, Italian, German, Japanese, and Spanish) and/or pictographs, summaries, color-coded figures, and modified Web-page components. Mobile vision-, hearing-, and/or speech-impaired financial customers would be able to more fully participate in financial markets, shop for insurance, and do their banking on the go. Mobile retail and government online channels would also be included, as more and more wisePad users browse and buy online.

Tapping the glasses frame twice might connect the user to the Web to view email on the lens display; a user-trained rotation of the frame might magnify visual content; and composite geometric gestures (touching and/or touchless) might adjust the colors in the display.

Future wisePad service providers, including IBM, pending negotiated agreement, must still come up with a viable wisePad-related revenue model. Although revenue models differ by industry and organization, a model based on a flat subscriber fee per user per month might be an option for a technology-service vendor. Initially, on-demand models are not likely to be a good choice for providers of early-stage innovative services like wisePad, as the number of subscriptions does not increase as quickly as in conventional business services (such as sales force automation) and collaboration services (such as Microsoft Office Live). However, when volume subscriptions of wisePad-type services do occura reasonable expectation over the next 10 years in light of today's worldwide impaired-user demographicsless expensive on-demand revenue models will be possible for wisePad services.

Conclusion

The vision-, hearing-, and/or speech-impaired user is the foremost stakeholder in this wisePad vision. Many of even the most technology-savvy ones are still waiting for critical enabling technologies (such as wisePad) to inexpensively and transparently assist their enjoyment and ubiquitous participation in family, personal, and work activities.

Since IBM inventors, engineers, and developers unveiled their first prototype of an eyewear-attached portable computer (www.isrl.uiuc.edu/~chip/projects/timeline/1999jones.html), many talented people both in and outside of IBM have participated in creating the technologies needed to make the wisePad services system a reality. They have since championed and advanced the standards and technologies for Web 2.0 services, thus providing a foundation for designers of future Web system access, applications, and service, including those working on wisePad (see Figure 3).

Many component technologies for wisePad services, most notably captioning, are market-ready today; if combined, they would be close to being a breakthrough for improving the quality of life for many mobile vision-, hearing-, and/or speech-impaired users. Easy-to-use applications for all wearable computer users, not only the handicapped, are emerging. Heartwarming and worthy of celebration, ubiquitous services, including wisePad, would contribute to personal empowerment, contribution, and inclusion for all.

References

1. Basson, S.H. and Kanevsky, D. Universal Closed-Caption Portable Receiver, U.S. Patent Application 7,221,405, May 22, 2007. IBM, Armonk, NY.

2. Fein, R.A., Dolan, W.B., Messerly, J., Fries, E.J., Thorpe, C.A., and Cokus, S.J. Document Summarizer for Word Processors, U.S. Patent 5,924,108, July 13,1999. Microsoft Corp., Redmond, WA.

3. Haritaoglu, I. InfoScope: Link from real world to digital information space. In Proceedings of the International Conference on Ubiquitous Computing (Atlanta, GA, Sept. 30). Lecture Notes in Computer Science Series 2201, Springer, Berlin, 2001, 247255.

4. Jutla, D.N. and Kanevsky, D. What's Next Charles Schwab? User Space in Personal Investing. IBM Research Report. IBM Research Division, Nov. 6, 2006; domino.research.ibm.com/library/cyberdig.nsf/papers/F3568D6DB7557C0685257222006F47A5/$File/rc24099.pdf.

5. Kanevsky, D. and Zlatsin, A. Controlling Devices' Behaviors Via Changes in their Relative Locations and Positions, U.S. Patent 20080174547, July 24, 2008. IBM, Armonk, NY; appft1.uspto.gov/netacgi/nph-Par ser?Sect1=PT02&Sect2=HITOFF&p=l&u=%2Fnetahtml%2FPTO%2Fsearch-bool.html&r=12&f=G&l=50&col=AND&d=PG01&s1=kanevsky.IN.&OS=IN/kanevsky&RS=IN/kanevsky.

6. Kanevsky, D. and Zadrozny, W.W. Virtual Shadow Briefcase in Servers Supporting Moving Embedded Clients, U.S. Patent No 6,912,580 B1, June 28, 2005. IBM, Armonk, NY; www.freepatentsonline.com/6912580.html.

7. Kanevsky, D. and Sorenson, J.S. Head-Mounted Display Content Transformer. U.S. Patent 6,738,535, May 18, 2004. IBM, Armonk, NY; www.freepatentsonline.com/6738535.html.

8. Microsoft and Forrester Research. The Market for Accessible Technology: The Wide Range of Abilities and Its Impact on Computer Technology. Research Report. Forrester Research, Cambridge, MA, 2003; www.microsoft.com/enable/research/phase1.aspx.

9. Toossi, M. A century of change: U.S. labor force 19502050. Monthly Labor Review 125, 5 (May 2002), 1528.

Authors

Dawn N. Jutla ([email protected]) is a full professor and chair of the Department of Finance, Information Systems, and Management Science in the Sobey School of Business at Saint Mary's University, Nova Scotia, Canada.

Dimitri Kanevsky ([email protected]) is a researcher, master inventor, and project manager in the Speech and Language Algorithms Department in the IBM T.J. Watson Research Center, Yorktown Heights, NY.

Footnotes

DOI: http://doi.acm.org/10.1145/1435417.1435434

Figures

Figure 1. wisePad services use designer eyewear.

Figure 2. wisePAD services blueprint.

Figure 3. The author Kanevsky, a deaf research scientist, testing IBM's prototype head-mounted display device with IBM's ViaVoice transcription services combined with a human-in-the-loop live transcription service from CaptionFirst, Inc. (www.captionfirst.com/), the first time transcription services were coupled with eyewear and a belt-worn portable computing device.

Figure. Future transparent wearable display (www.microvision.com).

Tables

Table. wisePad user services.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

No entries found