Credit: The Next Web
Sorting sound into intelligible speech is a seemingly effortless feat. Healthy human ear-brain auditory systems perform it heroically, even in highly confusing soundscapes with a riotous mix of acoustic signals.
And while scientists have successfully constructed automatic speech recognition (ASR) engines to help perform routine speech transactions — coaching you through an automatic payment over the telephone — ASR does have problems. Too often garble prevails. When that happens, pushing "0" to summon a pair of human ears hooked up to a neuroprocessing unit — a brain — seems the only way around a late fee.
Now scientists are turning to that human-brain hookup for help, too. A team from the International Computer Science Institute at the University of California, Berkeley, and from the University of Oldenburg, Germany, is learning from errors in human speech recognition (HSR) and applying that knowledge to design new signal processing strategies and models for ASR. They will present their study at the 162nd Meeting of the Acoustical Society of America, in San Diego, California.
One key finding: ASR does not handle time cues in language as well as HSR does. Improving computer models to optimize the way ASR processes time variation in speech to align it with the human system could have beneficial applications in many human-machine interfaces. These include improved hearing aids, smart homes, and new assistive hearing apps on smart phones.
Explains lead researcher, Bernd T. Meyer: "Automatic speech recognition has its flaws. In comparison, human speech recognition works a lot better than ASR, so we thought, 'Why don't we try to learn from the auditory system and integrate those principles into ASR?'"
The aim is to improve ASR so it, like HSR, can render correct messages from a highly variable mix of acoustic signals. The work is an extension of Meyer's previous research in medical physics with a specialty in mechanisms of hearing. "Since I had been working in a group with much knowledge about hearing, and we know that hearing and speaking are interrelated due to co-evolution, we thought it's a good idea to use the models developed there not only for hearing aids, but also for improving speech recognizers," Meyer says.
The team compared ASR to HSR in a series of experiments. The automatic system and the listeners had to identify the consonants and vowels sounds from a noisy database of utterances. These utterances intrinsically varied by rate, style, and vocal effort involved.
Results showed that a speech recognition gap exists between ASR and HSR. According to their data, a standard ASR system reaches human performance levels only when the signal-to-noise-ratio is increased by 15 decibels. In high noise conditions, human listeners achieved 75 percent accuracy while ASR was only slightly above 30 percent.
So if high noise conditions prevail when you happen to be talking to an automated bank teller, you might be courting a late fee — unless you push "0" to summon the HSR unit: a pair of ears, a brain, and a perfunctory wish for a nice day.
The paper, "Improving Automatic Speech Recognition by Learning from Human Errors," will be presented Tuesday (Nov. 1).
No entries found