acm-header
Sign In

Communications of the ACM

ACM TechNews

Baidu Explains How It's Mastering Mandarin With Deep Learning


View as: Print Mobile App Share:
How Deep Speech works.

An engineer at Baidu explains how the search service handles voice queries in Mandarin.

Credit: Awni Hannun/Baidu

In an interview, Baidu engineer Awni Hannun discusses a new model for handling Mandarin voice queries that tests found is accurate 94 percent of the time.

He says the model employs Deep Speech, a deep-learning system that differs from other deep learning-based systems such as Microsoft's Skype Translate. Hannun says in the latter case there are usually three modules in the pipeline--a speech-transcription module, a machine-translation module, and a speech-synthesis module.

"Our system is different than that system in that it's more what we call end-to-end," he notes. "Rather than having a lot of human-engineered components that have been developed over decades of speech research--by looking at the system and saying what features are important or which phonemes the model should predict--we just have some input data, which is an audio .WAV file on which we do very little pre-processing. And then we have a big, deep neural network that outputs directly to characters."

Hannun says the network is fed enough data so it can learn what is relevant from the input to correctly transcribe the output, with a minimum of human intervention. He says Baidu plans to build a speech system that can interface with any smart device, and compressing existing models may be of help in this regard.

From Medium
View Full Article

 

Abstracts Copyright © 2015 Information Inc., Bethesda, Maryland, USA


 

No entries found

Sign In for Full Access
» Forgot Password? » Create an ACM Web Account