acm-header
Sign In

Communications of the ACM

ACM TechNews

Baidu's New AI Can Mimic Your Voice After Listening to It for Just One Minute


View as: Print Mobile App Share:
Baidu's text-to-speech synthesis system can mimic users' voices.

Baidu researchers say they have developed an artificial intelligence that can learn to precisely mimic a person's voice based on less than 60 seconds' worth of listening to it.

Credit: Digital Trends

Researchers at Chinese search giant Baidu say they have developed an artificial intelligence that can learn to precisely mimic a person's voice based on less than 60 seconds' worth of listening to it.

They note this milestone uses Baidu's text-to-speech synthesis system Deep Voice, which was trained on more than 800 hours of audio from 2,400 speakers.

The team says Deep Voice requires only 100 five-second segments of vocal training data to sound its best, but a version trained on only 10 five-second samples was able to deceive a voice-recognition system more than 95% of the time.

"We see many great use cases or applications for this technology," says Baidu's Leo Zou. "For example, voice cloning could help patients who lost their voices. This is also an important breakthrough in the direction of personalized human-machine interfaces."

Zou also thinks the technique could advance the creation of original digital content.

From Digital Trends
View Full Article

 

Abstracts Copyright © 2018 Information Inc., Bethesda, Maryland, USA


 

No entries found

Sign In for Full Access
» Forgot Password? » Create an ACM Web Account