Researchers at Chinese search giant Baidu say they have developed an artificial intelligence that can learn to precisely mimic a person's voice based on less than 60 seconds' worth of listening to it.
They note this milestone uses Baidu's text-to-speech synthesis system Deep Voice, which was trained on more than 800 hours of audio from 2,400 speakers.
The team says Deep Voice requires only 100 five-second segments of vocal training data to sound its best, but a version trained on only 10 five-second samples was able to deceive a voice-recognition system more than 95% of the time.
"We see many great use cases or applications for this technology," says Baidu's Leo Zou. "For example, voice cloning could help patients who lost their voices. This is also an important breakthrough in the direction of personalized human-machine interfaces."
Zou also thinks the technique could advance the creation of original digital content.
From Digital Trends
View Full Article
Abstracts Copyright © 2018 Information Inc., Bethesda, Maryland, USA
No entries found