The team of computer scientists and electrical engineers from the University of California San Diego have found a way to make AI-generated voices, such as digital personal assistants, more expressive, with a minimum amount of training. The method, which translates text to speech, can also be applied to voices that were never part of the system's training set.
The researchers describe their work in "Expressive Neural Voice Cloning," presented at the virtually held 13th Asian Conference on Machine Learning.
"We wanted to look at the challenge of not just synthesizing speech but of adding expressive meaning to that speech," says Shehzeen Hussain, a Ph.D. student at the UC San Diego Jacobs School of Engineering.
The researchers flagged the pitch and rhythm of the speech in training samples as a proxy for emotion. This allowed their cloning system to generate expressive speech with minimal training, even for voices it had never encountered before.
"We demonstrate that our proposed model can make a new voice express, emote, sing, or copy the style of a given reference speech," the researchers write.
From University of California San Diego
View Full Article
No entries found