The key to improving speech recognition accuracy is simply mixing all available speech datasets together to train one large AI model, according to a recent study by a team of researchers affiliated with Google Research and Google Brain. They claim an AI model named SpeechStew that was trained on a range of speech corpora achieves state-of-the-art or near-state-of-the-art results on a variety of speech recognition benchmarks.
They describe their work in "SpeechStew: Simply Mix All Available Speech Recognition Data to Train One Large Neural Network."
In pursuit of a solution, the Google researchers combined all available labeled and unlabelled speech recognition data curated by the community over the years. They tested a general-purpose SpeechStew model on a number of benchmarks and found that it not only outperformed previously developed models but demonstrated an ability to adapt to challenging new tasks.
From VentureBeat
View Full Article
No entries found