ACM

Communications of the ACM

Home/News/Google's Deepmind AI Can Lip-Read Tv Shows Better.../Full Text

ACM TechNews

Google's Deepmind AI Can Lip-Read Tv Shows Better Than a Pro

By New Scientist
November 28, 2016
Comments

View as: Print Mobile App Share:

Reading lips on television. — Researchers are using deep-learning techniques to create an enhanced lip-reading system.

Credit: Getty Images

Researchers at Google's DeepMind and the University of Oxford are applying deep-learning techniques to a massive dataset of BBC TV programs to create a lip-reading system that can perform better than professional lip readers.

The artificial intelligence (AI) system was trained using 5,000 hours from six TV programs that aired between January 2010 and December 2015. The TV clips' audio and video streams were sometimes out of sync, so a computer system was taught the correct links between sounds and mouth shapes to prepare the dataset for the study. Using this information, the system determined how much the streams were out of sync and realigned them.

The AI's lip-reading performance was then tested on TV programs broadcast between March and September 2016, accurately deciphering 46.8% of all words without any errors. In comparison, a professional lip reader deciphered just 12.4% of words correctly in a dataset of 200 clips. Many of the AI's errors were small, such as missing an "s" at the end of the word.

Researchers believe automatic lip readers could have significant practical potential, with applications ranging from improved hearing aids to speech recognition in loud environments.

From New Scientist
View Full Article

No entries found