Researchers at Oxford University in the U.K. and Google have developed an algorithm that has outperformed professional human lip readers, a breakthrough they say could lead to surveillance video systems that can show the content of speech in addition to the actions of an individual.
The researchers developed the algorithm by training Google's Deep Mind neural network on thousands of hours of subtitled BBC TV videos, showing a wide range of people speaking in a variety of poses, activities, and lighting.
The neural network, dubbed Watch, Listen, Attend, and Spell (WLAS), learned to transcribe videos of mouth motion to characters, using more than 100,000 sentences from the videos. By translating mouth movements into individual characters, WLAS was able to spell out words.
The Oxford researchers found a professional lip reader could correctly decipher less than 25% of the spoken words, while the neural network was able to decipher 50% of the spoken words.
From ZDNet
View Full Article
Abstracts Copyright © 2017 Information Inc., Bethesda, Maryland, USA
No entries found