acm-header
Sign In

Communications of the ACM

ACM TechNews

Machine Learning System Tackles Speech and Object Recognition, All at Once


View as: Print Mobile App Share:
The audio cues associated with objects within this image.

Computer scientists at the Massachusetts Institute of Technology have developed a system that learns to identify objects within an image, based on a spoken description of the image.

Credit: Christine Daniloff/MIT

Researchers at the Massachusetts Institute of Technology (MIT) have developed a system that can learn to identify objects within an image, based on a spoken description of the image.

When provided with an image and an audio caption, the system can highlight in real time the relevant regions of the image being described.

The system learns words directly from recorded speech clips and objects in raw images, and associates them with one another.

The researchers trained the model on a total of 400,000 image-caption pairs, and held out 1,000 random pairs for testing.

Said researcher David Harwath, “We wanted to do speech recognition in a way that’s more natural, leveraging additional signals and information that humans have the benefit of using, but that machine learning algorithms don’t typically have access to. We got the idea of training a model in a manner similar to walking a child through the world and narrating what you’re seeing.”

From MIT News
View Full Article

 

Abstracts Copyright © 2018 Information Inc., Bethesda, Maryland, USA


 

No entries found

Sign In for Full Access
» Forgot Password? » Create an ACM Web Account