Massachusetts Institute of Technology (MIT) researchers have developed a method to train semantic parsers (which convert a spoken phrase to a machine-understandable representation of its meaning) by mimicking the way a child learns language.
The system observes captioned videos and connects the words to recorded actions and objects.
The team integrated a semantic parser and computer-vision component trained in object, human, and activity recognition in video, then trained the system on a crowdsourced dataset of captioned videos depicting human actions. The parser associated the words with actions and objects in a video, learning sentence structure in order to predict the meaning of a sentence without relying on video.
MIT's Andrei Barbu said the work aims to support the development of home robots that can adjust to homeowners' unique speaking patterns "and still figure out what they mean."
From ZDNet
View Full Article
Abstracts Copyright © 2018 Information Inc., Bethesda, Maryland, USA
No entries found