acm-header
Sign In

Communications of the ACM

ACM Careers

NSF Grant Will Help Create Machines That Think Like Toddlers


View as: Print Mobile App Share:
toddler wearing a head-mounted camera

Infants and toddlers will wear head-mounted mini-cameras to gather first-person image data for the study.

Credit: Sven Bambach / Indiana University

Linda Smith, an internationally recognized expert in human cognition at Indiana University, and IU professor Chen Yu, in collaboration with computer vision researchers from Georgia Tech, have received $700,000 from the U.S. National Science Foundation to lead new research that could strengthen understanding of how children learn to recognize discrete categories of objects.

Members of the IU Bloomington College of Arts and Sciences' Department of Psychological and Brain Sciences, Smith and Yu say the work is expected to help in the creation of machines that can learn how to visually recognize objects with the same ease as children. It could also lead to new, more sophisticated digital object-recognition technology.

The IU team will recruit over 100 families to gather first-person image data from infants and toddlers in their own homes using eye-tracking technology and extremely lightweight, GoPro-like, head-mounted mini-cameras. The team from Georgia Tech will then use the images to design machine learning models that mimic the toddlers' ability to recognize objects.

"The study addresses a critical need to better understand the visual side of object name learning," says Smith, a Distinguished Professor and Chancellor's Professor of Psychological and Brain Sciences. "Emerging evidence from labs across the country suggests that children who are slow word learners also are slower, or weaker, in their visual object recognition skills. It could be that learning object names teaches visual object recognition or that poor or slowly developing visual object recognition limits early word learning.

"Either way, visual object recognition is intricately connected to the early language-learning process," Smith says.

Participating families will hail from Monroe County, Ind. Caregivers will place head cameras on their children for six hours in a day or multiple times over a week to capture moment-to-moment, "high-density" eye movement information as they interact and play — a total of 54 million images and 500 hours of head camera video.

"The visual data and footage from these devices will undergo a rigorous data mining and quantitative analysis using computer vision and machine learning techniques, which could ultimately advance how researchers study learning in young infants and toddlers," Yu says.

"Our easy-to-use system was designed to fit parent and toddler needs," Smith adds. "Our goal will be to gather everyday toddler-perspective scenes without influencing — by our own expectations or parent expectations — the recorded scenes."

IU researchers will follow up with the participating families after one year to record language progression in the children, allowing the team to connect the new data back to the visual information from the initial portion of the study.

The formation of object recognition skills has remained a notoriously "unsolved puzzle" in psychology, says Smith, noting that most object recognition research and technology is based on the assumption that humans acquire these skills through the accumulation of numerous examples of a single object.

Smith and Yu's preliminary work, however, suggests a very different scenario: that numerous views, including partial and limited ones, of a single object lead to the development of visual object recognition skills. So in addition to tallying instances of exposure to different categories of objects — cars, cups, chairs, or ducks, for instance — IU researchers will record toddlers as they encounter similar objects in different forms — a toy duck, a soap dish shaped like a duck, a duck-shaped candy dispenser — as well as extended interactions with a single object.

"The key to this study is capturing egocentric, first-person views of the natural visual environment from the perspective of infants," says Smith, who says the camera and eye-tracking techniques are unique to her studies. "Objects are not recognized based upon the number of instances of the object in their environment but rather the limits of time and place, and by the young child's body, activities, and needs."

Smith also serves as the director of the Cognitive Development Lab in the Department of Psychological and Brain Sciences. Yu is director of the Computational Cognition and Learning Lab in the department.

The receipt of the NSF grant would not have been possible without support from the IU Bloomington Office of the Provost's Faculty Research Support Program, Smith says.


 

No entries found

Sign In for Full Access
» Forgot Password? » Create an ACM Web Account