MIT researchers created a new annotated synthetic dataset of images that depict a wide range of scenarios, which can be used to help machine-learning models understand the concepts in a scene.
Credit: Khaled Shehada et al.
Massachusetts Institute of Technology researchers were part of a team that developed a technique that uses computer-generated data to help vision and language models better understand concepts.
The researchers used an annotated synthetic dataset to fine-tune popular vision and language models, increasing their accuracy in concept understanding by up to 10%.
They produced close to 800,000 photorealistic images using computer-generated synthetic videos of diverse three-dimensional environments and objects, with human avatars added to interact with them.
A detailed caption was added to each image, covering object attributes, positional relationships, and human-object interactions.
Synthetic data allowed the researchers to create more diverse images at a lower cost than generating real data while preserving privacy through the use of avatars.
From MIT News
View Full Article
Abstracts Copyright © 2023 SmithBucklin, Washington, D.C., USA
No entries found