ACM

Communications of the ACM

Home/News/Helping Computer Vision, Language Models Understand.../Full Text

ACM TechNews

Helping Computer Vision, Language Models Understand What They See

By MIT News
September 14, 2023
Comments

View as: Print Mobile App Share:

An annoted image. — MIT researchers created a new annotated synthetic dataset of images that depict a wide range of scenarios, which can be used to help machine-learning models understand the concepts in a scene.

Credit: Khaled Shehada et al.

Massachusetts Institute of Technology researchers were part of a team that developed a technique that uses computer-generated data to help vision and language models better understand concepts.

The researchers used an annotated synthetic dataset to fine-tune popular vision and language models, increasing their accuracy in concept understanding by up to 10%.

They produced close to 800,000 photorealistic images using computer-generated synthetic videos of diverse three-dimensional environments and objects, with human avatars added to interact with them.

A detailed caption was added to each image, covering object attributes, positional relationships, and human-object interactions.

Synthetic data allowed the researchers to create more diverse images at a lower cost than generating real data while preserving privacy through the use of avatars.

From MIT News
View Full Article

No entries found