ACM

Communications of the ACM

Home/News/Using Language to Give Robots a Better Grasp of Open.../Full Text

ACM TechNews

Using Language to Give Robots a Better Grasp of Open-Ended World

By MIT News
November 9, 2023
Comments

View as: Print Mobile App Share:

The systems 3D feature fields could be helpful in environments that contain thousands of objects, such as warehouses. — Feature Fields for Robotic Manipulation (F3RM) enables robots to interpret open-ended text prompts using natural language, helping the machines manipulate unfamiliar objects.

Credit: Ge Yang et al.

The Feature Fields for Robotic Manipulation (F3RM) method designed by Massachusetts Institute of Technology researchers helps robots identify and grasp nearby objects by forming three-dimensional (3D) scenes from two-dimensional (2D) images and vision foundation models.

F3RM can be applied to real-world settings with thousands of objects by interpreting open-ended text prompts from humans using natural language.

A camera mounted on a selfie stick shoots 50 2D images in different poses to build a neural radiance field, with the resulting collage rendering a 360-degree "digital twin" of the environment.

F3RM uses the Contrastive Language-Image Pre-training (CLIP) vision foundation model to enhance geometry with semantic data, reassembling the 2D CLIP features for the camera-shot images into a 3D representation.

Following a few demonstrations, the robot, when prompted, grasps previously unencountered objects by applying its geometric and semantic knowledge, choosing the highest-scoring option.

From MIT News
View Full Article

No entries found