ACM

Communications of the ACM

Home/News/Google's PaLM-E Generalist Robot Brain Takes Commands/Full Text

ACM TechNews

Google's PaLM-E Generalist Robot Brain Takes Commands

By Ars Technica
March 13, 2023
Comments

View as: Print Mobile App Share:

A robotic arm controlled by PaLM-E reaches for a bag of chips. — The PaLM-E multimodal embodied visual-language model can generate a plan of action for a mobile robot platform with an arm and execute the actions by itself.

Credit: Google Research

Researchers at Google and Germany's Technical University of Berlin debuted PaLM-E, described as the largest visual-language model (VLM) ever created.

The multimodal embodied VLM contains 562 billion parameters and combines vision and language for robotic control; Google claimed it can formulate a plan of action to execute high-level commands using its mobile robot platform equipped with an arm.

PaLM-E analyzes data from the robot's camera without requiring pre-processed scene representations, eliminating human data pre-processing or annotation.

The VLM's integration into the control loop also instills resistance to interruptions during tasks.

PaLM-E encodes continuous observations into a sequence of vectors identical in size to language tokens, so it can "understand" sensor data in the same way it processes language.

From Ars Technica
View Full Article

No entries found