Researchers at Google and Germany's Technical University of Berlin debuted PaLM-E, described as the largest visual-language model (VLM) ever created.
The multimodal embodied VLM contains 562 billion parameters and combines vision and language for robotic control; Google claimed it can formulate a plan of action to execute high-level commands using its mobile robot platform equipped with an arm.
PaLM-E analyzes data from the robot's camera without requiring pre-processed scene representations, eliminating human data pre-processing or annotation.
The VLM's integration into the control loop also instills resistance to interruptions during tasks.
PaLM-E encodes continuous observations into a sequence of vectors identical in size to language tokens, so it can "understand" sensor data in the same way it processes language.
From Ars Technica
View Full Article
Abstracts Copyright © 2023 SmithBucklin, Washington, D.C., USA
No entries found