Photo source: MIT News
MIT is developing an innovative approach for training robots, diverging from traditional methods that rely on limited, task-specific datasets. Instead, this new technique leverages extensive data sets akin to those used in training large language models (LLMs).
The research team highlighted the limitations of imitation learning, where robots learn by observing humans. This method often struggles when faced with minor variations in the environment, such as changes in lighting or the introduction of new obstacles. In these situations, robots lack sufficient data to adapt effectively.
To address this challenge, the researchers drew inspiration from models like GPT-4 and adopted a more comprehensive data-driven strategy for problem-solving.
“In the language domain, the data are all just sentences,” noted Lirui Wang, the lead author of the study. “In robotics, given all the heterogeneity in the data, if you want to pretrain in a similar manner, we need a different architecture.”
The team introduced a novel architecture termed Heterogeneous Pretrained Transformers, which integrates information from various sensors and environments. A transformer model processes this diverse data into cohesive training models, with larger transformers yielding superior results.
Users can input specifications regarding the robot’s design, setup, and the tasks they wish to accomplish.
“Our dream is to have a universal robot brain that you could download and use for your robot without any training at all. While we are just in the early stages, we are going to keep pushing hard and hope scaling leads to a breakthrough in robotic policies, like it did with large language models,” said David Held, an associate professor at Carnegie Mellon University.
The research initiative is partially funded by the Toyota Research Institute (TRI). Last year, the TRI disclosed a method for training robots overnight and has recently formed a partnership to merge its robot learning research with Boston Dynamics’ hardware.