RoboCat: Self -perfect robotic agent

The new foundation agent learns to support various robotic arms, solves tasks from only 100 demonstrations and improves generated data.

Robots quickly become part of our daily lives, but they are often programmed only to perform specific tasks well. While the use of recent progress in artificial intelligence can lead to robots that can help in many other ways, progress in building general use robots is partly slower due to the time needed to collect training data.

Our latest article He introduces a self -inspected AI agent for robotics, Robocat, who learns to perform various tasks on different arms, and then independently generates new training data to improve its technique.

Previous research has examined how to develop robots that can learn multi -purpose AND Combine understanding of language models with the possibilities of the real world Robot assistant. RoboCat is the first agent who solved and adapted to many tasks and does it in various, real robots.

RoboCat learns much faster than other most modern models. He can choose a new task with just 100 demonstrations, because he draws on a large and varied set of data. This ability will help accelerate robotics research because it reduces the need for people by people and is an important step towards creating a general purpose robot.

How robocat improves

RoboCat is based on our multimodal model Gato (Spanish for “Cat”), which can process language, images and actions in both the simulated and physical environment. We combined Gato architecture with a large set of training sequences of paintings and actions of various robots to solve hundreds of different tasks.

After the first round of the training, we introduced RoboCat into the training cycle “Self -improvement” with a set of previously invisible tasks. Learning each new task took place after five steps:

  1. Collect 100-1000 demonstrations of a new task or robot using a robotic man-controlled arm.
  2. Customize RoboCat on this new task/shoulder to create a specialized spin-off agent.
  3. The spin-off agent practices an average of 10,000 times on this new task/shoulder, generating more training data.
  4. Turn on demonstration data and self -control data into the existing Robocata training set.
  5. Pass the new version of Robocat on the new set of training data.

Robocata training cycle, increased thanks to its ability to autonomously generate additional training data.

The combination of all this training means that the latest Robocat is based on a set of data of millions of trajectories, both from real and simulated robotic arms, including generated data. We used four different types of robots and many robotic arms to collect data based on visions representing Robocat tasks would be trained to perform.

RoboCat learns from various types of data types and training tasks: Films with real robotic arms collecting gears, simulated arm arranging blocks and robocat using a robotic arm to collect cucumber.

Learning to use new robotic arms and solving more complex tasks

Thanks to the diverse training, Robocat learned to operate various robotic arms within a few hours. While he was trained on his shoulders with two-ups, he was able to adapt to a more complex arm with a three-time gripper and twice as much controlled input data.

Left: The new robotic Robocat arm has learned to control
Normal: RoboCat video using a gear collecting arm

After observing 1000 demonstrations controlled by man, collected in just hours, Robocat could direct this new arm skillfully enough to successfully collect gears in 86% of time. With the same level of demonstration, it can adapt to the solutions to the tasks that combined precision and understanding, such as removing the correct fruit from a bowl and a solution to the fitting puzzle that is necessary for a more complex control.

Examples of ROBOCAT tasks can adapt to solving 500-1000 demonstrations.

Self -perfect general

RoboCat has a virtuous training cycle: the more new tasks learn, the better it is to learn additional new tasks. The initial version of Robocat was successful with only 36% of time on previously invisible tasks, after learning 500 demonstrations for the task. But the latest Robocat, which trained a greater variety of tasks, doubled this success rate over twice in the same tasks.

A big difference in performance between the initial Robocat (one training round) compared to the final version (extensive and various training, including self -improvement), 500 demonstrations of previously invisible tasks were adapted after both versions.

These improvements were caused by the growing experience of RoboCat, just like people develop a more diverse range of skills when they deepen learning in a given field. Robocat's ability to independently learn skills and rapid self -improvement, especially when applied to various robotic devices, will help pave the new generation path of more helpful, general robotic agents.

LEAVE A REPLY

Please enter your comment!
Please enter your name here