Retour aux articles
IAOpenAI News

Generalizing from simulation

Our latest robotics techniques allow robot controllers, trained entirely in simulation and deployed on physical robots, to react to unplanned changes in the environment as they solve simple tasks. That is, we’ve used th...

Le flux RSS ne fournissait qu'un extrait. FlowMarket a récupéré le contenu public disponible depuis la page originale, sans contourner les contenus réservés.

October 19, 2017

Generalizing from simulation

Colorful 3D render of a robot simulation interacting with a red cube on a platform

Our latest robotics techniques allow robot controllers, trained entirely in simulation and deployed on physical robots, to react to unplanned changes in the environment as they solve simple tasks. That is, we’ve used these techniques to build closed-loop systems rather than open-loop ones as before.

The simulator need not match the real-world in appearance or dynamics; instead, we randomize relevant aspects of the environment, from friction to action delays to sensor noise. Our new results provide more evidence that general-purpose robots can be built by training entirely in simulation, followed by a small amount of self-calibration in the real world.

Dynamics randomization

We developed  dynamics randomization ⁠ (opens in a new window)  to train a robot to adapt to unknown real-world dynamics. During training, we randomize a large set of ninety-five properties that determine the dynamics of the environment, such as altering the mass of each link in the robot’s body; the friction and damping of the object it is being trained on; the height of the table the object is on; the latency between actions; the noise in its observations; and so on.

We used this approach to train an  LSTM ⁠ (opens in a new window) -based policy to push a hockey puck around a table. Our feed-forward networks  fail ⁠  at this task, whereas LSTMs can use their past observations to analyze the dynamics of the world and adjust their behavior accordingly.

From vision to action

We also trained a robot end-to-end in simulation using reinforcement learning (RL), and deployed the resulting policy on a physical robot. The resulting system maps vision directly to action without special sensors, and can adapt to visual feedback.

The abundance of RL results with simulated robots can make it seem like RL easily solves most robotics tasks. But common RL algorithms work well only on tasks where small perturbations to your action can provide an incremental change to the reward. Some robotics tasks have simple rewards, like walking, where you can be scored on distance traveled. But most tasks do  not ⁠ —to define a dense reward for block stacking, you’d need to encode that the arm is close to the block, that the arm approaches the block in the correct orientation, that the block is lifted off the ground, the distance of block to the desired position, etc.

We spent a number of months unsuccessfully trying to get conventional RL algorithms working on pick-and-place tasks before ultimately developing a new reinforcement learning algorithm,  Hindsight Experience Replay ⁠ (opens in a new window)  (HER), which allows agents to learn from a binary reward by pretending that a failure was what they wanted to do all along and learning from it accordingly. (By analogy, imagine looking for a gas station but ending up at a pizza shop. You still don’t know where to get gas, but you’ve now learned where to get pizza.) We also used  domain randomization ⁠  on the visual shapes to learn a vision system robust enough for the physical world.

Our HER implementation uses the actor-critic technique with asymmetric information. (The  actor  is the policy, and the  critic  is a network which receives action/state pairs and estimates their Q-value, or sum of future rewards, providing training signal to the actor.) While the critic has access to the full state of the simulator, the actor only has access to RGB and depth data. Thus the critic can provide fully accurate feedback, while the actor uses only data present in the real world.

Costs

Both techniques increase the computational requirements: dynamics randomization slows training down by a factor of 3x, while learning from images rather than states is about 5–10x slower.

We see three approaches to building general-purpose robots: training on huge fleets of physical robots, making simulators increasingly match the real world, and randomizing the simulator to allow the model to generalize to the real-world. We increasingly believe that the third will be the most important part of the solution.

If you’re interested in helping us push towards general-purpose robots, consider  joining our team at OpenAI ⁠ .

  • Simulated Environments
  • Learning Paradigms
  • Robotics

Authors

Related articles

CLIP

Milestone Jan 5, 2021

Outstretched robot arm solving a Rubik's cube in its palm in front of a cloudy purple background

Milestone Oct 15, 2019

Learning Dexterity

Milestone Jul 30, 2018

Besoin d'un workflow n8n ou d'aide pour l'installer ?

Après la veille, passez à l'action : trouvez un template n8n ou un créateur capable de l'adapter à vos outils.

Source

OpenAI News - openai.com

Voir la publication originale