Retour aux articles
IAOpenAI News

RL²: Fast reinforcement learning via slow reinforcement learning

OpenAI November 9, 2016 Publication RL²: Fast reinforcement learning via slow reinforcement learning Read paper (opens in a new window) Loading… Share Abstract Deep reinforcement learning (deep RL) has bee...

Le flux RSS ne fournissait qu'un extrait. FlowMarket a récupéré le contenu public disponible depuis la page originale, sans contourner les contenus réservés.

November 9, 2016

RL²: Fast reinforcement learning via slow reinforcement learning

Rl Fast Reinforcement Learning Via Slow Reinforcement Learning

Abstract

Deep reinforcement learning (deep RL) has been successful in learning sophisticated behaviors automatically; however, the learning process requires a huge number of trials. In contrast, animals can learn new tasks in just a few trials, benefiting from their prior knowledge about the world. This paper seeks to bridge this gap. Rather than designing a "fast" reinforcement learning algorithm, we propose to represent it as a recurrent neural network (RNN) and learn it from data. In our proposed method, RL², the algorithm is encoded in the weights of the RNN, which are learned slowly through a general-purpose ("slow") RL algorithm. The RNN receives all information a typical RL algorithm would receive, including observations, actions, rewards, and termination flags; and it retains its state across episodes in a given Markov Decision Process (MDP). The activations of the RNN store the state of the "fast" RL algorithm on the current (previously unseen) MDP. We evaluate RL² experimentally on both small-scale and large-scale problems. On the small-scale side, we train it to solve randomly generated multi-arm bandit problems and finite MDPs. After RL² is trained, its performance on new MDPs is close to human-designed algorithms with optimality guarantees. On the large-scale side, we test RL² on a vision-based navigation task and show that it scales up to high-dimensional problems.

  • Learning Paradigms

Authors

Related articles

Scaling Laws For Reward Model Overoptimization

Publication Oct 19, 2022

Screenshot of a scene from Minecraft

Conclusion Jun 23, 2022

Group of people posing behind a panel

Publication Dec 13, 2019

Besoin d'un workflow n8n ou d'aide pour l'installer ?

Après la veille, passez à l'action : trouvez un template n8n ou un créateur capable de l'adapter à vos outils.

Source

OpenAI News - openai.com

Voir la publication originale