PyTorch implementation of Decoupling Value and Policy for Generalization in Reinforcement Learning

Last update: Dec 08, 2022

Related tags

Deep Learning idaac

Overview

IDAAC: Invariant Decoupled Advantage Actor-Critic

This is a PyTorch implementation of the methods proposed in

Decoupling Value and Policy for Generalization in Reinforcement Learning by

Roberta Raileanu and Rob Fergus.

Citation

If you use this code in your own work, please cite our paper:

@article{Raileanu2021DecouplingVA,
  title={Decoupling Value and Policy for Generalization in Reinforcement Learning},
  author={Roberta Raileanu and R. Fergus},
  journal={ArXiv},
  year={2021},
  volume={abs/2102.10330}
}

Requirements

To install all the required dependencies:

conda create -n idaac python=3.7
conda activate idaac

cd idaac
pip install -r requirements.txt

pip install procgen

git clone https://github.com/openai/baselines.git
cd baselines 
python setup.py install

Instructions

This repo provides instructions for training IDAAC, DAAC, and PPO on the Procgen benchmark.

Train IDAAC on CoinRun

python train.py --env_name coinrun --algo idaac

Train DAAC on CoinRun

python train.py --env_name coinrun --algo daac

Train PPO on CoinRun

python train.py --env_name coinrun --algo ppo --ppo_epoch 3

Note: The default code uses the same set of hyperparameters (HPs) for all environments, which are the best ones overall. In our studies, we've found some of the games can further benefit from slightly different HPs, so we provide those as well. To use the best hyperparameters for each environment, use the flag --use_best_hps.

Overview of DAAC and IDAAC

Procgen Results

IDAAC achieves state-of-the-art performance on the Procgen benchmark (easy mode), significantly improving the agent's generalization ability over standard RL methods such as PPO.

Test Results on Procgen

Acknowledgements

This code was based on an open sourced PyTorch implementation of PPO.

PyTorch implementation of Decoupling Value and Policy for Generalization in Reinforcement Learning

Related tags

Overview

IDAAC: Invariant Decoupled Advantage Actor-Critic

Citation

Requirements

Instructions

Train IDAAC on CoinRun

Train DAAC on CoinRun

Train PPO on CoinRun

Overview of DAAC and IDAAC

Procgen Results

Acknowledgements

Owner

OpenMatch: Open-set Consistency Regularization for Semi-supervised Learning with Outliers (NeurIPS 2021)

Repositório para arquivos sobre o Módulo 1 do curso Top Coders da Let's Code + Safra

Source code for our EMNLP'21 paper 《Raise a Child in Large Language Model: Towards Effective and Generalizable Fine-tuning》

PyTorch implementation of HDN(Homography Decomposition Networks) for planar object tracking

MADE (Masked Autoencoder Density Estimation) implementation in PyTorch

Official PyTorch Implementation of Unsupervised Learning of Scene Flow Estimation Fusing with Local Rigidity

Repository for paper "Non-intrusive speech intelligibility prediction from discrete latent representations"

LAMDA: Label Matching Deep Domain Adaptation

A custom DeepStack model for detecting 16 human actions.

fcn by tensorflow

PyTorch code for ICPR 2020 paper Future Urban Scene Generation Through Vehicle Synthesis

[CVPR-2021] UnrealPerson: An adaptive pipeline for costless person re-identification

Tensorflow-Project-Template - A best practice for tensorflow project template architecture.

Implementation of popular bandit algorithms in batch environments.

Source code, datasets and trained models for the paper Learning Advanced Mathematical Computations from Examples (ICLR 2021), by François Charton, Amaury Hayat (ENPC-Rutgers) and Guillaume Lample

Data Augmentation Using Keras and Python

[NeurIPS 2021] Source code for the paper "Qu-ANTI-zation: Exploiting Neural Network Quantization for Achieving Adversarial Outcomes"

Generalized Matrix Means for Semi-Supervised Learning with Multilayer Graphs

PiRank: Learning to Rank via Differentiable Sorting