Code for the paper: On Pathologies in KL-Regularized Reinforcement Learning from Expert Demonstrations

Last update: May 13, 2022

Related tags

Overview

Non-Parametric Prior Actor-Critic (N-PPAC)

This repository contains the code for

On Pathologies in KL-Regularized Reinforcement Learning from Expert Demonstrations, Tim G. J. Rudner*, Cong Lu*, Michael A. Osborne, Yarin Gal, Yee Whye Teh. Conference on Neural Information Processing Systems (NeurIPS), 2021.

Abstract: KL-regularized reinforcement learning from expert demonstrations has proved successful in improving the sample efficiency of deep reinforcement learning algorithms, allowing them to be applied to challenging physical real-world tasks. However, we show that KL-regularized reinforcement learning with behavioral policies derived from expert demonstrations suffers from hitherto unrecognized pathological behavior that can lead to slow, unstable, and suboptimal online training. We show empirically that the pathology occurs for commonly chosen behavioral policy classes and demonstrate its impact on sample efficiency and online policy performance. Finally, we show that the pathology can be remedied by specifying non-parametric behavioral policies and that doing so allows KL-regularized RL to significantly outperform state-of-the-art approaches on a variety of challenging locomotion and dexterous hand manipulation tasks.

View on OpenReview

In particular, the code implements:

Scripts for estimating behavioral reference policies for a range of model calsses, including non-parametric Gaussian processes, Bayesian neural networks trained via MC Dropout, deep ensembles, and Gaussian neural density models;
Scripts for KL-regularized online training that uses different bahevioral expert policies.

How to use this package

We provide a Docker setup which may be built as follows:

docker build -t torch-nppac .

To train the GP policies offline:

bash exp_scripts/paper_clone_gp.sh

To run online training (N-PPAC):

bash exp_scripts/paper_configs.sh

Pre-trained GP policies using final_clone_gp.sh are provided in the folder nppac/trained_gps/.

By default, all data will be stored in data/.

Reference

If you found this repository useful, please cite our paper as follows:

@inproceedings{
    rudner2021pathologies,
    title={On Pathologies in {KL}-Regularized Reinforcement Learning from Expert Demonstrations},
    author={Tim G. J. Rudner and Cong Lu and Michael A. Osborne and Yarin Gal and Yee Whye Teh},
    booktitle={Thirty-Fifth Conference on Neural Information Processing Systems},
    year={2021},
    url={https://openreview.net/forum?id=sS8rRmgAatA}
}

License

The repository is based on RLkit, which may contain further useful scripts. The license for this is contained under the rlkit/ folder.

Code for the paper: On Pathologies in KL-Regularized Reinforcement Learning from Expert Demonstrations

Related tags

Overview

Non-Parametric Prior Actor-Critic (N-PPAC)

How to use this package

Reference

License

Owner

Cong Lu

Explaining neural decisions contrastively to alternative decisions.

Code implementation of "Sparsity Probe: Analysis tool for Deep Learning Models"

Random Forests for Regression with Missing Entries

A general python framework for single object tracking in LiDAR point clouds, based on PyTorch Lightning.

The code repository for "RCNet: Reverse Feature Pyramid and Cross-scale Shift Network for Object Detection" (ACM MM'21)

retweet 4 satoshi ⚡️

source code for 'Finding Valid Adjustments under Non-ignorability with Minimal DAG Knowledge' by A. Shah, K. Shanmugam, K. Ahuja

PyTorch implementation of SIFT descriptor

Official implementation of "SinIR: Efficient General Image Manipulation with Single Image Reconstruction" (ICML 2021)

Code for "The Box Size Confidence Bias Harms Your Object Detector"

Face2webtoon - Despite its importance, there are few previous works applying I2I translation to webtoon.

Code for intrusion detection system (IDS) development using CNN models and transfer learning

The official PyTorch code for 'DER: Dynamically Expandable Representation for Class Incremental Learning' accepted by CVPR2021

Code for the paper "TadGAN: Time Series Anomaly Detection Using Generative Adversarial Networks"

Learning Dynamic Network Using a Reuse Gate Function in Semi-supervised Video Object Segmentation.

The Self-Supervised Learner can be used to train a classifier with fewer labeled examples needed using self-supervised learning.

Cave Generation using metaballs in Blender. Originally created by sdfgeoff, Edited by Myself (Archie Jaskowicz).

Orthogonal Jacobian Regularization for Unsupervised Disentanglement in Image Generation (ICCV 2021)

Wileless-PDGNet Implementation

🚀 PyTorch Implementation of "Progressive Distillation for Fast Sampling of Diffusion Models(v-diffusion)"