Reinforcement Learning via Supervised Learning

Last update: Nov 28, 2022

Related tags

Overview

Reinforcement Learning via Supervised Learning

Installation

Run

pip install -e .

in an environment with Python >= 3.7.0, <3.9.

The code depends on MuJoCo 2.1.0 (for mujoco-py) and MuJoCo 2.1.1 (for dm-control). Here are instructions for installing MuJoCo 2.1.0 and instructions for installing MuJoCo 2.1.1.

If you use the provided Dockerfile, it will automatically handle the MuJoCo dependencies for you. For example:

docker build -t rvs:latest .
docker run -it --rm -v $(pwd):/rvs rvs:latest bash
cd rvs
pip install -e .

Reproducing Experiments

The experiments directory contains a launch script for each environment suite. For example, to reproduce the RvS-R results in D4RL Gym locomotion, run

bash experiments/launch_gym_rvs_r.sh

Each launch script corresponds to a configuration file in experiments/config which serves as a reference for the hyperparameters associated with each experiment.

Adding New Environments

To run RvS on an environment of your own, you need to create a suitable dataset class. For example, in src/rvs/dataset.py, we have a dataset class for the GCSL environments, a dataset class for RvS-R in D4RL, and a dataset class for RvS-G in D4RL. In particular, the D4RLRvSGDataModule allows for conditioning on arbitrary dimensions of the goal state using the goal_columns attribute; for AntMaze, we set goal_columns to (0, 1) to condition only on the x and y coordinates of the goal state.

Baseline Numbers

We replicated CQL using this codebase, which was recommended to us by the CQL authors. All hyperparameters and logs from our replication runs can be viewed at our CQL-R Weights & Biases project.

We replicated Decision Transformer using our fork of the author's codebase, which we customized to add AntMaze. All hyperparameters and logs from our replication runs can be viewed at our DT Weights & Biases project.

Citing RvS

To cite RvS, you can use the following BibTeX entry:

@misc{emmons2021rvs,
      title={RvS: What is Essential for Offline RL via Supervised Learning?}, 
      author={Scott Emmons and Benjamin Eysenbach and Ilya Kostrikov and Sergey Levine},
      year={2021},
      eprint={2112.10751},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

Reinforcement Learning via Supervised Learning

Related tags

Overview

Reinforcement Learning via Supervised Learning

Installation

Reproducing Experiments

Adding New Environments

Baseline Numbers

Citing RvS

Owner

Scott Emmons

🍷 Gracefully claim weekly free games and monthly content from Epic Store.

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators

Camera-caps - Examine the camera capabilities for V4l2 cameras

Code for the tech report Toward Training at ImageNet Scale with Differential Privacy

In-Place Activated BatchNorm for Memory-Optimized Training of DNNs

T-LOAM: Truncated Least Squares Lidar-only Odometry and Mapping in Real-Time

This is the workbook I created while I was studying for the Qiskit Associate Developer exam. I hope this becomes useful to others as it was for me :)

Price-Prediction-For-a-Dream-Home - A machine learning based linear regression trained model for house price prediction.

Pytorch implementation of VAEs for heterogeneous likelihoods.

Robust Video Matting in PyTorch, TensorFlow, TensorFlow.js, ONNX, CoreML!

PyTorch common framework to accelerate network implementation, training and validation

A PyTorch implementation of Sharpness-Aware Minimization for Efficiently Improving Generalization

Official git for "CTAB-GAN: Effective Table Data Synthesizing"

Learning kernels to maximize the power of MMD tests

LocUNet is a deep learning method to localize a UE based solely on the reported signal strengths from a set of BSs.

HybridNets: End-to-End Perception Network

Autonomous racing with the Anki Overdrive

Trash Sorter Extraordinaire is a software which efficiently detects the different types of waste in a pile of random trash through feeding it pictures or videos.

A fast model to compute optical flow between two input images.

Official code repository for the work: "The Implicit Values of A Good Hand Shake: Handheld Multi-Frame Neural Depth Refinement"