Simple (but Strong) Baselines for POMDPs

Overview

Recurrent Model-Free RL is a Strong Baseline for Many POMDPs

Welcome to the POMDP world! This repo provides some simple baselines for POMDPs, specifically the recurrent model-free RL, for the following paper

Paper: arXiv Numeric Results: google drive

by Tianwei Ni, Benjamin Eysenbach and Ruslan Salakhutdinov.

Installation

First download this repo into your local directory (preferably on a cluster or a server) <local_path>. Then we recommend to use a virtual env to install all the dependencies. For example, we install using miniconda:

conda env create -f install.yml
conda activate pomdp

The yaml file includes all the dependencies (e.g. PyTorch, PyBullet) used in our experiments (including compared methods), but there are two exceptions:

  • To run Cheetah-Vel in meta RL, you have to install MuJoCo with a license
  • To run robust RL and generalization in RL experiments, you have to install roboschool.
    • We found it hard to install roboschool from scratch, therefore we provide a docker file roboschool.sif in google drive that contains roboschool and the other necessary libraries, adapted from SunBlaze repo.
    • To download and activate the docker file by singularity on a cluster (on a single server should be similar):
    # download roboschool.sif from the google drive to envs/rl-generalization/roboschool.sif
    # then run singularity shell
    singularity shell --nv -H <local_path>:/home envs/rl-generalization/roboschool.sif
    • Then you can test it by import roboschool in a python3 shell.

General Form to Run Our Implementation of Recurrent Model-Free RL and Compared Methods

Basically, we use .yml file in configs/ folder for each subarea of POMDPs. To run our implementation, in <local_path> simply use

export PYTHONPATH=${PWD}:$PYTHONPATH
python3 policies/main.py configs/<subarea>/<env_name>/<algo_name>.yml

where algo_name specifies the algorithm name:

  • sac_rnn and td3_rnn correspond to our implementation of recurrent model-free RL
  • ppo_rnn and a2c_rnn correspond to (Kostrikov, 2018) implementation of recurrent model-free RL
  • vrm corresponds to VRM compared in "standard" POMDPs
  • varibad corresponds the off-policy version of original VariBAD compared in meta RL
  • MRPO correspond to MRPO compared in robust RL

We have merged the prior methods above into our repository (there is no need to install other repositories), so that future work can use this single repository to run a number of baselines besides ours: A2C-GRU, PPO-GRU, VRM, VariBAD, MRPO. Since our code is heavily drawn from those prior works, we encourage authors to cite those prior papers or implementations. For the compared methods, we use their open-sourced implementation with their default hyperparameters.

Specific Running Commands for Each Subarea

Please see run_commands.md for details on running our implementation of recurrent model-free RL and also all the compared methods.

A Minimal Example to Run Our Implementation

Here we provide a stand-alone minimal example with the least dependencies to run our implementation of recurrent model-free RL!

Only requires PyTorch and PyBullet, no need to install MuJoCo or roboschool, no external configuration file.

Simply open the Jupyter Notebook example.ipynb and it contains the training and evaluation procedure on a toy POMDP environment (Pendulum-V). It only costs < 20 min to run the whole process.

Details of Our Implementation of Recurrent Model-Free RL: Decision Factors, Best Variants, Code Features

Please see our_details.md for more information on:

  • How to tune the decision factors discussed in the paper in the configuration files
  • How to tune the other hyperparameters that are also important to training
  • Where is the core class of our recurrent model-free RL and the RAM-efficient replay buffer
  • Our best variants in subarea and numeric results on all the bar charts and learning curves

Acknowledgement

Please see acknowledge.md for details.

Citation

If you find our code useful to your work, please consider citing our paper:

@article{ni2021recurrentrl,
  title={Recurrent Model-Free RL is a Strong Baseline for Many POMDPs},
  author={Ni, Tianwei and Eysenbach, Benjamin and Salakhutdinov, Ruslan},
  year={2021}
}

Contact

If you have any questions, please create an issue in this repo or contact Tianwei Ni ([email protected])

Owner
Tianwei V. Ni
Efficient coding excites me. Good research surprises me.
Tianwei V. Ni
Robust fine-tuning of zero-shot models

Robust fine-tuning of zero-shot models This repository contains code for the paper Robust fine-tuning of zero-shot models by Mitchell Wortsman*, Gabri

224 Dec 29, 2022
Background Matting: The World is Your Green Screen

Background Matting: The World is Your Green Screen By Soumyadip Sengupta, Vivek Jayaram, Brian Curless, Steve Seitz, and Ira Kemelmacher-Shlizerman Th

Soumyadip Sengupta 4.6k Jan 04, 2023
2021 credit card consuming recommendation

2021 credit card consuming recommendation

Wang, Chung-Che 7 Mar 08, 2022
This is a demo app to be used in the video streaming applications

MoViDNN: A Mobile Platform for Evaluating Video Quality Enhancement with Deep Neural Networks MoViDNN is an Android application that can be used to ev

ATHENA Christian Doppler (CD) Laboratory 7 Jul 21, 2022
Code for EMNLP2021 paper "Allocating Large Vocabulary Capacity for Cross-lingual Language Model Pre-training"

VoCapXLM Code for EMNLP2021 paper Allocating Large Vocabulary Capacity for Cross-lingual Language Model Pre-training Environment DockerFile: dancingso

Bo Zheng 15 Jul 28, 2022
YOLO-v5 기반 단안 카메라의 영상을 활용해 차간 거리를 일정하게 유지하며 주행하는 Adaptive Cruise Control 기능 구현

자율 주행차의 영상 기반 차간거리 유지 개발 Table of Contents 프로젝트 소개 주요 기능 시스템 구조 디렉토리 구조 결과 실행 방법 참조 팀원 프로젝트 소개 YOLO-v5 기반으로 단안 카메라의 영상을 활용해 차간 거리를 일정하게 유지하며 주행하는 Adap

14 Jun 29, 2022
Sample Code for "Pessimism Meets Invariance: Provably Efficient Offline Mean-Field Multi-Agent RL"

Sample Code for "Pessimism Meets Invariance: Provably Efficient Offline Mean-Field Multi-Agent RL" This is the official codebase for Pessimism Meets I

3 Sep 19, 2022
Official Implementation of "LUNAR: Unifying Local Outlier Detection Methods via Graph Neural Networks"

LUNAR Official Implementation of "LUNAR: Unifying Local Outlier Detection Methods via Graph Neural Networks" Adam Goodge, Bryan Hooi, Ng See Kiong and

Adam Goodge 25 Dec 28, 2022
tensorflow implementation of 'YOLO : Real-Time Object Detection'

YOLO_tensorflow (Version 0.3, Last updated :2017.02.21) 1.Introduction This is tensorflow implementation of the YOLO:Real-Time Object Detection It can

Jinyoung Choi 1.7k Nov 21, 2022
MonoRCNN is a monocular 3D object detection method for automonous driving

MonoRCNN MonoRCNN is a monocular 3D object detection method for automonous driving, published at ICCV 2021. This project is an implementation of MonoR

87 Dec 27, 2022
An implementation of the proximal policy optimization algorithm

PPO Pytorch C++ This is an implementation of the proximal policy optimization algorithm for the C++ API of Pytorch. It uses a simple TestEnvironment t

Martin Huber 59 Dec 09, 2022
Generalized and Efficient Blackbox Optimization System.

OpenBox Doc | OpenBox中文文档 OpenBox: Generalized and Efficient Blackbox Optimization System OpenBox is an efficient and generalized blackbox optimizatio

DAIR Lab 238 Dec 29, 2022
for a paper about leveraging discourse markers for training new models

TSLM-DISCOURSE-MARKERS Scope This repository contains: (1) Code to extract discourse markers from wikipedia (TSA). (1) Code to extract significant dis

International Business Machines 6 Nov 02, 2022
MaRS - a recursive filtering framework that allows for truly modular multi-sensor integration

The Modular and Robust State-Estimation Framework, or short, MaRS, is a recursive filtering framework that allows for truly modular multi-sensor integration

Control of Networked Systems - University of Klagenfurt 143 Dec 29, 2022
Neural Turing Machine (NTM) & Differentiable Neural Computer (DNC) with pytorch & visdom

Neural Turing Machine (NTM) & Differentiable Neural Computer (DNC) with pytorch & visdom Sample on-line plotting while training(avg loss)/testing(writ

Jingwei Zhang 269 Nov 15, 2022
High-performance moving least squares material point method (MLS-MPM) solver.

High-Performance MLS-MPM Solver with Cutting and Coupling (CPIC) (MIT License) A Moving Least Squares Material Point Method with Displacement Disconti

Yuanming Hu 2.2k Dec 31, 2022
Pytorch implementation of few-shot semantic image synthesis

Few-shot Semantic Image Synthesis Using StyleGAN Prior Our method can synthesize photorealistic images from dense or sparse semantic annotations using

40 Sep 26, 2022
Experimental code for paper: Generative Adversarial Networks as Variational Training of Energy Based Models

Experimental code for paper: Generative Adversarial Networks as Variational Training of Energy Based Models, under review at ICLR 2017 requirements: T

Shuangfei Zhai 18 Mar 05, 2022
A Jupyter notebook to play with NVIDIA's StyleGAN3 and OpenAI's CLIP for a text-based guided image generation.

A Jupyter notebook to play with NVIDIA's StyleGAN3 and OpenAI's CLIP for a text-based guided image generation.

Eugenio Herrera 175 Dec 29, 2022
DeLighT: Very Deep and Light-Weight Transformers

DeLighT: Very Deep and Light-weight Transformers This repository contains the source code of our work on building efficient sequence models: DeFINE (I

Sachin Mehta 440 Dec 18, 2022