[ICLR 2021] Rank the Episodes: A Simple Approach for Exploration in Procedurally-Generated Environments.

Related tags

Deep Learningrapid
Overview

[ICLR 2021] RAPID: A Simple Approach for Exploration in Reinforcement Learning

This is the Tensorflow implementation of ICLR 2021 paper Rank the Episodes: A Simple Approach for Exploration in Procedurally-Generated Environments. We propose a simple method RAPID for exploration through scroring the previous episodes and reproducing the good exploration behaviors with imitation learning. overview

The implementation is based on OpenAI baselines. For all the experiments, add the option --disable_rapid to see the baseline result. RAPID can achieve better performance and sample efficiency than state-of-the-art exploration methods on MiniGrid environments. rendering performance

Cite This Work

@inproceedings{
zha2021rank,
title={Rank the Episodes: A Simple Approach for Exploration in Procedurally-Generated Environments},
author={Daochen Zha and Wenye Ma and Lei Yuan and Xia Hu and Ji Liu},
booktitle={International Conference on Learning Representations},
year={2021},
url={https://openreview.net/forum?id=MtEE0CktZht}
}

Installation

Please make sure that you have Python 3.5+ installed. First, clone the repo with

git clone https://github.com/daochenzha/rapid.git
cd rapid

Then install the dependencies with pip:

pip install -r requirements.txt
pip install -e .

To run MuJoCo experiments, you need to have the MuJoCo license. Install mujoco-py with

pip install mujoco-py==1.50.1.68

How to run the code

The entry is main.py. Some important hyperparameters are as follows.

  • --env: what environment to be used
  • --num_timesteps: the number of timesteps to be run
  • --w0: the weight of extrinsic reward score
  • --w1: the weight of local score
  • --w2: the weight of global score
  • --sl_until: do the RAPID update until which timestep
  • --disable_rapid: use it to compare with PPO baseline
  • --log_dir: the directory to save logs

Reproducing the result of MiniGrid environments

For MiniGrid-KeyCorridorS3R2, run

python main.py --env MiniGrid-KeyCorridorS3R2-v0 --sl_until 1200000

For MiniGrid-KeyCorridorS3R3, run

python main.py --env MiniGrid-KeyCorridorS3R3-v0 --sl_until 3000000

For other environments, run

python main.py --env $ENV

where $ENV is the environment name.

Run MiniWorld Maze environment

  1. Clone the latest master branch of MiniWorld and install it
git clone -b master --single-branch --depth=1 https://github.com/maximecb/gym-miniworld.git
cd gym-miniwolrd
pip install -e .
cd ..
  1. Start training with
python main.py --env MiniWorld-MazeS5-v0 --num_timesteps 5000000 --nsteps 512 --w1 0.00001 --w2 0.0 --log_dir results/MiniWorld-MazeS5-v0

For server without screens, you may install xvfb with

apt-get install xvfb

Then start training with

xvfb-run -a -s "-screen 0 1024x768x24 -ac +extension GLX +render -noreset" python main.py --env MiniWorld-MazeS5-v0 --num_timesteps 5000000 --nsteps 512 --w1 0.00001 --w2 0.0 --log_dir results/MiniWorld-MazeS5-v0

Run MuJoCo experiments

Run

python main.py --seed 0 --env $env --num_timesteps 5000000 --lr 5e-4 --w1 0.001 --w2 0.0 --log_dir logs/$ENV/rapid

where $ENV can be EpisodeSwimmer-v2, EpisodeHopper-v2, EpisodeWalker2d-v2, EpisodeInvertedPendulum-v2, DensityEpisodeSwimmer-v2, or ViscosityEpisodeSwimmer-v2.

Owner
Daochen Zha
PhD student in Machine Learning and Data Mining
Daochen Zha
The Python3 import playground

The Python3 import playground I have been confused about python modules and packages, this text tries to clear the topic up a bit. Sources: https://ch

Michael Moser 5 Feb 22, 2022
HyperDict - Self linked dictionary in Python

Hyper Dictionary Advanced python dictionary(hash-table), which can link it-self

8 Feb 06, 2022
Knowledge Distillation Toolbox for Semantic Segmentation

SegDistill: Toolbox for Knowledge Distillation on Semantic Segmentation Networks This repo contains the supported code and configuration files for Seg

9 Dec 12, 2022
🔥 TensorFlow Code for technical report: "YOLOv3: An Incremental Improvement"

🆕 Are you looking for a new YOLOv3 implemented by TF2.0 ? If you hate the fucking tensorflow1.x very much, no worries! I have implemented a new YOLOv

3.6k Dec 26, 2022
Pytorch implementation of the paper DocEnTr: An End-to-End Document Image Enhancement Transformer.

DocEnTR Description Pytorch implementation of the paper DocEnTr: An End-to-End Document Image Enhancement Transformer. This model is implemented on to

Mohamed Ali Souibgui 74 Jan 07, 2023
Code release for NeRF (Neural Radiance Fields)

NeRF: Neural Radiance Fields Project Page | Video | Paper | Data Tensorflow implementation of optimizing a neural representation for a single scene an

6.5k Jan 01, 2023
Learning Efficient Online 3D Bin Packing on Packing Configuration Trees

Learning Efficient Online 3D Bin Packing on Packing Configuration Trees This repository is being continuously updated, please stay tuned! Any code con

86 Dec 28, 2022
DaReCzech is a dataset for text relevance ranking in Czech

Dataset DaReCzech is a dataset for text relevance ranking in Czech. The dataset consists of more than 1.6M annotated query-documents pairs,

Seznam.cz a.s. 8 Jul 26, 2022
Negative Interactions for Improved Collaborative Filtering:

Negative Interactions for Improved Collaborative Filtering: Don’t go Deeper, go Higher This notebook provides an implementation in Python 3 of the alg

Harald Steck 21 Mar 05, 2022
A PyTorch implementation of Radio Transformer Networks from the paper "An Introduction to Deep Learning for the Physical Layer".

An Introduction to Deep Learning for the Physical Layer An usable PyTorch implementation of the noisy autoencoder infrastructure in the paper "An Intr

Gram.AI 120 Nov 21, 2022
Utility tools for the "Divide and Remaster" dataset, introduced as part of the Cocktail Fork problem paper

Divide and Remaster Utility Tools Utility tools for the "Divide and Remaster" dataset, introduced as part of the Cocktail Fork problem paper The DnR d

Darius Petermann 46 Dec 11, 2022
code for our ECCV-2020 paper: Self-supervised Video Representation Learning by Pace Prediction

Video_Pace This repository contains the code for the following paper: Jiangliu Wang, Jianbo Jiao and Yunhui Liu, "Self-Supervised Video Representation

Jiangliu Wang 95 Dec 14, 2022
pytorch implementation of trDesign

trdesign-pytorch This repository is a PyTorch implementation of the trDesign paper based on the official TensorFlow implementation. The initial port o

Learn Ventures Inc. 41 Dec 29, 2022
Decision Transformer: A brand new Offline RL Pattern

DecisionTransformer_StepbyStep Intro Decision Transformer: A brand new Offline RL Pattern. 这是关于NeurIPS 2021 热门论文Decision Transformer的复现。 👍 原文地址: Deci

Irving 14 Nov 22, 2022
Adversarially Learned Inference

Adversarially Learned Inference Code for the Adversarially Learned Inference paper. Compiling the paper locally From the repo's root directory, $ cd p

Mohamed Ishmael Belghazi 308 Sep 24, 2022
Geometric Deep Learning Extension Library for PyTorch

Documentation | Paper | Colab Notebooks | External Resources | OGB Examples PyTorch Geometric (PyG) is a geometric deep learning extension library for

Matthias Fey 16.5k Jan 08, 2023
Official code for "End-to-End Optimization of Scene Layout" -- including VAE, Diff Render, SPADE for colorization (CVPR 2020 Oral)

End-to-End Optimization of Scene Layout Code release for: End-to-End Optimization of Scene Layout CVPR 2020 (Oral) Project site, Bibtex For help conta

Andrew Luo 41 Dec 09, 2022
MoViNets PyTorch implementation: Mobile Video Networks for Efficient Video Recognition;

MoViNet-pytorch Pytorch unofficial implementation of MoViNets: Mobile Video Networks for Efficient Video Recognition. Authors: Dan Kondratyuk, Liangzh

189 Dec 20, 2022
Lucid library adapted for PyTorch

Lucent PyTorch + Lucid = Lucent The wonderful Lucid library adapted for the wonderful PyTorch! Lucent is not affiliated with Lucid or OpenAI's Clarity

Lim Swee Kiat 520 Dec 26, 2022
A minimalist tool to display a network graph.

A tool to get a minimalist view of any architecture This tool has only be tested with the models included in this repo. Therefore, I can't guarantee t

Thibault Castells 1 Feb 11, 2022