The Unsupervised Reinforcement Learning Benchmark (URLB)

Last update: Dec 26, 2022

Related tags

Overview

The Unsupervised Reinforcement Learning Benchmark (URLB)

URLB provides a set of leading algorithms for unsupervised reinforcement learning where agents first pre-train without access to extrinsic rewards and then are finetuned to downstream tasks.

Requirements

We assume you have access to a GPU that can run CUDA 10.2 and CUDNN 8. Then, the simplest way to install all required dependencies is to create an anaconda environment by running

conda env create -f conda_env.yml

After the instalation ends you can activate your environment with

conda activate urlb

Implemented Agents

Agent	Command	Implementation Author(s)	Paper
ICM	`agent=icm`	Denis	paper
ProtoRL	`agent=proto`	Denis	paper
DIAYN	`agent=diayn`	Misha	paper
APT(ICM)	`agent=icm_apt`	Hao, Kimin	paper
APT(Ind)	`agent=ind_apt`	Hao, Kimin	paper
APS	`agent=aps`	Hao, Kimin	paper
SMM	`agent=smm`	Albert	paper
RND	`agent=rnd`	Kevin	paper
Disagreement	`agent=disagreement`	Catherine	paper

Available Domains

We support the following domains.

Domain	Tasks
`walker`	`stand`, `walk`, `run`, `flip`
`quadruped`	`walk`, `run`, `stand`, `jump`
`jaco`	`reach_top_left`, `reach_top_right`, `reach_bottom_left`, `reach_bottom_right`

Domain observation mode

Each domain supports two observation modes: states and pixels.

Model	Command
states	`obs_type=states`
pixels	`obs_type=pixels`

Instructions

Pre-training

To run pre-training use the pretrain.py script

python pretrain.py agent=icm domain=walker

or, if you want to train a skill-based agent, like DIAYN, run:

python pretrain.py agent=diayn domain=walker

This script will produce several agent snapshots after training for 100k, 500k, 1M, and 2M frames. The snapshots will be stored under the following directory:

./pretrained_models/<obs_type>/<domain>/<agent>/

For example:

./pretrained_models/states/walker/icm/

Fine-tuning

Once you have pre-trained your method, you can use the saved snapshots to initialize the DDPG agent and fine-tune it on a downstream task. For example, let's say you have pre-trained ICM, you can fine-tune it on walker_run by running the following command:

python finetune.py pretrained_agent=icm task=walker_run snapshot_ts=1000000 obs_type=states

This will load a snapshot stored in ./pretrained_models/states/walker/icm/snapshot_1000000.pt, initialize DDPG with it (both the actor and critic), and start training on walker_run using the extrinsic reward of the task.

For methods that use skills, include the agent, and the reward_free tag to false.

python finetune.py pretrained_agent=smm task=walker_run snapshot_ts=1000000 obs_type=states agent=smm reward_free=false

Monitoring

Logs are stored in the exp_local folder. To launch tensorboard run:

tensorboard --logdir exp_local

The console output is also available in a form:

| train | F: 6000 | S: 3000 | E: 6 | L: 1000 | R: 5.5177 | FPS: 96.7586 | T: 0:00:42

a training entry decodes as

F  : total number of environment frames
S  : total number of agent steps
E  : total number of episodes
R  : episode return
FPS: training throughput (frames per second)
T  : total training time

The Unsupervised Reinforcement Learning Benchmark (URLB)

Related tags

Overview

The Unsupervised Reinforcement Learning Benchmark (URLB)

Requirements

Implemented Agents

Available Domains

Domain observation mode

Instructions

Pre-training

Fine-tuning

Monitoring

Owner

This thesis is mainly concerned with state-space methods for a class of deep Gaussian process (DGP) regression problems

Code and dataset for ACL2018 paper "Exploiting Document Knowledge for Aspect-level Sentiment Classification"

Face Mask Detector by live camera using tensorflow-keras, openCV and Python

MiniSom is a minimalistic implementation of the Self Organizing Maps

用强化学习DQN算法，训练AI模型来玩合成大西瓜游戏，提供Keras版本和PARL（paddle）版本

Pytorch implementation of

Generative Exploration and Exploitation - This is an improved version of GENE.

Contrastive Learning for Many-to-many Multilingual Neural Machine Translation(mCOLT/mRASP2), ACL2021

EMNLP 2021 paper Models and Datasets for Cross-Lingual Summarisation.

An image classification app boilerplate to serve your deep learning models asap!

A PyTorch-centric hybrid classical-quantum machine learning framework

Udacity Suse Cloud Native Foundations Scholarship Course Walkthrough

DECAF: Generating Fair Synthetic Data Using Causally-Aware Generative Networks

Minimisation of a negative log likelihood fit to extract the lifetime of the D^0 meson (MNLL2ELDM)

Official Code Release for "CLIP-Adapter: Better Vision-Language Models with Feature Adapters"

Code for unmixing audio signals in four different stems "drums, bass, vocals, others". The code is adapted from "Jukebox: A Generative Model for Music"

Deep learning image registration library for PyTorch

Fine-Tune EleutherAI GPT-Neo to Generate Netflix Movie Descriptions in Only 47 Lines of Code Using Hugginface And DeepSpeed

Spherical Confidence Learning for Face Recognition, accepted to CVPR2021.

Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps[AAAI2021]