Continual reinforcement learning baselines: experiment specifications, implementation of existing methods, and common metrics. Easily extensible to new methods.

Overview

Continual Reinforcement Learning

This repository provides a simple way to run continual reinforcement learning experiments in PyTorch, including evaluating existing baseline algorithms, writing your own agents, and specifying custom experiments.

Benchmark results can be seen in our paper (coming soon).

Quick Start (trains CLEAR on Atari)

Clone the repo, and cd into it.

pip install torch>=1.7.1 torchvision
pip install -e .
OMP_NUM_THREADS=1 CUDA_VISIBLE_DEVICES=0 PYTHONUNBUFFERED=1 python main.py --config-file configs/atari/clear_atari.json --output-dir tmp

Changelog

  • 09/15/21: Second pre-release of continual-RL codebase with Procgen, Minihack, CHORES benchmarks. Results in paper (coming soon).
  • 07/26/21: Pre-release of continual-RL codebase with Atari benchmark results in ATARI_RESULTS.md

Getting Started

Setup your environment

There are two flavors of installation: pip and conda.

Pip setup

pip install torch>=1.7.1 torchvision
pip install -e .

Depending on your platform you may need a different torch installation command. See https://pytorch.org/

If you prefer not to install continual_rl as a pip package, you can alternatively do pip install -r requirements.txt

Conda Setup

  1. Run this command to set up a conda environment with the required packages:

    conda env create -f environment.yml -n <venv_name> 
    

    Replace <venv_name> with a virtual environment name of your choosing. If you leave off the -n argument, the default name venv_continual_rl will be used.

  2. Activate your new virtual environment: conda activate <venv_name>

Benchmark Setup

Installation instructions for each benchmark are provided in BENCHMARK_INSTALL.md

Run an experiment (Command-line Mode)

An experiment is a list of tasks, executed sequentially. Each task manages the training of a policy on a single environment. A simple experiment can be run with:

python main.py --policy ppo --experiment mini_atari_3_tasks_3_cycles

The available policies are in continual_rl/available_policies.py. The available experiments are in continual_rl/experiment_specs.py.

Additional command-line arguments

In addition to --policy and --experiment, the following command-line arguments to main.py are also permitted:

  • --output-dir [tmp/<policy>_<experiment>_<timestamp>]: Where logs and saved models are stored

Any policy configuration changes can be made simply by appending --param new_value to the arguments passed to main. The default policy configs (e.g. hyperparameters) are in the config.py file within the policy's folder, and any of them can be set in this way.

For example:

python main.py --policy ppo --experiment mini_atari_3_tasks_3_cycles --learning_rate 1e-3

will override the default learning_rate, and instead use 1e-3.

Run an Experiment (Configuration File)

There is another way experiments can be run: in "config-file" mode instead of "command-line".

Configuration files are an easy way to keep track of large numbers of experiments, and enables resuming an experiment from where it left off.

A configuration file contains JSON representing a list of dictionaries, where each dictionary is a single experiment's configuration. The parameters in the dictionary are all exactly the same as those used by the command line (without --). In other words, they are the config settings found in the policy's config.py file. Example config files can be found in configs/.

When you run the code with:

python main.py --config-file <path_to_file/some_config_file.json> [--output-dir tmp]

A new folder with the name "some_config_file" will be created in output-dir (tmp if otherwise unspecified).

Each experiment in some_config_file.json will be executed sequentially, creating subfolders "0", "1", "2", etc. under output_dir/some_config_file. The subfolder number corresponds to the index of the experiment in the config file's list. Each time the command above is run, it will find the first experiment not yet started by finding the first missing numbered subfolder in output-dir. Thus you can safely run the same command on multiple machines (if they share a filesystem) or multiple sessions on the same machine, and each will be executing different experiments in your queue.

If you wish to resume an experiment from where it left off, you can add the argument:

--resume-id n

and it will resume the experiment corresponding to subfolder n. (This can also be used to start an experiment by its run id even if it hasn't been run yet, i.e. skipping forward in the config file's list.)

Environment Variables

Useful environment variables:

  1. OpenMP thread limit (required for IMPALA-based policies)

    OMP_NUM_THREADS=1
    
  2. Which CUDA devices are visible to the code. In this example GPUs 0 and 1.

    CUDA_VISIBLE_DEVICES=0,1
    
  3. Display Python log messages immediately to the terminal.

    PYTHONUNBUFFERED=1
    

Custom Code

High Level Code Structure

Experiments

An experiment is a list of tasks, executed sequentially. Each task represents the training of an agent on a single environment. The default set of experiments can be seen in experiment_spec.py.

Conceptually, experiments and tasks contain information that should be consistent between runs of the experiment across different algorithms, to maintain a consistent setting for a baseline.

Each task has a type (i.e. subclasses TaskBase) based on what type of preprocessing the observation may require. For instance, ImageTasks will resize your image to the specified size, and permute the channels to match PyTorch's requirements. Only the most basic pre-processing happens here; everything else should be handled by the policy.

Policies

Policies are the core of how an agent operates, and have 3 key functions:

  1. During compute_action(), given an observation, a policy produces an action to take and an instance of TimestepData containing information necessary for the train step.
  2. In get_environment_runner(), Policies specify what type of EnvironmentRunner they should be run with, described further below.
  3. During policy.train() the policy updates its parameters according to the data collected and passed in.

Policy configuration files allow convenient specification of hyperparameters and feature flags, easily settable either via command line (--my_arg) or in a config file.

Conceptually, policies (and policy_configs) contain information that is specific to the algorithm currently being run, and is not expected to be held consistent for experiments using other policies (e.g. clip_eps for PPO).

Environment Runners

EnvironmentRunners specify how the environment should be called; they contain the core loop (observation to action to next observation). The available EnvironmentRunners are:

  1. EnvironmentRunnerSync: Runs environments individually, synchronously.
  2. EnvironmentRunnerBatch: Passes a batch of observations to policy.compute_action, and runs the environments in parallel.
  3. EnvironmentRunnerFullParallel: Spins up a specified number of processes and runs the environment for n steps (or until the end of the episode) separately on each.

More detail about what each Runner provides to the policy are specified in the collect_data method of each Runner.

Creating a new Policy

  1. Duplicate the prototype folder in policies/, renaming it to something distinctive to your new policy.

  2. Rename all other instances of the word "prototype" in filenames, class names, and imports in your new directory.

  3. Your X_policy_config.py file contains all configurations that will automatically be accepted as command line arguments or config file parameters, provided you follow the pattern provided (add a new instance variable, and an entry in _load_dict_internal that populates the variable from a provided dictionary, or utilize _auto_load_class_parameters).

  4. Your X_policy.py file contains the meat of your implementation. What each method requires is described fully in policy_base.py

  5. Your X_timestep_data.py file contains any data you want stored from your compute_action, to be passed to your train step. This object contains one timestep's worth of data, and its .reward and .done will be populated by the environment_runner you select in your X_policy.py file.

    Note: if not using the FullParallel runner, compute_action can instead save data off into a more efficient structure. See: ppo_policy.py

  6. Create unit tests in tests/policies/X_policy (highly encouraged as much as possible)

  7. Add a new entry to available_policies.py

Create a new Environment Runner

If your policy requires a custom environment collection loop, you may consider subclassing EnvironmentRunnerBase. An example of doing this can be seen in IMPALA.

Create a new Experiment

Each entry in the experiments dictionary in experiment_specs.py contains a lambda that, when called, returns an instance of Experiment. The only current requirements for the list of tasks is that the observation space is the same for all tasks, and that the action space is discrete. How different sizes of action space is handled is up to the policy.

Create a new Task

If you want to create an experiment with tasks that cannot be handled with the existing TaskBase subclasses, implement a subclass of TaskBase according to the methods defined therein, using the existing tasks as a guide.

Create unit tests in tests/experiments/tasks/X_task.py

Contributing to the repository

Yes, please do! Pull requests encouraged. Please run pytest on the repository before submitting.

If there's anything you want to customize that does not seem possible, or seems overly challenging, feel free to file an issue in the issue tracker and I'll look into it as soon as possible.

This repository uses pytest for tests. Please write tests wherever possible, in the appropriate folder in tests/.

Comments
  • Best place to add code that will act as

    Best place to add code that will act as "callbacks"?

    I'm interested in extending continual_rl by implementing some of my group's own continual RL algorithm ideas. However, there has been some incompatibilities from what I can see so far. I was wondering if anyone had any advice on some of these issues. In particular, I'm interested in running code related to the policy once a task completes (i.e., code that would normally be put inside the policy class). As I mentioned, from what I can tell, there doesn't seem to be a great way of doing this so I figured I'd ask a few questions before proceeding.

    First, are there any recommended ways of doing this from the policy side without editing any source code? Meaning, solely from my own custom policy class which inherits from PolicyBase.

    Second, would editing the experiment and policy base classes be a reasonable method for approaching this. Essentially, I'd add "callbacks" to the experiment loop that call methods from the policy before a cycle, after a cycle, before a task, and after a task. This is currently the approach I'm learning towards.

    If you have any other suggestions or questions feel free to let me know.

    opened by Bpoole908 7
  • Cannot run the impala-based policy

    Cannot run the impala-based policy

    Hi, thanks for sharing your code.

    I am trying to run with IMPALA-based policy in configuration-file mode, with configs/procgen/impala_procgen.json

    The command that I used: python main.py --config-file configs/procgen/impala_procgen.json

    However, I got all mean_episdoe_return equal to nan as follows (the outputs for setup are ignored):

    Steps 0 @ 0.0 SPS. Mean return nan. Stats:
    {'mean_episode_return': nan}
    [INFO:4981 monobeast:782 2021-11-05 16:09:24,182] Steps 0 @ 0.0 SPS. Mean return nan. Stats:
    {'mean_episode_return': nan}
    Steps 0 @ 0.0 SPS. Mean return nan. Stats:
    {'mean_episode_return': nan}
    [INFO:4981 monobeast:782 2021-11-05 16:09:29,189] Steps 0 @ 0.0 SPS. Mean return nan. Stats:
    {'mean_episode_return': nan}
    Steps 0 @ 0.0 SPS. Mean return nan. Stats:
    {'mean_episode_return': nan}
    [INFO:4981 monobeast:782 2021-11-05 16:09:34,195] Steps 0 @ 0.0 SPS. Mean return nan. Stats:
    {'mean_episode_return': nan}
    Steps 0 @ 0.0 SPS. Mean return nan. Stats:
    {'mean_episode_return': nan}
    [INFO:4981 monobeast:782 2021-11-05 16:09:39,202] Steps 0 @ 0.0 SPS. Mean return nan. Stats:
    {'mean_episode_return': nan}
    

    I found that while interacting with the environment, the act() function is stopped at this point https://github.com/AGI-Labs/continual_rl/blob/bcf17d879e8a983340be233ff8f740c424d0f303/continual_rl/policies/impala/torchbeast/monobeast.py#L226. I have tried to check in forward() in the class ImpalaNet() and saw that it stopped at x = self._conv_net(x) https://github.com/AGI-Labs/continual_rl/blob/bcf17d879e8a983340be233ff8f740c424d0f303/continual_rl/policies/impala/nets.py#L55. I did some modifications by replacing it x = torch.rand(x.shape[0], 512), and the code can run successfully.

    I guess the problem may be caused by the shared memory of the actor's model? Can you give some hints to fix that?

    I have set up for 1 actor and 1 learner Here is some OS information:

    • OS: Ubuntu 20.04
    • Python: 3.7.11 with conda
    • GPU: RTX 3080
    • RAM: 128 GB
    • torch: 1.1.1+cu11.0
    • Docker: No
    • Branch used: develop

    Update: In ConvNet84x84 class, I manually set the intermediate_dim=1024 (for Procgen) and comment out this line https://github.com/AGI-Labs/continual_rl/blob/bcf17d879e8a983340be233ff8f740c424d0f303/continual_rl/utils/common_nets.py#L62, so code is runnable now. Maybe when forwarding the model with dummy input before calling init() causes problem?

    opened by tunglm2203 2
  • Clear replay buffer size

    Clear replay buffer size

    I just wanted to check that the replay buffer size for Clear (denoted "replay_buffer_frames" in the config files) is based on the number of transitions and not the number of unrolled trajectories?

    Thanks in advance!

    opened by skezle 1
  • NameNotFound error

    NameNotFound error

    Hi,

    I ran the the following code: python main.py --policy ppo --experiment procgen_6_tasks_5_cycles

    and I got this error: gym.error.NameNotFound: Environment procgen:procgen-climber doesn't exist.

    Any idea?

    opened by raymondchua 1
  • Enabling common configs (in this case, save frequency) for a policy, …

    Enabling common configs (in this case, save frequency) for a policy, …

    …which can be specified via config file or command line as normal. Also making it so task params can be passed in (though presently they're just left as the defaults). Also gym 0.26 is not backwards compatible, so enforcing an older version for now, since updating the codebase is less trivial. Also fixed one of the deprecation issues.

    opened by SamNPowers 0
  • Minihack benchmark with impala doesn't finish running

    Minihack benchmark with impala doesn't finish running

    I am trying to run the minihack benchmarks with impala. But after 35M steps the job dies and a lot of RAM is taken up by python processes. Impala doesn't use experience replay like clear to prevent forgetting, so I am not sure what is taking up all the RAM. I get the following errors from the log files after running:

    OMP_NUM_THREADS=1 python main.py --config-file configs/minihack/impala_minihack.json --resume-id=0.

    I get the following error in impala_logs.log:

    Screen Shot 2022-05-20 at 11 53 06 AM

    I get the following error in core_process.log:

    Screen Shot 2022-05-18 at 11 01 04 AM

    Any idea what this could be?

    opened by skezle 16
Releases(v1.0.0-alpha)
  • v1.0.0-alpha(Jul 27, 2021)

    The initial framework for developing policies and experiments for the continual reinforcement learning setting, where tasks are learned sequentially rather than concurrently. Initial policies include PPO, Impala, CLEAR, Progress & Compress, and Elastic Weight Consolidation.

    Source code(tar.gz)
    Source code(zip)
TransVTSpotter: End-to-end Video Text Spotter with Transformer

TransVTSpotter: End-to-end Video Text Spotter with Transformer Introduction A Multilingual, Open World Video Text Dataset and End-to-end Video Text Sp

weijiawu 66 Dec 26, 2022
PyTorch code of my WACV 2022 paper Improving Model Generalization by Agreement of Learned Representations from Data Augmentation

Improving Model Generalization by Agreement of Learned Representations from Data Augmentation (WACV 2022) Paper ArXiv Why it matters? When data augmen

Rowel Atienza 5 Mar 04, 2022
Hardware accelerated, batchable and differentiable optimizers in JAX.

JAXopt Installation | Examples | References Hardware accelerated (GPU/TPU), batchable and differentiable optimizers in JAX. Installation JAXopt can be

Google 621 Jan 08, 2023
Patch-Diffusion Code (AAAI2022)

Patch-Diffusion This is an official PyTorch implementation of "Patch Diffusion: A General Module for Face Manipulation Detection" in AAAI2022. Require

H 7 Nov 02, 2022
Implementation for paper: Self-Regulation for Semantic Segmentation

Self-Regulation for Semantic Segmentation This is the PyTorch implementation for paper Self-Regulation for Semantic Segmentation, ICCV 2021. Citing SR

Dong ZHANG 30 Nov 21, 2022
A tool for making map images from OpenTTD save games

OpenTTD Surveyor A tool for making map images from OpenTTD save games. This is not part of the main OpenTTD codebase, nor is it ever intended to be pa

Aidan Randle-Conde 9 Feb 15, 2022
Generalized Decision Transformer for Offline Hindsight Information Matching

Generalized Decision Transformer for Offline Hindsight Information Matching [arxiv] If you use this codebase for your research, please cite the paper:

Hiroki Furuta 35 Dec 12, 2022
This is the official implementation of TrivialAugment and a mini-library for the application of multiple image augmentation strategies including RandAugment and TrivialAugment.

Trivial Augment This is the official implementation of TrivialAugment (https://arxiv.org/abs/2103.10158), as was used for the paper. TrivialAugment is

AutoML-Freiburg-Hannover 94 Dec 30, 2022
Federated Deep Reinforcement Learning for the Distributed Control of NextG Wireless Networks.

FDRL-PC-Dyspan Federated Deep Reinforcement Learning for the Distributed Control of NextG Wireless Networks. This repository contains the entire code

Peyman Tehrani 17 Nov 18, 2022
StyleSpace Analysis: Disentangled Controls for StyleGAN Image Generation

StyleSpace Analysis: Disentangled Controls for StyleGAN Image Generation Demo video: CVPR 2021 Oral: Single Channel Manipulation: Localized or attribu

Zongze Wu 267 Dec 30, 2022
This is an official implementation for "Self-Supervised Learning with Swin Transformers".

Self-Supervised Learning with Vision Transformers By Zhenda Xie*, Yutong Lin*, Zhuliang Yao, Zheng Zhang, Qi Dai, Yue Cao and Han Hu This repo is the

Swin Transformer 529 Jan 02, 2023
Pytorch implementation of TailCalibX : Feature Generation for Long-tail Classification

TailCalibX : Feature Generation for Long-tail Classification by Rahul Vigneswaran, Marc T. Law, Vineeth N. Balasubramanian, Makarand Tapaswi [arXiv] [

Rahul Vigneswaran 34 Jan 02, 2023
Real-Time and Accurate Full-Body Multi-Person Pose Estimation&Tracking System

News! Aug 2020: v0.4.0 version of AlphaPose is released! Stronger tracking! Include whole body(face,hand,foot) keypoints! Colab now available. Dec 201

Machine Vision and Intelligence Group @ SJTU 6.7k Dec 28, 2022
Neural Motion Learner With Python

Neural Motion Learner Introduction This work is to extract skeletal structure from volumetric observations and to learn motion dynamics from the detec

Jinseok Bae 14 Nov 28, 2022
A collection of pre-trained StyleGAN2 models trained on different datasets at different resolution.

Awesome Pretrained StyleGAN2 A collection of pre-trained StyleGAN2 models trained on different datasets at different resolution. Note the readme is a

Justin 1.1k Dec 24, 2022
Pseudo-Visual Speech Denoising

Pseudo-Visual Speech Denoising This code is for our paper titled: Visual Speech Enhancement Without A Real Visual Stream published at WACV 2021. Autho

Sindhu 94 Oct 22, 2022
Very Deep Convolutional Networks for Large-Scale Image Recognition

pytorch-vgg Some scripts to convert the VGG-16 and VGG-19 models [1] from Caffe to PyTorch. The converted models can be used with the PyTorch model zo

Justin Johnson 217 Dec 05, 2022
Devkit for 3D -- Some utils for 3D object detection based on Numpy and Pytorch

D3D Devkit for 3D: Some utils for 3D object detection and tracking based on Numpy and Pytorch Please consider siting my work if you find this library

Jacob Zhong 27 Jul 07, 2022
Axel - 3D printed robotic hands and they controll with Raspberry Pi and Arduino combo

Axel It's our graduation project about 3D printed robotic hands and they control

0 Feb 14, 2022
Official code for Score-Based Generative Modeling through Stochastic Differential Equations

Score-Based Generative Modeling through Stochastic Differential Equations This repo contains the official implementation for the paper Score-Based Gen

Yang Song 818 Jan 06, 2023