Independent and minimal implementations of some reinforcement learning algorithms using PyTorch (including PPO, A3C, A2C, ...).

Overview

PyTorch RL Minimal Implementations

There are implementations of some reinforcement learning algorithms, whose characteristics are as follow:

  1. Less packages-based: Only PyTorch and Gym, for building neural networks and testing algorithms' performance respectively, are necessary to install.
  2. Independent implementation: All RL algorithms are implemented in separate files, which facilitates to understand their processes and modify them to adapt to other tasks.
  3. Various expansion configurations: It's convenient to configure various parameters and tools, such as reward normalization, advantage normalization, tensorboard, tqdm and so on.

RL Algorithms List

Name Type Estimator Paper File
Q-Learning Value-based / Off policy TD Watkins et al. Q-Learning. Machine Learning, 1992 q_learning.py
REINFORCE Policy-based On policy MC Sutton et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation. In NeurIPS, 2000. reinforce.py
DQN Value-based / Off policy TD Mnih et al. Human-level control through deep reinforcement learning. Nature, 2015. doing
A2C Actor-Critic / On policy n-step TD Mnih et al. Asynchronous Methods for Deep Reinforcement Learning. In ICML, 2016. a2c.py
A3C Actor-Critic / On policy n-step TD .Mnih et al. Asynchronous Methods for Deep Reinforcement Learning. In ICML, 2016 a3c.py
ACER Actor-Critic / On policy GAE Wang et al. Sample Efficient Actor-Critic with Experience Replay. In ICLR, 2017. doing
ACKTR Actor-Critic / On policy GAE Wu et al. Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation. In NeurIPS, 2017. doing
PPO Actor-Critic / On policy GAE Schulman et al. Proximal Policy Optimization Algorithms. arXiv, 2017. ppo.py

Quick Start

Requirements

pytorch
gym

tensorboard  # for summary writer
tqdm         # for process bar

Abstract Agent

Components / Parameters

Component Description
policy neural network model
gamma discount factor of cumulative reward
lr learning rate. i.e. lr_actor, lr_critic
lr_decay weight decay to schedule the learning rate
lr_scheduler scheduler for the learning rate
coef_critic_loss coefficient of critic loss
coef_entropy_loss coefficient of entropy loss
writer summary writer to record information
buffer replay buffer to store historical trajectories
use_cuda use GPU
clip_grad gradients clipping
max_grad_norm maximum norm of gradients clipped
norm_advantage advantage normalization
open_tb open summary writer
open_tqdm open process bar

Methods

Methods Description
preprocess_obs() preprocess observation before input into the neural network
select_action() use actor network to select an action based on the policy distribution.
estimate_obs() use critic network to estimate the value of observation
update() update the parameter by calculate losses and gradients
train() set the neural network to train mode
eval() set the neural network to evaluate mode
save() save the model parameters
load() load the model parameters

Update & To-do & Limitations

Update History

  • 2021-12-09 ADD TRICK:norm_critic_loss in PPO
  • 2021-12-09 ADD PARAM: coef_critic_loss, coef_entropy_loss, log_step
  • 2021-12-07 ADD ALGO: A3C
  • 2021-12-05 ADD ALGO: PPO
  • 2021-11-28 ADD ALGO: A2C
  • 2021-11-20 ADD ALGO: Q learning, Reinforce

To-do List

  • ADD ALGO DQN, Double DQN, Dueling DQN, DDPG
  • ADD NN RNN Mode

Current Limitations

  • Unsupport Vectorized environments
  • Unsupport Continuous action space
  • Unsupport RNN-based model
  • Unsupport Imatation learning

Reference & Acknowledgements

Owner
Gemini Light
Gemini Light
Retrieval.pytorch - The code we used in [2020 DIGIX]

Retrieval.pytorch - The code we used in [2020 DIGIX]

Guo-Hua Wang 2 Feb 07, 2022
AI pipelines for Nvidia Jetson Platform

Jetson Multicamera Pipelines Easy-to-use realtime CV/AI pipelines for Nvidia Jetson Platform. This project: Builds a typical multi-camera pipeline, i.

NVIDIA AI IOT 96 Dec 23, 2022
Classifies galaxy morphology with Bayesian CNN

Zoobot Zoobot classifies galaxy morphology with deep learning. This code will let you: Reproduce and improve the Galaxy Zoo DECaLS automated classific

Mike Walmsley 39 Dec 20, 2022
Implementation of our recent paper, WOOD: Wasserstein-based Out-of-Distribution Detection.

WOOD Implementation of our recent paper, WOOD: Wasserstein-based Out-of-Distribution Detection. Abstract The training and test data for deep-neural-ne

8 Dec 24, 2022
[BMVC2021] "TransFusion: Cross-view Fusion with Transformer for 3D Human Pose Estimation"

TransFusion-Pose TransFusion: Cross-view Fusion with Transformer for 3D Human Pose Estimation Haoyu Ma, Liangjian Chen, Deying Kong, Zhe Wang, Xingwei

Haoyu Ma 29 Dec 23, 2022
[CoRL 21'] TANDEM: Tracking and Dense Mapping in Real-time using Deep Multi-view Stereo

TANDEM: Tracking and Dense Mapping in Real-time using Deep Multi-view Stereo Lukas Koestler1*    Nan Yang1,2*,†    Niclas Zeller2,3    Daniel Cremers1

TUM Computer Vision Group 744 Jan 04, 2023
Official implementation of paper "Query2Label: A Simple Transformer Way to Multi-Label Classification".

Introdunction This is the official implementation of the paper "Query2Label: A Simple Transformer Way to Multi-Label Classification". Abstract This pa

Shilong Liu 274 Dec 28, 2022
A Lightweight Experiment & Resource Monitoring Tool 📺

Lightweight Experiment & Resource Monitoring 📺 "Did I already run this experiment before? How many resources are currently available on my cluster?"

170 Dec 28, 2022
DeepStruc is a Conditional Variational Autoencoder which can predict the mono-metallic nanoparticle from a Pair Distribution Function.

ChemRxiv | [Paper] XXX DeepStruc Welcome to DeepStruc, a Deep Generative Model (DGM) that learns the relation between PDF and atomic structure and the

Emil Thyge Skaaning Kjær 13 Aug 01, 2022
Official repository for the ICLR 2021 paper Evaluating the Disentanglement of Deep Generative Models with Manifold Topology

Official repository for the ICLR 2021 paper Evaluating the Disentanglement of Deep Generative Models with Manifold Topology Sharon Zhou, Eric Zelikman

Stanford Machine Learning Group 34 Nov 16, 2022
A GUI for Face Recognition, based upon Docker, Tkinter, GPU and a camera device.

Face Recognition GUI This repository is a GUI version of Face Recognition by Adam Geitgey, where e.g. Docker and Tkinter are utilized. All the materia

Kasper Henriksen 6 Dec 05, 2022
Codes and Data Processing Files for our paper.

Code Scripts and Processing Files for EEG Sleep Staging Paper 1. Folder Tree ./src_preprocess (data preprocessing files for SHHS and Sleep EDF) sleepE

Chaoqi Yang 18 Dec 12, 2022
Repository for tackling Kaggle Ultrasound Nerve Segmentation challenge using Torchnet.

Ultrasound Nerve Segmentation Challenge using Torchnet This repository acts as a starting point for someone who wants to start with the kaggle ultraso

Qure.ai 46 Jul 18, 2022
A Comprehensive Empirical Study of Vision-Language Pre-trained Model for Supervised Cross-Modal Retrieval

CLIP4CMR A Comprehensive Empirical Study of Vision-Language Pre-trained Model for Supervised Cross-Modal Retrieval The original data and pre-calculate

24 Dec 26, 2022
Model parallel transformers in Jax and Haiku

Mesh Transformer Jax A haiku library using the new(ly documented) xmap operator in Jax for model parallelism of transformers. See enwik8_example.py fo

Ben Wang 4.8k Jan 01, 2023
Multi Task RL Baselines

MTRL Multi Task RL Algorithms Contents Introduction Setup Usage Documentation Contributing to MTRL Community Acknowledgements Introduction M

Facebook Research 171 Jan 09, 2023
Learning RGB-D Feature Embeddings for Unseen Object Instance Segmentation

Unseen Object Clustering: Learning RGB-D Feature Embeddings for Unseen Object Instance Segmentation Introduction In this work, we propose a new method

NVIDIA Research Projects 132 Dec 13, 2022
Run Effective Large Batch Contrastive Learning on Limited Memory GPU

Gradient Cache Gradient Cache is a simple technique for unlimitedly scaling contrastive learning batch far beyond GPU memory constraint. This means tr

Luyu Gao 198 Dec 29, 2022
AOT-GAN for High-Resolution Image Inpainting (codebase for image inpainting)

AOT-GAN for High-Resolution Image Inpainting Arxiv Paper | AOT-GAN: Aggregated Contextual Transformations for High-Resolution Image Inpainting Yanhong

Multimedia Research 214 Jan 03, 2023
A project which aims to protect your privacy using inexpensive hardware and easily modifiable software

Protecting your privacy using an ESP32, an IR sensor and a python script This project, which I personally call the "never-gonna-catch-me-in-the-act-ev

8 Oct 10, 2022