An efficient framework for reinforcement learning.

Overview

rl: An efficient framework for reinforcement learning

Python

Requirements

name version
Python >=3.7
numpy >=1.19
torch >=1.7
tensorboard >=2.5
tensorboardX >=2.4
gym >=0.18.3

Make sure your Python environment is activated before installing following requirements.
pip install -U gym tensorboard tensorboardx

Introduction

Quick Start

CartPole-v0:
python demo.py
Enter the following commands in terminal to start training Pendulum-v0:
python demo.py --env_name Pendulum-v0 --target_reward -250.0
Use Recurrent Neural Network:
python demo.py --env_name Pendulum-v0 --target_reward -250.0 --use_rnn --log_dir Pendulum-v0_RNN
Open a new terminal:
tensorboard --logdir=result
Then you can access the training information by visiting http://localhost:6006/ in browser.

Structure

Proximal Policy Optimization

PPO is an on-policy and model-free reinforcement learning algorithm.

Components

  • Generalized Advantage Estimation (GAE)
  • Gate Recurrent Unit (GRU)

Hyperparameters

hyperparameter note value
env_num number of parallel processes 16
chunk_len BPTT for GRU 10
eps clipping parameter 0.2
gamma discount factor 0.99
gae_lambda trade-off between TD and MC 0.95
entropy_coef coefficient of entropy 0.05
ppo_epoch data usage 5
adv_norm normalized advantage 1 (True)
max_norm gradient clipping (L2) 20.0
weight_decay weight decay (L2) 1e-6
lr_actor learning rate of actor network 1e-3
lr_critic learning rate of critic network 1e-3

Test Environment

A simple test environment for verifying the effectiveness of this algorithm (of course, the algorithm can also be implemented by yourself).
Simple logic with less code.

Mechanism

The environment chooses one number randomly in every step, and returns the one-hot matrix.
If the action taken matches the number chosen in the last 3 steps, you will get a complete reward of 1.

>>> from env.test_env import TestEnv
>>> env = TestEnv()
>>> env.seed(0)
>>> env.reset()
array([1., 0., 0.], dtype=float32)
>>> env.step(9 * 0 + 3 * 0 + 1 * 0)
(array([0., 1., 0.], dtype=float32), 1.0, False, {'str': 'Completely correct.'})
>>> env.step(9 * 1 + 3 * 0 + 1 * 0)
(array([1., 0., 0.], dtype=float32), 1.0, False, {'str': 'Completely correct.'})
>>> env.step(9 * 0 + 3 * 1 + 1 * 0)
(array([0., 1., 0.], dtype=float32), 1.0, False, {'str': 'Completely correct.'})
>>> env.step(9 * 0 + 3 * 1 + 1 * 0)
(array([0., 1., 0.], dtype=float32), 0.0, False, {'str': 'Completely wrong.'})
>>> env.step(9 * 0 + 3 * 1 + 1 * 0)
(array([0., 0., 1.], dtype=float32), 0.6666666666666666, False, {'str': 'Partially correct.'})
>>> env.step(9 * 2 + 3 * 0 + 1 * 0)
(array([1., 0., 0.], dtype=float32), 0.3333333333333333, False, {'str': 'Partially correct.'})
>>> env.step(9 * 0 + 3 * 2 + 1 * 1)
(array([0., 0., 1.], dtype=float32), 1.0, False, {'str': 'Completely correct.'})
>>>

Convergence Reward

  • General RL algorithms will achieve an average reward of 55.5.
  • Because of the state memory unit, RNN based RL algorithms can reach the goal of 100.0.

2021, ICCD Lab, Dalian University of Technology. Author: Jingcheng Jiang.

Applicator Kit for Modo allow you to apply Apple ARKit Face Tracking data from your iPhone or iPad to your characters in Modo.

Applicator Kit for Modo Applicator Kit for Modo allow you to apply Apple ARKit Face Tracking data from your iPhone or iPad with a TrueDepth camera to

Andrew Buttigieg 3 Aug 24, 2021
A nutritional label for food for thought.

Lexiscore As a first effort in tackling the theme of information overload in content consumption, I've been working on the lexiscore: a nutritional la

Paul Bricman 34 Nov 08, 2022
Constraint-based geometry sketcher for blender

Constraint-based sketcher addon for Blender that allows to create precise 2d shapes by defining a set of geometric constraints like tangent, distance,

1.7k Dec 31, 2022
Codebase for arXiv preprint "NeRF++: Analyzing and Improving Neural Radiance Fields"

NeRF++ Codebase for arXiv preprint "NeRF++: Analyzing and Improving Neural Radiance Fields" Work with 360 capture of large-scale unbounded scenes. Sup

Kai Zhang 722 Dec 28, 2022
Meta-TTS: Meta-Learning for Few-shot SpeakerAdaptive Text-to-Speech

Meta-TTS: Meta-Learning for Few-shot SpeakerAdaptive Text-to-Speech This repository is the official implementation of "Meta-TTS: Meta-Learning for Few

Sung-Feng Huang 128 Dec 25, 2022
This repo includes the supplementary of our paper "CEMENT: Incomplete Multi-View Weak-Label Learning with Long-Tailed Labels"

Supplementary Materials for CEMENT: Incomplete Multi-View Weak-Label Learning with Long-Tailed Labels This repository includes all supplementary mater

Zhiwei Li 0 Jan 05, 2022
Weakly-supervised semantic image segmentation with CNNs using point supervision

Code for our ECCV paper What's the Point: Semantic Segmentation with Point Supervision. Summary This library is a custom build of Caffe for semantic i

27 Sep 14, 2022
FcaNet: Frequency Channel Attention Networks

FcaNet: Frequency Channel Attention Networks PyTorch implementation of the paper "FcaNet: Frequency Channel Attention Networks". Simplest usage Models

327 Dec 27, 2022
The official code repository for examples in the O'Reilly book 'Generative Deep Learning'

Generative Deep Learning Teaching Machines to paint, write, compose and play The official code repository for examples in the O'Reilly book 'Generativ

David Foster 1.3k Dec 29, 2022
The official code repo of "HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection"

Hierarchical Token Semantic Audio Transformer Introduction The Code Repository for "HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound

Knut(Ke) Chen 134 Jan 01, 2023
gym-anm is a framework for designing reinforcement learning (RL) environments that model Active Network Management (ANM) tasks in electricity distribution networks.

gym-anm is a framework for designing reinforcement learning (RL) environments that model Active Network Management (ANM) tasks in electricity distribution networks. It is built on top of the OpenAI G

Robin Henry 99 Dec 12, 2022
Cross-modal Retrieval using Transformer Encoder Reasoning Networks (TERN). With use of Metric Learning and FAISS for fast similarity search on GPU

Cross-modal Retrieval using Transformer Encoder Reasoning Networks This project reimplements the idea from "Transformer Reasoning Network for Image-Te

Minh-Khoi Pham 5 Nov 05, 2022
This is an early in-development version of training CLIP models with hivemind.

A transformer that does not hog your GPU memory This is an early in-development codebase: if you want a stable and documented hivemind codebase, look

<a href=[email protected]"> 4 Nov 06, 2022
SeqAttack: a framework for adversarial attacks on token classification models

A framework for adversarial attacks against token classification models

Walter 23 Nov 25, 2022
Semi-Supervised Semantic Segmentation with Cross-Consistency Training (CCT)

Semi-Supervised Semantic Segmentation with Cross-Consistency Training (CCT) Paper, Project Page This repo contains the official implementation of CVPR

Yassine 344 Dec 29, 2022
Project for tracking occupancy in Tel-Aviv parking lots.

Ahuzat Dibuk - Tracking occupancy in Tel-Aviv parking lots main.py This module was set-up to be executed on Google Cloud Platform. I run it every 15 m

Geva Kipper 35 Nov 22, 2022
Attention Probe: Vision Transformer Distillation in the Wild

Attention Probe: Vision Transformer Distillation in the Wild Jiahao Wang, Mingdeng Cao, Shuwei Shi, Baoyuan Wu, Yujiu Yang In ICASSP 2022 This code is

Wang jiahao 3 Oct 31, 2022
LSTM Neural Networks for Spectroscopic Studies of Type Ia Supernovae

Package Description The difficulties in acquiring spectroscopic data have been a major challenge for supernova surveys. snlstm is developed to provide

7 Oct 11, 2022
Official PyTorch Implementation of Rank & Sort Loss [ICCV2021]

Rank & Sort Loss for Object Detection and Instance Segmentation The official implementation of Rank & Sort Loss. Our implementation is based on mmdete

Kemal Oksuz 229 Dec 20, 2022
Comp445 project - Data Communications & Computer Networks

COMP-445 Data Communications & Computer Networks Change Python version in Conda

Peng Zhao 2 Oct 03, 2022