Addressing Function Approximation Error in Actor-Critic Methods

PyTorch implementation of Twin Delayed Deep Deterministic Policy Gradients (TD3). If you use our code or data please cite the paper.

Method is tested on MuJoCo continuous control tasks in OpenAI gym. Networks are trained using PyTorch 1.2 and Python 3.7.

Usage

The paper results can be reproduced by running:

./run_experiments.sh

Experiments on single environments can be run by calling:

python main.py --env HalfCheetah-v2

Hyper-parameters can be modified with different arguments to main.py. We include an implementation of DDPG (DDPG.py), which is not used in the paper, for easy comparison of hyper-parameters with TD3. This is not the implementation of "Our DDPG" as used in the paper (see OurDDPG.py).

Algorithms which TD3 compares against (PPO, TRPO, ACKTR, DDPG) can be found at OpenAI baselines repository.

Results

Code is no longer exactly representative of the code used in the paper. Minor adjustments to hyperparamters, etc, to improve performance. Learning curves are still the original results found in the paper.

Learning curves found in the paper are found under /learning_curves. Each learning curve are formatted as NumPy arrays of 201 evaluations (201,), where each evaluation corresponds to the average total reward from running the policy for 10 episodes with no exploration. The first evaluation is the randomly initialized policy network (unused in the paper). Evaluations are peformed every 5000 time steps, over a total of 1 million time steps.

Numerical results can be found in the paper, or from the learning curves. Video of the learned agent can be found here.

Bibtex

@inproceedings{fujimoto2018addressing,
  title={Addressing Function Approximation Error in Actor-Critic Methods},
  author={Fujimoto, Scott and Hoof, Herke and Meger, David},
  booktitle={International Conference on Machine Learning},
  pages={1582--1591},
  year={2018}
}

Author's PyTorch implementation of TD3 for OpenAI gym tasks

Related tags

Overview

Addressing Function Approximation Error in Actor-Critic Methods

Usage

Results

Bibtex

Owner

Scott Fujimoto

PyTorch implementation of Soft-DTW: a Differentiable Loss Function for Time-Series in CUDA

Repo 4 basic seminar §How to make human machine readable"

FOSS Digital Asset Distribution Platform built on Frappe.

Learning to Prompt for Vision-Language Models.

[CVPR 2021] Generative Hierarchical Features from Synthesizing Images

Source code of the paper PatchGraph: In-hand tactile tracking with learned surface normals.

Wider or Deeper: Revisiting the ResNet Model for Visual Recognition

Python codes for Lite Audio-Visual Speech Enhancement.

PyToch implementation of A Novel Self-supervised Learning Task Designed for Anomaly Segmentation

YoloAll is a collection of yolo all versions. you you use YoloAll to test yolov3/yolov5/yolox/yolo_fastest

領域を指定し、キーを入力することで画像を保存するツールです。クラス分類用のデータセット作成を想定しています。

A collection of 100 Deep Learning images and visualizations

Large-scale language modeling tutorials with PyTorch

BLEND: A Fast, Memory-Efficient, and Accurate Mechanism to Find Fuzzy Seed Matches

Course on computational design, non-linear optimization, and dynamics of soft systems at UIUC.

PPLNN is a Primitive Library for Neural Network is a high-performance deep-learning inference engine for efficient AI inferencing

ShinRL: A Library for Evaluating RL Algorithms from Theoretical and Practical Perspectives

X-modaler is a versatile and high-performance codebase for cross-modal analytics.

Help you understand Manual and w/ Clutch point while driving.

Machine Learning Platform for Kubernetes