Soft actor-critic is a deep reinforcement learning framework for training maximum entropy policies in continuous domains.

Last update: Jan 07, 2023

Related tags

Deep Learning sac

Overview

This repository is no longer maintained. Please use our new Softlearning package instead.

Soft Actor-Critic

Soft actor-critic is a deep reinforcement learning framework for training maximum entropy policies in continuous domains. The algorithm is based on the paper Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor presented at ICML 2018.

This implementation uses Tensorflow. For a PyTorch implementation of soft actor-critic, take a look at rlkit by Vitchyr Pong.

See the DIAYN documentation for using SAC for learning diverse skills.

Getting Started

Soft Actor-Critic can be run either locally or through Docker.

Prerequisites

You will need to have Docker and Docker Compose installed unless you want to run the environment locally.

Most of the models require a Mujoco license.

Docker installation

If you want to run the Mujoco environments, the docker environment needs to know where to find your Mujoco license key (mjkey.txt). You can either copy your key into /.mujoco/mjkey.txt, or you can specify the path to the key in your environment variables:

export MUJOCO_LICENSE_PATH=
   
    /mjkey.txt

Once that's done, you can run the Docker container with

docker-compose up

Docker compose creates a Docker container named soft-actor-critic and automatically sets the needed environment variables and volumes.

You can access the container with the typical Docker exec-command, i.e.

docker exec -it soft-actor-critic bash

See examples section for examples of how to train and simulate the agents.

To clean up the setup:

docker-compose down

Local installation

To get the environment installed correctly, you will first need to clone rllab, and have its path added to your PYTHONPATH environment variable.

Clone rllab

cd 
   
    
git clone https://github.com/rll/rllab.git
cd rllab
git checkout b3a28992eca103cab3cb58363dd7a4bb07f250a0
export PYTHONPATH=$(pwd):${PYTHONPATH}

Download and copy mujoco files to rllab path: If you're running on OSX, download https://www.roboti.us/download/mjpro131_osx.zip instead, and copy the .dylib files instead of .so files.

mkdir -p /tmp/mujoco_tmp && cd /tmp/mujoco_tmp
wget -P . https://www.roboti.us/download/mjpro131_linux.zip
unzip mjpro131_linux.zip
mkdir 
   
    /rllab/vendor/mujoco
cp ./mjpro131/bin/libmujoco131.so 
    
     /rllab/vendor/mujoco
cp ./mjpro131/bin/libglfw.so.3 
     
      /rllab/vendor/mujoco
cd ..
rm -rf /tmp/mujoco_tmp

Copy your Mujoco license key (mjkey.txt) to rllab path:

cp 
   
    /mjkey.txt 
    
     /rllab/vendor/mujoco

Clone sac

cd 
   
    
git clone https://github.com/haarnoja/sac.git
cd sac

Create and activate conda environment

cd sac
conda env create -f environment.yml
source activate sac

The environment should be ready to run. See examples section for examples of how to train and simulate the agents.

Finally, to deactivate and remove the conda environment:

source deactivate
conda remove --name sac --all

Examples

Training and simulating an agent

To train the agent

python ./examples/mujoco_all_sac.py --env=swimmer --log_dir="/root/sac/data/swimmer-experiment"

To simulate the agent (NOTE: This step currently fails with the Docker installation, due to missing display.)

python ./scripts/sim_policy.py /root/sac/data/swimmer-experiment/itr_
   
    .pkl

mujoco_all_sac.py contains several different environments and there are more example scripts available in the /examples folder. For more information about the agents and configurations, run the scripts with --help flag. For example:

python ./examples/mujoco_all_sac.py --help
usage: mujoco_all_sac.py [-h]
                         [--env {ant,walker,swimmer,half-cheetah,humanoid,hopper}]
                         [--exp_name EXP_NAME] [--mode MODE]
                         [--log_dir LOG_DIR]

python ./examples/mujoco_all_sac.py --help
usage: mujoco_all_sac.py [-h]
                         [--env {ant,walker,swimmer,half-cheetah,humanoid,hopper}]
                         [--exp_name EXP_NAME] [--mode MODE]
                         [--log_dir LOG_DIR]

Benchmark Results

Benchmark results for some of the OpenAI Gym v2 environments can be found here.

Credits

The soft actor-critic algorithm was developed by Tuomas Haarnoja under the supervision of Prof. Sergey Levine and Prof. Pieter Abbeel at UC Berkeley. Special thanks to Vitchyr Pong, who wrote some parts of the code, and Kristian Hartikainen who helped testing, documenting, and polishing the code and streamlining the installation process. The work was supported by Berkeley Deep Drive.

Reference

@article{haarnoja2017soft,
  title={Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor},
  author={Haarnoja, Tuomas and Zhou, Aurick and Abbeel, Pieter and Levine, Sergey},
  booktitle={Deep Reinforcement Learning Symposium},
  year={2017}
}

Soft actor-critic is a deep reinforcement learning framework for training maximum entropy policies in continuous domains.

Related tags

Overview

Soft Actor-Critic

Getting Started

Prerequisites

Docker installation

Local installation

Examples

Training and simulating an agent

Benchmark Results

Credits

Reference

Owner

Tuomas Haarnoja

Customer-Transaction-Analysis - This analysis is based on a synthesised transaction dataset containing 3 months worth of transactions for 100 hypothetical customers.

Mae segmentation - Reproduction of semantic segmentation using masked autoencoder (mae)

Re-implement CycleGAN in Tensorlayer

Accepted at ICCV-2021: Workshop on Computer Vision for Automated Medical Diagnosis (CVAMD)

Fully Connected DenseNet for Image Segmentation

An Inverse Kinematics library aiming performance and modularity

Tensorflow port of a full NetVLAD network

3D position tracking for soccer players with multi-camera videos

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

Numerical differential equation solvers in JAX. Autodifferentiable and GPU-capable.

Universal Probability Distributions with Optimal Transport and Convex Optimization

Python scripts for performing stereo depth estimation using the MobileStereoNet model in Tensorflow Lite.

[CVPR 2021] MiVOS - Mask Propagation module. Reproduced STM (and better) with training code :star2:. Semi-supervised video object segmentation evaluation.

This is a project based on retinaface face detection, including ghostnet and mobilenetv3

Fully Convolutional Networks for Semantic Segmentation by Jonathan Long, Evan Shelhamer, and Trevor Darrell. CVPR 2015 and PAMI 2016.

GEP (GDB Enhanced Prompt) - a GDB plug-in for GDB command prompt with fzf history search, fish-like autosuggestions, auto-completion with floating window, partial string matching in history, and more!

StyleSwin: Transformer-based GAN for High-resolution Image Generation

Code for our paper "MG-GAN: A Multi-Generator Model Preventing Out-of-Distribution Samples in Pedestrian Trajectory Prediction" published at ICCV 2021.

[CVPR2021] De-rendering the World's Revolutionary Artefacts

To prepare an image processing model to classify the type of disaster based on the image dataset

Soft actor-critic is a deep reinforcement learning framework for training maximum entropy policies in continuous domains.

Related tags

Overview

Soft Actor-Critic

Getting Started

Prerequisites

Docker installation

Local installation

Examples

Training and simulating an agent

Benchmark Results

Credits

Reference

Owner

Tuomas Haarnoja

Customer-Transaction-Analysis - This analysis is based on a synthesised transaction dataset containing 3 months worth of transactions for 100 hypothetical customers.

Mae segmentation - Reproduction of semantic segmentation using masked autoencoder (mae)

Re-implement CycleGAN in Tensorlayer

Accepted at ICCV-2021: Workshop on Computer Vision for Automated Medical Diagnosis (CVAMD)

Fully Connected DenseNet for Image Segmentation

An Inverse Kinematics library aiming performance and modularity

Tensorflow port of a full NetVLAD network

3D position tracking for soccer players with multi-camera videos

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

Numerical differential equation solvers in JAX. Autodifferentiable and GPU-capable.

Universal Probability Distributions with Optimal Transport and Convex Optimization

Python scripts for performing stereo depth estimation using the MobileStereoNet model in Tensorflow Lite.

[CVPR 2021] MiVOS - Mask Propagation module. Reproduced STM (and better) with training code :star2:. Semi-supervised video object segmentation evaluation.

This is a project based on retinaface face detection, including ghostnet and mobilenetv3

Fully Convolutional Networks for Semantic Segmentation by Jonathan Long*, Evan Shelhamer*, and Trevor Darrell. CVPR 2015 and PAMI 2016.

GEP (GDB Enhanced Prompt) - a GDB plug-in for GDB command prompt with fzf history search, fish-like autosuggestions, auto-completion with floating window, partial string matching in history, and more!

StyleSwin: Transformer-based GAN for High-resolution Image Generation

Code for our paper "MG-GAN: A Multi-Generator Model Preventing Out-of-Distribution Samples in Pedestrian Trajectory Prediction" published at ICCV 2021.

[CVPR2021] De-rendering the World's Revolutionary Artefacts

To prepare an image processing model to classify the type of disaster based on the image dataset

Fully Convolutional Networks for Semantic Segmentation by Jonathan Long, Evan Shelhamer, and Trevor Darrell. CVPR 2015 and PAMI 2016.