This is the official implementation of Multi-Agent PPO.



Chao Yu*, Akash Velu*, Eugene Vinitsky, Yu Wang, Alexandre Bayen, and Yi Wu.


This repository implements MAPPO, an multi-agent variant of PPO. The implementation in this repositorory is used in the paper "The Surprising Effectiveness of MAPPO in Cooperative Multi-Agent Games" ( This repository is heavily based on

Environments supported:

1. Usage

All core code is located within the onpolicy folder. The algorithms/ subfolder contains algorithm-specific code for MAPPO.

  • The envs/ subfolder contains environment wrapper implementations for the MPEs, SMAC, and Hanabi.

  • Code to perform training rollouts and policy updates are contained within the runner/ folder - there is a runner for each environment.

  • Executable scripts for training with default hyperparameters can be found in the scripts/ folder. The files are named in the following manner: Within each file, the map name (in the case of SMAC and the MPEs) can be altered.

  • Python training scripts for each environment can be found in the scripts/train/ folder.

  • The file contains relevant hyperparameter and env settings. Most hyperparameters are defaulted to the ones used in the paper; however, please refer to the appendix for a full list of hyperparameters used.

2. Installation

Here we give an example installation on CUDA == 10.1. For non-GPU & other CUDA version installation, please refer to the PyTorch website.

# create conda environment
conda create -n marl python==3.6.1
conda activate marl
pip install torch==1.5.1+cu101 torchvision==0.6.1+cu101 -f
# install on-policy package
cd on-policy
pip install -e .

Even though we provide requirement.txt, it may have redundancy. We recommend that the user try to install other required packages by running the code and finding which required package hasn't installed yet.

2.1 Install StarCraftII 4.10

# password is iagreetotheeula
echo "export SC2PATH=~/StarCraftII/" > ~/.bashrc

2.2 Hanabi

Environment code for Hanabi is developed from the open-source environment code, but has been slightly modified to fit the algorithms used here.
To install, execute the following:

pip install cffi
cd envs/hanabi
mkdir build & cd build
cmake ..
make -j

2.3 Install MPE

# install this package first
pip install seaborn

There are 3 Cooperative scenarios in MPE:

  • simple_spread
  • simple_speaker_listener, which is 'Comm' scenario in paper
  • simple_reference


Here we use as an example:

cd onpolicy/scripts
chmod +x ./

Local results are stored in subfold scripts/results. Note that we use Weights & Bias as the default visualization platform; to use Weights & Bias, please register and login to the platform first. More instructions for using Weights&Bias can be found in the official documentation. Adding the --use_wandb in command line or in the .sh file will use Tensorboard instead of Weights & Biases.

We additionally provide ./ for evaluating the hanabi score over 100k trials.

4. Publication

If you find this repository useful, please cite our paper:

      title={The Surprising Effectiveness of MAPPO in Cooperative Multi-Agent Games}, 
      author={Chao Yu and Akash Velu and Eugene Vinitsky and Yu Wang and Alexandre Bayen and Yi Wu},
This is a benchmark of popular multi-agent reinforcement learning algorithms & environments
Open world survival environment for reinforcement learning

Crafter Open world survival environment for reinforcement learning. Highlights Crafter is a procedurally generated 2D world, where the agent finds foo

Danijar Hafner 213 Jan 05, 2023
A customisable 3D platform for agent-based AI research

DeepMind Lab is a 3D learning environment based on id Software's Quake III Arena via ioquake3 and other open source software. DeepMind Lab provides a

DeepMind 6.8k Jan 05, 2023
Dopamine is a research framework for fast prototyping of reinforcement learning algorithms.

Dopamine Dopamine is a research framework for fast prototyping of reinforcement learning algorithms. It aims to fill the need for a small, easily grok

Google 10k Jan 07, 2023
Rethinking the Importance of Implementation Tricks in Multi-Agent Reinforcement Learning

MARL Tricks Our codes for RIIT: Rethinking the Importance of Implementation Tricks in Multi-AgentReinforcement Learning. We implemented and standardiz

404 Dec 25, 2022
Deep Reinforcement Learning for Keras.

Deep Reinforcement Learning for Keras What is it? keras-rl implements some state-of-the art deep reinforcement learning algorithms in Python and seaml

Keras-RL 5.4k Jan 04, 2023
An open source robotics benchmark for meta- and multi-task reinforcement learning

Meta-World Meta-World is an open-source simulated benchmark for meta-reinforcement learning and multi-task learning consisting of 50 distinct robotic

Reinforcement Learning Working Group 823 Jan 06, 2023
Modular Deep Reinforcement Learning framework in PyTorch. Companion library of the book "Foundations of Deep Reinforcement Learning".

SLM Lab Modular Deep Reinforcement Learning framework in PyTorch. Documentation: BeamRider Breakout KungFuMaster M

Wah Loon Keng 1.1k Dec 24, 2022
A platform for Reasoning systems (Reinforcement Learning, Contextual Bandits, etc.)

Applied Reinforcement Learning @ Facebook Overview ReAgent is an open source end-to-end platform for applied reinforcement learning (RL) developed and

Facebook Research 3.3k Jan 05, 2023
This is the official implementation of Multi-Agent PPO.

MAPPO Chao Yu*, Akash Velu*, Eugene Vinitsky, Yu Wang, Alexandre Bayen, and Yi Wu. Website: This repository implem

653 Jan 06, 2023
TF-Agents: A reliable, scalable and easy to use TensorFlow library for Contextual Bandits and Reinforcement Learning.

TF-Agents: A reliable, scalable and easy to use TensorFlow library for Contextual Bandits and Reinforcement Learning. TF-Agents makes implementing, de

2.4k Dec 29, 2022
Game Agent Framework. Helping you create AIs / Bots that learn to play any game you own!

Serpent.AI - Game Agent Framework (Python) Update: Revival (May 2020) Development work has resumed on the framework with the aim of bringing it into 2

Serpent.AI 6.4k Jan 05, 2023
TensorFlow Reinforcement Learning

TRFL TRFL (pronounced "truffle") is a library built on top of TensorFlow that exposes several useful building blocks for implementing Reinforcement Le

DeepMind 3.1k Dec 29, 2022
ChainerRL is a deep reinforcement learning library built on top of Chainer.

ChainerRL ChainerRL is a deep reinforcement learning library that implements various state-of-the-art deep reinforcement algorithms in Python using Ch

Chainer 1.1k Dec 26, 2022
Monitor your el-cheapo UPS via SNMP

UPSC-SNMP-Agent UPSC-SNMP-Agent exposes your el-cheapo locally connected UPS via the SNMP network management protocol. This enables various equipment

Tom Szilagyi 32 Jul 28, 2022
Tensorforce: a TensorFlow library for applied reinforcement learning

Tensorforce: a TensorFlow library for applied reinforcement learning Introduction Tensorforce is an open-source deep reinforcement learning framework,

Tensorforce 3.2k Jan 02, 2023
A general-purpose multi-agent training framework.

MALib A general-purpose multi-agent training framework. Installation step1: build environment conda create -n malib python==3.7 -y conda activate mali

MARL @ SJTU 346 Jan 03, 2023
A toolkit for developing and comparing reinforcement learning algorithms.

Status: Maintenance (expect bug fixes and minor updates) OpenAI Gym OpenAI Gym is a toolkit for developing and comparing reinforcement learning algori

OpenAI 29.6k Jan 01, 2023
Retro Games in Gym

Status: Maintenance (expect bug fixes and minor updates) Gym Retro Gym Retro lets you turn classic video games into Gym environments for reinforcement

OpenAI 2.8k Jan 03, 2023
Reinforcement Learning Coach by Intel AI Lab enables easy experimentation with state of the art Reinforcement Learning algorithms

Coach Coach is a python reinforcement learning framework containing implementation of many state-of-the-art algorithms. It exposes a set of easy-to-us

Intel Labs 2.2k Jan 05, 2023
Doom-based AI Research Platform for Reinforcement Learning from Raw Visual Information. :godmode:

ViZDoom ViZDoom allows developing AI bots that play Doom using only the visual information (the screen buffer). It is primarily intended for research

Marek Wydmuch 1.5k Dec 30, 2022