Evolutionary Population Curriculum for Scaling Multi-Agent Reinforcement Learning

Last update: Dec 21, 2022

Related tags

Deep Learning epciclr2020

Overview

Evolutionary Population Curriculum for Scaling Multi-Agent Reinforcement Learning

This is the code for implementing the MADDPG algorithm presented in the paper: Evolutionary Population Curriculum for Scaling Multi-Agent Reinforcement Learning. It is configured to be run in conjunction with environments from the (https://github.com/qian18long/epciclr2020/tree/master/mpe_local). We show our gif results here (https://sites.google.com/view/epciclr2020/). Note: this codebase has been restructured since the original paper, and the results may vary from those reported in the paper.

Installation

Install tensorflow 1.13.1

pip install tensorflow==1.13.1

Install OpenAI gym

pip install gym==0.13.0

Install other dependencies

pip install joblib imageio

Case study: Multi-Agent Particle Environments

We demonstrate here how the code can be used in conjunction with the(https://github.com/qian18long/epciclr2020/tree/master/mpe_local). It is based on(https://github.com/openai/multiagent-particle-envs)

Quick start

See train_grassland_epc.sh, train_adversarial_epc.sh and train_food_collect_epc.sh for the EPC algorithm for scenario grassland, adversarial and food_collect in the example setting presented in our paper.

Command-line options

Environment options

--scenario: defines which environment in the MPE is to be used (default: "grassland")
--map-size: The size of the environment. 1 if normal and 2 otherwise. (default: "normal")
--sight: The agent's visibility radius. (default: 100)
--alpha: Reward shared weight. (default: 0.0)
--max-episode-len maximum length of each episode for the environment (default: 25)
--num-episodes total number of training episodes (default: 200000)
--num-good: number of good agents in the scenario (default: 2)
--num-adversaries: number of adversaries in the environment (default: 2)
--num-food: number of food(resources) in the scenario (default: 4)
--good-policy: algorithm used for the 'good' (non adversary) policies in the environment (default: "maddpg"; options: {"att-maddpg", "maddpg", "PC", "mean-field"})
--adv-policy: algorithm used for the adversary policies in the environment (default: "maddpg"; options: {"att-maddpg", "maddpg", "PC", "mean-field"})

Core training parameters

--lr: learning rate (default: 1e-2)
--gamma: discount factor (default: 0.95)
--batch-size: batch size (default: 1024)
--num-units: number of units in the MLP (default: 64)
--good-num-units: number of units in the MLP of good agents, if not providing it will be num-units.
--adv-num-units: number of units in the MLP of adversarial agents, if not providing it will be num-units.
--n_cpu_per_agent: cpu usage per agent (default: 1)
--good-share-weights: good agents share weights of the agents encoder within the model.
--adv-share-weights: adversarial agents share weights of the agents encoder within the model.
--use-gpu: Use GPU for training (default: False)
--n-envs: number of environments instances in parallelization

Checkpointing

--save-dir: directory where intermediate training results and model will be saved (default: "/test/")
--save-rate: model is saved every time this number of episodes has been completed (default: 1000)
--load-dir: directory where training state and model are loaded from (default: "test")

Evaluation

--restore: restores previous training state stored in load-dir (or in save-dir if no load-dir has been provided), and continues training (default: False)
--display: displays to the screen the trained policy stored in load-dir (or in save-dir if no load-dir has been provided), but does not continue training (default: False)
--save-gif-data: Save the gif examples to the save-dir (default: False)
--render-gif: Render the gif in the load-dir (default: False)

EPC options

--initial-population: initial population size in the first stage
--num-selection: size of the population selected for reproduction
--num-stages: number of stages
--stage-num-episodes: number of training episodes in each stage
--stage-n-envs: number of environments instances in parallelization in each stage
--test-num-episodes: number of episodes for the competing

Example scripts

.maddpg_o/experiments/train_normal.py: apply the train_helpers.py for MADDPG, Att-MADDPG and mean-field training

.maddpg_o/experiments/train_x2.py: apply a single step doubling training
.maddpg_o/experiments/train_mix_match.py: mix match of the good agents in --sheep-init-load-dirs and adversarial agents in '--wolf-init-load-dirs' for model agents evaluation.
.maddpg_o/experiments/train_epc.py: train the scheduled EPC algorithm.
.maddpg_o/experiments/compete.py: evaluate different models by competition

Paper citation

@inproceedings{epciclr2020,
  author = {Qian Long and Zihan Zhou and Abhinav Gupta and Fei Fang and Yi Wu and Xiaolong Wang},
  title = {Evolutionary Population Curriculum for Scaling Multi-Agent Reinforcement Learning},
  booktitle = {International Conference on Learning Representations},
  year = {2020}
}

Evolutionary Population Curriculum for Scaling Multi-Agent Reinforcement Learning

Related tags

Overview

Evolutionary Population Curriculum for Scaling Multi-Agent Reinforcement Learning

Installation

Case study: Multi-Agent Particle Environments

Quick start

Command-line options

Environment options

Core training parameters

Checkpointing

Evaluation

EPC options

Example scripts

Paper citation

Owner

[ICCV 2021] FaPN: Feature-aligned Pyramid Network for Dense Image Prediction

Emblaze - Interactive Embedding Comparison

Why Are You Weird? Infusing Interpretability in Isolation Forest for Anomaly Detection

:boar: :bear: Deep Learning based Python Library for Stock Market Prediction and Modelling

The repository for our EMNLP 2021 paper "Finnish Dialect Identification: The Effect of Audio and Text"

Kaggle competition: Springleaf Marketing Response

This is the repository of shape matching algorithm Iterative Rotations and Assignments (IRA)

The Multi-Mission Maximum Likelihood framework (3ML)

Torch implementation of SegNet and deconvolutional network

[ACM MM 2021] TSA-Net: Tube Self-Attention Network for Action Quality Assessment

Tensorflow Implementation for "Pre-trained Deep Convolution Neural Network Model With Attention for Speech Emotion Recognition"

NPBG++: Accelerating Neural Point-Based Graphics

Two-Stage Peer-Regularized Feature Recombination for Arbitrary Image Style Transfer

Code and model benchmarks for "SEVIR : A Storm Event Imagery Dataset for Deep Learning Applications in Radar and Satellite Meteorology"

An algorithmic trading bot that learns and adapts to new data and evolving markets using Financial Python Programming and Machine Learning.

The project is an official implementation of our CVPR2019 paper "Deep High-Resolution Representation Learning for Human Pose Estimation"

PyTorch implementation of the end-to-end coreference resolution model with different higher-order inference methods.

ManimML is a project focused on providing animations and visualizations of common machine learning concepts with the Manim Community Library.

This is code to fit per-pixel environment map with spherical Gaussian lobes, using LBFGS optimization

Easy genetic ancestry predictions in Python