Evolutionary Population Curriculum for Scaling Multi-Agent Reinforcement Learning

Overview

Evolutionary Population Curriculum for Scaling Multi-Agent Reinforcement Learning

This is the code for implementing the MADDPG algorithm presented in the paper: Evolutionary Population Curriculum for Scaling Multi-Agent Reinforcement Learning. It is configured to be run in conjunction with environments from the (https://github.com/qian18long/epciclr2020/tree/master/mpe_local). We show our gif results here (https://sites.google.com/view/epciclr2020/). Note: this codebase has been restructured since the original paper, and the results may vary from those reported in the paper.

Installation

  • Install tensorflow 1.13.1
pip install tensorflow==1.13.1
  • Install OpenAI gym
pip install gym==0.13.0
  • Install other dependencies
pip install joblib imageio

Case study: Multi-Agent Particle Environments

We demonstrate here how the code can be used in conjunction with the(https://github.com/qian18long/epciclr2020/tree/master/mpe_local). It is based on(https://github.com/openai/multiagent-particle-envs)

Quick start

  • See train_grassland_epc.sh, train_adversarial_epc.sh and train_food_collect_epc.sh for the EPC algorithm for scenario grassland, adversarial and food_collect in the example setting presented in our paper.

Command-line options

Environment options

  • --scenario: defines which environment in the MPE is to be used (default: "grassland")

  • --map-size: The size of the environment. 1 if normal and 2 otherwise. (default: "normal")

  • --sight: The agent's visibility radius. (default: 100)

  • --alpha: Reward shared weight. (default: 0.0)

  • --max-episode-len maximum length of each episode for the environment (default: 25)

  • --num-episodes total number of training episodes (default: 200000)

  • --num-good: number of good agents in the scenario (default: 2)

  • --num-adversaries: number of adversaries in the environment (default: 2)

  • --num-food: number of food(resources) in the scenario (default: 4)

  • --good-policy: algorithm used for the 'good' (non adversary) policies in the environment (default: "maddpg"; options: {"att-maddpg", "maddpg", "PC", "mean-field"})

  • --adv-policy: algorithm used for the adversary policies in the environment (default: "maddpg"; options: {"att-maddpg", "maddpg", "PC", "mean-field"})

Core training parameters

  • --lr: learning rate (default: 1e-2)

  • --gamma: discount factor (default: 0.95)

  • --batch-size: batch size (default: 1024)

  • --num-units: number of units in the MLP (default: 64)

  • --good-num-units: number of units in the MLP of good agents, if not providing it will be num-units.

  • --adv-num-units: number of units in the MLP of adversarial agents, if not providing it will be num-units.

  • --n_cpu_per_agent: cpu usage per agent (default: 1)

  • --good-share-weights: good agents share weights of the agents encoder within the model.

  • --adv-share-weights: adversarial agents share weights of the agents encoder within the model.

  • --use-gpu: Use GPU for training (default: False)

  • --n-envs: number of environments instances in parallelization

Checkpointing

  • --save-dir: directory where intermediate training results and model will be saved (default: "/test/")

  • --save-rate: model is saved every time this number of episodes has been completed (default: 1000)

  • --load-dir: directory where training state and model are loaded from (default: "test")

Evaluation

  • --restore: restores previous training state stored in load-dir (or in save-dir if no load-dir has been provided), and continues training (default: False)

  • --display: displays to the screen the trained policy stored in load-dir (or in save-dir if no load-dir has been provided), but does not continue training (default: False)

  • --save-gif-data: Save the gif examples to the save-dir (default: False)

  • --render-gif: Render the gif in the load-dir (default: False)

EPC options

  • --initial-population: initial population size in the first stage

  • --num-selection: size of the population selected for reproduction

  • --num-stages: number of stages

  • --stage-num-episodes: number of training episodes in each stage

  • --stage-n-envs: number of environments instances in parallelization in each stage

  • --test-num-episodes: number of episodes for the competing

Example scripts

  • .maddpg_o/experiments/train_normal.py: apply the train_helpers.py for MADDPG, Att-MADDPG and mean-field training
  • .maddpg_o/experiments/train_x2.py: apply a single step doubling training

  • .maddpg_o/experiments/train_mix_match.py: mix match of the good agents in --sheep-init-load-dirs and adversarial agents in '--wolf-init-load-dirs' for model agents evaluation.

  • .maddpg_o/experiments/train_epc.py: train the scheduled EPC algorithm.

  • .maddpg_o/experiments/compete.py: evaluate different models by competition

Paper citation

@inproceedings{epciclr2020,
  author = {Qian Long and Zihan Zhou and Abhinav Gupta and Fei Fang and Yi Wu and Xiaolong Wang},
  title = {Evolutionary Population Curriculum for Scaling Multi-Agent Reinforcement Learning},
  booktitle = {International Conference on Learning Representations},
  year = {2020}
}
[ICCV 2021] FaPN: Feature-aligned Pyramid Network for Dense Image Prediction

FaPN: Feature-aligned Pyramid Network for Dense Image Prediction [arXiv] [Project Page] @inproceedings{ huang2021fapn, title={{FaPN}: Feature-alig

Shihua Huang 23 Jul 22, 2022
Emblaze - Interactive Embedding Comparison

Emblaze - Interactive Embedding Comparison Emblaze is a Jupyter notebook widget for visually comparing embeddings using animated scatter plots. It bun

CMU Data Interaction Group 77 Nov 24, 2022
Why Are You Weird? Infusing Interpretability in Isolation Forest for Anomaly Detection

Why, hello there! This is the supporting notebook for the research paper — Why Are You Weird? Infusing Interpretability in Isolation Forest for Anomal

2 Dec 14, 2021
:boar: :bear: Deep Learning based Python Library for Stock Market Prediction and Modelling

bulbea "Deep Learning based Python Library for Stock Market Prediction and Modelling." Table of Contents Installation Usage Documentation Dependencies

Achilles Rasquinha 1.8k Jan 05, 2023
The repository for our EMNLP 2021 paper "Finnish Dialect Identification: The Effect of Audio and Text"

Finnish Dialect Identification The repository for our EMNLP 2021 paper "Finnish Dialect Identification: The Effect of Audio and Text". We present a te

Rootroo Ltd 2 Dec 25, 2021
Kaggle competition: Springleaf Marketing Response

PruebaEnel Prueba Kaggle-Springleaf-master Prueba Kaggle-Springleaf Kaggle competition: Springleaf Marketing Response Competencia de Kaggle: Marketing

1 Feb 09, 2022
This is the repository of shape matching algorithm Iterative Rotations and Assignments (IRA)

Description This is the repository of shape matching algorithm Iterative Rotations and Assignments (IRA), described in the publication [1]. Directory

MAMMASMIAS Consortium 6 Nov 14, 2022
The Multi-Mission Maximum Likelihood framework (3ML)

PyPi Conda The Multi-Mission Maximum Likelihood framework (3ML) A framework for multi-wavelength/multi-messenger analysis for astronomy/astrophysics.

The Multi-Mission Maximum Likelihood (3ML) 62 Dec 30, 2022
Torch implementation of SegNet and deconvolutional network

Torch implementation of SegNet and deconvolutional network

Fedor Chervinskii 5 Jul 17, 2020
[ACM MM 2021] TSA-Net: Tube Self-Attention Network for Action Quality Assessment

Tube Self-Attention Network (TSA-Net) This repository contains the PyTorch implementation for paper TSA-Net: Tube Self-Attention Network for Action Qu

ShunliWang 18 Dec 23, 2022
Tensorflow Implementation for "Pre-trained Deep Convolution Neural Network Model With Attention for Speech Emotion Recognition"

Tensorflow Implementation for "Pre-trained Deep Convolution Neural Network Model With Attention for Speech Emotion Recognition" Pre-trained Deep Convo

Ankush Malaker 5 Nov 11, 2022
NPBG++: Accelerating Neural Point-Based Graphics

[CVPR 2022] NPBG++: Accelerating Neural Point-Based Graphics Project Page | Paper This repository contains the official Python implementation of the p

Ruslan Rakhimov 57 Dec 03, 2022
Two-Stage Peer-Regularized Feature Recombination for Arbitrary Image Style Transfer

Two-Stage Peer-Regularized Feature Recombination for Arbitrary Image Style Transfer Paper on arXiv Public PyTorch implementation of two-stage peer-reg

NNAISENSE 38 Oct 14, 2022
Code and model benchmarks for "SEVIR : A Storm Event Imagery Dataset for Deep Learning Applications in Radar and Satellite Meteorology"

NeurIPS 2020 SEVIR Code for paper: SEVIR : A Storm Event Imagery Dataset for Deep Learning Applications in Radar and Satellite Meteorology Requirement

USAF - MIT Artificial Intelligence Accelerator 46 Dec 15, 2022
An algorithmic trading bot that learns and adapts to new data and evolving markets using Financial Python Programming and Machine Learning.

ALgorithmic_Trading_with_ML An algorithmic trading bot that learns and adapts to new data and evolving markets using Financial Python Programming and

1 Mar 14, 2022
The project is an official implementation of our CVPR2019 paper "Deep High-Resolution Representation Learning for Human Pose Estimation"

Deep High-Resolution Representation Learning for Human Pose Estimation (CVPR 2019) News [2020/07/05] A very nice blog from Towards Data Science introd

Leo Xiao 3.9k Jan 05, 2023
PyTorch implementation of the end-to-end coreference resolution model with different higher-order inference methods.

End-to-End Coreference Resolution with Different Higher-Order Inference Methods This repository contains the implementation of the paper: Revealing th

Liyan 52 Jan 04, 2023
ManimML is a project focused on providing animations and visualizations of common machine learning concepts with the Manim Community Library.

ManimML ManimML is a project focused on providing animations and visualizations of common machine learning concepts with the Manim Community Library.

259 Jan 04, 2023
This is code to fit per-pixel environment map with spherical Gaussian lobes, using LBFGS optimization

Spherical Gaussian Optimization This is code to fit per-pixel environment map with spherical Gaussian lobes, using LBFGS optimization. This code has b

41 Dec 14, 2022
Easy genetic ancestry predictions in Python

ezancestry Easily visualize your direct-to-consumer genetics next to 2500+ samples from the 1000 genomes project. Evaluate the performance of a custom

Kevin Arvai 38 Jan 02, 2023