Evolutionary Population Curriculum for Scaling Multi-Agent Reinforcement Learning

Overview

Evolutionary Population Curriculum for Scaling Multi-Agent Reinforcement Learning

This is the code for implementing the MADDPG algorithm presented in the paper: Evolutionary Population Curriculum for Scaling Multi-Agent Reinforcement Learning. It is configured to be run in conjunction with environments from the (https://github.com/qian18long/epciclr2020/tree/master/mpe_local). We show our gif results here (https://sites.google.com/view/epciclr2020/). Note: this codebase has been restructured since the original paper, and the results may vary from those reported in the paper.

Installation

  • Install tensorflow 1.13.1
pip install tensorflow==1.13.1
  • Install OpenAI gym
pip install gym==0.13.0
  • Install other dependencies
pip install joblib imageio

Case study: Multi-Agent Particle Environments

We demonstrate here how the code can be used in conjunction with the(https://github.com/qian18long/epciclr2020/tree/master/mpe_local). It is based on(https://github.com/openai/multiagent-particle-envs)

Quick start

  • See train_grassland_epc.sh, train_adversarial_epc.sh and train_food_collect_epc.sh for the EPC algorithm for scenario grassland, adversarial and food_collect in the example setting presented in our paper.

Command-line options

Environment options

  • --scenario: defines which environment in the MPE is to be used (default: "grassland")

  • --map-size: The size of the environment. 1 if normal and 2 otherwise. (default: "normal")

  • --sight: The agent's visibility radius. (default: 100)

  • --alpha: Reward shared weight. (default: 0.0)

  • --max-episode-len maximum length of each episode for the environment (default: 25)

  • --num-episodes total number of training episodes (default: 200000)

  • --num-good: number of good agents in the scenario (default: 2)

  • --num-adversaries: number of adversaries in the environment (default: 2)

  • --num-food: number of food(resources) in the scenario (default: 4)

  • --good-policy: algorithm used for the 'good' (non adversary) policies in the environment (default: "maddpg"; options: {"att-maddpg", "maddpg", "PC", "mean-field"})

  • --adv-policy: algorithm used for the adversary policies in the environment (default: "maddpg"; options: {"att-maddpg", "maddpg", "PC", "mean-field"})

Core training parameters

  • --lr: learning rate (default: 1e-2)

  • --gamma: discount factor (default: 0.95)

  • --batch-size: batch size (default: 1024)

  • --num-units: number of units in the MLP (default: 64)

  • --good-num-units: number of units in the MLP of good agents, if not providing it will be num-units.

  • --adv-num-units: number of units in the MLP of adversarial agents, if not providing it will be num-units.

  • --n_cpu_per_agent: cpu usage per agent (default: 1)

  • --good-share-weights: good agents share weights of the agents encoder within the model.

  • --adv-share-weights: adversarial agents share weights of the agents encoder within the model.

  • --use-gpu: Use GPU for training (default: False)

  • --n-envs: number of environments instances in parallelization

Checkpointing

  • --save-dir: directory where intermediate training results and model will be saved (default: "/test/")

  • --save-rate: model is saved every time this number of episodes has been completed (default: 1000)

  • --load-dir: directory where training state and model are loaded from (default: "test")

Evaluation

  • --restore: restores previous training state stored in load-dir (or in save-dir if no load-dir has been provided), and continues training (default: False)

  • --display: displays to the screen the trained policy stored in load-dir (or in save-dir if no load-dir has been provided), but does not continue training (default: False)

  • --save-gif-data: Save the gif examples to the save-dir (default: False)

  • --render-gif: Render the gif in the load-dir (default: False)

EPC options

  • --initial-population: initial population size in the first stage

  • --num-selection: size of the population selected for reproduction

  • --num-stages: number of stages

  • --stage-num-episodes: number of training episodes in each stage

  • --stage-n-envs: number of environments instances in parallelization in each stage

  • --test-num-episodes: number of episodes for the competing

Example scripts

  • .maddpg_o/experiments/train_normal.py: apply the train_helpers.py for MADDPG, Att-MADDPG and mean-field training
  • .maddpg_o/experiments/train_x2.py: apply a single step doubling training

  • .maddpg_o/experiments/train_mix_match.py: mix match of the good agents in --sheep-init-load-dirs and adversarial agents in '--wolf-init-load-dirs' for model agents evaluation.

  • .maddpg_o/experiments/train_epc.py: train the scheduled EPC algorithm.

  • .maddpg_o/experiments/compete.py: evaluate different models by competition

Paper citation

@inproceedings{epciclr2020,
  author = {Qian Long and Zihan Zhou and Abhinav Gupta and Fei Fang and Yi Wu and Xiaolong Wang},
  title = {Evolutionary Population Curriculum for Scaling Multi-Agent Reinforcement Learning},
  booktitle = {International Conference on Learning Representations},
  year = {2020}
}
A higher performance pytorch implementation of DeepLab V3 Plus(DeepLab v3+)

A Higher Performance Pytorch Implementation of DeepLab V3 Plus Introduction This repo is an (re-)implementation of Encoder-Decoder with Atrous Separab

linhua 326 Nov 22, 2022
A hyperparameter optimization framework

Optuna: A hyperparameter optimization framework Website | Docs | Install Guide | Tutorial Optuna is an automatic hyperparameter optimization software

7.4k Jan 04, 2023
This is the repository of shape matching algorithm Iterative Rotations and Assignments (IRA)

Description This is the repository of shape matching algorithm Iterative Rotations and Assignments (IRA), described in the publication [1]. Directory

MAMMASMIAS Consortium 6 Nov 14, 2022
DeepI2I: Enabling Deep Hierarchical Image-to-Image Translation by Transferring from GANs

DeepI2I: Enabling Deep Hierarchical Image-to-Image Translation by Transferring from GANs Abstract: Image-to-image translation has recently achieved re

yaxingwang 23 Apr 14, 2022
A very simple tool to rewrite parameters such as attributes and constants for OPs in ONNX models. Simple Attribute and Constant Modifier for ONNX.

sam4onnx A very simple tool to rewrite parameters such as attributes and constants for OPs in ONNX models. Simple Attribute and Constant Modifier for

Katsuya Hyodo 6 May 15, 2022
Tracking code for the winner of track 1 in the MMP-Tracking Challenge at ICCV 2021 Workshop.

Tracking Code for the winner of track1 in MMP-Trakcing challenge This repository contains our tracking code for the Multi-camera Multiple People Track

DamoCV 29 Nov 13, 2022
Official PyTorch implemention of our paper "Learning to Rectify for Robust Learning with Noisy Labels".

WarPI The official PyTorch implemention of our paper "Learning to Rectify for Robust Learning with Noisy Labels". Run python main.py --corruption_type

Haoliang Sun 3 Sep 03, 2022
A library for graph deep learning research

Documentation | Paper [JMLR] | Tutorials | Benchmarks | Examples DIG: Dive into Graphs is a turnkey library for graph deep learning research. Why DIG?

DIVE Lab, Texas A&M University 1.3k Jan 01, 2023
Tensorflow 2.x implementation of Panoramic BlitzNet for object detection and semantic segmentation on indoor panoramic images.

Deep neural network for object detection and semantic segmentation on indoor panoramic images. The implementation is based on the papers:

Alejandro de Nova Guerrero 9 Nov 24, 2022
Model Zoo for MindSpore

Welcome to the Model Zoo for MindSpore In order to facilitate developers to enjoy the benefits of MindSpore framework, we will continue to add typical

MindSpore 226 Jan 07, 2023
Code for project: "Learning to Minimize Remainder in Supervised Learning".

Learning to Minimize Remainder in Supervised Learning Code for project: "Learning to Minimize Remainder in Supervised Learning". Requirements and Envi

Yan Luo 0 Jul 18, 2021
pybaum provides tools to work with pytrees which is a concept burrowed from JAX.

pybaum provides tools to work with pytrees which is a concept burrowed from JAX.

Open Source Economics 9 May 11, 2022
3DIAS: 3D Shape Reconstruction with Implicit Algebraic Surfaces (ICCV 2021)

3DIAS_Pytorch This repository contains the official code to reproduce the results from the paper: 3DIAS: 3D Shape Reconstruction with Implicit Algebra

Mohsen Yavartanoo 21 Dec 12, 2022
Package to compute Mauve, a similarity score between neural text and human text. Install with `pip install mauve-text`.

MAUVE MAUVE is a library built on PyTorch and HuggingFace Transformers to measure the gap between neural text and human text with the eponymous MAUVE

Krishna Pillutla 182 Jan 02, 2023
Improving adversarial robustness by a coupling rejection strategy

Adversarial Training with Rectified Rejection The code for the paper Adversarial Training with Rectified Rejection. Environment settings and libraries

Tianyu Pang 29 Jan 06, 2023
Scikit-event-correlation - Event Correlation and Forecasting over High Dimensional Streaming Sensor Data algorithms

scikit-event-correlation Event Correlation and Changing Detection Algorithm Theo

Intellia ICT 5 Oct 30, 2022
Implementation of Fast Transformer in Pytorch

Fast Transformer - Pytorch Implementation of Fast Transformer in Pytorch. This only work as an encoder. Yannic video AI Epiphany Install $ pip install

Phil Wang 167 Dec 27, 2022
Distributed Arcface Training in Pytorch

Distributed Arcface Training in Pytorch

3 Nov 23, 2021
CSE-519---Project - Job Title Analysis (Project for CSE 519 - Data Science Fundamentals)

A Multifaceted Approach to Job Title Analysis CSE 519 - Data Science Fundamentals Project Description Project consists of three parts: Salary Predicti

Jimit Dholakia 1 Jan 04, 2022
SSD-based Object Detection in PyTorch

SSD-based Object Detection in PyTorch 서강대학교 현대모비스 SW 프로그램에서 진행한 인공지능 프로젝트입니다. Jetson nano를 이용해 pre-trained network를 fine tuning시켜 차량 및 신호등 인식을 구현하였습니다

Haneul Kim 1 Nov 16, 2021