A parallel framework for population-based multi-agent reinforcement learning.

Last update: Jan 08, 2023

Overview

MALib: A parallel framework for population-based multi-agent reinforcement learning

MALib is a parallel framework of population-based learning nested with (multi-agent) reinforcement learning (RL) methods, such as Policy Space Response Oracle, Self-Play and Neural Fictitous Self-Play. MALib provides higher-level abstractions of MARL training paradigms, which enables efficient code reuse and flexible deployments on different distributed computing paradigms. The design of MALib also strives to promote the research of other multi-agent learning, including multi-agent imitation learning and model-based MARL.

Installation

The installation of MALib is very easy. We've tested MALib on Python 3.6 and 3.7. This guide is based on ubuntu 18.04 and above. We strongly recommend using conda to manage your dependencies, and avoid version conflicts. Here we show the example of building python 3.7 based conda environment.

conda create -n malib python==3.7 -y
conda activate malib

# install dependencies
./install_deps.sh

# install malib
pip install -e .

External environments are integrated in MALib, such as StarCraftII and vizdoom, you can install them via pip install -e .[envs]. For users who wanna contribute to our repository, run pip install -e .[dev] to complete the development dependencies.

optional: if you wanna use alpha-rank to solve meta-game, install open-spiel with its installation guides

Quick Start

"""PSRO with PPO for Leduc Holdem"""

from malib.envs.poker import poker_aec_env as leduc_holdem
from malib.runner import run
from malib.rollout import rollout_func


env = leduc_holdem.env(fixed_player=True)

run(
    agent_mapping_func=lambda agent_id: agent_id,
    env_description={
        "creator": leduc_holdem.env,
        "config": {"fixed_player": True},
        "id": "leduc_holdem",
        "possible_agents": env.possible_agents,
    },
    training={
        "interface": {
            "type": "independent",
            "observation_spaces": env.observation_spaces,
            "action_spaces": env.action_spaces
        },
    },
    algorithms={
        "PSRO_PPO": {
            "name": "PPO",
            "custom_config": {
                "gamma": 1.0,
                "eps_min": 0,
                "eps_max": 1.0,
                "eps_decay": 100,
            },
        }
    },
    rollout={
        "type": "async",
        "stopper": "simple_rollout",
        "callback": rollout_func.sequential
    }
)

Citing MALib

If you use MALib in your work, please cite the accompanying paper.

@misc{zhou2021malib,
      title={MALib: A Parallel Framework for Population-based Multi-agent Reinforcement Learning}, 
      author={Ming Zhou and Ziyu Wan and Hanjing Wang and Muning Wen and Runzhe Wu and Ying Wen and Yaodong Yang and Weinan Zhang and Jun Wang},
      year={2021},
      eprint={2106.07551},
      archivePrefix={arXiv},
      primaryClass={cs.MA}
}

A parallel framework for population-based multi-agent reinforcement learning.

Related tags

Overview

MALib: A parallel framework for population-based multi-agent reinforcement learning

Installation

Quick Start

Citing MALib

Owner

MARL @ SJTU

This repository provides an unified frameworks to train and test the state-of-the-art few-shot font generation (FFG) models.

A treasure chest for visual recognition powered by PaddlePaddle

[TIP 2020] Multi-Temporal Scene Classification and Scene Change Detection with Correlation based Fusion

This project aims to explore the deployment of Swin-Transformer based on TensorRT, including the test results of FP16 and INT8.

In this project we investigate the performance of the SetCon model on realistic video footage. Therefore, we implemented the model in PyTorch and tested the model on two example videos.

Implementation of ICCV21 paper: PnP-DETR: Towards Efficient Visual Analysis with Transformers

When in Doubt: Improving Classification Performance with Alternating Normalization

Implementation of GeoDiff: a Geometric Diffusion Model for Molecular Conformation Generation (ICLR 2022).

Official codes for the paper "Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech"

The first public PyTorch implementation of Attentive Recurrent Comparators

Automated Hyperparameter Optimization Competition

Official implementation of NeuralFusion: Online Depth Map Fusion in Latent Space

Sample and Computation Redistribution for Efficient Face Detection

Semi-Supervised Semantic Segmentation via Adaptive Equalization Learning, NeurIPS 2021 (Spotlight)

🌊 Online machine learning in Python

Optimized primitives for collective multi-GPU communication

The World of an Octopus: How Reporting Bias Influences a Language Model's Perception of Color

[NeurIPS 2020] Code for the paper "Balanced Meta-Softmax for Long-Tailed Visual Recognition"

The toolkit to generate auto labeled datasets

Video Matting via Consistency-Regularized Graph Neural Networks