Resilient projection-based consensus actor-critic (RPBCAC) algorithm

Last update: Jul 12, 2022

Overview

Resilient projection-based consensus actor-critic (RPBCAC) algorithm

We implement the RPBCAC algorithm with nonlinear approximation from [1] and focus on training performance of cooperative agents in the presence of adversaries. We aim to validate the analytical results presented in the paper and prevent adversarial attacks that can arbitrarily hurt cooperative network performance including the one studied in [2]. The repository contains folders whose description is provided below:

agents - contains resilient and adversarial agents
environments - contains a grid world environment for the cooperative navigation task
simulation_results - contains plots that show training performance
training - contains functions for training agents

To train agents, execute main.py.

Multi-agent grid world: cooperative navigation

We train five agents in a grid-world environment. Their original goal is to approach their desired position without colliding with other agents in the network. We design a grid world of dimension (6 x 6) and consider a reward function that penalizes the agents for distance from the target and colliding with other agents.

We compare the cooperative network performance under the RPBCAC algorithm with the trimming parameter H=0 and H=1, which corresponds to the number of adversarial agents that are assumed to be present in the network. We consider four scenarios:

All agents are cooperative. They maximize the team-average expected returns.
One agent is greedy as it maximizes its own expected returns. It shares parameters with other agents but does not apply consensus updates.
One agent is faulty and does not have a well-defined objective. It shares fixed parameter values with other agents.
One agent is strategic; it maximizes its own returns and leads the cooperative agents to minimize their returns. The strategic agent has knowledge of other agents' rewards and updates two critic estimates (one critic is used to improve the adversary's policy and the other to hurt the cooperative agents' performance).

The simulation results below demonstrate very good performance of the RPBCAC with H=1 (right) compared to the non-resilient case with H=0 (left). The performance is measured by the episode returns.

1) All cooperative

2) Three cooperative + one greedy

3) Three cooperative + one faulty

4) Three cooperative + one malicious

The folder with resilient agents contains the RPBCAC agent as well as an agent that applies the method of trimmed means in the consensus updates (RTMCAC).

References

[2] Figura, M., Kosaraju, K. C., and Gupta, V. Adversarial attacks in consensus-based multi-agent reinforcement learning. arXiv preprint arXiv:2103.06967, 2021.

Resilient projection-based consensus actor-critic (RPBCAC) algorithm

Related tags

Overview

Resilient projection-based consensus actor-critic (RPBCAC) algorithm

Multi-agent grid world: cooperative navigation

1) All cooperative

2) Three cooperative + one greedy

3) Three cooperative + one faulty

4) Three cooperative + one malicious

References

Owner

Martin Figura

A Python training and inference implementation of Yolov5 helmet detection in Jetson Xavier nx and Jetson nano

Keyhole Imaging: Non-Line-of-Sight Imaging and Tracking of Moving Objects Along a Single Optical Path

Geometric Deep Learning Extension Library for PyTorch

Project page for End-to-end Recovery of Human Shape and Pose

Official codebase for running the small, filtered-data GLIDE model from GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models.

Keras Realtime Multi-Person Pose Estimation - Keras version of Realtime Multi-Person Pose Estimation project

This repository contains the code for EMNLP-2021 paper "Word-Level Coreference Resolution"

TinyML Cookbook, published by Packt

Train robotic agents to learn pick and place with deep learning for vision-based manipulation in PyBullet.

Hierarchical Memory Matching Network for Video Object Segmentation (ICCV 2021)

(IEEE TIP 2021) Regularized Densely-connected Pyramid Network for Salient Instance Segmentation

Expand human face editing via Global Direction of StyleCLIP, especially to maintain similarity during editing.

Deep Text Search is an AI-powered multilingual text search and recommendation engine with state-of-the-art transformer-based multilingual text embedding (50+ languages).

Collect some papers about transformer with vision. Awesome Transformer with Computer Vision (CV)

Companion code for "Bayesian logistic regression for online recalibration and revision of risk prediction models with performance guarantees"

This reposityory contains the PyTorch implementation of our paper "Generative Dynamic Patch Attack".

Official pytorch code for SSC-GAN: Semi-Supervised Single-Stage Controllable GANs for Conditional Fine-Grained Image Generation(ICCV 2021)

BisQue is a web-based platform designed to provide researchers with organizational and quantitative analysis tools for 5D image data. Users can extend BisQue by implementing containerized ML workflows.

Official PyTorch implementation of "The Center of Attention: Center-Keypoint Grouping via Attention for Multi-Person Pose Estimation" (ICCV 21).

Official repository of the paper 'Essentials for Class Incremental Learning'