A Closer Look at Invalid Action Masking in Policy Gradient Algorithms

Overview

A Closer Look at Invalid Action Masking in Policy Gradient Algorithms

This repo contains the source code to reproduce the results in the paper A Closer Look at Invalid Action Masking in Policy Gradient Algorithms.

Steps to reproduce the experiments

Our experiments use docker containers to run and Weight and Biases (https://www.wandb.com/) to record the experiments, so the first step is to register a wandb account and get an API key, which we refer to as YOUR_WANDB_KEY

# build the docker container
docker build -t invalid_action_masking:latest -f sharedmemory.Dockerfile .
# build docker run commands. replace `{YOUR_WANDB_KEY}` with your own
WANDB_KEY={YOUR_WANDB_KEY} python docker.py > docker.sh
# run experiments (96 in total)
# if you have limited computational resources, consider not running all of them at a time.
# in addition, notice the commands have --cpuset-cpus="0", --cpuset-cpus="1" for different runs
# to make sure each container is only using one core. By default I assume your machine has 40 cores,
# but feel free to modify the `cores` variable in `docker.py`
bash docker.sh

Steps to reproduce the figures

Record your wandb username, which we will refer to as YOUR_WANDB_ENTITY

cd plots
WANDB_ENTITY={YOUR_WANDB_ENTITY} python episode_reward.py
WANDB_ENTITY={YOUR_WANDB_ENTITY} python approx_kl.py

These command should reproduce the PDFs in plots that are attached to the repo.

Reproduction without WANDB

Although it would be possible, it would require a significant amount of effort to properly log metrics and redo the plotting, so at this time we would not have intructions to do reproduction without WANDB. Note that it is possible to use wandb locally by following https://docs.wandb.com/self-hosted/local.

If you have an issue reproducing the results

We have tested these scripts to reproduce but it is possible that there is a bug and maybe we are assuming something specific regarding the environment. If you couldn't reproduce our results, please file an issue and we will address it as soon as the double-blind review is over.

Owner
Costa Huang
Computer Science Ph.D student at Drexel University researching Game Artificial Intelligence
Costa Huang
AOT-GAN for High-Resolution Image Inpainting (codebase for image inpainting)

AOT-GAN for High-Resolution Image Inpainting Arxiv Paper | AOT-GAN: Aggregated Contextual Transformations for High-Resolution Image Inpainting Yanhong

Multimedia Research 214 Jan 03, 2023
Fake News Detection Using Machine Learning Methods

Fake-News-Detection-Using-Machine-Learning-Methods Fake news is always a real and dangerous issue. However, with the presence and abundance of various

Achraf Safsafi 1 Jan 11, 2022
Detecting drunk people through thermal images using Deep Learning (CNN)

Drunk Detection CNN Detecting drunk people through thermal images using Deep Learning (CNN) Dataset We used thermal images provided by Electronics Lab

Giacomo Ferretti 3 Oct 27, 2022
ShinRL: A Library for Evaluating RL Algorithms from Theoretical and Practical Perspectives

Status: Under development (expect bug fixes and huge updates) ShinRL: A Library for Evaluating RL Algorithms from Theoretical and Practical Perspectiv

37 Dec 28, 2022
Pun Detection and Location

Pun Detection and Location “The Boating Store Had Its Best Sail Ever”: Pronunciation-attentive Contextualized Pun Recognition Yichao Zhou, Jyun-yu Jia

lawson 3 May 13, 2022
A resource for learning about deep learning techniques from regression to LSTM and Reinforcement Learning using financial data and the fitness functions of algorithmic trading

A tour through tensorflow with financial data I present several models ranging in complexity from simple regression to LSTM and policy networks. The s

195 Dec 07, 2022
Official PyTorch implementation of the paper: Improving Graph Neural Network Expressivity via Subgraph Isomorphism Counting.

Improving Graph Neural Network Expressivity via Subgraph Isomorphism Counting Official PyTorch implementation of the paper: Improving Graph Neural Net

Giorgos Bouritsas 58 Dec 31, 2022
A forwarding MPI implementation that can use any other MPI implementation via an MPI ABI

MPItrampoline MPI wrapper library: MPI trampoline library: MPI integration tests: MPI is the de-facto standard for inter-node communication on HPC sys

Erik Schnetter 31 Dec 22, 2022
COLMAP - Structure-from-Motion and Multi-View Stereo

COLMAP About COLMAP is a general-purpose Structure-from-Motion (SfM) and Multi-View Stereo (MVS) pipeline with a graphical and command-line interface.

4.7k Jan 07, 2023
codes for Self-paced Deep Regression Forests with Consideration on Ranking Fairness

Self-paced Deep Regression Forests with Consideration on Ranking Fairness This is official codes for paper Self-paced Deep Regression Forests with Con

Learning in Vision 4 Sep 11, 2022
PRTR: Pose Recognition with Cascade Transformers

PRTR: Pose Recognition with Cascade Transformers Introduction This repository is the official implementation for Pose Recognition with Cascade Transfo

mlpc-ucsd 133 Dec 30, 2022
A general framework for deep learning experiments under PyTorch based on pytorch-lightning

torchx Torchx is a general framework for deep learning experiments under PyTorch based on pytorch-lightning. TODO list gan-like training wrapper text

Yingtian Liu 6 Mar 17, 2022
CRNN With PyTorch

CRNN-PyTorch Implementation of https://arxiv.org/abs/1507.05717

Vadim 4 Sep 01, 2022
Accuracy Aligned. Concise Implementation of Swin Transformer

Accuracy Aligned. Concise Implementation of Swin Transformer This repository contains the implementation of Swin Transformer, and the training codes o

FengWang 77 Dec 16, 2022
Codes for SIGIR'22 Paper 'On-Device Next-Item Recommendation with Self-Supervised Knowledge Distillation'

OD-Rec Codes for SIGIR'22 Paper 'On-Device Next-Item Recommendation with Self-Supervised Knowledge Distillation' Paper, saved teacher models and Andro

Xin Xia 11 Nov 22, 2022
An implementation of the [Hierarchical (Sig-Wasserstein) GAN] algorithm for large dimensional Time Series Generation

Hierarchical GAN for large dimensional financial market data Implementation This repository is an implementation of the [Hierarchical (Sig-Wasserstein

11 Nov 29, 2022
A full pipeline AutoML tool for tabular data

HyperGBM Doc | 中文 We Are Hiring! Dear folks,we are offering challenging opportunities located in Beijing for both professionals and students who are k

DataCanvas 240 Jan 03, 2023
Revisiting Oxford and Paris: Large-Scale Image Retrieval Benchmarking

Revisiting Oxford and Paris: Large-Scale Image Retrieval Benchmarking We revisit and address issues with Oxford 5k and Paris 6k image retrieval benchm

Filip Radenovic 188 Dec 17, 2022
Deep learning operations reinvented (for pytorch, tensorflow, jax and others)

This video in better quality. einops Flexible and powerful tensor operations for readable and reliable code. Supports numpy, pytorch, tensorflow, and

Alex Rogozhnikov 6.2k Jan 01, 2023
Visualize Camera's Pose Using Extrinsic Parameter by Plotting Pyramid Model on 3D Space

extrinsic2pyramid Visualize Camera's Pose Using Extrinsic Parameter by Plotting Pyramid Model on 3D Space Intro A very simple and straightforward modu

JEONG HYEONJIN 106 Dec 28, 2022