Official Repository for "Robust On-Policy Data Collection for Data Efficient Policy Evaluation" (NeurIPS 2021 Workshop on OfflineRL).

Overview

Robust On-Policy Data Collection for Data-Efficient Policy Evaluation

Source code of Robust On-Policy Data Collection for Data-Efficient Policy Evaluation (NeurIPS 2021 Workshop on OfflineRL).

The code is written in python 3, using Pytorch for the implementation of the deep networks and OpenAI gym for the experiment domains.

Requirements

To install the required codebase, it is recommended to create a conda or a virtual environment. Then, run the following command

pip install -r requirements.txt

Preparation

To conduct policy evaluation, we need to prepare a set of pretrained policies. You can skip this part if you already have the pretrained models in policy_models/ and the corresponding policy values in experiments/policy_info.py

Pretrained Policy

Train the policy models using REINFORCE in different domains by running:

python policy/reinfoce.py --exp_name {exp_name}

where {exp_name} can be MultiBandit, GridWorld, CartPole or CartPoleContinuous. The parameterized epsilon-greedy policies for MultiBandit and GridWorld can be obtained by running:

python policy/handmade_policy.py

Policy Value

Option 1: Run in sequence

For each policy model, the true policy value is estimated with $10^6$ Monte Carlo roll-outs by running:

python experiments/policy_value.py --policy_name {policy_name} --seed {seed} --n 10e6

This will print the average steps, true policy value and variance of returns. Make sure you copy these results into the file experiment/policy_info.py.

Option 2: Run in parallel

If you can use qsub or sbatch, you can also run jobs/jobs_value.py with different seeds in parallel and merge them by running experiments/merge_values.py to get $10^6$ Monte Carlo roll-outs. The policy values reported in this paper were obtained in this way.

Evaluation

Option 1: Run in sequence

The main running script for policy evaluation is experiments/evaluate.py. The following running command is an example of Monte Carlo estimation for Robust On-policy Acting with $\rho=1.0$ for the policy model_GridWorld_5000.pt with seeds from 0 to 199.

python experiments/evaluate.py --policy_name GridWorld_5000 --ros_epsilon 1.0 --collectors RobustOnPolicyActing --estimators MonteCarlo --eval_steps "7,14,29,59,118,237,475,951,1902,3805,7610,15221,30443,60886" --seeds "0,199"

To conduct policy evaluation with off-policy data, you need to add the following arguments to the above running command:

--combined_trajectories 100 --combined_ops_epsilon 0.10 

Option 2: Run in parallel

If you can use qsub or sbatch, you may only need to run the script jobs/jobs.py where all experiments in the paper are arranged. The log will be saved in log/ and the seed results will be saved in results/seeds. Note that we save the data collection cache in results/data and re-use it for different value estimations. To merge results of different seeds, run experiments/merge_results.py, and the merged results will be saved in results/.

Ploting

When the experiments are finished, all the figures in the paper are produced by running

python drawing/draw.py

Citing

If you use this repository in your work, please consider citing the paper

@inproceedings{zhong2021robust,
    title = {Robust On-Policy Data Collection for Data-Efficient Policy Evaluation},
    author = {Rujie Zhong, Josiah P. Hanna, Lukas Schäfer and Stefano V. Albrecht},
    booktitle = {NeurIPS Workshop on Offline Reinforcement Learning (OfflineRL)},
    year = {2021}
}
Owner
Autonomous Agents Research Group (University of Edinburgh)
Official code repositories for projects by the Autonomous Agents Research Group
Autonomous Agents Research Group (University of Edinburgh)
A `Neural = Symbolic` framework for sound and complete weighted real-value logic

Logical Neural Networks LNNs are a novel Neuro = symbolic framework designed to seamlessly provide key properties of both neural nets (learning) and s

International Business Machines 138 Dec 19, 2022
Revisting Open World Object Detection

Revisting Open World Object Detection Installation See INSTALL.md. Dataset Our n

58 Dec 23, 2022
Taming Transformers for High-Resolution Image Synthesis

Taming Transformers for High-Resolution Image Synthesis CVPR 2021 (Oral) Taming Transformers for High-Resolution Image Synthesis Patrick Esser*, Robin

CompVis Heidelberg 3.5k Jan 03, 2023
[ICCV'21] Official implementation for the paper Social NCE: Contrastive Learning of Socially-aware Motion Representations

CrowdNav with Social-NCE This is an official implementation for the paper Social NCE: Contrastive Learning of Socially-aware Motion Representations by

VITA lab at EPFL 125 Dec 23, 2022
A package, and script, to perform imaging transcriptomics on a neuroimaging scan.

Imaging Transcriptomics Imaging transcriptomics is a methodology that allows to identify patterns of correlation between gene expression and some prop

Alessio Giacomel 10 Dec 27, 2022
Official PyTorch implementation and pretrained models of the paper Self-Supervised Classification Network

Self-Classifier: Self-Supervised Classification Network Official PyTorch implementation and pretrained models of the paper Self-Supervised Classificat

Elad Amrani 24 Dec 21, 2022
Learning to Map Large-scale Sparse Graphs on Memristive Crossbar

Release of AutoGMap:Learning to Map Large-scale Sparse Graphs on Memristive Crossbar For reproduction of our searched model, the Ubuntu OS is recommen

2 Aug 23, 2022
Graph Analysis From Scratch

Graph Analysis From Scratch Goal In this notebook we wanted to implement some functionalities to analyze a weighted graph only by using algorithms imp

Arturo Ghinassi 0 Sep 17, 2022
CoTr: Efficiently Bridging CNN and Transformer for 3D Medical Image Segmentation

CoTr: Efficient 3D Medical Image Segmentation by bridging CNN and Transformer This is the official pytorch implementation of the CoTr: Paper: CoTr: Ef

218 Dec 25, 2022
AirCode: A Robust Object Encoding Method

AirCode This repo contains source codes for the arXiv preprint "AirCode: A Robust Object Encoding Method" Demo Object matching comparison when the obj

Chen Wang 30 Dec 09, 2022
Integrated Semantic and Phonetic Post-correction for Chinese Speech Recognition

Integrated Semantic and Phonetic Post-correction for Chinese Speech Recognition | paper | dataset | pretrained detection model | Authors: Yi-Chang Che

Yi-Chang Chen 1 Aug 23, 2022
Unified learning approach for egocentric hand gesture recognition and fingertip detection

Unified Gesture Recognition and Fingertip Detection A unified convolutional neural network (CNN) algorithm for both hand gesture recognition and finge

Mohammad 227 Dec 25, 2022
Hydra Lightning Template for Structured Configs

Hydra Lightning Template for Structured Configs Template for creating projects with pytorch-lightning and hydra. How to use this template? Create your

Model-driven Machine Learning 4 Jul 19, 2022
Reinforcement Learning for Portfolio Management

qtrader Reinforcement Learning for Portfolio Management Why Reinforcement Learning? Learns the optimal action, rather than models the market. Adaptive

Angelos Filos 406 Jan 01, 2023
Markov Attention Models

Introduction This repo contains code for reproducing the results in the paper Graphical Models with Attention for Context-Specific Independence and an

Vicarious 0 Dec 09, 2021
Code of Puregaze: Purifying gaze feature for generalizable gaze estimation, AAAI 2022.

PureGaze: Purifying Gaze Feature for Generalizable Gaze Estimation Description Our work is accpeted by AAAI 2022. Picture: We propose a domain-general

39 Dec 05, 2022
PyTorch implementation of Barlow Twins.

Barlow Twins: Self-Supervised Learning via Redundancy Reduction PyTorch implementation of Barlow Twins. @article{zbontar2021barlow, title={Barlow Tw

Facebook Research 839 Dec 29, 2022
[CVPR 2021] Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

[CVPR 2021] Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Fudan Zhang Vision Group 897 Jan 05, 2023
Official implementation for "Low-light Image Enhancement via Breaking Down the Darkness"

Low-light Image Enhancement via Breaking Down the Darkness by Qiming Hu, Xiaojie Guo. 1. Dependencies Python3 PyTorch=1.0 OpenCV-Python, TensorboardX

Qiming Hu 30 Jan 01, 2023
Official PyTorch implementation of the paper "Self-Supervised Relational Reasoning for Representation Learning", NeurIPS 2020 Spotlight.

Official PyTorch implementation of the paper: "Self-Supervised Relational Reasoning for Representation Learning" (2020), Patacchiola, M., and Storkey,

Massimiliano Patacchiola 135 Jan 03, 2023