Safe Policy Optimization with Local Features

Overview

Safe Policy Optimization with Local Feature (SPO-LF)

This is the source-code for implementing the algorithms in the paper "Safe Policy Optimization with Local Generalized Linear Function Approximations" which was presented in NeurIPS-21.

Installation

There is requirements.txt in this repository. Except for the common modules (e.g., numpy, scipy), our source code depends on the following modules.

We also provide Dockerfile in this repository, which can be used for reproducing our grid-world experiment.

Simulation configuration

We manage the simulation configuration using hydra. Configurations are listed in config.yaml. For example, the algorithm to run should be chosen from the ones we implemented:

sim_type: {safe_glm, unsafe_glm, random, oracle, safe_gp_state, safe_gp_feature, safe_glm_stepwise}

Grid World Experiment

The source code necessary for our grid-world experiment is contained in /grid_world folder. To run the simulation, for example, use the following commands.

cd grid_world
python main.py sim_type=safe_glm env.reuse_env=False

For the monte carlo simulation while comparing our proposed method with baselines, use the shell file, run.sh.

We also provide a script for visualization. If you want to render how the agent behaves, use the following command.

python main.py sim_type=safe_glm env.reuse_env=True

Safety-Gym Experiment

The source code necessary for our safety-gym experiment is contained in /safety_gym_discrete folder. Our experiment is based on safety-gym. Our proposed method utilize dynamic programming algorithms to solve Bellman Equation, so we modified engine.py to discrtize the environment. We attach modified safety-gym source code in /safety_gym_discrete/engine.py. To use the modified library, please clone safety-gym, then replace safety-gym/safety_gym/envs/engine.py using /safety_gym_discrete/engine.py in our repo. Using the following commands to install the modified library:

cd safety_gym
pip install -e .

Note that MuJoCo licence is needed for installing Safety-Gym. To run the simulation, use the folowing commands.

cd safety_gym_discrete
python main.py sim_idx=0

We compare our proposed method with three notable baselines: CPO, PPO-Lagrangian, and TRPO-Lagrangian. The baseline implementation depends on safety-starter-agents. We modified run_agent.py in the repo source code.

To run the baseline, use the folowing commands.

cd safety_gym_discrete/baseline
python baseline_run.py sim_type=cpo

The environment that agent runs on is generated using generate_env.py. We provide 10 50*50 environments. If you want to generate other environments, you can change the world shape in safety_gym_discrete.py, and running the following commands:

cd safety_gym_discrete
python generate_env.py

Citation

If you find this code useful in your research, please consider citing:

@inproceedings{wachi_yue_sui_neurips2021,
  Author = {Wachi, Akifumi and Wei, Yunyue and Sui, Yanan},
  Title = {Safe Policy Optimization with Local Generalized Linear Function Approximations},
  Booktitle  = {Neural Information Processing Systems (NeurIPS)},
  Year = {2021}
}
Owner
Akifumi Wachi
Akifumi Wachi
Face Recognition Attendance Project

Face-Recognition-Attendance-Project In This Project You will learn how to mark attendance using face recognition, Hello Guys This is Gautam Kumar, Thi

Gautam Kumar 1 Dec 03, 2022
Improving Transferability of Representations via Augmentation-Aware Self-Supervision

Improving Transferability of Representations via Augmentation-Aware Self-Supervision Accepted to NeurIPS 2021 TL;DR: Learning augmentation-aware infor

hankook 38 Sep 16, 2022
This repository contains the source code of an efficient 1D probabilistic model for music time analysis proposed in ICASSP2022 venue.

Jump Reward Inference for 1D Music Rhythmic State Spaces An implementation of the probablistic jump reward inference model for music rhythmic informat

Mojtaba Heydari 25 Dec 16, 2022
Cascaded Deep Video Deblurring Using Temporal Sharpness Prior and Non-local Spatial-Temporal Similarity

This repository is the official PyTorch implementation of Cascaded Deep Video Deblurring Using Temporal Sharpness Prior and Non-local Spatial-Temporal Similarity

hippopmonkey 4 Dec 11, 2022
PyTorch implementation of Soft-DTW: a Differentiable Loss Function for Time-Series in CUDA

Soft DTW Loss Function for PyTorch in CUDA This is a Pytorch Implementation of Soft-DTW: a Differentiable Loss Function for Time-Series which is batch

Keon Lee 76 Dec 20, 2022
The PyTorch implementation of paper REST: Debiased Social Recommendation via Reconstructing Exposure Strategies

REST The PyTorch implementation of paper REST: Debiased Social Recommendation via Reconstructing Exposure Strategies. Usage Download dataset Download

DMIRLAB 2 Mar 13, 2022
High dimensional black-box optimizer using Latent Action Monte Carlo Tree Search algorithm

LA-MCTS The code is based of paper Learning Search Space Partition for Black-box Optimization using Monte Carlo Tree Search. Component LA-MCTS has thr

Meta Research 18 Oct 24, 2022
A framework for joint super-resolution and image synthesis, without requiring real training data

SynthSR This repository contains code to train a Convolutional Neural Network (CNN) for Super-resolution (SR), or joint SR and data synthesis. The met

83 Jan 01, 2023
In this tutorial, you will perform inference across 10 well-known pre-trained object detectors and fine-tune on a custom dataset. Design and train your own object detector.

Object Detection Object detection is a computer vision task for locating instances of predefined objects in images or videos. In this tutorial, you wi

Ibrahim Sobh 62 Dec 25, 2022
New approach to benchmark VQA models

VQA Benchmarking This repository contains the web application & the python interface to evaluate VQA models. Documentation Please see the documentatio

4 Jul 25, 2022
PyTorch META-DATASET (Few-shot classification benchmark)

PyTorch META-DATASET (Few-shot classification benchmark) This repo contains a PyTorch implementation of meta-dataset and a unified implementation of s

Malik Boudiaf 39 Oct 31, 2022
Pytorch implementation of “Recursive Non-Autoregressive Graph-to-Graph Transformer for Dependency Parsing with Iterative Refinement”

Graph-to-Graph Transformers Self-attention models, such as Transformer, have been hugely successful in a wide range of natural language processing (NL

Idiap Research Institute 40 Aug 14, 2022
Dynamic Divide-and-Conquer Adversarial Training for Robust Semantic Segmentation (ICCV2021)

Dynamic Divide-and-Conquer Adversarial Training for Robust Semantic Segmentation This is a pytorch project for the paper Dynamic Divide-and-Conquer Ad

DV Lab 29 Nov 21, 2022
A Python package to process & model ChEMBL data.

insilico: A Python package to process & model ChEMBL data. ChEMBL is a manually curated chemical database of bioactive molecules with drug-like proper

Steven Newton 0 Dec 09, 2021
for a paper about leveraging discourse markers for training new models

TSLM-DISCOURSE-MARKERS Scope This repository contains: (1) Code to extract discourse markers from wikipedia (TSA). (1) Code to extract significant dis

International Business Machines 6 Nov 02, 2022
Baseline for the Spoofing-aware Speaker Verification Challenge 2022

Introduction This repository contains several materials that supplements the Spoofing-Aware Speaker Verification (SASV) Challenge 2022 including: calc

40 Dec 28, 2022
Official implementation of "DSP: Dual Soft-Paste for Unsupervised Domain Adaptive Semantic Segmentation"

DSP Official implementation of "DSP: Dual Soft-Paste for Unsupervised Domain Adaptive Semantic Segmentation". Accepted by ACM Multimedia 2021. Authors

20 Oct 24, 2022
This repository provides the official code for GeNER (an automated dataset Generation framework for NER).

GeNER This repository provides the official code for GeNER (an automated dataset Generation framework for NER). Overview of GeNER GeNER allows you to

DMIS Laboratory - Korea University 50 Nov 30, 2022
An implementation demo of the ICLR 2021 paper Neural Attention Distillation: Erasing Backdoor Triggers from Deep Neural Networks in PyTorch.

Neural Attention Distillation This is an implementation demo of the ICLR 2021 paper Neural Attention Distillation: Erasing Backdoor Triggers from Deep

Yige-Li 84 Jan 04, 2023
Simulations for Turring patterns on an apically expanding domain. T

Turing patterns on expanding domain Simulations for Turring patterns on an apically expanding domain. The details about the models and numerical imple

Yue Liu 0 Aug 03, 2021