Safe Policy Optimization with Local Features

Overview

Safe Policy Optimization with Local Feature (SPO-LF)

This is the source-code for implementing the algorithms in the paper "Safe Policy Optimization with Local Generalized Linear Function Approximations" which was presented in NeurIPS-21.

Installation

There is requirements.txt in this repository. Except for the common modules (e.g., numpy, scipy), our source code depends on the following modules.

We also provide Dockerfile in this repository, which can be used for reproducing our grid-world experiment.

Simulation configuration

We manage the simulation configuration using hydra. Configurations are listed in config.yaml. For example, the algorithm to run should be chosen from the ones we implemented:

sim_type: {safe_glm, unsafe_glm, random, oracle, safe_gp_state, safe_gp_feature, safe_glm_stepwise}

Grid World Experiment

The source code necessary for our grid-world experiment is contained in /grid_world folder. To run the simulation, for example, use the following commands.

cd grid_world
python main.py sim_type=safe_glm env.reuse_env=False

For the monte carlo simulation while comparing our proposed method with baselines, use the shell file, run.sh.

We also provide a script for visualization. If you want to render how the agent behaves, use the following command.

python main.py sim_type=safe_glm env.reuse_env=True

Safety-Gym Experiment

The source code necessary for our safety-gym experiment is contained in /safety_gym_discrete folder. Our experiment is based on safety-gym. Our proposed method utilize dynamic programming algorithms to solve Bellman Equation, so we modified engine.py to discrtize the environment. We attach modified safety-gym source code in /safety_gym_discrete/engine.py. To use the modified library, please clone safety-gym, then replace safety-gym/safety_gym/envs/engine.py using /safety_gym_discrete/engine.py in our repo. Using the following commands to install the modified library:

cd safety_gym
pip install -e .

Note that MuJoCo licence is needed for installing Safety-Gym. To run the simulation, use the folowing commands.

cd safety_gym_discrete
python main.py sim_idx=0

We compare our proposed method with three notable baselines: CPO, PPO-Lagrangian, and TRPO-Lagrangian. The baseline implementation depends on safety-starter-agents. We modified run_agent.py in the repo source code.

To run the baseline, use the folowing commands.

cd safety_gym_discrete/baseline
python baseline_run.py sim_type=cpo

The environment that agent runs on is generated using generate_env.py. We provide 10 50*50 environments. If you want to generate other environments, you can change the world shape in safety_gym_discrete.py, and running the following commands:

cd safety_gym_discrete
python generate_env.py

Citation

If you find this code useful in your research, please consider citing:

@inproceedings{wachi_yue_sui_neurips2021,
  Author = {Wachi, Akifumi and Wei, Yunyue and Sui, Yanan},
  Title = {Safe Policy Optimization with Local Generalized Linear Function Approximations},
  Booktitle  = {Neural Information Processing Systems (NeurIPS)},
  Year = {2021}
}
Owner
Akifumi Wachi
Akifumi Wachi
A passive-recon tool that parses through found assets and interacts with the Hackerone API

Hackerone Passive Recon Tool A passive-recon tool that parses through found assets and interacts with the Hackerone API. Setup Simply run setup.sh to

elbee 4 Jan 13, 2022
将hw时信息收集以及简单的漏洞操作步骤简单化

Braised-vegetables 将hw时信息收集以及简单的漏洞扫描操作步骤简单化 使用subfinder(被动子域名爆破收集) subdomain(主动域名爆破) nabbu(端口扫描) httpx(探测目录浏览) crawlergo(360深度爬虫) chorme(谷歌浏览器) xray(漏

19 Nov 15, 2022
Brute Force Guess the password for Instgram accounts with python

Brute-Force-instagram Guess the password for Instgram accounts Tool features : It has two modes: 1- Combo system from you 2- Automatic (random) system

45 Dec 11, 2022
This is python script that will extract the functions call in all used DLL in an executable and then provide a mapping of those functions to the attack classes defined and curated malapi.io.

F2Amapper This is python script that will extract the functions call in all used DLL in an executable and then provide a mapping of those functions to

Ajit Kumar 3 Sep 03, 2022
Lazarus analysis tools and research report

Lazarus Research This repository publishes analysis reports and analysis tools for Operation Dream Job and Operation JTrack for Lazarus. Tools Python

JPCERT Coordination Center 50 Sep 13, 2022
Virus-Builder - This tool will generate a virus that can only destroy Windows computer

Virus-Builder - This tool will generate a virus that can only destroy Windows computer. You can also configure to auto run in usb drive

Saad 16 Dec 30, 2022
neo Tool is great one in binary exploitation topic

neo Tool is great one in binary exploitation topic. instead of doing several missions by many tools and windows, you can now automate this in one tool in one session.. Enjoy it

Hamza Elansari 4 Oct 10, 2022
Simple script to have LDAP authentication in Home Assistant Docker, using NGINX's ldap-auth container

Home Assistant LDAP Auth Simple script to have LDAP authentication in Home Assistant Docker, using NGINX's ldap-auth container. Usage Deploy NGINX's l

Erik 1 Sep 21, 2022
the metasploit script(POC) about CVE-2021-36260

CVE-2021-36260-metasploit the metasploit script(POC) about CVE-2021-36260. A command injection vulnerability in the web server of some Hikvision produ

Taroballz 14 Nov 09, 2022
An easy-to-use wrapper for NTFS-3G on macOS

ezNTFS ezNTFS is an easy-to-use wrapper for NTFS-3G on macOS. ezNTFS can be used as a menu bar app, or via the CLI in the terminal. Installation To us

Matthew Go 34 Dec 01, 2022
🐎🖥《赛马娘》(ウマ娘: Pretty Derby)辅助脚本

auto-derby 自动化养马 育成结果 Nurturing result 功能 支持客户端 DMM (前台) 实验性 安卓 ADB 连接(后台)开发基于 1080x1920 分辨率 团队赛 (Team race) 有胜利确定奖励时吃帕菲 日常赛 (Daily race) PvP 活动赛 (Cha

NateScarlet 376 Jan 01, 2023
KeyKatcher is a keylogger that records keystrokes made on a computer and sends to the E-Mail.

What is a keylogger? A keylogger is a software application or piece of hardware that monitors and records keystrokes made on a computer keyboard. The

Himank_Jain 7 Sep 19, 2022
Having a weak password is not good for a system that demands high confidentiality and security of user credentials

Having a weak password is not good for a system that demands high confidentiality and security of user credentials. It turns out that people find it difficult to make up a strong password that is str

PyLaboratory 0 Feb 07, 2022
Generate malicious files using recently published bidi-attack (CVE-2021-42574)

CVE-2021-42574 - Code generator Generate malicious files using recently published bidi-attack vulnerability, which was discovered in Unicode Specifica

js-on 7 Nov 09, 2022
A Python Scanner for log4j

log4j-Scanner scanner for log4j cat web-urls.txt | python3 log4j.py ID.burpcollaborator.net web-urls.txt http://127.0.0.1:8080 https://www.google.c

Ihebski 5 Jun 26, 2022
com_media allowed paths that are not intended for image uploads to RCE

CVE-2021-23132 com_media allowed paths that are not intended for image uploads to RCE. CVE-2020-24597 Directory traversal in com_media to RCE Two CVEs

KIEN HOANG 67 Nov 09, 2022
DNSpooq - dnsmasq cache poisoning (CVE-2020-25686, CVE-2020-25684, CVE-2020-25685)

dnspooq DNSpooq PoC - dnsmasq cache poisoning (CVE-2020-25686, CVE-2020-25684, CVE-2020-25685) For educational purposes only Requirements Docker compo

Teppei Fukuda 80 Nov 28, 2022
A Python application to predict what is cooking

ez-cuisine-classifier A Python application to predict what is cooking Environment Python 3.9 Windows 10 Install python -m venv venv .\venv\Scripts\act

Zeheng Li 1 Jun 21, 2022
Sample exploits for Zephyr CVE-2021-3625

CVE-2021-3625 This repository contains a few example exploits for CVE-2021-3625. All Zephyr-based usb devices up to (and including) version 2.5.0 suff

7 Nov 10, 2022
Dumping revelant information on compromised targets without AV detection

DonPAPI Dumping revelant information on compromised targets without AV detection DPAPI dumping Lots of credentials are protected by DPAPI (link ) We a

Login Securite 580 Jan 09, 2023