Safe Policy Optimization with Local Features

Overview

Safe Policy Optimization with Local Feature (SPO-LF)

This is the source-code for implementing the algorithms in the paper "Safe Policy Optimization with Local Generalized Linear Function Approximations" which was presented in NeurIPS-21.

Installation

There is requirements.txt in this repository. Except for the common modules (e.g., numpy, scipy), our source code depends on the following modules.

We also provide Dockerfile in this repository, which can be used for reproducing our grid-world experiment.

Simulation configuration

We manage the simulation configuration using hydra. Configurations are listed in config.yaml. For example, the algorithm to run should be chosen from the ones we implemented:

sim_type: {safe_glm, unsafe_glm, random, oracle, safe_gp_state, safe_gp_feature, safe_glm_stepwise}

Grid World Experiment

The source code necessary for our grid-world experiment is contained in /grid_world folder. To run the simulation, for example, use the following commands.

cd grid_world
python main.py sim_type=safe_glm env.reuse_env=False

For the monte carlo simulation while comparing our proposed method with baselines, use the shell file, run.sh.

We also provide a script for visualization. If you want to render how the agent behaves, use the following command.

python main.py sim_type=safe_glm env.reuse_env=True

Safety-Gym Experiment

The source code necessary for our safety-gym experiment is contained in /safety_gym_discrete folder. Our experiment is based on safety-gym. Our proposed method utilize dynamic programming algorithms to solve Bellman Equation, so we modified engine.py to discrtize the environment. We attach modified safety-gym source code in /safety_gym_discrete/engine.py. To use the modified library, please clone safety-gym, then replace safety-gym/safety_gym/envs/engine.py using /safety_gym_discrete/engine.py in our repo. Using the following commands to install the modified library:

cd safety_gym
pip install -e .

Note that MuJoCo licence is needed for installing Safety-Gym. To run the simulation, use the folowing commands.

cd safety_gym_discrete
python main.py sim_idx=0

We compare our proposed method with three notable baselines: CPO, PPO-Lagrangian, and TRPO-Lagrangian. The baseline implementation depends on safety-starter-agents. We modified run_agent.py in the repo source code.

To run the baseline, use the folowing commands.

cd safety_gym_discrete/baseline
python baseline_run.py sim_type=cpo

The environment that agent runs on is generated using generate_env.py. We provide 10 50*50 environments. If you want to generate other environments, you can change the world shape in safety_gym_discrete.py, and running the following commands:

cd safety_gym_discrete
python generate_env.py

Citation

If you find this code useful in your research, please consider citing:

@inproceedings{wachi_yue_sui_neurips2021,
  Author = {Wachi, Akifumi and Wei, Yunyue and Sui, Yanan},
  Title = {Safe Policy Optimization with Local Generalized Linear Function Approximations},
  Booktitle  = {Neural Information Processing Systems (NeurIPS)},
  Year = {2021}
}
Owner
Akifumi Wachi
Akifumi Wachi
Analyse a forensic target (such as a directory) to find and report files found and not found from CIRCL hashlookup public service

Analyse a forensic target (such as a directory) to find and report files found and not found from CIRCL hashlookup public service. This tool can help a digital forensic investigator to know the conte

hashlookup 96 Dec 20, 2022
Log4j rce test environment and poc

log4jpwn log4j rce test environment See: https://www.lunasec.io/docs/blog/log4j-zero-day/ Experiments to trigger in various software products mentione

Leon Jacobs 307 Dec 24, 2022
集成crawlergo、xray、dirsearch、nmap等工具的src漏洞挖掘工具,使用docker封装运行;

tools下有几个工具,所以项目文件比较大,如果下载总是中断的话建议拆开下载各个项目然后直接拷贝dockefile和recon.py即可 0x01 hscan介绍 hscan是什么 hscan是一款旨在使用一条命令替代渗透前的多条扫描命令,通过集成crawlergo扫描和xray扫描、dirsear

102 Jan 04, 2023
Spray365 is a password spraying tool that identifies valid credentials for Microsoft accounts (Office 365 / Azure AD).

What is Spray365? Spray365 is a password spraying tool that identifies valid credentials for Microsoft accounts (Office 365 / Azure AD). How is Spray3

Mark Hedrick 246 Dec 28, 2022
FBGen is simple facebook user based wordlist generator using Username/ID and cookie.

FBGen is simple facebook user based wordlist generator using Username/ID and cookie.

2 Jul 20, 2022
Simple Python 3 script to detect the "Log4j" Java library vulnerability (CVE-2021-44228) for a list of URL with multithreading

log4j-detect Simple Python 3 script to detect the "Log4j" Java library vulnerability (CVE-2021-44228) for a list of URL with multithreading The script

Wade 1 Dec 15, 2021
Execution After Redirect (EAR) / Long Response Redirection Vulnerability Scanner written in python3

Execution After Redirect (EAR) / Long Response Redirection Vulnerability Scanner written in python3, It Fuzzes All URLs of target website & then scan them for EAR

Pushpender Singh 9 Dec 12, 2022
Big-Papa Integrates Javascript and python for remote cookie stealing which then can be used for session hijacking

Big-Papa is a remote cookie stealer which can then be used for session hijacking and Bypassing 2 Factor Authentication

77 Jan 03, 2023
🎻 Modularized exploit generation framework

Modularized exploit generation framework for x86_64 binaries Overview This project is still at early stage of development, so you might want to come b

ᴀᴇꜱᴏᴘʜᴏʀ 30 Jan 17, 2022
Exploit and Check Script for CVE 2022-1388

F5-CVE-2022-1388-Exploit Exploit and Check Script for CVE 2022-1388 Usage Check against single host python3 CVE-2022-1388.py -v true -u target_url At

Andy Gill 52 Dec 22, 2022
CVE-2021-26855 SSRF Exchange Server

CVE-2021-26855 Brute Force EMail Exchange Server Timeline: Monday, March 8, 2021: Update Dumping content...(I'm not done, can u guy help me done this

lulz 117 Nov 28, 2022
A Python replicated exploit for Webmin 1.580 /file/show.cgi Remote Code Execution

CVE-2012-2982 John Hammond | September 4th, 2021 Checking searchsploit for Webmin 1.580 I only saw a Metasploit module for the /file/show.cgi Remote C

John Hammond 25 Dec 08, 2022
A high-performance DNS stub resolver for bulk lookups and reconnaissance (subdomain enumeration)

MassDNS A high-performance DNS stub resolver MassDNS is a simple high-performance DNS stub resolver targeting those who seek to resolve a massive amou

B. Blechschmidt 2.5k Jan 07, 2023
LdapRelayScan - Check for LDAP protections regarding the relay of NTLM authentication

LDAP Relay Scan A tool to check Domain Controllers for LDAP server protections r

315 Dec 18, 2022
Log4j2 CVE-2021-44228 revshell

Log4j2-CVE-2021-44228-revshell Usage For reverse shell: $~ python3 Log4j2-revshell.py -M rev -u http://www.victimLog4j.xyz:8080 -l [AttackerIP] -p [At

FaisalFs 16 Mar 24, 2022
Microsoft Exchange Server SSRF漏洞(CVE-2021-26855)

Microsoft_Exchange_Server_SSRF_CVE-2021-26855 zoomeye dork:app:"Microsoft Exchange Server" 使用Seebug工具箱及pocsuite3编写的脚本Microsoft_Exchange_Server_SSRF_CV

conjojo 37 Nov 12, 2022
hackinsta: a program to hack instagram

hackinsta a program to hack instagram Yokoback_(instahack) is the file to open, you need libraries write on import. You run that file in the same fold

1 Dec 04, 2021
Mad Spammer is a python webhook spammer which is very easy and safe to use.

Mad Spammer 👿 Pre-Setup: Open your terminal/console and type: pip install module colorama python MadSpammer.py Setup: After doing that, you should be

1 Nov 26, 2021
ShoLister - a tool that collects all available subdomains for specific hostname or organization from Shodan

ShoLister is a tool that collects all available subdomains for specific hostname or organization from Shodan. The tool is designed to be used from Penetration Tester and Bug Bounty Hunters.

Eslam Akl 45 Dec 28, 2022
PKUAutoElective for 2021 spring semester

PKUAutoElective 2021 Spring Version Update at Mar 7 15:28 (UTC+8): 修改了 get_supplement 的 API 参数,已经可以实现课程列表页面的正常跳转,请更新至最新 commit 版本 本项目基于 PKUAutoElectiv

Zihan Mao 84 Sep 09, 2022