Official codebase for "B-Pref: Benchmarking Preference-BasedReinforcement Learning" contains scripts to reproduce experiments.

Last update: Dec 20, 2022

Related tags

Deep Learning BPref

Overview

B-Pref

Official codebase for B-Pref: Benchmarking Preference-BasedReinforcement Learning contains scripts to reproduce experiments.

Install

conda env create -f conda_env.yml
pip install -e .[docs,tests,extra]
cd custom_dmcontrol
pip install -e .
cd custom_dmc2gym
pip install -e .
pip install git+https://github.com/rlworkgroup/[email protected]#egg=metaworld
pip install pybullet

Run experiments using GT rewards

SAC & SAC + unsupervised pre-training

Experiments can be reproduced with the following:

./scripts/[env_name]/run_sac.sh 
./scripts/[env_name]/run_sac_unsuper.sh

PPO & PPO + unsupervised pre-training

Experiments can be reproduced with the following:

./scripts/[env_name]/run_ppo.sh 
./scripts/[env_name]/run_ppo_unsuper.sh

Run experiments on irrational teacher

To design more realistic models of human teachers, we consider a common stochastic model and systematically manipulate its terms and operators:

teacher_beta: rationality constant of stochastic preference model (default: -1 for perfectly rational model)
teacher_gamma: discount factor to model myopic behavior (default: 1)
teacher_eps_mistake: probability of making a mistake (default: 0)
teacher_eps_skip: hyperparameters to control skip threshold (\in [0,1])
teacher_eps_equal: hyperparameters to control equal threshold (\in [0,1])

In B-Pref, we tried the following teachers:

Oracle teacher: (teacher_beta=-1, teacher_gamma=1, teacher_eps_mistake=0, teacher_eps_skip=0, teacher_eps_equal=0)

Mistake teacher: (teacher_beta=-1, teacher_gamma=1, teacher_eps_mistake=0.1, teacher_eps_skip=0, teacher_eps_equal=0)

Noisy teacher: (teacher_beta=1, teacher_gamma=1, teacher_eps_mistake=0, teacher_eps_skip=0, teacher_eps_equal=0)

Skip teacher: (teacher_beta=-1, teacher_gamma=1, teacher_eps_mistake=0, teacher_eps_skip=0.1, teacher_eps_equal=0)

Myopic teacher: (teacher_beta=-1, teacher_gamma=0.9, teacher_eps_mistake=0, teacher_eps_skip=0, teacher_eps_equal=0)

Equal teacher: (teacher_beta=-1, teacher_gamma=1, teacher_eps_mistake=0, teacher_eps_skip=0, teacher_eps_equal=0.1)

PEBBLE

Experiments can be reproduced with the following:

./scripts/[env_name]/[teacher_type]/[max_budget]/run_PEBBLE.sh [sampling_scheme: 0=uniform, 1=disagreement, 2=entropy]

PrefPPO

Experiments can be reproduced with the following:

./scripts/[env_name]/[teacher_type]/[max_budget]/run_PrefPPO.sh [sampling_scheme: 0=uniform, 1=disagreement, 2=entropy]

note: full hyper-paramters for meta-world will be updated soon!

Official codebase for "B-Pref: Benchmarking Preference-BasedReinforcement Learning" contains scripts to reproduce experiments.

Related tags

Overview

B-Pref

Install

Run experiments using GT rewards

SAC & SAC + unsupervised pre-training

PPO & PPO + unsupervised pre-training

Run experiments on irrational teacher

PEBBLE

PrefPPO

Owner

Official Implementation of "Transformers Can Do Bayesian Inference"

Exemplo de implementação do padrão circuit breaker em python

Code for "OctField: Hierarchical Implicit Functions for 3D Modeling (NeurIPS 2021)"

The official PyTorch code for 'DER: Dynamically Expandable Representation for Class Incremental Learning' accepted by CVPR2021

Assginment for UofT CSC420: Intro to Image Understanding

(ICCV 2021) ProHMR - Probabilistic Modeling for Human Mesh Recovery

Prefix-Tuning: Optimizing Continuous Prompts for Generation

Some pre-commit hooks for OpenMMLab projects

Vector Neurons: A General Framework for SO(3)-Equivariant Networks

A Shading-Guided Generative Implicit Model for Shape-Accurate 3D-Aware Image Synthesis

Official PyTorch implementation of the ICRA 2021 paper: Adversarial Differentiable Data Augmentation for Autonomous Systems.

Official code repository for A Simple Long-Tailed Rocognition Baseline via Vision-Language Model.

Instance-wise Occlusion and Depth Orders in Natural Scenes (CVPR 2022)

CS583: Deep Learning

AdaFocus V2: End-to-End Training of Spatial Dynamic Networks for Video Recognition

Code for Referring Image Segmentation via Cross-Modal Progressive Comprehension, CVPR2020.

PyTorch code for our paper "Image Super-Resolution with Non-Local Sparse Attention" (CVPR2021).

Python code for the paper How to scale hyperparameters for quickshift image segmentation

Runtime type annotations for the shape, dtype etc. of PyTorch Tensors.

An open-source Kazakh named entity recognition dataset (KazNERD), annotation guidelines, and baseline NER models.