Training code and evaluation benchmarks for the "Self-Supervised Policy Adaptation during Deployment" paper.

Last update: Nov 01, 2022

Overview

Self-Supervised Policy Adaptation during Deployment

PyTorch implementation of PAD and evaluation benchmarks from

Self-Supervised Policy Adaptation during Deployment

Nicklas Hansen, Rishabh Jangir, Yu Sun, Guillem Alenyà, Pieter Abbeel, Alexei A. Efros, Lerrel Pinto, Xiaolong Wang

[Paper] [Website]

Citation

If you find our work useful in your research, please consider citing the paper as follows:

@article{hansen2020deployment,
  title={Self-Supervised Policy Adaptation during Deployment},
  author={Nicklas Hansen and Rishabh Jangir and Yu Sun and Guillem Alenyà and Pieter Abbeel and Alexei A. Efros and Lerrel Pinto and Xiaolong Wang},
  year={2020},
  eprint={2007.04309},
  archivePrefix={arXiv},
  primaryClass={cs.LG}
}

Setup

We assume that you have access to a GPU with CUDA >=9.2 support. All dependencies can then be installed with the following commands:

conda env create -f setup/conda.yml
conda activate pad
sh setup/install_envs.sh

Training & Evaluation

We have prepared training and evaluation scripts that can be run by sh scripts/train.sh and sh scripts/eval.sh. Alternatively, you can call the python scripts directly, e.g. for training call

CUDA_VISIBLE_DEVICES=0 python3 src/train.py \
    --domain_name cartpole \
    --task_name swingup \
    --action_repeat 8 \
    --mode train \
    --use_inv \
    --num_shared_layers 8 \
    --seed 0 \
    --work_dir logs/cartpole_swingup/inv/0 \
    --save_model

which should give you an output of the form

| train | E: 1 | S: 1000 | D: 0.8 s | R: 0.0000 | BR: 0.0000 | 
  ALOSS: 0.0000 | CLOSS: 0.0000 | RLOSS: 0.0000

We provide a pre-trained model that can be used for evaluation. To run Policy Adaptation during Deployment, call

CUDA_VISIBLE_DEVICES=0 python3 src/eval.py \
    --domain_name cartpole \
    --task_name swingup \
    --action_repeat 8 \
    --mode color_hard \
    --use_inv \
    --num_shared_layers 8 \
    --seed 0 \
    --work_dir logs/cartpole_swingup/inv/0 \
    --pad_checkpoint 500k

which should give you an output of the form

Evaluating logs/cartpole_swingup/inv/0 for 100 episodes (mode: color_hard)
eval reward: 666

Policy Adaptation during Deployment of logs/cartpole_swingup/inv/0 for 100 episodes (mode: color_hard)
pad reward: 722

Here's a few samples from the training and test environments of our benchmark:

Please refer to the project page and paper for results and experimental details.

Acknowledgements

We want to thank the numerous researchers and engineers involved in work of which this implementation is based on. Our SAC implementation is based on this repository, the original DeepMind Control suite is available here and the gym wrapper for it is available here. Go check them out!

Training code and evaluation benchmarks for the "Self-Supervised Policy Adaptation during Deployment" paper.

Related tags

Overview

Self-Supervised Policy Adaptation during Deployment

Citation

Setup

Training & Evaluation

Acknowledgements

Owner

Nicklas Hansen

KwaiRec: A Fully-observed Dataset for Recommender Systems (Density: Almost 100%)

Implementation of a Transformer that Ponders, using the scheme from the PonderNet paper

[IJCAI-2021] A benchmark of data-free knowledge distillation from paper "Contrastive Model Inversion for Data-Free Knowledge Distillation"

Clinica is a software platform for clinical research studies involving patients with neurological and psychiatric diseases and the acquisition of multimodal data

The Official PyTorch Implementation of "VAEBM: A Symbiosis between Variational Autoencoders and Energy-based Models" (ICLR 2021 spotlight paper)

Official PyTorch implementation for paper Context Matters: Graph-based Self-supervised Representation Learning for Medical Images

Vehicles Counting using YOLOv4 + DeepSORT + Flask + Ngrok

[ICCV 2021] Deep Hough Voting for Robust Global Registration

PyDeepFakeDet is an integrated and scalable tool for Deepfake detection.

Official repository for "Restormer: Efficient Transformer for High-Resolution Image Restoration". SOTA for motion deblurring, image deraining, denoising (Gaussian/real data), and defocus deblurring.

Cross-platform-profile-pic-changer - Script to change profile pictures across multiple platforms

This is a custom made virus code in python, using tkinter module.

SpecAugmentPyTorch - A Pytorch (support batch and channel) implementation of GoogleBrain's SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

Boundary-aware Transformers for Skin Lesion Segmentation

This is the official implementation of VaxNeRF (Voxel-Accelearated NeRF).

Monocular 3D pose estimation. OpenVINO. CPU inference or iGPU (OpenCL) inference.

High-Resolution Image Synthesis with Latent Diffusion Models

Unofficial PyTorch implementation of Fastformer based on paper "Fastformer: Additive Attention Can Be All You Need"."

Code for our ICCV 2021 Paper "OadTR: Online Action Detection with Transformers".

a Lightweight library for sequential learning agents, including reinforcement learning