Code for the Population-Based Bandits Algorithm, presented at NeurIPS 2020.

Last update: Nov 16, 2022

Related tags

Overview

Population-Based Bandits (PB2)

Code for the Population-Based Bandits (PB2) Algorithm, from the paper Provably Efficient Online Hyperparameter Optimization with Population-Based Bandits.

The framework is based on a union of ray (using rllib and tune) and GPy. Heavily inspired by the ray tune pbt_ppo example.

NOTE PB2 is included in the ray.tune library, which is the official supported implementation. The link to the code is here, and the accompanying blog post is here.

Running the Code

To run the IMPALA experiment, use command:

python run_impala.py

To run the PPO experiment, use command:

python run_ppo.py

Config

Within that function, there are multiple ways to mix it up. You can choose the following:

-env_name: for example BreakoutNoFrameSkip-v4.
-method: either pb2 or pbt (or asha for PPO).
-freq: the frequency of updating hyperparams, we use 500,000 for IMPALA and 50,000 for PPO.
-seed: we used 0 1 2 3 4 5 6... and plan to add more seeds.
-max: the maximum number of timesteps, we used 10,000,000 for IMPALA and 1,000,000 for PPO.

It should also be possible to adapt this code to run other ray tune schedulers. We used it for ASHA in our PPO experiments. We are also working to include a BOHB baseline.

Please get in touch for all questions. jackph [at] robots [dot] ox [dot] ac [dot] uk

Citing PB2

Finally, if you found this repo useful, please consider citing us:

@inproceedings{NEURIPS2020_c7af0926,
 author = {Parker-Holder, Jack and Nguyen, Vu and Roberts, Stephen J},
 booktitle = {Advances in Neural Information Processing Systems},
 editor = {H. Larochelle and M. Ranzato and R. Hadsell and M. F. Balcan and H. Lin},
 pages = {17200--17211},
 publisher = {Curran Associates, Inc.},
 title = {Provably Efficient Online Hyperparameter Optimization with Population-Based Bandits},
 url = {https://proceedings.neurips.cc/paper/2020/file/c7af0926b294e47e52e46cfebe173f20-Paper.pdf},
 volume = {33},
 year = {2020}
}

Code for the Population-Based Bandits Algorithm, presented at NeurIPS 2020.

Related tags

Overview

Population-Based Bandits (PB2)

Running the Code

Config

Citing PB2

Owner

Jack Parker-Holder

Classification of ecg datas for disease detection

DeiT: Data-efficient Image Transformers

A new video text spotting framework with Transformer

Source code, datasets and trained models for the paper Learning Advanced Mathematical Computations from Examples (ICLR 2021), by François Charton, Amaury Hayat (ENPC-Rutgers) and Guillaume Lample

Code for paper "Learning to Reweight Examples for Robust Deep Learning"

An experiment on the performance of homemade Q-learning AIs in Agar.io depending on their state representation and available actions

[NeurIPS2021] Code Release of Learning Transferable Perturbations

nfelo: a power ranking, prediction, and betting model for the NFL

Code accompanying the paper Shared Independent Component Analysis for Multi-subject Neuroimaging

Personals scripts using ageitgey/face_recognition

利用python脚本实现微信、支付宝账单的合并，并保存到excel文件实现自动记账，可查看可视化图表。

This code is 3d-CNN model that can predict environmental value

Reinforcement learning library in JAX.

Code repo for "FASA: Feature Augmentation and Sampling Adaptation for Long-Tailed Instance Segmentation" (ICCV 2021)

NanoDet-Plus⚡Super fast and lightweight anchor-free object detection model. 🔥Only 980 KB(int8) / 1.8MB (fp16) and run 97FPS on cellphone🔥

A PyTorch Implementation of "Watch Your Step: Learning Node Embeddings via Graph Attention" (NeurIPS 2018).

Code Repository for Liquid Time-Constant Networks (LTCs)

A curated list of resources for Image and Video Deblurring

Spontaneous Facial Micro Expression Recognition using 3D Spatio-Temporal Convolutional Neural Networks

Telegram chatbot created with deep learning model (LSTM) and telebot library.