batch-bandits

Implementation of popular bandit algorithms in batch environments.

Source code to our paper "The Impact of Batch Learning in Stochastic Bandits" accepted at the workshop on the Ecological Theory of Reinforcement Learning, NeurIPS 2021.

Overview

The repository provides an opportunuty to run simulations or replay logged datasets in sequential batch manner - sequential interaction with the environment when responses are grouped in batches and observed by the agent only at the end of each batch. Broadly speaking, sequential batch learning is a more generalized way of learning which covers both offline and online settings as special cases bringing together their advantages.

Framework

Two particularly useful versions of the multi-armed bandit problem are implemented: Stochastic Multi-Armed Bandit (MAB) and Contextual Multi-Armed Bandit (CMAB). The key feature of the project is that both versions support parameter batch_size - a certain period of time when the agent interacts with the environment "blindly". Despite the batch setting is a property of the environment, this limitation is considered from a policy perspective. With this, it is assumed that it is not the online agent who works with the batch environment, but the batch policy interacts with the online environment.

The project is built upon RL-GLue framework, which provides an interface to connect agents, environments, and experiment programs. Note, that MAB/rl_glue.py and CMAB/rl_glue.py were adapted to make batch interaction possible.

Implemented algorithms

Version	Algorithm	Comment
MAB	ε - greedy	-
MAB	Thompson Sampling	-
MAB	UCB	-
CMAB	LinTS	see link (and references therein) for more details
CMAB	LinUCB	see article for theoretical description
CMAB	Offline evaluator	policy evaluation technique; see article for theoretical quarantees

Implementation of popular bandit algorithms in batch environments.

Related tags

Overview

batch-bandits

Overview

Framework

Implemented algorithms

Owner

Danil Provodin

Rl-quickstart - Reinforcement Learning Quickstart

This repository is for our EMNLP 2021 paper "Automated Generation of Accurate & Fluent Medical X-ray Reports"

This repository contains project created during the Data Challenge module at London School of Hygiene & Tropical Medicine

Data Consistency for Magnetic Resonance Imaging

Code for One-shot Talking Face Generation from Single-speaker Audio-Visual Correlation Learning (AAAI 2022)

Tensorflow implementation of Semi-supervised Sequence Learning (https://arxiv.org/abs/1511.01432)

Code and model benchmarks for "SEVIR : A Storm Event Imagery Dataset for Deep Learning Applications in Radar and Satellite Meteorology"

Leaderboard, taxonomy, and curated list of few-shot object detection papers.

Pytorch implementation for reproducing StackGAN_v2 results in the paper StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks

Computational modelling of ray propagation through optical elements using the principles of geometric optics (Ray Tracer)

Planar Prior Assisted PatchMatch Multi-View Stereo

This is an official implementation for "Exploiting Temporal Contexts with Strided Transformer for 3D Human Pose Estimation".

🏅 The Most Comprehensive List of Kaggle Solutions and Ideas 🏅

Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm

[CVPR 2021] Forecasting the panoptic segmentation of future video frames

IMBENS: class-imbalanced ensemble learning in Python.

HyperaPy: An automatic hyperparameter optimization framework ⚡🚀

Object Detection and Multi-Object Tracking

Like Dirt-Samples, but cleaned up