PyTorch implementation of "A Simple Baseline for Low-Budget Active Learning".

Overview

A Simple Baseline for Low-Budget Active Learning

This repository is the implementation of A Simple Baseline for Low-Budget Active Learning. In this paper, we are interested in low-budget active learning where only a small subset of unlabeled data, e.g. 0.2% of ImageNet, can be annotated. We show that although the state-of-the-art active learning methods work well given a large budget of data labeling, a simple k-means clustering algorithm can outperform them on low budgets. Our code is modified from CompRess [1].

@article{pourahmadi2021simple,
  title={A Simple Baseline for Low-Budget Active Learning},
  author={Pourahmadi, Kossar and Nooralinejad, Parsa and Pirsiavash, Hamed},
  journal={arXiv preprint arXiv:2110.12033},
  year={2021}
}

Benchmarks

We implemented the following query strategies in strategies.py on CIFAR-10, CIFAR-100, ImageNet, and ImageNet-LT datasets:

a) Single-batch k-means: At each round, it clusters the whole dataset to budget size clusters and sends nearest neighbors of centers directly to the oracle to be annotated.

b) Multi-batch k-means: Uses the difference of two consecutive budget sizes as the number of clusters and picks those nearest examples to centers that have not been labeled previously by the oracle.

c) Core-set [2]

d) Max-Entropy [3]: Treats the entropy of example probability distribution output as an uncertainty score and samples uncertain points for annotation.

e) Uniform: Selects equal number of samples randomly from all classes.

f) Random: Samples are selected randomly (uniformly) from the entire dataset.

Requirements

Usage

This implementation supports multi-gpu, DataParallel or single-gpu training.

You have the following options to run commands:

  • --arch We use pre-trained ResNet-18 with CompRess (download weights) or pre-trained ResNet-50 with MoCo-v2 (download weights). Use one of resnet18 or resnet50 as the argument accordingly.
  • --backbone compress, moco
  • --splits You can define budget sizes with comma as a seperator. For instance, --splits 10,20.
  • --name Specify the query strategy name by using one of uniform random kmeans accu_kmeans coreset.
  • --dataset Indicate the unlabeled dataset name by using one of cifar10 cifar100 imagenet imagenet_lt.

Sample selection

If the strategy needs an initial pool (accu_kmeans or coreset) then pass the file path with --resume-indices.

python sampler.py \
--arch resnet18 \
--weights [path to weights] \
--backbone compress \
--batch-size 4 \
--workers 4 \
--splits 100 \
--load_cache \
--name kmeans \
--dataset cifar10 \
[path to dataset file]

Linear classification

python eval_lincls.py \
--arch resnet18 \
--weights [path to weights] \
--backbone compress \
--batch-size 128 \
--workers 4 \
--lr 0.01 \
--lr_schedule 50,75 \
--epochs 100 \
--splits 1000 \  
--load_cache \
--name random \
--dataset imagenet \
[path to dataset file]

Nearest neighbor classification

python eval_knn.py \
--arch resnet18 \
--weights [path to weights] \
--backbone compress \
--batch-size 128 \
--workers 8 \
--splits 1000 \
--load_cache \
--name random \
--dataset cifar10 \
[path to dataset file]

Entropy sampling

To sample data using Max-Entropy, use active_sampler.py and entropy for --name. Give the initial pool indices file path with --resume-indices.

python active_sampler.py \
--arch resnet18 \
--weights [path to weights] \
--backbone compress \
--batch-size 128 \
--workers 4 \
--lr 0.001 \
--lr_schedule 50,75 \
--epochs 100 \
--splits 2000 \
--load_cache \
--name entropy \
--resume-indices [path to random initial pool file] \
--dataset imagenet \
[path to dataset file]

Fine-tuning

This file is implemented only for CompRess ResNet-18 backbone on ImageNet. --lr is the learning rate of backbone and --lr-lin is for the linear classifier.

python finetune.py \
--arch resnet18 \
--weights [path to weights] \
--batch-size 128 \
--workers 16 \
--epochs 100 \
--lr_schedule 50,75 \
--lr 0.0001 \
--lr-lin 0.01 \
--splits 1000 \
--name kmeans \
--dataset imagenet \
[path to dataset file]

Training from scratch

Starting from a random initialized network, you can train the model on CIFAR-100 or ImageNet.

python trainer_DP.py \
--arch resnet18 \
--batch-size 128 \
--workers 4 \
--epochs 100 \
--lr 0.1 \
--lr_schedule 30,60,90 \
--splits 1000 \
--name kmeans \
--dataset imagenet \
[path to dataset file]

References

[1] CompRess: Self-Supervised Learning by Compressing Representations, NeurIPS, 2020

[2] Active Learning for Convolutional Neural Networks: A Core-Set Approach, ICLR, 2018

[3] A new active labeling method for deep learning, IJCNN, 2014

Implementation for the IJCAI2021 work "Beyond the Spectrum: Detecting Deepfakes via Re-synthesis"

Beyond the Spectrum Implementation for the IJCAI2021 work "Beyond the Spectrum: Detecting Deepfakes via Re-synthesis" by Yang He, Ning Yu, Margret Keu

Yang He 27 Jan 07, 2023
PyTorch implementation of NIPS 2017 paper Dynamic Routing Between Capsules

Dynamic Routing Between Capsules - PyTorch implementation PyTorch implementation of NIPS 2017 paper Dynamic Routing Between Capsules from Sara Sabour,

Adam Bielski 475 Dec 24, 2022
Custom IMDB Dataset is extracted between 2020-2021 and custom distilBERT model is trained for movie success probability prediction

IMDB Success Predictor Project involves Web Scraping custom IMDB data between 2020 and 2021 of 10000 movies and shows sorted by number of votes ,fine

Gautam Diwan 1 Jan 18, 2022
Open-World Entity Segmentation

Open-World Entity Segmentation Project Website Lu Qi*, Jason Kuen*, Yi Wang, Jiuxiang Gu, Hengshuang Zhao, Zhe Lin, Philip Torr, Jiaya Jia This projec

DV Lab 410 Jan 03, 2023
shufflev2-yolov5:lighter, faster and easier to deploy

shufflev2-yolov5: lighter, faster and easier to deploy. Evolved from yolov5 and the size of model is only 1.7M (int8) and 3.3M (fp16). It can reach 10+ FPS on the Raspberry Pi 4B when the input size

pogg 1.5k Jan 05, 2023
Open AI's Python library

OpenAI Python Library The OpenAI Python library provides convenient access to the OpenAI API from applications written in the Python language. It incl

Pavan Ananth Sharma 3 Jul 10, 2022
Pytorch code for "State-only Imitation with Transition Dynamics Mismatch" (ICLR 2020)

This repo contains code for our paper State-only Imitation with Transition Dynamics Mismatch published at ICLR 2020. The code heavily uses the RL mach

20 Sep 08, 2022
A self-supervised 3D representation learning framework named viewpoint bottleneck.

Pointly-supervised 3D Scene Parsing with Viewpoint Bottleneck Paper Created by Liyi Luo, Beiwen Tian, Hao Zhao and Guyue Zhou from Institute for AI In

63 Aug 11, 2022
Official Implementation for HyperStyle: StyleGAN Inversion with HyperNetworks for Real Image Editing

HyperStyle: StyleGAN Inversion with HyperNetworks for Real Image Editing Yuval Alaluf*, Omer Tov*, Ron Mokady, Rinon Gal, Amit H. Bermano *Denotes equ

885 Jan 06, 2023
Rl-quickstart - Reinforcement Learning Quickstart

Reinforcement Learning Quickstart To get setup with the repository, git clone ht

UCLA DataRes 3 Jun 16, 2022
Chinese Mandarin tts text-to-speech 中文 (普通话) 语音 合成 , by fastspeech 2 , implemented in pytorch, using waveglow as vocoder,

Chinese mandarin text to speech based on Fastspeech2 and Unet This is a modification and adpation of fastspeech2 to mandrin(普通话). Many modifications t

291 Jan 02, 2023
[CVPR 2021] Rethinking Text Segmentation: A Novel Dataset and A Text-Specific Refinement Approach

Rethinking Text Segmentation: A Novel Dataset and A Text-Specific Refinement Approach This is the repo to host the dataset TextSeg and code for TexRNe

SHI Lab 174 Dec 19, 2022
Minimisation of a negative log likelihood fit to extract the lifetime of the D^0 meson (MNLL2ELDM)

Minimisation of a negative log likelihood fit to extract the lifetime of the D^0 meson (MNLL2ELDM) Introduction The average lifetime of the $D^{0}$ me

Son Gyo Jung 1 Dec 17, 2021
Implementation of the algorithm shown in the article "Modelo de Predicción de Éxito de Canciones Basado en Descriptores de Audio"

Success Predictor Implementation of the algorithm shown in the article "Modelo de Predicción de Éxito de Canciones Basado en Descriptores de Audio". B

Rodrigo Nazar Meier 4 Mar 17, 2022
Pytorch implementation for "Open Compound Domain Adaptation" (CVPR 2020 ORAL)

Open Compound Domain Adaptation [Project] [Paper] [Demo] [Blog] Overview Open Compound Domain Adaptation (OCDA) is the author's re-implementation of t

Zhongqi Miao 137 Dec 15, 2022
Finding an Unsupervised Image Segmenter in each of your Deep Generative Models

Finding an Unsupervised Image Segmenter in each of your Deep Generative Models Description Recent research has shown that numerous human-interpretable

Luke Melas-Kyriazi 61 Oct 17, 2022
Deep Learning tutorials in jupyter notebooks.

DeepSchool.io Sign up here for Udemy Course on Machine Learning (Use code DEEPSCHOOL-MARCH to get 85% off course). Goals Make Deep Learning easier (mi

Sachin Abeywardana 1.8k Dec 28, 2022
traiNNer is an open source image and video restoration (super-resolution, denoising, deblurring and others) and image to image translation toolbox based on PyTorch.

traiNNer traiNNer is an open source image and video restoration (super-resolution, denoising, deblurring and others) and image to image translation to

202 Jan 04, 2023
The codes and related files to reproduce the results for Image Similarity Challenge Track 2.

ISC-Track2-Submission The codes and related files to reproduce the results for Image Similarity Challenge Track 2. Required dependencies To begin with

Wenhao Wang 89 Jan 02, 2023
YOLOPのPythonでのONNX推論サンプル

YOLOP-ONNX-Video-Inference-Sample YOLOPのPythonでのONNX推論サンプルです。 ONNXモデルは、hustvl/YOLOP/weights を使用しています。 Requirement OpenCV 3.4.2 or later onnxruntime 1.

KazuhitoTakahashi 8 Sep 05, 2022