Official PyTorch code for CVPR 2020 paper "Deep Active Learning for Biased Datasets via Fisher Kernel Self-Supervision"

Overview

Deep Active Learning for Biased Datasets via Fisher Kernel Self-Supervision

https://arxiv.org/abs/2003.00393

Abstract

Active learning (AL) aims to minimize labeling efforts for data-demanding deep neural networks (DNNs) by selecting the most representative data points for annotation. However, currently used methods are ill-equipped to deal with biased data. The main motivation of this paper is to consider a realistic setting for pool-based semi-supervised AL, where the unlabeled collection of train data is biased. We theoretically derive an optimal acquisition function for AL in this setting. It can be formulated as distribution shift minimization between unlabeled train data and weakly-labeled validation dataset. To implement such acquisition function, we propose a low-complexity method for feature density matching using Fisher kernel (FK) self-supervision as well as several novel pseudo-label estimators. Our FK-based method outperforms state-of-the-art methods on MNIST, SVHN, and ImageNet classification while requiring only 1/10th of processing. The conducted experiments show at least 40% drop in labeling efforts for the biased class-imbalanced data compared to existing methods.

BibTex Citation

If you like our paper or code, please cite its CVPR2020 preprint using the following BibTex:

@article{gudovskiy2020al,
  title={Deep Active Learning for Biased Datasets via Fisher Kernel Self-Supervision},
  author={Gudovskiy, Denis and Hodgkinson, Alec and Yamaguchi, Takuya and Tsukizawa, Sotaro},
  journal={arXiv:2003.00393},
  year={2020}
}

Installation

  • Install v1.1+ PyTorch by selecting your environment on the website and running the appropriate command.
  • Clone this repository: code has been tested on Python 3+.
  • Install DALI for ImageNet only: tested on v0.11.0.
  • Optionally install Kornia for MC-based pseudo-label estimation metrics. However, due to strict Python 3.6+ requirement for this lib, by default, we provide our simple rotation function. Use Kornia to experiment with other sampling strategies.

Datasets

Data and temporary files like descriptors, checkpoints and index files are saved into ./local_data/{dataset} folder. For example, MNIST scripts are located in ./mnist and its data is saved into ./local_data/MNIST folder, correspondingly. In order to get statistically significant results, we execute multiple runs of the same configuration with randomized weights and training dataset splits and save results to ./local_data/{dataset}/runN folders. We suggest to check that you have enough space for large-scale datasets.

MNIST, SVHN

Datasets will be automatically downloaded and converted to PyTorch after the first run of AL.

ImageNet

Due to large size, ImageNet has to be manually downloaded and preprocessed using these scripts.

Code Organization

  • Scripts are located in ./{dataset} folder.
  • Main parts of the framework are contained in only few files: "unsup.py", "gen_descr.py", "main_descr.py" as well as execution script "run.py".
  • Dataset loaders are located in ./{dataset}/custom_datasets and DNN models in ./{dataset}/custom_models
  • The "unsup.py" is a script to train initial model by unsupervised pretraining using rotation method and to produce all-random weights initial model.
  • The "gen_descr.py" generates descriptor database files in ./local_data/{dataset}/runN/descr.
  • The "main_descr.py" performs AL feature matching, adds new data to training dataset and retrains model with new augmented data. Its checkpoints are saved into ./local_data/{dataset}/runN/checkpoint.
  • The run.py" can read these checkpoint files and perform AL iteration with retraining.
  • The run_plot.py" generates performance curves that can be found in the paper.
  • To make confusion matrices and t-SNE plots, use extra "visualize_tsne.py" script for MNIST only.
  • VAAL code can be found in ./vaal folder, which is adopted version of official repo.

Running Active Learning Experiments

  • Install minimal required packages from requirements.txt.
  • The command interface for all methods is combined into "run.py" script. It can run multiple algorithms and data configurations.
  • The script parameters may differ depending on the dataset and, hence, it is better to use "python3 run.py --help" command.
  • First, you have to set configuration in cfg = list() according to its format and execute "run.py" script with "--initial" flag to generate initial random and unsupervised pretrained models.
  • Second, the same script should be run without "--initial".
  • Third, after all AL steps are executed, "run_plot.py" should be used to reproduce performance curves.
  • All these steps require basic understanding of the AL terminology.
  • Use the default configurations to reproduce paper results.
  • To speed up or parallelize multiple runs, use --run-start, --run-stop parameters to limit number of runs saved in ./local_data/{dataset}/runN folders. The default setting is 10 runs for MNIST, 5 for SVHN and 1 for ImageNet.
pip3 install -U -r requirements.txt
python3 run.py --gpu 0 --initial # generate initial models
python3 run.py --gpu 0 --unsupervised 0 # AL with the initial all-random parameters model
python3 run.py --gpu 0 --unsupervised 1 # AL with the initial model pretrained using unsupervised rotation method

Reference Results

MNIST

MNIST LeNet test accuracy: (a) no class imbalance, (b) 100x class imbalance, and (c) ablation study of pseudo-labeling and unsupervised pretraining (100x class imbalance). Our method decreases labeling by 40% compared to prior works for biased data.

SVHN and ImageNet

SVHN ResNet-10 test (top) and ImageNet ResNet-18 val (bottom) accuracy: (a,c) no class imbalance and (b,d) with 100x class imbalance.

MNIST Visualizations

Confusion matrix (top) and t-SNE (bottom) of MNIST test data at AL iteration b=3 with 100x class imbalance for: (a) varR with E=1, K=128, (b) R_{z,g}, S=hat{p}(y,z), L=80 (ours), and (c) R_{z,g}, S=y, L=80. Dots and balls represent correspondingly correctly and incorrectly classified images for t-SNE visualizations. The underrepresented classes {5,8,9} have on average 36% accuracy for prior work (a), while our method (b) increases their accuracy to 75%. The ablation configuration (c) shows 89% theoretical limit of our method.

Owner
Denis
Machine and Deep Learning Researcher
Denis
Semantic Segmentation Architectures Implemented in PyTorch

pytorch-semseg Semantic Segmentation Algorithms Implemented in PyTorch This repository aims at mirroring popular semantic segmentation architectures i

Meet Shah 3.3k Dec 29, 2022
NR-GAN: Noise Robust Generative Adversarial Networks

Lexicon Enhanced Chinese Sequence Labeling Using BERT Adapter Code and checkpoints for the ACL2021 paper "Lexicon Enhanced Chinese Sequence Labelling

Takuhiro Kaneko 59 Dec 11, 2022
Implementation of Kalman Filter in Python

Kalman Filter in Python This is a basic example of how Kalman filter works in Python. I do plan on refactoring and expanding this repo in the future.

Enoch Kan 35 Sep 11, 2022
Pytorch implementation of OCNet series and SegFix.

openseg.pytorch News 2021/09/14 MMSegmentation has supported our ISANet and refer to ISANet for more details. 2021/08/13 We have released the implemen

openseg-group 1.1k Dec 23, 2022
Tensorflow port of a full NetVLAD network

netvlad_tf The main intention of this repo is deployment of a full NetVLAD network, which was originally implemented in Matlab, in Python. We provide

Robotics and Perception Group 225 Nov 08, 2022
Automatic differentiation with weighted finite-state transducers.

GTN: Automatic Differentiation with WFSTs Quickstart | Installation | Documentation What is GTN? GTN is a framework for automatic differentiation with

100 Dec 29, 2022
An inofficial PyTorch implementation of PREDATOR based on KPConv.

PREDATOR: Registration of 3D Point Clouds with Low Overlap An inofficial PyTorch implementation of PREDATOR based on KPConv. The code has been tested

ZhuLifa 14 Aug 03, 2022
This repository is the official implementation of the Hybrid Self-Attention NEAT algorithm.

This repository is the official implementation of the Hybrid Self-Attention NEAT algorithm. It contains the code to reproduce the results presented in the original paper: https://arxiv.org/abs/2112.0

Saman Khamesian 6 Dec 13, 2022
Script for getting information in discord

User-info.py Script for getting information in https://discord.com/ Instalação: apt-get update -y apt-get upgrade -y apt-get install git pkg install

Moleey 1 Dec 18, 2021
Tensorflow Repo for "DeepGCNs: Can GCNs Go as Deep as CNNs?"

DeepGCNs: Can GCNs Go as Deep as CNNs? In this work, we present new ways to successfully train very deep GCNs. We borrow concepts from CNNs, mainly re

Guohao Li 612 Nov 15, 2022
YOLTv4 builds upon YOLT and SIMRDWN, and updates these frameworks to use the most performant version of YOLO, YOLOv4

YOLTv4 builds upon YOLT and SIMRDWN, and updates these frameworks to use the most performant version of YOLO, YOLOv4. YOLTv4 is designed to detect objects in aerial or satellite imagery in arbitraril

Adam Van Etten 161 Jan 06, 2023
This repository is an implementation of paper : Improving the Training of Graph Neural Networks with Consistency Regularization

CRGNN Paper : Improving the Training of Graph Neural Networks with Consistency Regularization Environments Implementing environment: GeForce RTX™ 3090

THUDM 28 Dec 09, 2022
PyTorch Implementation of DSB for Score Based Generative Modeling. Experiments managed using Hydra.

Diffusion Schrödinger Bridge with Applications to Score-Based Generative Modeling This repository contains the implementation for the paper Diffusion

James Thornton 50 Jan 03, 2023
Official Pytorch implementation of "CLIPstyler:Image Style Transfer with a Single Text Condition"

CLIPstyler Official Pytorch implementation of "CLIPstyler:Image Style Transfer with a Single Text Condition" Environment Pytorch 1.7.1, Python 3.6 $ c

201 Dec 29, 2022
Source Code of NeurIPS21 paper: Recognizing Vector Graphics without Rasterization

YOLaT-VectorGraphicsRecognition This repository is the official PyTorch implementation of our NeurIPS-2021 paper: Recognizing Vector Graphics without

Microsoft 49 Dec 20, 2022
Includes PyTorch -> Keras model porting code for ConvNeXt family of models with fine-tuning and inference notebooks.

ConvNeXt-TF This repository provides TensorFlow / Keras implementations of different ConvNeXt [1] variants. It also provides the TensorFlow / Keras mo

Sayak Paul 87 Dec 06, 2022
a reimplementation of Holistically-Nested Edge Detection in PyTorch

pytorch-hed This is a personal reimplementation of Holistically-Nested Edge Detection [1] using PyTorch. Should you be making use of this work, please

Simon Niklaus 375 Dec 06, 2022
GB-CosFace: Rethinking Softmax-based Face Recognition from the Perspective of Open Set Classification

GB-CosFace: Rethinking Softmax-based Face Recognition from the Perspective of Open Set Classification This is the official pytorch implementation of t

Alibaba Cloud 5 Nov 14, 2022
PyTorch implementation of Anomaly Transformer: Time Series Anomaly Detection with Association Discrepancy

Anomaly Transformer in PyTorch This is an implementation of Anomaly Transformer: Time Series Anomaly Detection with Association Discrepancy. This pape

spencerbraun 160 Dec 19, 2022
MediaPipe Kullanarak İleri Seviye Bilgisayarla Görü

MediaPipe Kullanarak İleri Seviye Bilgisayarla Görü

Burak Bagatarhan 12 Mar 29, 2022