Repo for CReST: A Class-Rebalancing Self-Training Framework for Imbalanced Semi-Supervised Learning

Last update: Nov 01, 2022

Related tags

Overview

CReST in Tensorflow 2

Code for the paper: "CReST: A Class-Rebalancing Self-Training Framework for Imbalanced Semi-Supervised Learning" by Chen Wei, Kihyuk Sohn, Clayton Mellina, Alan Yuille and Fan Yang.

This is not an officially supported Google product.

Install dependencies

sudo apt install python3-dev python3-virtualenv python3-tk imagemagick
virtualenv -p python3 --system-site-packages env3
. env3/bin/activate
pip install -r requirements.txt

The code has been tested on Ubuntu 18.04 with CUDA 10.2.

Environment setting

. env3/bin/activate
export ML_DATA=/path/to/your/data
export ML_DIR=/path/to/your/code
export RESULT=/path/to/your/result
export PYTHONPATH=$PYTHONPATH:$ML_DIR

Datasets

Download or generate the datasets as follows:

CIFAR10 and CIFAR100: Follow the steps to download and generate balanced CIFAR10 and CIFAR100 datasets. Put it under ${ML_DATA}/cifar, for example, ${ML_DATA}/cifar/cifar10-test.tfrecord.
Long-tailed CIFAR10 and CIFAR100: Follow the steps to download the datasets prepared by Cui et al. Put it under ${ML_DATA}/cifar-lt, for example, ${ML_DATA}/cifar-lt/cifar-10-data-im-0.1.

Running experiment on Long-tailed CIFAR10, CIFAR100

Run MixMatch (paper) and FixMatch (paper):

Specify method to run via --method. It can be fixmatch or mixmatch.
Specify dataset via --dataset. It can be cifar10lt or cifar100lt.
Specify the class imbalanced ratio, i.e., the number of training samples from the most minority class over that from the most majority class, via --class_im_ratio.
Specify the percentage of labeled data via --percent_labeled.
Specify the number of generations for self-training via --num_generation.
Specify whether to use distribution alignment via --do_distalign.
Specify the initial distribution alignment temperature via --dalign_t.

Specify how distribution alignment is applied via --how_dalign. It can be constant or adaptive.

python -m train_and_eval_loop \
  --model_dir=/tmp/model \
  --method=fixmatch \
  --dataset=cifar10lt \
  --input_shape=32,32,3 \
  --class_im_ratio=0.01 \
  --percent_labeled=0.1 \
  --fold=1 \
  --num_epoch=64 \
  --num_generation=6 \
  --sched_level=1 \
  --dalign_t=0.5 \
  --how_dalign=adaptive \
  --do_distalign=True

Results

The code reproduces main results of the paper. For all settings and methods, we run experiments on 5 different folds and report the mean and standard deviations. Note that the numbers may not exactly match those from the papers as there are extra randomness coming from the training.

Results on Long-tailed CIFAR10 with 10% labeled data (Table 1 in the paper).

	gamma=50	gamma=100	gamma=200
FixMatch	79.4 (0.98)	66.2 (0.83)	59.9 (0.44)
CReST	83.7 (0.40)	75.4 (1.62)	63.9 (0.67)
CReST+	84.5 (0.41)	77.7 (1.22)	67.5 (1.36)

Training with Multiple GPUs

Simply set CUDA_VISIBLE_DEVICES=0,1,2,3 or any number of GPUs.
Make sure that batch size is divisible by the number of GPUs.

Augmentation

One can concatenate different augmentation shortkeys to compose an augmentation sequence.
- d: default augmentation, resize and shift.
- h: horizontal flip.
- ra: random augment with all augmentation ops.
- rc: random augment with color augmentation ops only.
- rg: random augment with geometric augmentation ops only.
- c: cutout.
- For example, dhrac applies shift, flip, random augment with all ops, followed by cutout.

Citing this work

@article{wei2021crest,
    title={CReST: A Class-Rebalancing Self-Training Framework for Imbalanced Semi-Supervised Learning},
    author={Chen Wei and Kihyuk Sohn and Clayton Mellina and Alan Yuille and Fan Yang},
    journal={arXiv preprint arXiv:2102.09559},
    year={2021},
}

Repo for CReST: A Class-Rebalancing Self-Training Framework for Imbalanced Semi-Supervised Learning

Related tags

Overview

CReST in Tensorflow 2

Install dependencies

Environment setting

Datasets

Running experiment on Long-tailed CIFAR10, CIFAR100

Results

Training with Multiple GPUs

Augmentation

Citing this work

Owner

Google Research

Indoor Panorama Planar 3D Reconstruction via Divide and Conquer

[CVPR 2021] Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion

The final project of "Applying AI to 3D Medical Imaging Data" from "AI for Healthcare" nanodegree - Udacity.

Hitters Linear Regression - Hitters Linear Regression With Python

An index of algorithms for learning causality with data

Resources related to our paper "CLIN-X: pre-trained language models and a study on cross-task transfer for concept extraction in the clinical domain"

[AAAI 2022] Sparse Structure Learning via Graph Neural Networks for Inductive Document Classification

[ICLR'21] FedBN: Federated Learning on Non-IID Features via Local Batch Normalization

Deep Distributed Control of Port-Hamiltonian Systems

The VeriNet toolkit for verification of neural networks

Converts geometry node attributes to built-in attributes

This code is part of the reproducibility package for the SANER 2022 paper "Generating Clarifying Questions for Query Refinement in Source Code Search".

E2VID_ROS - E2VID_ROS: E2VID to a real-time system

Colar: Effective and Efficient Online Action Detection by Consulting Exemplars, CVPR 2022.

Code image classification of MNIST dataset using different architectures: simple linear NN, autoencoder, and highway network

Configure SRX interfaces with Scrapli

Pytorch-Swin-Unet-V2 - a modified version of Swin Unet based on Swin Transfomer V2

[CVPR'21] Locally Aware Piecewise Transformation Fields for 3D Human Mesh Registration

General Virtual Sketching Framework for Vector Line Art (SIGGRAPH 2021)

GradAttack is a Python library for easy evaluation of privacy risks in public gradients in Federated Learning