[NeurIPS 2020] Semi-Supervision (Unlabeled Data) & Self-Supervision Improve Class-Imbalanced / Long-Tailed Learning

Overview

Rethinking the Value of Labels for Improving Class-Imbalanced Learning

This repository contains the implementation code for paper:
Rethinking the Value of Labels for Improving Class-Imbalanced Learning
Yuzhe Yang, and Zhi Xu
34th Conference on Neural Information Processing Systems (NeurIPS), 2020
[Website] [arXiv] [Paper] [Slides] [Video]

If you find this code or idea useful, please consider citing our work:

@inproceedings{yang2020rethinking,
  title={Rethinking the Value of Labels for Improving Class-Imbalanced Learning},
  author={Yang, Yuzhe and Xu, Zhi},
  booktitle={Conference on Neural Information Processing Systems (NeurIPS)},
  year={2020}
}

Overview

In this work, we show theoretically and empirically that, both semi-supervised learning (using unlabeled data) and self-supervised pre-training (first pre-train the model with self-supervision) can substantially improve the performance on imbalanced (long-tailed) datasets, regardless of the imbalanceness on labeled/unlabeled data and the base training techniques.

Semi-Supervised Imbalanced Learning: Using unlabeled data helps to shape clearer class boundaries and results in better class separation, especially for the tail classes. semi

Self-Supervised Imbalanced Learning: Self-supervised pre-training (SSP) helps mitigate the tail classes leakage during testing, which results in better learned boundaries and representations. self

Installation

Prerequisites

Dependencies

  • PyTorch (>= 1.2, tested on 1.4)
  • yaml
  • scikit-learn
  • TensorboardX

Code Overview

Main Files

Main Arguments

  • --dataset: name of chosen long-tailed dataset
  • --imb_factor: imbalance factor (inverse value of imbalance ratio \rho in the paper)
  • --imb_factor_unlabel: imbalance factor for unlabeled data (inverse value of unlabel imbalance ratio \rho_U)
  • --pretrained_model: path to self-supervised pre-trained models
  • --resume: path to resume checkpoint (also for evaluation)

Getting Started

Semi-Supervised Imbalanced Learning

Unlabeled data sourcing

CIFAR-10-LT: CIFAR-10 unlabeled data is prepared following this repo using the 80M TinyImages. In short, a data sourcing model is trained to distinguish CIFAR-10 classes and an "non-CIFAR" class. For each class, images are then ranked based on the prediction confidence, and unlabeled (imbalanced) datasets are constructed accordingly. Use the following link to download the prepared unlabeled data, and place in your data_path:

SVHN-LT: Since its own dataset contains an extra part with 531.1K additional (labeled) samples, they are directly used to simulate the unlabeled dataset.

Note that the class imbalance in unlabeled data is also considered, which is controlled by --imb_factor_unlabel (\rho_U in the paper). See imbalance_cifar.py and imbalance_svhn.py for details.

Semi-supervised learning with pseudo-labeling

To perform pseudo-labeling (self-training), first a base classifier is trained on original imbalanced dataset. With the trained base classifier, pseudo-labels can be generated using

python gen_pseudolabels.py --resume <ckpt-path> --data_dir <data_path> --output_dir <output_path> --output_filename <save_name>

We provide generated pseudo label files for CIFAR-10-LT & SVHN-LT with \rho=50, using base models trained with standard cross-entropy (CE) loss:

To train with unlabeled data, for example, on CIFAR-10-LT with \rho=50 and \rho_U=50

python train_semi.py --dataset cifar10 --imb_factor 0.02 --imb_factor_unlabel 0.02

Self-Supervised Imbalanced Learning

Self-supervised pre-training (SSP)

To perform Rotation SSP on CIFAR-10-LT with \rho=100

python pretrain_rot.py --dataset cifar10 --imb_factor 0.01

To perform MoCo SSP on ImageNet-LT

python pretrain_moco.py --dataset imagenet --data <data_path>

Network training with SSP models

Train on CIFAR-10-LT with \rho=100

python train.py --dataset cifar10 --imb_factor 0.01 --pretrained_model <path_to_ssp_model>

Train on ImageNet-LT / iNaturalist 2018

python -m imagenet_inat.main --cfg <path_to_ssp_config> --model_dir <path_to_ssp_model>

Results and Models

All related data and checkpoints can be found via this link. Individual results and checkpoints are detailed as follows.

Semi-Supervised Imbalanced Learning

CIFAR-10-LT

Model Top-1 Error Download
CE + [email protected] (\rho=50 and \rho_U=1) 16.79 ResNet-32
CE + [email protected] (\rho=50 and \rho_U=25) 16.88 ResNet-32
CE + [email protected] (\rho=50 and \rho_U=50) 18.36 ResNet-32
CE + [email protected] (\rho=50 and \rho_U=100) 19.94 ResNet-32

SVHN-LT

Model Top-1 Error Download
CE + [email protected] (\rho=50 and \rho_U=1) 13.07 ResNet-32
CE + [email protected] (\rho=50 and \rho_U=25) 13.36 ResNet-32
CE + [email protected] (\rho=50 and \rho_U=50) 13.16 ResNet-32
CE + [email protected] (\rho=50 and \rho_U=100) 14.54 ResNet-32

Test a pretrained checkpoint

python train_semi.py --dataset cifar10 --resume <ckpt-path> -e

Self-Supervised Imbalanced Learning

CIFAR-10-LT

  • Self-supervised pre-trained models (Rotation)

    Dataset Setting \rho=100 \rho=50 \rho=10
    Download ResNet-32 ResNet-32 ResNet-32
  • Final models (200 epochs)

    Model \rho Top-1 Error Download
    CE(Uniform) + SSP 10 12.28 ResNet-32
    CE(Uniform) + SSP 50 21.80 ResNet-32
    CE(Uniform) + SSP 100 26.50 ResNet-32
    CE(Balanced) + SSP 10 11.57 ResNet-32
    CE(Balanced) + SSP 50 19.60 ResNet-32
    CE(Balanced) + SSP 100 23.47 ResNet-32

CIFAR-100-LT

  • Self-supervised pre-trained models (Rotation)

    Dataset Setting \rho=100 \rho=50 \rho=10
    Download ResNet-32 ResNet-32 ResNet-32
  • Final models (200 epochs)

    Model \rho Top-1 Error Download
    CE(Uniform) + SSP 10 42.93 ResNet-32
    CE(Uniform) + SSP 50 54.96 ResNet-32
    CE(Uniform) + SSP 100 59.60 ResNet-32
    CE(Balanced) + SSP 10 41.94 ResNet-32
    CE(Balanced) + SSP 50 52.91 ResNet-32
    CE(Balanced) + SSP 100 56.94 ResNet-32

ImageNet-LT

  • Self-supervised pre-trained models (MoCo)
    [ResNet-50]

  • Final models (90 epochs)

    Model Top-1 Error Download
    CE(Uniform) + SSP 54.4 ResNet-50
    CE(Balanced) + SSP 52.4 ResNet-50
    cRT + SSP 48.7 ResNet-50

iNaturalist 2018

  • Self-supervised pre-trained models (MoCo)
    [ResNet-50]

  • Final models (90 epochs)

    Model Top-1 Error Download
    CE(Uniform) + SSP 35.6 ResNet-50
    CE(Balanced) + SSP 34.1 ResNet-50
    cRT + SSP 31.9 ResNet-50

Test a pretrained checkpoint

# test on CIFAR-10 / CIFAR-100
python train.py --dataset cifar10 --resume <ckpt-path> -e

# test on ImageNet-LT / iNaturalist 2018
python -m imagenet_inat.main --cfg <path_to_ssp_config> --model_dir <path_to_model> --test

Acknowledgements

This code is partly based on the open-source implementations from the following sources: OpenLongTailRecognition, classifier-balancing, LDAM-DRW, MoCo, and semisup-adv.

Contact

If you have any questions, feel free to contact us through email ([email protected] & [email protected]) or Github issues. Enjoy!

Owner
Yuzhe Yang
Ph.D. student at MIT CSAIL
Yuzhe Yang
An extremely simple, intuitive, hardware-friendly, and well-performing network structure for LiDAR semantic segmentation on 2D range image. IROS21

FIDNet_SemanticKITTI Motivation Implementing complicated network modules with only one or two points improvement on hardware is tedious. So here we pr

YimingZhao 54 Dec 12, 2022
Python-based Informatics Kit for Analysing Chemical Units

INSTALLATION Python-based Informatics Kit for the Analysis of Chemical Units Step 1: Make a conda environment: conda create -n pikachu python=3.9 cond

47 Dec 23, 2022
Guided Internet-delivered Cognitive Behavioral Therapy Adherence Forecasting

Guided Internet-delivered Cognitive Behavioral Therapy Adherence Forecasting #Dataset The folder "Dataset" contains the dataset use in this work and m

0 Jan 08, 2022
[CVPR 2021] Unsupervised 3D Shape Completion through GAN Inversion

ShapeInversion Paper Junzhe Zhang, Xinyi Chen, Zhongang Cai, Liang Pan, Haiyu Zhao, Shuai Yi, Chai Kiat Yeo, Bo Dai, Chen Change Loy "Unsupervised 3D

100 Dec 22, 2022
这是一个facenet-pytorch的库,可以用于训练自己的人脸识别模型。

Facenet:人脸识别模型在Pytorch当中的实现 目录 性能情况 Performance 所需环境 Environment 注意事项 Attention 文件下载 Download 预测步骤 How2predict 训练步骤 How2train 参考资料 Reference 性能情况 训练数据

Bubbliiiing 210 Jan 06, 2023
PyTorch implementation for the paper Visual Representation Learning with Self-Supervised Attention for Low-Label High-Data Regime

Visual Representation Learning with Self-Supervised Attention for Low-Label High-Data Regime Created by Prarthana Bhattacharyya. Disclaimer: This is n

Prarthana Bhattacharyya 5 Nov 08, 2022
[ECCVW2020] Robust Long-Term Object Tracking via Improved Discriminative Model Prediction (RLT-DiMP)

Feel free to visit my homepage Robust Long-Term Object Tracking via Improved Discriminative Model Prediction (RLT-DIMP) [ECCVW2020 paper] Presentation

Seokeon Choi 35 Oct 26, 2022
Source code for TACL paper "KEPLER: A Unified Model for Knowledge Embedding and Pre-trained Language Representation".

KEPLER: A Unified Model for Knowledge Embedding and Pre-trained Language Representation Source code for TACL 2021 paper KEPLER: A Unified Model for Kn

THU-KEG 138 Dec 22, 2022
Tweesent-back - Tweesent backend uses fastAPI as the web framework

TweeSent Backend Tweesent backend. This repo uses fastAPI as the web framework.

0 Mar 26, 2022
Using pretrained language models for biomedical knowledge graph completion.

LMs for biomedical KG completion This repository contains code to run the experiments described in: Scientific Language Models for Biomedical Knowledg

Rahul Nadkarni 41 Nov 30, 2022
Receptive Field Block Net for Accurate and Fast Object Detection, ECCV 2018

Receptive Field Block Net for Accurate and Fast Object Detection By Songtao Liu, Di Huang, Yunhong Wang Updatas (2021/07/23): YOLOX is here!, stronger

Liu Songtao 1.4k Dec 21, 2022
Object Detection Projekt in GKI WS2021/22

tfObjectDetection Object Detection Projekt with tensorflow in GKI WS2021/22 Docker Container: docker run -it --name --gpus all -v path/to/project:p

Tim Eggers 1 Jul 18, 2022
Code of 3D Shape Variational Autoencoder Latent Disentanglement via Mini-Batch Feature Swapping for Bodies and Faces

3D Shape Variational Autoencoder Latent Disentanglement via Mini-Batch Feature Swapping for Bodies and Faces Installation After cloning the repo open

37 Dec 03, 2022
Repository for self-supervised landmark discovery

self-supervised-landmarks Repository for self-supervised landmark discovery Requirements pytorch pynrrd (for 3d images) Usage The use of this models i

Riddhish Bhalodia 2 Apr 18, 2022
Source code for deep symbolic optimization.

Update July 10, 2021: This repository now supports an additional symbolic optimization task: learning symbolic policies for reinforcement learning. Th

Brenden Petersen 290 Dec 25, 2022
Tree Nested PyTorch Tensor Lib

DI-treetensor treetensor is a generalized tree-based tensor structure mainly developed by OpenDILab Contributors. Almost all the operation can be supp

OpenDILab 167 Dec 29, 2022
A curated list of awesome neural radiance fields papers

Awesome Neural Radiance Fields A curated list of awesome neural radiance fields papers, inspired by awesome-computer-vision. How to submit a pull requ

Yen-Chen Lin 3.9k Dec 27, 2022
A CNN implementation using only numpy. Supports multidimensional images, stride, etc.

A CNN implementation using only numpy. Supports multidimensional images, stride, etc. Speed up due to heavy use of slicing and mathematical simplification..

2 Nov 30, 2021
Transport Mode detection - can detect the mode of transport with the help of features such as acceeration,jerk etc

title emoji colorFrom colorTo sdk app_file pinned Transport_Mode_Detector 🚀 purple yellow gradio app.py false Configuration title: string Display tit

Nishant Rajadhyaksha 3 Jan 16, 2022
Predicting Axillary Lymph Node Metastasis in Early Breast Cancer Using Deep Learning on Primary Tumor Biopsy Slides

Predicting Axillary Lymph Node Metastasis in Early Breast Cancer Using Deep Learning on Primary Tumor Biopsy Slides Project | This repo is the officia

CVSM Group - email: <a href=[email protected]"> 33 Dec 28, 2022