Pytorch implementation of TailCalibX : Feature Generation for Long-tail Classification

Overview

TailCalibX : Feature Generation for Long-tail Classification

by Rahul Vigneswaran, Marc T. Law, Vineeth N. Balasubramanian, Makarand Tapaswi

[arXiv] [Code] [pip Package] [Video] TailCalibX methodology

Table of contents

๐Ÿฃ Easy Usage (Recommended way to use our method)

โš  Caution: TailCalibX is just TailCalib employed multiple times. Specifically, we generate a set of features once every epoch and use them to train the classifier. In order to mimic that, three things must be done at every epoch in the following order:

  1. Collect all the features from your dataloader.
  2. Use the tailcalib package to make the features balanced by generating samples.
  3. Train the classifier.
  4. Repeat.

๐Ÿ’ป Installation

Use the package manager pip to install tailcalib.

pip install tailcalib

๐Ÿ‘จโ€๐Ÿ’ป Example Code

Check the instruction here for a much more detailed python package information.

# Import
from tailcalib import tailcalib

# Initialize
a = tailcalib(base_engine="numpy")   # Options: "numpy", "pytorch"

# Imbalanced random fake data
import numpy as np
X = np.random.rand(200,100)
y = np.random.randint(0,10, (200,))

# Balancing the data using "tailcalib"
feat, lab, gen = a.generate(X=X, y=y)

# Output comparison
print(f"Before: {np.unique(y, return_counts=True)}")
print(f"After: {np.unique(lab, return_counts=True)}")

๐Ÿงช Advanced Usage

โœ” Things to do before you run the code from this repo

  • Change the data_root for your dataset in main.py.
  • If you are using wandb logging (Weights & Biases), make sure to change the wandb.init in main.py accordingly.

๐Ÿ“€ How to use?

  • For just the methods proposed in this paper :
    • For CIFAR100-LT: run_TailCalibX_CIFAR100-LT.sh
    • For mini-ImageNet-LT : run_TailCalibX_mini-ImageNet-LT.sh
  • For all the results show in the paper :
    • For CIFAR100-LT: run_all_CIFAR100-LT.sh
    • For mini-ImageNet-LT : run_all_mini-ImageNet-LT.sh

๐Ÿ“š How to create the mini-ImageNet-LT dataset?

Check Notebooks/Create_mini-ImageNet-LT.ipynb for the script that generates the mini-ImageNet-LT dataset with varying imbalance ratios and train-test-val splits.

โš™ Arguments

  • --seed : Select seed for fixing it.

    • Default : 1
  • --gpu : Select the GPUs to be used.

    • Default : "0,1,2,3"
  • --experiment: Experiment number (Check 'libs/utils/experiment_maker.py').

    • Default : 0.1
  • --dataset : Dataset number.

    • Choices : 0 - CIFAR100, 1 - mini-imagenet
    • Default : 0
  • --imbalance : Select Imbalance factor.

    • Choices : 0: 1, 1: 100, 2: 50, 3: 10
    • Default : 1
  • --type_of_val : Choose which dataset split to use.

    • Choices: "vt": val_from_test, "vtr": val_from_train, "vit": val_is_test
    • Default : "vit"
  • --cv1 to --cv9 : Custom variable to use in experiments - purpose changes according to the experiment.

    • Default : "1"
  • --train : Run training sequence

    • Default : False
  • --generate : Run generation sequence

    • Default : False
  • --retraining : Run retraining sequence

    • Default : False
  • --resume : Will resume from the 'latest_model_checkpoint.pth' and wandb if applicable.

    • Default : False
  • --save_features : Collect feature representations.

    • Default : False
  • --save_features_phase : Dataset split of representations to collect.

    • Choices : "train", "val", "test"
    • Default : "train"
  • --config : If you have a yaml file with appropriate config, provide the path here. Will override the 'experiment_maker'.

    • Default : None

๐Ÿ‹๏ธโ€โ™‚๏ธ Trained weights

Experiment CIFAR100-LT (ResNet32, seed 1, Imb 100) mini-ImageNet-LT (ResNeXt50)
TailCalib Git-LFS Git-LFS
TailCalibX Git-LFS Git-LFS
CBD + TailCalibX Git-LFS Git-LFS

๐Ÿช€ Results on a Toy Dataset

Open In Colab

The higher the Imb ratio, the more imbalanced the dataset is. Imb ratio = maximum_sample_count / minimum_sample_count.

Check this notebook to play with the toy example from which the plot below was generated.

๐ŸŒด Directory Tree

TailCalibX
โ”œโ”€โ”€ libs
โ”‚   โ”œโ”€โ”€ core
โ”‚   โ”‚   โ”œโ”€โ”€ ce.py
โ”‚   โ”‚   โ”œโ”€โ”€ core_base.py
โ”‚   โ”‚   โ”œโ”€โ”€ ecbd.py
โ”‚   โ”‚   โ”œโ”€โ”€ modals.py
โ”‚   โ”‚   โ”œโ”€โ”€ TailCalib.py
โ”‚   โ”‚   โ””โ”€โ”€ TailCalibX.py
โ”‚   โ”œโ”€โ”€ data
โ”‚   โ”‚   โ”œโ”€โ”€ dataloader.py
โ”‚   โ”‚   โ”œโ”€โ”€ ImbalanceCIFAR.py
โ”‚   โ”‚   โ””โ”€โ”€ mini-imagenet
โ”‚   โ”‚       โ”œโ”€โ”€ 0.01_test.txt
โ”‚   โ”‚       โ”œโ”€โ”€ 0.01_train.txt
โ”‚   โ”‚       โ””โ”€โ”€ 0.01_val.txt
โ”‚   โ”œโ”€โ”€ loss
โ”‚   โ”‚   โ”œโ”€โ”€ CosineDistill.py
โ”‚   โ”‚   โ””โ”€โ”€ SoftmaxLoss.py
โ”‚   โ”œโ”€โ”€ models
โ”‚   โ”‚   โ”œโ”€โ”€ CosineDotProductClassifier.py
โ”‚   โ”‚   โ”œโ”€โ”€ DotProductClassifier.py
โ”‚   โ”‚   โ”œโ”€โ”€ ecbd_converter.py
โ”‚   โ”‚   โ”œโ”€โ”€ ResNet32Feature.py
โ”‚   โ”‚   โ”œโ”€โ”€ ResNext50Feature.py
โ”‚   โ”‚   โ””โ”€โ”€ ResNextFeature.py
โ”‚   โ”œโ”€โ”€ samplers
โ”‚   โ”‚   โ””โ”€โ”€ ClassAwareSampler.py
โ”‚   โ””โ”€โ”€ utils
โ”‚       โ”œโ”€โ”€ Default_config.yaml
โ”‚       โ”œโ”€โ”€ experiments_maker.py
โ”‚       โ”œโ”€โ”€ globals.py
โ”‚       โ”œโ”€โ”€ logger.py
โ”‚       โ””โ”€โ”€ utils.py
โ”œโ”€โ”€ LICENSE
โ”œโ”€โ”€ main.py
โ”œโ”€โ”€ Notebooks
โ”‚   โ”œโ”€โ”€ Create_mini-ImageNet-LT.ipynb
โ”‚   โ””โ”€โ”€ toy_example.ipynb
โ”œโ”€โ”€ readme_assets
โ”‚   โ”œโ”€โ”€ method.svg
โ”‚   โ””โ”€โ”€ toy_example_output.svg
โ”œโ”€โ”€ README.md
โ”œโ”€โ”€ run_all_CIFAR100-LT.sh
โ”œโ”€โ”€ run_all_mini-ImageNet-LT.sh
โ”œโ”€โ”€ run_TailCalibX_CIFAR100-LT.sh
โ””โ”€โ”€ run_TailCalibX_mini-imagenet-LT.sh

Ignored tailcalib_pip as it is for the tailcalib pip package.

๐Ÿ“ƒ Citation

@inproceedings{rahul2021tailcalibX,
    title   = {{Feature Generation for Long-tail Classification}},
    author  = {Rahul Vigneswaran and Marc T. Law and Vineeth N. Balasubramanian and Makarand Tapaswi},
    booktitle = {ICVGIP},
    year = {2021}
}

๐Ÿ‘ Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

โค About me

Rahul Vigneswaran

โœจ Extras

๐Ÿ Long-tail buzz : If you are interested in deep learning research which involves long-tailed / imbalanced dataset, take a look at Long-tail buzz to learn about the recent trending papers in this field.

๐Ÿ“ License

MIT

Owner
Rahul Vigneswaran
Rahul Vigneswaran
Share a benchmark that can easily apply reinforcement learning in Job-shop-scheduling

Gymjsp Gymjsp is an open source Python library, which uses the OpenAI Gym interface for easily instantiating and interacting with RL environments, and

134 Dec 08, 2022
Official repository of the paper 'Essentials for Class Incremental Learning'

Essentials for Class Incremental Learning Official repository of the paper 'Essentials for Class Incremental Learning' This Pytorch repository contain

33 Nov 27, 2022
This's an implementation of deepmind Visual Interaction Networks paper using pytorch

Visual-Interaction-Networks An implementation of Deepmind visual interaction networks in Pytorch. Introduction For the purpose of understanding the ch

Mahmoud Gamal Salem 166 Dec 06, 2022
State of the Art Neural Networks for Generative Deep Learning

pyradox-generative State of the Art Neural Networks for Generative Deep Learning Table of Contents pyradox-generative Table of Contents Installation U

Ritvik Rastogi 8 Sep 29, 2022
Books, Presentations, Workshops, Notebook Labs, and Model Zoo for Software Engineers and Data Scientists wanting to learn the TF.Keras Machine Learning framework

Books, Presentations, Workshops, Notebook Labs, and Model Zoo for Software Engineers and Data Scientists wanting to learn the TF.Keras Machine Learning framework

Google Cloud Platform 792 Dec 28, 2022
PyTorch implementation for SDEdit: Image Synthesis and Editing with Stochastic Differential Equations

SDEdit: Image Synthesis and Editing with Stochastic Differential Equations Project | Paper | Colab PyTorch implementation of SDEdit: Image Synthesis a

536 Jan 05, 2023
Performance Analysis of Multi-user NOMA Wireless-Powered mMTC Networks: A Stochastic Geometry Approach

Performance Analysis of Multi-user NOMA Wireless-Powered mMTC Networks: A Stochastic Geometry Approach Thanh Luan Nguyen, Tri Nhu Do, Georges Kaddoum

Thanh Luan Nguyen 2 Oct 10, 2022
Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data

Real-ESRGAN Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data Ported from https://github.com/xinntao/Real-ESRGAN Depend

Holy Wu 44 Dec 27, 2022
Code for our ALiBi method for transformer language models.

Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation This repository contains the code and models for our paper Tra

Ofir Press 211 Dec 31, 2022
A collection of papers about Transformer in the field of medical image analysis.

A collection of papers about Transformer in the field of medical image analysis.

Junyu Chen 377 Jan 05, 2023
OMAMO: orthology-based model organism selection

OMAMO: orthology-based model organism selection OMAMO is a tool that suggests the best model organism to study a biological process based on orthologo

Dessimoz Lab 5 Apr 22, 2022
Final project for machine learning (CSC 590). Detection of hepatitis C and progression through blood samples.

Hepatitis C Blood Based Detection Final project for machine learning (CSC 590). Dataset from Kaggle. Using data from previous hepatitis C blood panels

Jennefer Maldonado 1 Dec 28, 2021
PyTorch implementation of VAGAN: Visual Feature Attribution Using Wasserstein GANs

Prototypical Networks for Few shot Learning in PyTorch Simple alternative Implementation of Prototypical Networks for Few Shot Learning (paper, code)

Orobix 93 Aug 17, 2022
StyleGAN-Human: A Data-Centric Odyssey of Human Generation

StyleGAN-Human: A Data-Centric Odyssey of Human Generation Abstract: Unconditional human image generation is an important task in vision and graphics,

stylegan-human 762 Jan 08, 2023
Deep deconfounded recommender (Deep-Deconf) for paper "Deep causal reasoning for recommendations"

Deep Causal Reasoning for Recommender Systems The codes are associated with the following paper: Deep Causal Reasoning for Recommendations, Yaochen Zh

Yaochen Zhu 22 Oct 15, 2022
[ICCV 2021] Amplitude-Phase Recombination: Rethinking Robustness of Convolutional Neural Networks in Frequency Domain

Amplitude-Phase Recombination (ICCV'21) Official PyTorch implementation of "Amplitude-Phase Recombination: Rethinking Robustness of Convolutional Neur

Guangyao Chen 53 Oct 05, 2022
A tiny, friendly, strong baseline code for Person-reID (based on pytorch).

Pytorch ReID Strong, Small, Friendly A tiny, friendly, strong baseline code for Person-reID (based on pytorch). Strong. It is consistent with the new

Zhedong Zheng 3.5k Jan 08, 2023
PyTorch - Python + Nim

Master Release Pytorch - Py + Nim A Nim frontend for pytorch, aiming to be mostly auto-generated and internally using ATen. Because Nim compiles to C+

Giovanni Petrantoni 425 Dec 22, 2022
PyTorchMemTracer - Depict GPU memory footprint during DNN training of PyTorch

A Memory Tracer For PyTorch OOM is a nightmare for PyTorch users. However, most

Jiarui Fang 9 Nov 14, 2022
Cross-Task Consistency Learning Framework for Multi-Task Learning

Cross-Task Consistency Learning Framework for Multi-Task Learning Tested on numpy(v1.19.1) opencv-python(v4.4.0.42) torch(v1.7.0) torchvision(v0.8.0)

Aki Nakano 2 Jan 08, 2022