imbalanced-DL: Deep Imbalanced Learning in Python

Overview

imbalanced-DL: Deep Imbalanced Learning in Python

Overview

imbalanced-DL (imported as imbalanceddl) is a Python package designed to make deep imbalanced learning easier for researchers and real-world users. From our experiences, we observe that to tackcle deep imbalanced learning, there is a need for a strategy. That is, we may not just address this problem with one single model or approach. Thus in this package, we seek to provide several strategies for deep imbalanced learning. The package not only implements several popular deep imbalanced learning strategies, but also provides benchmark results on several image classification tasks. Futhermore, this package provides an interface for implementing more datasets and strategies.

Strategy

We provide some baseline strategies as well as some state-of-the-are strategies in this package as the following:

Environments

  • This package is tested on Linux OS.
  • You are suggested to use a different virtual environment so as to avoid package dependency issue.
  • For Pyenv & Virtualenv users, you can follow the below steps to create a new virtual environment or you can also skip this step.
Pyenv & Virtualenv (Optinal)
  • For dependency isolation, it's better to create another virtual environment for usage.
  • The following will be the demo for creating and managing virtual environment.
  • Install pyenv & virtualenv first.
  • pyenv virtualenv [version] [virtualenv_name]
    • For example, if you'd like to use python 3.6.8, you can do: pyenv virtualenv 3.6.8 TestEnv
  • mkdir [dir_name]
  • cd [dir_name]
  • pyenv local [virtualenv_name]
  • Then, you will have a new (clean) python virtual environment for the package installation.

Installation

Basic Requirement

  • Python >= 3.6
git clone https://github.com/ntucllab/imbalanced-DL.git
cd imbalanceddl
python -m pip install -r requirements.txt
python setup.py install

Usage

We highlight three key features of imbalanced-DL as the following:

(0) Imbalanced Dataset:

  • We support 5 benchmark image datasets for deep imbalanced learing.
  • To create and ImbalancedDataset object, you will need to provide a config_file as well as the dataset name you would like to use.
  • Specifically, inside the config_file, you will need to specify three key parameters for creating imbalanced dataset.
    • imb_type: you can choose from exp (long-tailed imbalance) or step imbalanced type.
    • imb_ratio: you can specify the imbalanceness of your data, typically researchers choose 0.1 or 0.01.
    • dataset_name: you can specify 5 benchmark image datasets we provide, or you can implement your own dataset.
    • For an example of the config_file, you can see example/config.
  • To contruct your own dataset, you should inherit from BaseDataset, and you can follow torchvision.datasets.ImageFolder to construct your dataset in PyTorch format.
from imbalanceddl.dataset.imbalance_dataset import ImbalancedDataset

# specify the dataset name
imbalance_dataset = ImbalancedDataset(config, dataset_name=config.dataset)

(1) Strategy Trainer:

  • We support 6 different strategies for deep imbalance learning, and you can either choose to train from scratch, or evaluate with the best model after training. To evaluate with the best model, you can get more in-depth metrics such as per class accuracy for further evaluation on the performance of the selected strategy. We provide one trained model in example/checkpoint_cifar10.
  • For each strategy trainer, it is associated with a config_file, ImbalancedDataset object, model, and strategy_name.
  • Specifically, the config_file will provide some training parameters, where the default settings for reproducing benchmark result can be found in example/config. You can also set these training parameters based on your own need.
  • For model, we currently provide resnet32 and resnet18 for reproducing the benchmark results.
  • We provide a build_trainer() function to return the specified trainer as the following.
from imbalanceddl.strategy.build_trainer import build_trainer

# specify the strategy
trainer = build_trainer(config,
                        imbalance_dataset,
                        model=model,
                        strategy=config.strategy)
# train from scratch
trainer.do_train_val()

# Evaluate with best model
trainer.eval_best_model()
  • Or you can also just select the specific strategy you would like to use as:
from imbalanceddl.strategy import LDAMDRWTrainer

# pick the trainer
trainer = LDAMDRWTrainer(config,
                         imbalance_dataset,
                         model=model,
                         strategy=config.strategy)

# train from scratch
trainer.do_train_val()

# Evaluate with best model
trainer.eval_best_model()
  • To construct your own strategy trainer, you need to inherit from Trainer class, where in your own strategy you will have to implement get_criterion() and train_one_epoch() method. After this you can choose whether to add your strategy to build_trainer() function or you can just use it as the above demonstration.

(2) Benchmark research environment:

  • To conduct deep imbalanced learning research, we provide example codes for training with different strategies, and provide benchmark results on five image datasets. To quickly start training CIFAR-10 with ERM strategy, you can do:
cd example
python main.py --gpu 0 --seed 1126 --c config/config_cifar10.yaml --strategy ERM

  • Following the example code, you can not only get results from baseline training as well as state-of-the-art performance such as LDAM or Remix, but also use this environment to develop your own algorithm / strategy. Feel free to add your own strategy into this package.
  • For more information about example and usage, please see the Example README

Benchmark Results

We provide benchmark results on 5 image datasets, including CIFAR-10, CIFAR-100, CINIC-10, SVHN, and Tiny-ImageNet. We follow standard procedure to generate imbalanced training dataset for these 5 datasets, and provide their top 1 validation accuracy results for research benchmark. For example, below you can see the result table of Long-tailed Imbalanced CIFAR-10 trained on different strategies. For more detailed benchmark results, please see example/README.md.

  • Long-tailed Imbalanced CIFAR-10
imb_type imb_factor Model Strategy Validation Top 1
long-tailed 100 ResNet32 ERM 71.23
long-tailed 100 ResNet32 DRW 75.08
long-tailed 100 ResNet32 LDAM-DRW 77.75
long-tailed 100 ResNet32 Mixup-DRW 82.11
long-tailed 100 ResNet32 Remix-DRW 81.82

Test

  • python -m unittest -v

Contact

If you have any question, please don't hesitate to email [email protected]. Thanks !

Acknowledgement

The authors thank members of the Computational Learning Lab at National Taiwan University for valuable discussions and various contributions to making this package better.

Owner
NTUCSIE CLLab
Computational Learning Lab, Dept. of Computer Science and Information Engineering, National Taiwan University
NTUCSIE CLLab
This repository contains several image-to-image translation models, whcih were tested for RGB to NIR image generation. The models are Pix2Pix, Pix2PixHD, CycleGAN and PointWise.

RGB2NIR_Experimental This repository contains several image-to-image translation models, whcih were tested for RGB to NIR image generation. The models

5 Jan 04, 2023
TensorFlow implementation of AlexNet and its training and testing on ImageNet ILSVRC 2012 dataset

AlexNet training on ImageNet LSVRC 2012 This repository contains an implementation of AlexNet convolutional neural network and its training and testin

Matteo Dunnhofer 161 Nov 25, 2022
pip install python-office

🍬 python for office 👉 http://www.python4office.cn/ 👈 🌎 English Documentation 📚 简介 Python-office 是一个 Python 自动化办公第三方库,能解决大部分自动化办公的问题。而且每个功能只需一行代码,

程序员晚枫 272 Dec 29, 2022
Trans-Encoder: Unsupervised sentence-pair modelling through self- and mutual-distillations

Trans-Encoder: Unsupervised sentence-pair modelling through self- and mutual-distillations Code repo for paper Trans-Encoder: Unsupervised sentence-pa

Amazon 101 Dec 29, 2022
Python KNN model: Predicting a probability of getting a work visa. Tableau: Non-immigrant visas over the years.

The value of international students to the United States. Probability of getting a non-immigrant visa. Project timeline: Jan 2021 - April 2021 Project

Zinaida Dvoskina 2 Nov 21, 2021
Image Fusion Transformer

Image-Fusion-Transformer Platform Python 3.7 Pytorch =1.0 Training Dataset MS-COCO 2014 (T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ram

Vibashan VS 68 Dec 23, 2022
Implementation of BI-RADS-BERT & The Advantages of Section Tokenization.

BI-RADS BERT Implementation of BI-RADS-BERT & The Advantages of Section Tokenization. This implementation could be used on other radiology in house co

1 May 17, 2022
Github Traffic Insights as Prometheus metrics.

github-traffic Github Traffic collects your repository's traffic data and exposes it as Prometheus metrics. Grafana dashboard that displays the metric

Grafana Labs 34 Oct 27, 2022
[SDM 2022] Towards Similarity-Aware Time-Series Classification

SimTSC This is the PyTorch implementation of SDM2022 paper Towards Similarity-Aware Time-Series Classification. We propose Similarity-Aware Time-Serie

Daochen Zha 49 Dec 27, 2022
Code for the paper Progressive Pose Attention for Person Image Generation in CVPR19 (Oral).

Pose-Transfer Code for the paper Progressive Pose Attention for Person Image Generation in CVPR19(Oral). The paper is available here. Video generation

Tengteng Huang 679 Jan 04, 2023
Code for "Discovering Non-monotonic Autoregressive Orderings with Variational Inference" (paper and code updated from ICLR 2021)

Discovering Non-monotonic Autoregressive Orderings with Variational Inference Description This package contains the source code implementation of the

Xuanlin (Simon) Li 10 Dec 29, 2022
TaCL: Improving BERT Pre-training with Token-aware Contrastive Learning

TaCL: Improving BERT Pre-training with Token-aware Contrastive Learning Authors: Yixuan Su, Fangyu Liu, Zaiqiao Meng, Lei Shu, Ehsan Shareghi, and Nig

Yixuan Su 79 Nov 04, 2022
Simple torch.nn.module implementation of Alias-Free-GAN style filter and resample

Alias-Free-Torch Simple torch module implementation of Alias-Free GAN. This repository including Alias-Free GAN style lowpass sinc filter @filter.py A

이준혁(Junhyeok Lee) 64 Dec 22, 2022
Mini-hmc-jax - A simple implementation of Hamiltonian Monte Carlo in JAX

mini-hmc-jax This is a simple implementation of Hamiltonian Monte Carlo in JAX t

Martin Marek 6 Mar 03, 2022
Official Implement of CVPR 2021 paper “Cross-Modal Collaborative Representation Learning and a Large-Scale RGBT Benchmark for Crowd Counting”

RGBT Crowd Counting Lingbo Liu, Jiaqi Chen, Hefeng Wu, Guanbin Li, Chenglong Li, Liang Lin. "Cross-Modal Collaborative Representation Learning and a L

37 Dec 08, 2022
This repository lets you interact with Lean through a REPL.

lean-gym This repository lets you interact with Lean through a REPL. See Formal Mathematics Statement Curriculum Learning for a presentation of lean-g

OpenAI 87 Dec 28, 2022
Hi Guys, here I am providing examples, which will help you in Lerarning Python

LearningPython Hi guys, here I am trying to include as many practice examples of Python Language, as i Myself learn, and hope these will help you in t

4 Feb 03, 2022
AI pipelines for Nvidia Jetson Platform

Jetson Multicamera Pipelines Easy-to-use realtime CV/AI pipelines for Nvidia Jetson Platform. This project: Builds a typical multi-camera pipeline, i.

NVIDIA AI IOT 96 Dec 23, 2022
A python bot to move your mouse every few seconds to appear active on Skype, Teams or Zoom as you go AFK. 🐭 🤖

PyMouseBot If you're from GT and annoyed with SGVPN idle timeouts while working on development laptop, You might find this useful. A python cli bot to

Oaker Min 6 Oct 24, 2022
A full pipeline AutoML tool for tabular data

HyperGBM Doc | 中文 We Are Hiring! Dear folks,we are offering challenging opportunities located in Beijing for both professionals and students who are k

DataCanvas 240 Jan 03, 2023