A scientific and useful toolbox, which contains practical and effective long-tail related tricks with extensive experimental results

Overview

Bag of tricks for long-tailed visual recognition with deep convolutional neural networks

This repository is the official PyTorch implementation of AAAI-21 paper Bag of Tricks for Long-Tailed Visual Recognition with Deep Convolutional Neural Networks, which provides practical and effective tricks used in long-tailed image classification.

Trick gallery: trick_gallery.md

  • The tricks will be constantly updated. If you have or need any long-tail related trick newly proposed, please to open an issue or pull requests. Make sure to attach the results in corresponding md files if you pull a request with a new trick.
  • For any problem, such as bugs, feel free to open an issue.

Paper collection of long-tailed visual recognition

Awesome-of-Long-Tailed-Recognition

Long-Tailed-Classification-Leaderboard

Development log

Trick gallery and combinations

Brief inroduction

We divided the long-tail realted tricks into four families: re-weighting, re-sampling, mixup training, and two-stage training. For more details of the above four trick families, see the original paper.

Detailed information :

  • Trick gallery:

    Tricks, corresponding results, experimental settings, and running commands are listed in trick_gallery.md.
  • Trick combinations:

    Combinations of different tricks, corresponding results, experimental settings, and running commands are listed in trick_combination.md.
  • These tricks and trick combinations, which provide the corresponding results in this repo, have been reorgnized and tested. We are trying our best to deal with the rest, which will be constantly updated.

Main requirements

torch >= 1.4.0
torchvision >= 0.5.0
tensorboardX >= 2.1
tensorflow >= 1.14.0 #convert long-tailed cifar datasets from tfrecords to jpgs
Python 3
apex
  • We provide the detailed requirements in requirements.txt. You can run pip install requirements.txt to create the same running environment as ours.
  • The apex is recommended to be installed for saving GPU memories:
pip install -U pip
git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
  • If the apex is not installed, the Distributed training with DistributedDataParallel in our codes cannot be used.

Preparing the datasets

We provide three datasets in this repo: long-tailed CIFAR (CIFAR-LT), long-tailed ImageNet (ImageNet-LT), and iNaturalist 2018 (iNat18).

The detailed information of these datasets are shown as follows:

Datasets CIFAR-10-LT CIFAR-100-LT ImageNet-LT iNat18
Imbalance factor
100 50 100 50
Training images 12,406 13,996 10,847 12,608 11,5846 437,513
Classes 50 50 100 100 1,000 8,142
Max images 5,000 5,000 500 500 1,280 1,000
Min images 50 100 5 10 5 2
Imbalance factor 100 50 100 50 256 500
-  `Max images` and `Min images` represents the number of training images in the largest and smallest classes, respectively.

-  CIFAR-10-LT-100 means the long-tailed CIFAR-10 dataset with the imbalance factor $\beta = 100$.

-  Imbalance factor is defined as $\beta = \frac{\text{Max images}}{\text{Min images}}$.

  • Data format

The annotation of a dataset is a dict consisting of two field: annotations and num_classes. The field annotations is a list of dict with image_id, fpath, im_height, im_width and category_id.

Here is an example.

{
    'annotations': [
                    {
                        'image_id': 1,
                        'fpath': '/data/iNat18/images/train_val2018/Plantae/7477/3b60c9486db1d2ee875f11a669fbde4a.jpg',
                        'im_height': 600,
                        'im_width': 800,
                        'category_id': 7477
                    },
                    ...
                   ]
    'num_classes': 8142
}
  • CIFAR-LT

    There are two versions of CIFAR-LT.

    1. Cui et al., CVPR 2019 firstly proposed the CIFAR-LT. They provided the download link of CIFAR-LT, and also the codes to generate the data, which are in TensorFlow.

      You can follow the steps below to get this version of CIFAR-LT:

      1. Download the Cui's CIFAR-LT in GoogleDrive or Baidu Netdisk (password: 5rsq). Suppose you download the data and unzip them at path /downloaded/data/.
      2. Run tools/convert_from_tfrecords, and the converted CIFAR-LT and corresponding jsons will be generated at /downloaded/converted/.
    # Convert from the original format of CIFAR-LT
    python tools/convert_from_tfrecords.py  --input_path /downloaded/data/ --out_path /downloaded/converted/
    1. Cao et al., NeurIPS 2019 followed Cui et al., CVPR 2019's method to generate the CIFAR-LT randomly. They modify the CIFAR datasets provided by PyTorch as this file shows.
  • ImageNet-LT

    You can use the following steps to convert from the original images of ImageNet-LT.

    1. Download the original ILSVRC-2012. Suppose you have downloaded and reorgnized them at path /downloaded/ImageNet/, which should contain two sub-directories: /downloaded/ImageNet/train and /downloaded/ImageNet/val.
    2. Download the train/test splitting files (ImageNet_LT_train.txt and ImageNet_LT_test.txt) in GoogleDrive or Baidu Netdisk (password: cj0g). Suppose you have downloaded them at path /downloaded/ImageNet-LT/.
    3. Run tools/convert_from_ImageNet.py, and you will get two jsons: ImageNet_LT_train.json and ImageNet_LT_val.json.
    # Convert from the original format of ImageNet-LT
    python tools/convert_from_ImageNet.py --input_path /downloaded/ImageNet-LT/ --image_path /downloaed/ImageNet/ --output_path ./
  • iNat18

    You can use the following steps to convert from the original format of iNaturalist 2018.

    1. The images and annotations should be downloaded at iNaturalist 2018 firstly. Suppose you have downloaded them at path /downloaded/iNat18/.
    2. Run tools/convert_from_iNat.py, and use the generated iNat18_train.json and iNat18_val.json to train.
    # Convert from the original format of iNaturalist
    # See tools/convert_from_iNat.py for more details of args 
    python tools/convert_from_iNat.py --input_json_file /downloaded/iNat18/train2018.json --image_path /downloaded/iNat18/images --output_json_file ./iNat18_train.json
    
    python tools/convert_from_iNat.py --input_json_file /downloaded/iNat18/val2018.json --image_path /downloaded/iNat18/images --output_json_file ./iNat18_val.json 

Usage

In this repo:

  • The results of CIFAR-LT (ResNet-32) and ImageNet-LT (ResNet-10), which need only one GPU to train, are gotten by DataParallel training with apex.

  • The results of iNat18 (ResNet-50), which need more than one GPU to train, are gotten by DistributedDataParallel training with apex.

  • If more than one GPU is used, DistributedDataParallel training is efficient than DataParallel training, especially when the CPU calculation forces are limited.

Training

Parallel training with DataParallel

1, To train
# To train long-tailed CIFAR-10 with imbalanced ratio of 50. 
# `GPUs` are the GPUs you want to use, such as `0,4`.
bash data_parallel_train.sh configs/test/data_parallel.yaml GPUs

Distributed training with DistributedDataParallel

1, Change the NCCL_SOCKET_IFNAME in run_with_distributed_parallel.sh to [your own socket name]. 
export NCCL_SOCKET_IFNAME = [your own socket name]

2, To train
# To train long-tailed CIFAR-10 with imbalanced ratio of 50. 
# `GPUs` are the GPUs you want to use, such as `0,1,4`.
# `NUM_GPUs` are the number of GPUs you want to use. If you set `GPUs` to `0,1,4`, then `NUM_GPUs` should be `3`.
bash distributed_data_parallel_train.sh configs/test/distributed_data_parallel.yaml NUM_GPUs GPUs

Validation

You can get the validation accuracy and the corresponding confusion matrix after running the following commands.

See main/valid.py for more details.

1, Change the TEST.MODEL_FILE in the yaml to your own path of the trained model firstly.
2, To do validation
# `GPUs` are the GPUs you want to use, such as `0,1,4`.
python main/valid.py --cfg [Your yaml] --gpus GPUS

The comparison between the baseline results using our codes and the references [Cui, Kang]

  • We use Top-1 error rates as our evaluation metric.
  • From the results of two CIFAR-LT, we can see that the CIFAR-LT provided by Cao has much lower Top-1 error rates on CIFAR-10-LT, compared with the baseline results reported in his paper. So, in our experiments, we use the CIFAR-LT of Cui for fairness.
  • For the ImageNet-LT, we find that the color_jitter augmentation was not included in our experiments, which, however, is adopted by other methods. So, in this repo, we add the color_jitter augmentation on ImageNet-LT. The old baseline without color_jitter is 64.89, which is +1.15 points higher than the new baseline.
  • You can click the Baseline in the table below to see the experimental settings and corresponding running commands.
Datasets Cui et al., 2019 Cao et al., 2020 ImageNet-LT iNat18
CIFAR-10-LT CIFAR-100-LT CIFAR-10-LT CIFAR-100-LT
Imbalance factor Imbalance factor
100 50 100 50 100 50 100 50
Backbones ResNet-32 ResNet-32 ResNet-10 ResNet-50
Baselines using our codes
  1. CONFIG (from left to right):
    • configs/cui_cifar/baseline/{cifar10_im100.yaml, cifar10_im50.yaml, cifar100_im100.yaml, cifar100_im50.yaml}
    • configs/cao_cifar/baseline/{cifar10_im100.yaml, cifar10_im50.yaml, cifar100_im100.yaml, cifar100_im50.yaml}
    • configs/ImageNet_LT/imagenetlt_baseline.yaml
    • configs/iNat18/iNat18_baseline.yaml

  2. Running commands:
    • For CIFAR-LT and ImageNet-LT: bash data_parallel_train.sh CONFIG GPU
    • For iNat18: bash distributed_data_parallel_train.sh configs/iNat18/iNat18_baseline.yaml NUM_GPUs GPUs
30.12 24.81 61.76 57.65 28.05 23.55 62.27 56.22 63.74 40.55
Reference [Cui, Kang, Liu] 29.64 25.19 61.68 56.15 29.64 25.19 61.68 56.15 64.40 42.86

Citation

@inproceedings{zhang2020tricks,
  author    = {Yongshun Zhang and Xiu{-}Shen Wei and Boyan Zhou and Jianxin Wu},
  title     = {Bag of Tricks for Long-Tailed Visual Recognition with Deep Convolutional Neural Networks},
  booktitle = {AAAI},
  year      = {2021},
}

Contacts

If you have any question about our work, please do not hesitate to contact us by emails provided in the paper.

Owner
Yong-Shun Zhang
Computer Vision
Yong-Shun Zhang
Semi-SDP Semi-supervised parser for semantic dependency parsing.

Semi-SDP Semi-supervised parser for semantic dependency parsing. This repo contains the code used for the semi-supervised semantic dependency parser i

12 Sep 17, 2021
Self-supervised Label Augmentation via Input Transformations (ICML 2020)

Self-supervised Label Augmentation via Input Transformations Authors: Hankook Lee, Sung Ju Hwang, Jinwoo Shin (KAIST) Accepted to ICML 2020 Install de

hankook 96 Dec 29, 2022
This repo includes our code for evaluating and improving transferability in domain generalization (NeurIPS 2021)

Transferability for domain generalization This repo is for evaluating and improving transferability in domain generalization (NeurIPS 2021), based on

gordon 9 Nov 29, 2022
Supplementary materials to "Spin-optomechanical quantum interface enabled by an ultrasmall mechanical and optical mode volume cavity" by H. Raniwala, S. Krastanov, M. Eichenfield, and D. R. Englund, 2022

Supplementary materials to "Spin-optomechanical quantum interface enabled by an ultrasmall mechanical and optical mode volume cavity" by H. Raniwala,

Stefan Krastanov 1 Jan 17, 2022
Probabilistic-Monocular-3D-Human-Pose-Estimation-with-Normalizing-Flows

Probabilistic-Monocular-3D-Human-Pose-Estimation-with-Normalizing-Flows This is the official implementation of the ICCV 2021 Paper "Probabilistic Mono

62 Nov 23, 2022
Large scale and asynchronous Hyperparameter Optimization at your fingertip.

Syne Tune This package provides state-of-the-art distributed hyperparameter optimizers (HPO) where trials can be evaluated with several backend option

Amazon Web Services - Labs 236 Jan 01, 2023
Bridging Vision and Language Model

BriVL BriVL (Bridging Vision and Language Model) 是首个中文通用图文多模态大规模预训练模型。BriVL模型在图文检索任务上有着优异的效果,超过了同期其他常见的多模态预训练模型(例如UNITER、CLIP)。 BriVL论文:WenLan: Bridgi

235 Dec 27, 2022
Ἀνατομή is a PyTorch library to analyze representation of neural networks

Ἀνατομή is a PyTorch library to analyze representation of neural networks

Ryuichiro Hataya 50 Dec 05, 2022
A Robust Unsupervised Ensemble of Feature-Based Explanations using Restricted Boltzmann Machines

A Robust Unsupervised Ensemble of Feature-Based Explanations using Restricted Boltzmann Machines Understanding the results of deep neural networks is

Johan van den Heuvel 2 Dec 13, 2021
Implementation of character based convolutional neural network

Character Based CNN This repo contains a PyTorch implementation of a character-level convolutional neural network for text classification. The model a

Ahmed BESBES 248 Nov 21, 2022
The official GitHub repository for the Argoverse 2 dataset.

Argoverse 2 API Official GitHub repository for the Argoverse 2 family of datasets. If you have any questions or run into any problems with either the

Argo AI 156 Dec 23, 2022
A Streamlit component to render ECharts.

Streamlit - ECharts A Streamlit component to display ECharts. Install pip install streamlit-echarts Usage This library provides 2 functions to display

Fanilo Andrianasolo 290 Dec 30, 2022
A PyTorch Implementation of ViT (Vision Transformer)

ViT - Vision Transformer This is an implementation of ViT - Vision Transformer by Google Research Team through the paper "An Image is Worth 16x16 Word

Quan Nguyen 7 May 11, 2022
Pytorch implementation of the paper "Class-Balanced Loss Based on Effective Number of Samples"

Class-balanced-loss-pytorch Pytorch implementation of the paper Class-Balanced Loss Based on Effective Number of Samples presented at CVPR'19. Yin Cui

Vandit Jain 697 Dec 29, 2022
Code for Recurrent Mask Refinement for Few-Shot Medical Image Segmentation (ICCV 2021).

Recurrent Mask Refinement for Few-Shot Medical Image Segmentation Steps Install any missing packages using pip or conda Preprocess each dataset using

XIE LAB @ UCI 39 Dec 08, 2022
Implementation of C-RNN-GAN.

Implementation of C-RNN-GAN. Publication: Title: C-RNN-GAN: Continuous recurrent neural networks with adversarial training Information: http://mogren.

Olof Mogren 427 Dec 25, 2022
List of content farm sites like g.penzai.com.

内容农场网站清单 Google 中文搜索结果包含了相当一部分的内容农场式条目,比如「小 X 知识网」「小 X 百科网」。此种链接常会 302 重定向其主站,页面内容为自动生成,大量堆叠关键字,揉杂一些爬取到的内容,完全不具可读性和参考价值。 尤为过分的是,该类网站可能有成千上万个分身域名被 Goog

WDMPA 541 Jan 03, 2023
Official PyTorch implementation of MAAD: A Model and Dataset for Attended Awareness

MAAD: A Model for Attended Awareness in Driving Install // Datasets // Training // Experiments // Analysis // License Official PyTorch implementation

7 Oct 16, 2022
Learning Correspondence from the Cycle-consistency of Time (CVPR 2019)

TimeCycle Code for Learning Correspondence from the Cycle-consistency of Time (CVPR 2019, Oral). The code is developed based on the PyTorch framework,

Xiaolong Wang 706 Nov 29, 2022
Python script for performing depth completion from sparse depth and rgb images using the msg_chn_wacv20. model in ONNX

ONNX msg_chn_wacv20 depth completion Python script for performing depth completion from sparse depth and rgb images using the msg_chn_wacv20 model in

Ibai Gorordo 19 Oct 22, 2022