CVPR 2020 oral paper: Overcoming Classifier Imbalance for Long-tail Object Detection with Balanced Group Softmax.

Overview

Overcoming Classifier Imbalance for Long-tail Object Detection with Balanced Group Softmax

⚠️ Latest: Current repo is a complete version. But we delete many redundant codes and are still under testing now.

This repo is the official implementation for CVPR 2020 oral paper: Overcoming Classifier Imbalance for Long-tail Object Detection with Balanced Group Softmax. [Paper] [Supp] [Slides] [Video] [Code and models]

Note: Current code is still not very clean yet. We are still working on it, and it will be updated soon.

Framework

Requirements

1. Environment:

The requirements are exactly the same as mmdetection v1.0.rc0. We tested on on the following settings:

  • python 3.7
  • cuda 9.2
  • pytorch 1.3.1+cu92
  • torchvision 0.4.2+cu92
  • mmcv 0.2.14
HH=`pwd`
conda create -n mmdet python=3.7 -y
conda activate mmdet

pip install cython
pip install numpy
pip install torch
pip install torchvision
pip install pycocotools
pip install mmcv
pip install matplotlib
pip install terminaltables

cd lvis-api/
python setup.py develop

cd $HH
python setup.py develop

2. Data:

a. For dataset images:

# Make sure you are in dir BalancedGroupSoftmax

mkdir data
cd data
mkdir lvis
mkdir pretrained_models
  • If you already have COCO2017 dataset, it will be great. Link train2017 and val2017 folders under folder lvis.
  • If you do not have COCO2017 dataset, please download: COCO train set and COCO val set and unzip these files and mv them under folder lvis.

b. For dataset annotations:

To train HTC models, download COCO stuff annotations and change the name of folder stuffthingmaps_trainval2017 to stuffthingmaps.

c. For pretrained models:

Download the corresponding pre-trained models below.

  • To train baseline models, we need models trained on COCO to initialize. Please download the corresponding COCO models at mmdetection model zoo.
  • To train balanced group softmax models (shorted as gs models), we need corresponding baseline models trained on LVIS to initialize and fix all parameters except for the last FC layer.
  • Move these model files to ./data/pretrained_models/

d. For intermediate files (for BAGS and reweight models only):

You can either donwnload or generate them before training and testing. Put them under ./data/lvis/.

  • BAGS models: label2binlabel.pt, pred_slice_with0.pt, valsplit.pkl
  • Re-weight models: cls_weight.pt, cls_weight_bours.pt
  • RFS models: class_to_imageid_and_inscount.pt

After all these operations, the folder data should be like this:

    data
    ├── lvis
    │   ├── lvis_v0.5_train.json
    │   ├── lvis_v0.5_val.json
    │   ├── stuffthingmaps (Optional, for HTC models only)
    │   ├── label2binlabel.pt (Optional, for GAGS models only)
    │   ├── ...... (Other intermidiate files)
    │   │   ├── train2017
    │   │   │   ├── 000000004134.png
    │   │   │   ├── 000000031817.png
    │   │   │   ├── ......
    │   │   └── val2017
    │   │       ├── 000000424162.png
    │   │       ├── 000000445999.png
    │   │       ├── ......
    │   ├── train2017
    │   │   ├── 000000100582.jpg
    │   │   ├── 000000102411.jpg
    │   │   ├── ......
    │   └── val2017
    │       ├── 000000062808.jpg
    │       ├── 000000119038.jpg
    │       ├── ......
    └── pretrained_models
        ├── faster_rcnn_r50_fpn_2x_20181010-443129e1.pth
        ├── ......

Training

Note: Please make sure that you have prepared the pre-trained models and intermediate files and they have been put to the path specified in ${CONIFG_FILE}.

Use the following commands to train a model.

# Single GPU
python tools/train.py ${CONFIG_FILE}

# Multi GPU distributed training
./tools/dist_train.sh ${CONFIG_FILE} ${GPU_NUM} [optional arguments]

All config files are under ./configs/.

  • ./configs/bags: all models for Balanced Group Softmax.
  • ./configs/baselines: all baseline models.
  • ./configs/transferred: transferred models from long-tail image classification.
  • ./configs/ablations: models for ablation study.

For example, to train a BAGS model with Faster R-CNN R50-FPN:

# Single GPU
python tools/train.py configs/bags/gs_faster_rcnn_r50_fpn_1x_lvis_with0_bg8.py

# Multi GPU distributed training (for 8 gpus)
./tools/dist_train.sh configs/bags/gs_faster_rcnn_r50_fpn_1x_lvis_with0_bg8.py 8

Important: The default learning rate in config files is for 8 GPUs and 2 img/gpu (batch size = 8*2 = 16). According to the Linear Scaling Rule, you need to set the learning rate proportional to the batch size if you use different GPUs or images per GPU, e.g., lr=0.01 for 4 GPUs * 2 img/gpu and lr=0.08 for 16 GPUs * 4 img/gpu. (Cited from mmdetection.)

Testing

Note: Please make sure that you have prepared the intermediate files and they have been put to the path specified in ${CONIFG_FILE}.

Use the following commands to test a trained model.

# single gpu test
python tools/test_lvis.py \
 ${CONFIG_FILE} ${CHECKPOINT_FILE} [--out ${RESULT_FILE}] [--eval ${EVAL_METRICS}]

# multi-gpu testing
./tools/dist_test_lvis.sh \
 ${CONFIG_FILE} ${CHECKPOINT_FILE} ${GPU_NUM} [--out ${RESULT_FILE}] [--eval ${EVAL_METRICS}]
  • $RESULT_FILE: Filename of the output results in pickle format. If not specified, the results will not be saved to a file.
  • $EVAL_METRICS: Items to be evaluated on the results. bbox for bounding box evaluation only. bbox segm for bounding box and mask evaluation.

For example (assume that you have downloaded the corresponding model file to ./data/downloaded_models):

  • To evaluate the trained BAGS model with Faster R-CNN R50-FPN for object detection:
# single-gpu testing
python tools/test_lvis.py configs/bags/gs_faster_rcnn_r50_fpn_1x_lvis_with0_bg8.py \
 ./donwloaded_models/gs_faster_rcnn_r50_fpn_1x_lvis_with0_bg8.pth \
  --out gs_box_result.pkl --eval bbox

# multi-gpu testing (8 gpus)
./tools/dist_test_lvis.sh configs/bags/gs_faster_rcnn_r50_fpn_1x_lvis_with0_bg8.py \
./donwloaded_models/gs_faster_rcnn_r50_fpn_1x_lvis_with0_bg8.pth 8 \
--out gs_box_result.pkl --eval bbox
  • To evaluate the trained BAGS model with Mask R-CNN R50-FPN for instance segmentation:
# single-gpu testing
python tools/test_lvis.py configs/bags/gs_mask_rcnn_r50_fpn_1x_lvis.py \
 ./donwloaded_models/gs_mask_rcnn_r50_fpn_1x_lvis.pth \
  --out gs_mask_result.pkl --eval bbox segm

# multi-gpu testing (8 gpus)
./tools/dist_test_lvis.sh configs/bags/gs_mask_rcnn_r50_fpn_1x_lvis.py \
./donwloaded_models/gs_mask_rcnn_r50_fpn_1x_lvis.pth 8 \
--out gs_mask_result.pkl --eval bbox segm

The evaluation results will be shown in markdown table format:

| Type | IoU | Area | MaxDets | CatIds | Result |
| :---: | :---: | :---: | :---: | :---: | :---: |
|  (AP)  | 0.50:0.95 |    all | 300 |          all | 25.96% |
|  (AP)  | 0.50      |    all | 300 |          all | 43.58% |
|  (AP)  | 0.75      |    all | 300 |          all | 27.15% |
|  (AP)  | 0.50:0.95 |      s | 300 |          all | 20.26% |
|  (AP)  | 0.50:0.95 |      m | 300 |          all | 32.81% |
|  (AP)  | 0.50:0.95 |      l | 300 |          all | 40.10% |
|  (AP)  | 0.50:0.95 |    all | 300 |            r | 17.66% |
|  (AP)  | 0.50:0.95 |    all | 300 |            c | 25.75% |
|  (AP)  | 0.50:0.95 |    all | 300 |            f | 29.55% |
|  (AR)  | 0.50:0.95 |    all | 300 |          all | 34.76% |
|  (AR)  | 0.50:0.95 |      s | 300 |          all | 24.77% |
|  (AR)  | 0.50:0.95 |      m | 300 |          all | 41.50% |
|  (AR)  | 0.50:0.95 |      l | 300 |          all | 51.64% |

Results and models

The main results on LVIS val set:

LVIS val results

Models:

Please refer to our paper and supp for more details.

ID Models bbox mAP / mask mAP Train Test Config file Pretrained Model Train part Model
(1) Faster R50-FPN 20.98 file COCO R50 All Google drive
(2) x2 21.93 file Model (1) All Google drive
(3) Finetune tail 22.28 × file Model (1) All Google drive
(4) RFS 23.41 file COCO R50 All Google drive
(5) RFS-finetune 22.66 file Model (1) All Google drive
(6) Re-weight 23.48 file Model (1) All Google drive
(7) Re-weight-cls 24.66 file Model (1) Cls Google drive
(8) Focal loss 11.12 × file Model (1) All Google drive
(9) Focal loss-cls 19.29 × file Model (1) Cls Google drive
(10) NCM-fc 16.02 × × Model (1)
(11) NCM-conv 12.56 × × Model (1)
(12) $\tau$-norm 11.01 × × Model (1) Cls
(13) $\tau$-norm-select 21.61 × × Model (1) Cls
(14) Ours (Faster R50-FPN) 25.96 file Model (1) Cls Google drive
(15) Faster X101-64x4d 24.63 file COCO x101 All Google drive
(16) Ours (Faster X101-64x4d) 27.83 file Model (15) Cls Google drive
(17) Cascade X101-64x4d 27.16 file COCO cascade x101 All Google drive
(18) Ours (Cascade X101-64x4d) 32.77 file Model (17) Cls Google drive
(19) Mask R50-FPN 20.78/20.68 file COCO mask r50 All Google drive
(20) Ours (Mask R50-FPN) 25.76/26.25 file Model (19) Cls Google drive
(21) HTC X101-64x4d 31.28/29.28 file COCO HTC x101 All Google drive
(22) Ours (HTC X101-64x4d) 33.68/31.20 file Model (21) Cls Google drive
(23) HTC X101-64x4d-MS-DCN 34.61/31.94 file COCO HTC x101-ms-dcn All Google drive
(24) Ours (HTC X101-64x4d-MS-DCN) 37.71/34.39 file Model (23) Cls Google drive

PS: in column Pretrained Model, the file of Model (n) is the same as the Google drive file in column Model in row (n).

Citation

@inproceedings{li2020overcoming,
  title={Overcoming Classifier Imbalance for Long-Tail Object Detection With Balanced Group Softmax},
  author={Li, Yu and Wang, Tao and Kang, Bingyi and Tang, Sheng and Wang, Chunfeng and Li, Jintao and Feng, Jiashi},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={10991--11000},
  year={2020}
}

Credit

This code is largely based on mmdetection v1.0.rc0 and LVIS API.

Owner
FishYuLi
happy
FishYuLi
3rd place solution for the Weather4cast 2021 Stage 1 Challenge

weather4cast2021_Stage1 3rd place solution for the Weather4cast 2021 Stage 1 Challenge Dependencies The code can be executed from a fresh environment

5 Aug 14, 2022
Official implementation of the method ContIG, for self-supervised learning from medical imaging with genomics

ContIG: Self-supervised Multimodal Contrastive Learning for Medical Imaging with Genetics This is the code implementation of the paper "ContIG: Self-s

Digital Health & Machine Learning 22 Dec 13, 2022
Codes and pretrained weights for winning submission of 2021 Brain Tumor Segmentation (BraTS) Challenge

Winning submission to the 2021 Brain Tumor Segmentation Challenge This repo contains the codes and pretrained weights for the winning submission to th

94 Dec 28, 2022
Source code for the GPT-2 story generation models in the EMNLP 2020 paper "STORIUM: A Dataset and Evaluation Platform for Human-in-the-Loop Story Generation"

Storium GPT-2 Models This is the official repository for the GPT-2 models described in the EMNLP 2020 paper [STORIUM: A Dataset and Evaluation Platfor

Nader Akoury 27 Dec 20, 2022
[arXiv22] Disentangled Representation Learning for Text-Video Retrieval

Disentangled Representation Learning for Text-Video Retrieval This is a PyTorch implementation of the paper Disentangled Representation Learning for T

Qiang Wang 49 Dec 18, 2022
A modern pure-Python library for reading PDF files

pdf A modern pure-Python library for reading PDF files. The goal is to have a modern interface to handle PDF files which is consistent with itself and

6 Apr 06, 2022
Bayesian optimization in PyTorch

BoTorch is a library for Bayesian Optimization built on PyTorch. BoTorch is currently in beta and under active development! Why BoTorch ? BoTorch Prov

2.5k Dec 31, 2022
One Million Scenes for Autonomous Driving

ONCE Benchmark This is a reproduced benchmark for 3D object detection on the ONCE (One Million Scenes) dataset. The code is mainly based on OpenPCDet.

148 Dec 28, 2022
Attentional Focus Modulates Automatic Finger‑tapping Movements

"Attentional Focus Modulates Automatic Finger‑tapping Movements", in Scientific Reports

Xingxun Jiang 1 Dec 02, 2021
UMPNet: Universal Manipulation Policy Network for Articulated Objects

UMPNet: Universal Manipulation Policy Network for Articulated Objects Zhenjia Xu, Zhanpeng He, Shuran Song Columbia University Robotics and Automation

Columbia Artificial Intelligence and Robotics Lab 33 Dec 03, 2022
This repository contains the implementations related to the experiments of a set of publicly available datasets that are used in the time series forecasting research space.

TSForecasting This repository contains the implementations related to the experiments of a set of publicly available datasets that are used in the tim

Rakshitha Godahewa 80 Dec 30, 2022
Another pytorch implementation of FCN (Fully Convolutional Networks)

FCN-pytorch-easiest Trying to be the easiest FCN pytorch implementation and just in a get and use fashion Here I use a handbag semantic segmentation f

Y. Dong 158 Dec 21, 2022
Measures input lag without dedicated hardware, performing motion detection on recorded or live video

What is InputLagTimer? This tool can measure input lag by analyzing a video where both the game controller and the game screen can be seen on a webcam

Bruno Gonzalez 4 Aug 18, 2022
Conversion between units used in magnetism

convmag Conversion between various units used in magnetism The conversions between base units available are: T - G : 1e4

0 Jul 15, 2021
Using Self-Supervised Pretext Tasks for Active Learning - Official Pytorch Implementation

Using Self-Supervised Pretext Tasks for Active Learning - Official Pytorch Implementation Experiment Setting: CIFAR10 (downloaded and saved in ./DATA

John Seon Keun Yi 38 Dec 27, 2022
Using machine learning to predict undergrad college admissions.

College-Prediction Project- Overview: Many have tried, many have failed. Few trailblazers are ambitious enought to chase acceptance into the top 15 un

John H Klinges 1 Jan 05, 2022
Unofficial PyTorch implementation of MobileViT based on paper "MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer".

MobileViT RegNet Unofficial PyTorch implementation of MobileViT based on paper MOBILEVIT: LIGHT-WEIGHT, GENERAL-PURPOSE, AND MOBILE-FRIENDLY VISION TR

Hong-Jia Chen 91 Dec 02, 2022
Neural-Pull: Learning Signed Distance Functions from Point Clouds by Learning to Pull Space onto Surfaces(ICML 2021)

Neural-Pull: Learning Signed Distance Functions from Point Clouds by Learning to Pull Space onto Surfaces(ICML 2021) This repository contains the code

149 Dec 15, 2022
Fully-automated scripts for collecting AI-related papers

AI-Paper-collector Fully-automated scripts for collecting AI-related papers List of Conferences to crawel ACL: 21-19 (including findings) EMNLP: 21-19

Gordon Lee 776 Jan 08, 2023
Bilinear attention networks for visual question answering

Bilinear Attention Networks This repository is the implementation of Bilinear Attention Networks for the visual question answering and Flickr30k Entit

Jin-Hwa Kim 506 Nov 29, 2022