Overcoming Classifier Imbalance for Long-tail Object Detection with Balanced Group Softmax
This repo is the official implementation for CVPR 2020 oral paper: Overcoming Classifier Imbalance for Long-tail Object Detection with Balanced Group Softmax. [Paper] [Supp] [Slides] [Video] [Code and models]
Note: Current code is still not very clean yet. We are still working on it, and it will be updated soon.
Requirements
1. Environment:
The requirements are exactly the same as mmdetection v1.0.rc0. We tested on on the following settings:
- python 3.7
- cuda 9.2
- pytorch 1.3.1+cu92
- torchvision 0.4.2+cu92
- mmcv 0.2.14
HH=`pwd`
conda create -n mmdet python=3.7 -y
conda activate mmdet
pip install cython
pip install numpy
pip install torch
pip install torchvision
pip install pycocotools
pip install mmcv
pip install matplotlib
pip install terminaltables
cd lvis-api/
python setup.py develop
cd $HH
python setup.py develop
2. Data:
a. For dataset images:
# Make sure you are in dir BalancedGroupSoftmax
mkdir data
cd data
mkdir lvis
mkdir pretrained_models
- If you already have COCO2017 dataset, it will be great. Link
train2017andval2017folders under folderlvis. - If you do not have COCO2017 dataset, please download: COCO train set and COCO val set and unzip these files and mv them under folder
lvis.
b. For dataset annotations:
- Download lvis annotations: lvis train ann and lvis val ann.
- Unzip all the files and put them under
lvis,
To train HTC models, download COCO stuff annotations and change the name of folder
stuffthingmaps_trainval2017tostuffthingmaps.
c. For pretrained models:
Download the corresponding pre-trained models below.
- To train baseline models, we need models trained on COCO to initialize. Please download the corresponding COCO models at mmdetection model zoo.
- To train balanced group softmax models (shorted as
gsmodels), we need corresponding baseline models trained on LVIS to initialize and fix all parameters except for the last FC layer. - Move these model files to
./data/pretrained_models/
d. For intermediate files (for BAGS and reweight models only):
You can either donwnload or generate them before training and testing. Put them under
./data/lvis/.
- BAGS models:
label2binlabel.pt, pred_slice_with0.pt, valsplit.pkl - Re-weight models:
cls_weight.pt, cls_weight_bours.pt - RFS models:
class_to_imageid_and_inscount.pt
After all these operations, the folder data should be like this:
data
├── lvis
│ ├── lvis_v0.5_train.json
│ ├── lvis_v0.5_val.json
│ ├── stuffthingmaps (Optional, for HTC models only)
│ ├── label2binlabel.pt (Optional, for GAGS models only)
│ ├── ...... (Other intermidiate files)
│ │ ├── train2017
│ │ │ ├── 000000004134.png
│ │ │ ├── 000000031817.png
│ │ │ ├── ......
│ │ └── val2017
│ │ ├── 000000424162.png
│ │ ├── 000000445999.png
│ │ ├── ......
│ ├── train2017
│ │ ├── 000000100582.jpg
│ │ ├── 000000102411.jpg
│ │ ├── ......
│ └── val2017
│ ├── 000000062808.jpg
│ ├── 000000119038.jpg
│ ├── ......
└── pretrained_models
├── faster_rcnn_r50_fpn_2x_20181010-443129e1.pth
├── ......
Training
Note: Please make sure that you have prepared the pre-trained models and intermediate files and they have been put to the path specified in
${CONIFG_FILE}.
Use the following commands to train a model.
# Single GPU
python tools/train.py ${CONFIG_FILE}
# Multi GPU distributed training
./tools/dist_train.sh ${CONFIG_FILE} ${GPU_NUM} [optional arguments]
All config files are under ./configs/.
./configs/bags: all models for Balanced Group Softmax../configs/baselines: all baseline models../configs/transferred:transferred models from long-tail image classification../configs/ablations: models for ablation study.
For example, to train a BAGS model with Faster R-CNN R50-FPN:
# Single GPU
python tools/train.py configs/bags/gs_faster_rcnn_r50_fpn_1x_lvis_with0_bg8.py
# Multi GPU distributed training (for 8 gpus)
./tools/dist_train.sh configs/bags/gs_faster_rcnn_r50_fpn_1x_lvis_with0_bg8.py 8
Important: The default learning rate in config files is for 8 GPUs and 2 img/gpu (batch size = 8*2 = 16). According to the Linear Scaling Rule, you need to set the learning rate proportional to the batch size if you use different GPUs or images per GPU, e.g., lr=0.01 for 4 GPUs * 2 img/gpu and lr=0.08 for 16 GPUs * 4 img/gpu. (Cited from mmdetection.)
Testing
Note: Please make sure that you have prepared the intermediate files and they have been put to the path specified in
${CONIFG_FILE}.
Use the following commands to test a trained model.
# single gpu test
python tools/test_lvis.py \
${CONFIG_FILE} ${CHECKPOINT_FILE} [--out ${RESULT_FILE}] [--eval ${EVAL_METRICS}]
# multi-gpu testing
./tools/dist_test_lvis.sh \
${CONFIG_FILE} ${CHECKPOINT_FILE} ${GPU_NUM} [--out ${RESULT_FILE}] [--eval ${EVAL_METRICS}]
$RESULT_FILE: Filename of the output results in pickle format. If not specified, the results will not be saved to a file.$EVAL_METRICS: Items to be evaluated on the results.bboxfor bounding box evaluation only.bbox segmfor bounding box and mask evaluation.
For example (assume that you have downloaded the corresponding model file to ./data/downloaded_models):
- To evaluate the trained BAGS model with Faster R-CNN R50-FPN for object detection:
# single-gpu testing
python tools/test_lvis.py configs/bags/gs_faster_rcnn_r50_fpn_1x_lvis_with0_bg8.py \
./donwloaded_models/gs_faster_rcnn_r50_fpn_1x_lvis_with0_bg8.pth \
--out gs_box_result.pkl --eval bbox
# multi-gpu testing (8 gpus)
./tools/dist_test_lvis.sh configs/bags/gs_faster_rcnn_r50_fpn_1x_lvis_with0_bg8.py \
./donwloaded_models/gs_faster_rcnn_r50_fpn_1x_lvis_with0_bg8.pth 8 \
--out gs_box_result.pkl --eval bbox
- To evaluate the trained BAGS model with Mask R-CNN R50-FPN for instance segmentation:
# single-gpu testing
python tools/test_lvis.py configs/bags/gs_mask_rcnn_r50_fpn_1x_lvis.py \
./donwloaded_models/gs_mask_rcnn_r50_fpn_1x_lvis.pth \
--out gs_mask_result.pkl --eval bbox segm
# multi-gpu testing (8 gpus)
./tools/dist_test_lvis.sh configs/bags/gs_mask_rcnn_r50_fpn_1x_lvis.py \
./donwloaded_models/gs_mask_rcnn_r50_fpn_1x_lvis.pth 8 \
--out gs_mask_result.pkl --eval bbox segm
The evaluation results will be shown in markdown table format:
| Type | IoU | Area | MaxDets | CatIds | Result |
| :---: | :---: | :---: | :---: | :---: | :---: |
| (AP) | 0.50:0.95 | all | 300 | all | 25.96% |
| (AP) | 0.50 | all | 300 | all | 43.58% |
| (AP) | 0.75 | all | 300 | all | 27.15% |
| (AP) | 0.50:0.95 | s | 300 | all | 20.26% |
| (AP) | 0.50:0.95 | m | 300 | all | 32.81% |
| (AP) | 0.50:0.95 | l | 300 | all | 40.10% |
| (AP) | 0.50:0.95 | all | 300 | r | 17.66% |
| (AP) | 0.50:0.95 | all | 300 | c | 25.75% |
| (AP) | 0.50:0.95 | all | 300 | f | 29.55% |
| (AR) | 0.50:0.95 | all | 300 | all | 34.76% |
| (AR) | 0.50:0.95 | s | 300 | all | 24.77% |
| (AR) | 0.50:0.95 | m | 300 | all | 41.50% |
| (AR) | 0.50:0.95 | l | 300 | all | 51.64% |
Results and models
The main results on LVIS val set:
Models:
Please refer to our paper and supp for more details.
| ID | Models | bbox mAP / mask mAP | Train | Test | Config file | Pretrained Model | Train part | Model |
|---|---|---|---|---|---|---|---|---|
| (1) | Faster R50-FPN | 20.98 | √ | √ | file | COCO R50 | All | Google drive |
| (2) | x2 | 21.93 | √ | √ | file | Model (1) | All | Google drive |
| (3) | Finetune tail | 22.28 | × | √ | file | Model (1) | All | Google drive |
| (4) | RFS | 23.41 | √ | √ | file | COCO R50 | All | Google drive |
| (5) | RFS-finetune | 22.66 | √ | √ | file | Model (1) | All | Google drive |
| (6) | Re-weight | 23.48 | √ | √ | file | Model (1) | All | Google drive |
| (7) | Re-weight-cls | 24.66 | √ | √ | file | Model (1) | Cls | Google drive |
| (8) | Focal loss | 11.12 | × | √ | file | Model (1) | All | Google drive |
| (9) | Focal loss-cls | 19.29 | × | √ | file | Model (1) | Cls | Google drive |
| (10) | NCM-fc | 16.02 | × | × | Model (1) | |||
| (11) | NCM-conv | 12.56 | × | × | Model (1) | |||
| (12) | $\tau$-norm | 11.01 | × | × | Model (1) | Cls | ||
| (13) | $\tau$-norm-select | 21.61 | × | × | Model (1) | Cls | ||
| (14) | Ours (Faster R50-FPN) | 25.96 | √ | √ | file | Model (1) | Cls | Google drive |
| (15) | Faster X101-64x4d | 24.63 | √ | √ | file | COCO x101 | All | Google drive |
| (16) | Ours (Faster X101-64x4d) | 27.83 | √ | √ | file | Model (15) | Cls | Google drive |
| (17) | Cascade X101-64x4d | 27.16 | √ | √ | file | COCO cascade x101 | All | Google drive |
| (18) | Ours (Cascade X101-64x4d) | 32.77 | √ | √ | file | Model (17) | Cls | Google drive |
| (19) | Mask R50-FPN | 20.78/20.68 | √ | √ | file | COCO mask r50 | All | Google drive |
| (20) | Ours (Mask R50-FPN) | 25.76/26.25 | √ | √ | file | Model (19) | Cls | Google drive |
| (21) | HTC X101-64x4d | 31.28/29.28 | √ | √ | file | COCO HTC x101 | All | Google drive |
| (22) | Ours (HTC X101-64x4d) | 33.68/31.20 | √ | √ | file | Model (21) | Cls | Google drive |
| (23) | HTC X101-64x4d-MS-DCN | 34.61/31.94 | √ | √ | file | COCO HTC x101-ms-dcn | All | Google drive |
| (24) | Ours (HTC X101-64x4d-MS-DCN) | 37.71/34.39 | √ | √ | file | Model (23) | Cls | Google drive |
PS: in column
Pretrained Model, the file ofModel (n)is the same as theGoogle drivefile in columnModelin row(n).
Citation
@inproceedings{li2020overcoming,
title={Overcoming Classifier Imbalance for Long-Tail Object Detection With Balanced Group Softmax},
author={Li, Yu and Wang, Tao and Kang, Bingyi and Tang, Sheng and Wang, Chunfeng and Li, Jintao and Feng, Jiashi},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={10991--11000},
year={2020}
}
Credit
This code is largely based on mmdetection v1.0.rc0 and LVIS API.

