CVPR 2020 oral paper: Overcoming Classifier Imbalance for Long-tail Object Detection with Balanced Group Softmax.

Overview

Overcoming Classifier Imbalance for Long-tail Object Detection with Balanced Group Softmax

⚠️ Latest: Current repo is a complete version. But we delete many redundant codes and are still under testing now.

This repo is the official implementation for CVPR 2020 oral paper: Overcoming Classifier Imbalance for Long-tail Object Detection with Balanced Group Softmax. [Paper] [Supp] [Slides] [Video] [Code and models]

Note: Current code is still not very clean yet. We are still working on it, and it will be updated soon.

Framework

Requirements

1. Environment:

The requirements are exactly the same as mmdetection v1.0.rc0. We tested on on the following settings:

  • python 3.7
  • cuda 9.2
  • pytorch 1.3.1+cu92
  • torchvision 0.4.2+cu92
  • mmcv 0.2.14
HH=`pwd`
conda create -n mmdet python=3.7 -y
conda activate mmdet

pip install cython
pip install numpy
pip install torch
pip install torchvision
pip install pycocotools
pip install mmcv
pip install matplotlib
pip install terminaltables

cd lvis-api/
python setup.py develop

cd $HH
python setup.py develop

2. Data:

a. For dataset images:

# Make sure you are in dir BalancedGroupSoftmax

mkdir data
cd data
mkdir lvis
mkdir pretrained_models
  • If you already have COCO2017 dataset, it will be great. Link train2017 and val2017 folders under folder lvis.
  • If you do not have COCO2017 dataset, please download: COCO train set and COCO val set and unzip these files and mv them under folder lvis.

b. For dataset annotations:

To train HTC models, download COCO stuff annotations and change the name of folder stuffthingmaps_trainval2017 to stuffthingmaps.

c. For pretrained models:

Download the corresponding pre-trained models below.

  • To train baseline models, we need models trained on COCO to initialize. Please download the corresponding COCO models at mmdetection model zoo.
  • To train balanced group softmax models (shorted as gs models), we need corresponding baseline models trained on LVIS to initialize and fix all parameters except for the last FC layer.
  • Move these model files to ./data/pretrained_models/

d. For intermediate files (for BAGS and reweight models only):

You can either donwnload or generate them before training and testing. Put them under ./data/lvis/.

  • BAGS models: label2binlabel.pt, pred_slice_with0.pt, valsplit.pkl
  • Re-weight models: cls_weight.pt, cls_weight_bours.pt
  • RFS models: class_to_imageid_and_inscount.pt

After all these operations, the folder data should be like this:

    data
    ├── lvis
    │   ├── lvis_v0.5_train.json
    │   ├── lvis_v0.5_val.json
    │   ├── stuffthingmaps (Optional, for HTC models only)
    │   ├── label2binlabel.pt (Optional, for GAGS models only)
    │   ├── ...... (Other intermidiate files)
    │   │   ├── train2017
    │   │   │   ├── 000000004134.png
    │   │   │   ├── 000000031817.png
    │   │   │   ├── ......
    │   │   └── val2017
    │   │       ├── 000000424162.png
    │   │       ├── 000000445999.png
    │   │       ├── ......
    │   ├── train2017
    │   │   ├── 000000100582.jpg
    │   │   ├── 000000102411.jpg
    │   │   ├── ......
    │   └── val2017
    │       ├── 000000062808.jpg
    │       ├── 000000119038.jpg
    │       ├── ......
    └── pretrained_models
        ├── faster_rcnn_r50_fpn_2x_20181010-443129e1.pth
        ├── ......

Training

Note: Please make sure that you have prepared the pre-trained models and intermediate files and they have been put to the path specified in ${CONIFG_FILE}.

Use the following commands to train a model.

# Single GPU
python tools/train.py ${CONFIG_FILE}

# Multi GPU distributed training
./tools/dist_train.sh ${CONFIG_FILE} ${GPU_NUM} [optional arguments]

All config files are under ./configs/.

  • ./configs/bags: all models for Balanced Group Softmax.
  • ./configs/baselines: all baseline models.
  • ./configs/transferred: transferred models from long-tail image classification.
  • ./configs/ablations: models for ablation study.

For example, to train a BAGS model with Faster R-CNN R50-FPN:

# Single GPU
python tools/train.py configs/bags/gs_faster_rcnn_r50_fpn_1x_lvis_with0_bg8.py

# Multi GPU distributed training (for 8 gpus)
./tools/dist_train.sh configs/bags/gs_faster_rcnn_r50_fpn_1x_lvis_with0_bg8.py 8

Important: The default learning rate in config files is for 8 GPUs and 2 img/gpu (batch size = 8*2 = 16). According to the Linear Scaling Rule, you need to set the learning rate proportional to the batch size if you use different GPUs or images per GPU, e.g., lr=0.01 for 4 GPUs * 2 img/gpu and lr=0.08 for 16 GPUs * 4 img/gpu. (Cited from mmdetection.)

Testing

Note: Please make sure that you have prepared the intermediate files and they have been put to the path specified in ${CONIFG_FILE}.

Use the following commands to test a trained model.

# single gpu test
python tools/test_lvis.py \
 ${CONFIG_FILE} ${CHECKPOINT_FILE} [--out ${RESULT_FILE}] [--eval ${EVAL_METRICS}]

# multi-gpu testing
./tools/dist_test_lvis.sh \
 ${CONFIG_FILE} ${CHECKPOINT_FILE} ${GPU_NUM} [--out ${RESULT_FILE}] [--eval ${EVAL_METRICS}]
  • $RESULT_FILE: Filename of the output results in pickle format. If not specified, the results will not be saved to a file.
  • $EVAL_METRICS: Items to be evaluated on the results. bbox for bounding box evaluation only. bbox segm for bounding box and mask evaluation.

For example (assume that you have downloaded the corresponding model file to ./data/downloaded_models):

  • To evaluate the trained BAGS model with Faster R-CNN R50-FPN for object detection:
# single-gpu testing
python tools/test_lvis.py configs/bags/gs_faster_rcnn_r50_fpn_1x_lvis_with0_bg8.py \
 ./donwloaded_models/gs_faster_rcnn_r50_fpn_1x_lvis_with0_bg8.pth \
  --out gs_box_result.pkl --eval bbox

# multi-gpu testing (8 gpus)
./tools/dist_test_lvis.sh configs/bags/gs_faster_rcnn_r50_fpn_1x_lvis_with0_bg8.py \
./donwloaded_models/gs_faster_rcnn_r50_fpn_1x_lvis_with0_bg8.pth 8 \
--out gs_box_result.pkl --eval bbox
  • To evaluate the trained BAGS model with Mask R-CNN R50-FPN for instance segmentation:
# single-gpu testing
python tools/test_lvis.py configs/bags/gs_mask_rcnn_r50_fpn_1x_lvis.py \
 ./donwloaded_models/gs_mask_rcnn_r50_fpn_1x_lvis.pth \
  --out gs_mask_result.pkl --eval bbox segm

# multi-gpu testing (8 gpus)
./tools/dist_test_lvis.sh configs/bags/gs_mask_rcnn_r50_fpn_1x_lvis.py \
./donwloaded_models/gs_mask_rcnn_r50_fpn_1x_lvis.pth 8 \
--out gs_mask_result.pkl --eval bbox segm

The evaluation results will be shown in markdown table format:

| Type | IoU | Area | MaxDets | CatIds | Result |
| :---: | :---: | :---: | :---: | :---: | :---: |
|  (AP)  | 0.50:0.95 |    all | 300 |          all | 25.96% |
|  (AP)  | 0.50      |    all | 300 |          all | 43.58% |
|  (AP)  | 0.75      |    all | 300 |          all | 27.15% |
|  (AP)  | 0.50:0.95 |      s | 300 |          all | 20.26% |
|  (AP)  | 0.50:0.95 |      m | 300 |          all | 32.81% |
|  (AP)  | 0.50:0.95 |      l | 300 |          all | 40.10% |
|  (AP)  | 0.50:0.95 |    all | 300 |            r | 17.66% |
|  (AP)  | 0.50:0.95 |    all | 300 |            c | 25.75% |
|  (AP)  | 0.50:0.95 |    all | 300 |            f | 29.55% |
|  (AR)  | 0.50:0.95 |    all | 300 |          all | 34.76% |
|  (AR)  | 0.50:0.95 |      s | 300 |          all | 24.77% |
|  (AR)  | 0.50:0.95 |      m | 300 |          all | 41.50% |
|  (AR)  | 0.50:0.95 |      l | 300 |          all | 51.64% |

Results and models

The main results on LVIS val set:

LVIS val results

Models:

Please refer to our paper and supp for more details.

ID Models bbox mAP / mask mAP Train Test Config file Pretrained Model Train part Model
(1) Faster R50-FPN 20.98 file COCO R50 All Google drive
(2) x2 21.93 file Model (1) All Google drive
(3) Finetune tail 22.28 × file Model (1) All Google drive
(4) RFS 23.41 file COCO R50 All Google drive
(5) RFS-finetune 22.66 file Model (1) All Google drive
(6) Re-weight 23.48 file Model (1) All Google drive
(7) Re-weight-cls 24.66 file Model (1) Cls Google drive
(8) Focal loss 11.12 × file Model (1) All Google drive
(9) Focal loss-cls 19.29 × file Model (1) Cls Google drive
(10) NCM-fc 16.02 × × Model (1)
(11) NCM-conv 12.56 × × Model (1)
(12) $\tau$-norm 11.01 × × Model (1) Cls
(13) $\tau$-norm-select 21.61 × × Model (1) Cls
(14) Ours (Faster R50-FPN) 25.96 file Model (1) Cls Google drive
(15) Faster X101-64x4d 24.63 file COCO x101 All Google drive
(16) Ours (Faster X101-64x4d) 27.83 file Model (15) Cls Google drive
(17) Cascade X101-64x4d 27.16 file COCO cascade x101 All Google drive
(18) Ours (Cascade X101-64x4d) 32.77 file Model (17) Cls Google drive
(19) Mask R50-FPN 20.78/20.68 file COCO mask r50 All Google drive
(20) Ours (Mask R50-FPN) 25.76/26.25 file Model (19) Cls Google drive
(21) HTC X101-64x4d 31.28/29.28 file COCO HTC x101 All Google drive
(22) Ours (HTC X101-64x4d) 33.68/31.20 file Model (21) Cls Google drive
(23) HTC X101-64x4d-MS-DCN 34.61/31.94 file COCO HTC x101-ms-dcn All Google drive
(24) Ours (HTC X101-64x4d-MS-DCN) 37.71/34.39 file Model (23) Cls Google drive

PS: in column Pretrained Model, the file of Model (n) is the same as the Google drive file in column Model in row (n).

Citation

@inproceedings{li2020overcoming,
  title={Overcoming Classifier Imbalance for Long-Tail Object Detection With Balanced Group Softmax},
  author={Li, Yu and Wang, Tao and Kang, Bingyi and Tang, Sheng and Wang, Chunfeng and Li, Jintao and Feng, Jiashi},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={10991--11000},
  year={2020}
}

Credit

This code is largely based on mmdetection v1.0.rc0 and LVIS API.

Owner
FishYuLi
happy
FishYuLi
Commonsense Ability Tests

CATS Commonsense Ability Tests Dataset and script for paper Evaluating Commonsense in Pre-trained Language Models Use making_sense.py to run the exper

XUHUI ZHOU 28 Oct 19, 2022
FPSAutomaticAiming——基于YOLOV5的FPS类游戏自动瞄准AI

FPSAutomaticAiming——基于YOLOV5的FPS类游戏自动瞄准AI 声明: 本项目仅限于学习交流,不可用于非法用途,包括但不限于:用于游戏外挂等,使用本项目产生的任何后果与本人无关! 简介 本项目基于yolov5,实现了一款FPS类游戏(CF、CSGO等)的自瞄AI,本项目旨在使用现

Fabian 246 Dec 28, 2022
Code for "Learning to Segment Rigid Motions from Two Frames".

rigidmask Code for "Learning to Segment Rigid Motions from Two Frames". ** This is a partial release with inference and evaluation code.

Gengshan Yang 157 Nov 21, 2022
Official implementation of Protected Attribute Suppression System, ICCV 2021

Official implementation of Protected Attribute Suppression System, ICCV 2021

Prithviraj Dhar 6 Jan 01, 2023
Reducing Information Bottleneck for Weakly Supervised Semantic Segmentation (NeurIPS 2021)

Reducing Information Bottleneck for Weakly Supervised Semantic Segmentation (NeurIPS 2021) The implementation of Reducing Infromation Bottleneck for W

Jungbeom Lee 81 Dec 16, 2022
TransVTSpotter: End-to-end Video Text Spotter with Transformer

TransVTSpotter: End-to-end Video Text Spotter with Transformer Introduction A Multilingual, Open World Video Text Dataset and End-to-end Video Text Sp

weijiawu 66 Dec 26, 2022
Leaf: Multiple-Choice Question Generation

Leaf: Multiple-Choice Question Generation Easy to use and understand multiple-choice question generation algorithm using T5 Transformers. The applicat

Kristiyan Vachev 62 Dec 20, 2022
Pairwise learning neural link prediction for ogb link prediction

Pairwise Learning for Neural Link Prediction for OGB (PLNLP-OGB) This repository provides evaluation codes of PLNLP for OGB link property prediction t

Zhitao WANG 31 Oct 10, 2022
This project contains an implemented version of Face Detection using OpenCV and Mediapipe. This is a code snippet and can be used in projects.

Live-Face-Detection Project Description: In this project, we will be using the live video feed from the camera to detect Faces. It will also detect so

Hassan Shahzad 3 Oct 02, 2021
StyleTransfer - Open source style transfer project, based on VGG19

StyleTransfer - Open source style transfer project, based on VGG19

Patrick martins de lima 9 Dec 13, 2021
Physics-informed convolutional-recurrent neural networks for solving spatiotemporal PDEs

PhyCRNet Physics-informed convolutional-recurrent neural networks for solving spatiotemporal PDEs Paper link: [ArXiv] By: Pu Ren, Chengping Rao, Yang

Pu Ren 11 Aug 23, 2022
Learning Super-Features for Image Retrieval

Learning Super-Features for Image Retrieval This repository contains the code for running our FIRe model presented in our ICLR'22 paper: @inproceeding

NAVER 101 Dec 28, 2022
traiNNer is an open source image and video restoration (super-resolution, denoising, deblurring and others) and image to image translation toolbox based on PyTorch.

traiNNer traiNNer is an open source image and video restoration (super-resolution, denoising, deblurring and others) and image to image translation to

202 Jan 04, 2023
Improving Factual Consistency of Abstractive Text Summarization

Improving Factual Consistency of Abstractive Text Summarization We provide the code for the papers: "Entity-level Factual Consistency of Abstractive T

61 Nov 27, 2022
Generating Digital Painting Lighting Effects via RGB-space Geometry (SIGGRAPH2020/TOG2020)

Project PaintingLight PaintingLight is a project conducted by the Style2Paints team, aimed at finding a method to manipulate the illumination in digit

651 Dec 29, 2022
YOLO-v5 기반 단안 카메라의 영상을 활용해 차간 거리를 일정하게 유지하며 주행하는 Adaptive Cruise Control 기능 구현

자율 주행차의 영상 기반 차간거리 유지 개발 Table of Contents 프로젝트 소개 주요 기능 시스템 구조 디렉토리 구조 결과 실행 방법 참조 팀원 프로젝트 소개 YOLO-v5 기반으로 단안 카메라의 영상을 활용해 차간 거리를 일정하게 유지하며 주행하는 Adap

14 Jun 29, 2022
Barbershop: GAN-based Image Compositing using Segmentation Masks (SIGGRAPH Asia 2021)

Barbershop: GAN-based Image Compositing using Segmentation Masks Barbershop: GAN-based Image Compositing using Segmentation Masks Peihao Zhu, Rameen A

Peihao Zhu 928 Dec 30, 2022
Coursera - Quiz & Assignment of Coursera

Coursera Assignments This repository is aimed to help Coursera learners who have difficulties in their learning process. The quiz and programming home

浅梦 828 Jan 04, 2023
This project uses ViT to perform image classification tasks on DATA set CIFAR10.

Vision-Transformer-Multiprocess-DistributedDataParallel-Apex Introduction This project uses ViT to perform image classification tasks on DATA set CIFA

Kaicheng Yang 3 Jun 03, 2022
Deep Reinforcement Learning for Multiplayer Online Battle Arena

MOBA_RL Deep Reinforcement Learning for Multiplayer Online Battle Arena Prerequisite Python 3 gym-derk Tensorflow 2.4.1 Dotaservice of TimZaman Seed R

Dohyeong Kim 32 Dec 18, 2022