Scale-aware Automatic Augmentation for Object Detection (CVPR 2021)

Overview

SA-AutoAug

Scale-aware Automatic Augmentation for Object Detection

Yukang Chen, Yanwei Li, Tao Kong, Lu Qi, Ruihang Chu, Lei Li, Jiaya Jia

[Paper] [BibTeX]


This project provides the implementation for the CVPR 2021 paper "Scale-aware Automatic Augmentation for Object Detection". Scale-aware AutoAug provides a new search space and search metric to find effective data agumentation policies for object detection. It is implemented on maskrcnn-benchmark and FCOS. Both search and training codes have been released. To facilitate more use, we re-implement the training code based on Detectron2.

Installation

For maskrcnn-benchmark code, please follow INSTALL.md for instruction.

For FCOS code, please follow INSTALL.md for instruction.

For Detectron2 code, please follow INSTALL.md for instruction.

Search

(You can skip this step and directly train on our searched policies.)

To search with 8 GPUs, run:

cd /path/to/SA-AutoAug/maskrcnn-benchmark
export NGPUS=8
python3 -m torch.distributed.launch --nproc_per_node=$NGPUS tools/search.py --config-file configs/SA_AutoAug/retinanet_R-50-FPN_search.yaml OURPUT_DIR /path/to/searchlog_dir

Since we finetune on an existing baseline model during search, a baseline model is needed. You can download this model for search, or you can use other Retinanet baseline model trained by yourself.

Training

To train the searched policies on maskrcnn-benchmark (FCOS)

cd /path/to/SA-AutoAug/maskrcnn-benchmark
export NGPUS=8
python3 -m torch.distributed.launch --nproc_per_node=$NGPUS tools/train_net.py --config-file configs/SA_AutoAug/CONFIG_FILE  OUTPUT_DIR /path/to/traininglog_dir

For example, to train the retinanet ResNet-50 model with our searched data augmentation policies in 6x schedule:

cd /path/to/SA-AutoAug/maskrcnn-benchmark
export NGPUS=8
python3 -m torch.distributed.launch --nproc_per_node=$NGPUS tools/train_net.py --config-file configs/SA_AutoAug/retinanet_R-50-FPN_6x.yaml  OUTPUT_DIR models/retinanet_R-50-FPN_6x_SAAutoAug

To train the searched policies on detectron2

cd /path/to/SA-AutoAug/detectron2
python3 ./tools/train_net.py --num-gpus 8 --config-file ./configs/COCO-Detection/SA_AutoAug/CONFIG_FILE OUTPUT_DIR /path/to/traininglog_dir

For example, to train the retinanet ResNet-50 model with our searched data augmentation policies in 6x schedule:

cd /path/to/SA-AutoAug/detectron2
python3 ./tools/train_net.py --num-gpus 8 --config-file ./configs/COCO-Detection/SA_AutoAug/retinanet_R_50_FPN_6x.yaml OUTPUT_DIR output_retinanet_R_50_FPN_6x_SAAutoAug

Results

We provide the results on COCO val2017 set with pretrained models.

Based on maskrcnn-benchmark

Method Backbone APbbox Download
Faster R-CNN ResNet-50 41.8 Model
Faster R-CNN ResNet-101 44.2 Model
RetinaNet ResNet-50 41.4 Model
RetinaNet ResNet-101 42.8 Model
Mask R-CNN ResNet-50 42.8 Model
Mask R-CNN ResNet-101 45.3 Model

Based on FCOS

Method Backbone APbbox Download
FCOS ResNet-50 42.6 Model
FCOS ResNet-101 44.0 Model
ATSS ResNext-101-32x8d-dcnv2 48.5 Model
ATSS ResNext-101-32x8d-dcnv2 (1200 size) 49.6 Model

Based on Detectron2

Method Backbone APbbox Download
Faster R-CNN ResNet-50 41.9 Model - Metrics
Faster R-CNN ResNet-101 44.2 Model - Metrics
RetinaNet ResNet-50 40.8 Model - Metrics
RetinaNet ResNet-101 43.1 Model - Metrics
Mask R-CNN ResNet-50 42.9 Model - Metrics
Mask R-CNN ResNet-101 45.6 Model - Metrics

Citing SA-AutoAug

Consider cite SA-Autoaug in your publications if it helps your research.

@inproceedings{saautoaug,
  title={Scale-aware Automatic Augmentation for Object Detection},
  author={Yukang Chen, Yanwei Li, Tao Kong, Lu Qi, Ruihang Chu, Lei Li, Jiaya Jia},
  booktitle={IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2021}
}

Acknowledgments

This training code of this project is built on maskrcnn-benchmark, Detectron2, FCOS, and ATSS. The search code of this project is modified from DetNAS. Some augmentation code and settings follow AutoAug-Det. We thanks a lot for the authors of these projects.

Note that:

(1) We also provides script files for search and training in maskrcnn-benchmark, FCOS, and, detectron2.

(2) Any issues or pull requests on this project are welcome. In addition, if you meet problems when applying the augmentations to other datasets or codebase, feel free to contact Yukang Chen ([email protected]).

Owner
DV Lab
Deep Vision Lab
DV Lab
MAGMA - a GPT-style multimodal model that can understand any combination of images and language

MAGMA -- Multimodal Augmentation of Generative Models through Adapter-based Finetuning Authors repo (alphabetical) Constantin (CoEich), Mayukh (Mayukh

Aleph Alpha GmbH 331 Jan 03, 2023
A comprehensive list of published machine learning applications to cosmology

ml-in-cosmology This github attempts to maintain a comprehensive list of published machine learning applications to cosmology, organized by subject ma

George Stein 290 Dec 29, 2022
MinkLoc3D-SI: 3D LiDAR place recognition with sparse convolutions,spherical coordinates, and intensity

MinkLoc3D-SI: 3D LiDAR place recognition with sparse convolutions,spherical coordinates, and intensity Introduction The 3D LiDAR place recognition aim

16 Dec 08, 2022
Code and models for "Pano3D: A Holistic Benchmark and a Solid Baseline for 360 Depth Estimation", OmniCV Workshop @ CVPR21.

Pano3D A Holistic Benchmark and a Solid Baseline for 360o Depth Estimation Pano3D is a new benchmark for depth estimation from spherical panoramas. We

Visual Computing Lab, Information Technologies Institute, Centre for Reseach and Technology Hellas 50 Dec 29, 2022
UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation

UnivNet UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation. Training python train.py --c

Rishikesh (ऋषिकेश) 55 Dec 26, 2022
The 2nd place solution of 2021 google landmark retrieval on kaggle.

Leaderboard, taxonomy, and curated list of few-shot object detection papers.

229 Dec 13, 2022
[ICCV'21] Learning Conditional Knowledge Distillation for Degraded-Reference Image Quality Assessment

CKDN The official implementation of the ICCV2021 paper "Learning Conditional Knowledge Distillation for Degraded-Reference Image Quality Assessment" O

Multimedia Research 50 Dec 13, 2022
Tool cek opsi checkpoint facebook!

tool apa ini? cek_opsi_facebook adalah sebuah tool yang mengecek opsi checkpoint akun facebook yang terkena checkpoint! tujuan dibuatnya tool ini? too

Muhammad Latif Harkat 2 Jul 17, 2022
My implementation of DeepMind's Perceiver

DeepMind Perceiver (in PyTorch) Disclaimer: This is not official and I'm not affiliated with DeepMind. My implementation of the Perceiver: General Per

Louis Arge 55 Dec 12, 2022
LaBERT - A length-controllable and non-autoregressive image captioning model.

Length-Controllable Image Captioning (ECCV2020) This repo provides the implemetation of the paper Length-Controllable Image Captioning. Install conda

bearcatt 53 Nov 13, 2022
Using deep learning model to detect breast cancer.

Breast-Cancer-Detection Breast cancer is the most frequent cancer among women, with around one in every 19 women at risk. The number of cases of breas

1 Feb 13, 2022
hySLAM is a hybrid SLAM/SfM system designed for mapping

HySLAM Overview hySLAM is a hybrid SLAM/SfM system designed for mapping. The system is based on ORB-SLAM2 with some modifications and refactoring. Raú

Brian Hopkinson 15 Oct 10, 2022
Official respository for "Modeling Defocus-Disparity in Dual-Pixel Sensors", ICCP 2020

Official respository for "Modeling Defocus-Disparity in Dual-Pixel Sensors", ICCP 2020 BibTeX @INPROCEEDINGS{punnappurath2020modeling, author={Abhi

Abhijith Punnappurath 22 Oct 01, 2022
Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow

Mask R-CNN for Object Detection and Segmentation This is an implementation of Mask R-CNN on Python 3, Keras, and TensorFlow. The model generates bound

Matterport, Inc 22.5k Jan 04, 2023
Contrastive Learning for Metagenomic Binning

CLMB A simple framework for CLMB - a novel deep Contrastive Learningfor Metagenomic Binning Created by Pengfei Zhang, senior of Department of Computer

1 Sep 14, 2022
Code for the Active Speakers in Context Paper (CVPR2020)

Active Speakers in Context This repo contains the official code and models for the "Active Speakers in Context" CVPR 2020 paper. Before Training The c

43 Oct 14, 2022
A pure PyTorch implementation of the loss described in "Online Segment to Segment Neural Transduction"

ssnt-loss ℹ️ This is a WIP project. the implementation is still being tested. A pure PyTorch implementation of the loss described in "Online Segment t

張致強 1 Feb 09, 2022
TriMap: Large-scale Dimensionality Reduction Using Triplets

TriMap TriMap is a dimensionality reduction method that uses triplet constraints to form a low-dimensional embedding of a set of points. The triplet c

Ehsan Amid 235 Dec 24, 2022
[CVPR 2021] "Multimodal Motion Prediction with Stacked Transformers": official code implementation and project page.

mmTransformer Introduction This repo is official implementation for mmTransformer in pytorch. Currently, the core code of mmTransformer is implemented

DeciForce: Crossroads of Machine Perception and Autonomy 232 Dec 31, 2022
IsoGCN code for ICLR2021

IsoGCN The official implementation of IsoGCN, presented in the ICLR2021 paper Isometric Transformation Invariant and Equivariant Graph Convolutional N

horiem 39 Nov 25, 2022