[NeurIPS 2021 Spotlight] Aligning Pretraining for Detection via Object-Level Contrastive Learning

Overview

SoCo

[NeurIPS 2021 Spotlight] Aligning Pretraining for Detection via Object-Level Contrastive Learning

By Fangyun Wei*, Yue Gao*, Zhirong Wu, Han Hu, Stephen Lin.

* Equal contribution.

Introduction

Image-level contrastive representation learning has proven to be highly effective as a generic model for transfer learning. Such generality for transfer learning, however, sacrifices specificity if we are interested in a certain downstream task. We argue that this could be sub-optimal and thus advocate a design principle which encourages alignment between the self-supervised pretext task and the downstream task. In this paper, we follow this principle with a pretraining method specifically designed for the task of object detection. We attain alignment in the following three aspects:

  1. object-level representations are introduced via selective search bounding boxes as object proposals;
  2. the pretraining network architecture incorporates the same dedicated modules used in the detection pipeline (e.g. FPN);
  3. the pretraining is equipped with object detection properties such as object-level translation invariance and scale invariance. Our method, called Selective Object COntrastive learning (SoCo), achieves state-of-the-art results for transfer performance on COCO detection using a Mask R-CNN framework.

Architecture

Main results

The pretrained models will be available soon.

SoCo pre-trained models

Model Arch Epochs Scripts Download
SoCo ResNet50-C4 100 SoCo_C4_100ep
SoCo ResNet50-C4 400 SoCo_C4_400ep
SoCo ResNet50-FPN 100 SoCo_FPN_100ep
SoCo ResNet50-FPN 400 SoCo_FPN_400ep
SoCo* ResNet50-FPN 400 SoCo_FPN_Star_400ep

Results on COCO with MaskRCNN R50-FPN

Methods Epoch APbb APbb50 APbb75 APmk APmk50 APmk75 Detectron2 trained
Scratch - 31.0 49.5 33.2 28.5 46.8 30.4 --
Supervised 90 38.9 59.6 42.7 35.4 56.5 38.1 --
SoCo 100 42.3 62.5 46.5 37.6 59.1 40.5
SoCo 400 43.0 63.3 47.1 38.2 60.2 41.0
SoCo* 400 43.2 63.5 47.4 38.4 60.2 41.4

Results on COCO with MaskRCNN R50-C4

Methods Epoch APbb APbb50 APbb75 APmk APmk50 APmk75 Detectron2 trained
Scratch - 26.4 44.0 27.8 29.3 46.9 30.8 --
Supervised 90 38.2 58.2 41.2 33.3 54.7 35.2 --
SoCo 100 40.4 60.4 43.7 34.9 56.8 37.0
SoCo 400 40.9 60.9 44.3 35.3 57.5 37.3

Get started

Requirements

The Dockerfile is included, please refer to it.

Prepare data with Selective Search

  1. Generate Selective Search proposals
    python selective_search/generate_imagenet_ss_proposals.py
  2. Filter out not valid proposals with filter strategy
    python selective_search/filter_ss_proposals_json.py
  3. Post preprocessing for no proposals images
    python selective_search/filter_ss_proposals_json_post_no_prop.py

Pretrain with SoCo

Use SoCo FPN 100 epoch as example.

bash ./tools/SoCo_FPN_100ep.sh

Finetune detector

  1. Copy the folder detectron2_configs to the root folder of Detectron2
  2. Train the detectors with Detectron2

Citation

@article{wei2021aligning,
  title={Aligning Pretraining for Detection via Object-Level Contrastive Learning},
  author={Wei, Fangyun and Gao, Yue and Wu, Zhirong and Hu, Han and Lin, Stephen},
  journal={arXiv preprint arXiv:2106.02637},
  year={2021}
}
Owner
Yue Gao
Researcher at Microsoft Research Asia
Yue Gao
PyTorch implementation of Tacotron speech synthesis model.

tacotron_pytorch PyTorch implementation of Tacotron speech synthesis model. Inspired from keithito/tacotron. Currently not as much good speech quality

Ryuichi Yamamoto 279 Dec 09, 2022
Pytorch implementation of PCT: Point Cloud Transformer

PCT: Point Cloud Transformer This is a Pytorch implementation of PCT: Point Cloud Transformer.

Yi_Zhang 265 Dec 22, 2022
Pytorch code for paper "Image Compressed Sensing Using Non-local Neural Network" TMM 2021.

NL-CSNet-Pytorch Pytorch code for paper "Image Compressed Sensing Using Non-local Neural Network" TMM 2021. Note: this repo only shows the strategy of

WenxueCui 7 Nov 07, 2022
Source Code for ICSE 2022 Paper - ``Can We Achieve Fairness Using Semi-Supervised Learning?''

Fair-SSL Source Code for ICSE 2022 Paper - Can We Achieve Fairness Using Semi-Supervised Learning? Ethical bias in machine learning models has become

1 Dec 18, 2021
Unofficial implement with paper SpeakerGAN: Speaker identification with conditional generative adversarial network

Introduction This repository is about paper SpeakerGAN , and is unofficially implemented by Mingming Huang ( 7 Jan 03, 2023

[CVPR 2022] Structured Sparse R-CNN for Direct Scene Graph Generation

Structured Sparse R-CNN for Direct Scene Graph Generation Our paper Structured Sparse R-CNN for Direct Scene Graph Generation has been accepted by CVP

Multimedia Computing Group, Nanjing University 44 Dec 23, 2022
Implementation of Squeezenet in pytorch, pretrained models on Cifar 10 data to come

Pytorch Squeeznet Pytorch implementation of Squeezenet model as described in https://arxiv.org/abs/1602.07360 on cifar-10 Data. The definition of Sque

gaurav pathak 86 Oct 28, 2022
Forecasting for knowable future events using Bayesian informative priors (forecasting with judgmental-adjustment).

What is judgyprophet? judgyprophet is a Bayesian forecasting algorithm based on Prophet, that enables forecasting while using information known by the

AstraZeneca 56 Oct 26, 2022
a morph transfer UGATIT for image translation.

Morph-UGATIT a morph transfer UGATIT for image translation. Introduction 中文技术文档 This is Pytorch implementation of UGATIT, paper "U-GAT-IT: Unsupervise

55 Nov 14, 2022
[CVPR 2021] Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

[CVPR 2021] Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Fudan Zhang Vision Group 897 Jan 05, 2023
Codes and models for the paper "Learning Unknown from Correlations: Graph Neural Network for Inter-novel-protein Interaction Prediction".

GNN_PPI Codes and models for the paper "Learning Unknown from Correlations: Graph Neural Network for Inter-novel-protein Interaction Prediction". Lear

Ursa Zrimsek 2 Dec 14, 2022
We propose a new method for effective shadow removal by regarding it as an exposure fusion problem.

Auto-exposure fusion for single-image shadow removal We propose a new method for effective shadow removal by regarding it as an exposure fusion proble

Qing Guo 146 Dec 31, 2022
AI Based Smart Exam Proctoring Package

AI Based Smart Exam Proctoring Package It takes image (base64) as input: Provide Output as: Detection of Mobile phone. Detection of More than 1 person

NARENDER KESWANI 3 Sep 09, 2022
Image morphing without reference points by applying warp maps and optimizing over them.

Differentiable Morphing Image morphing without reference points by applying warp maps and optimizing over them. Differentiable Morphing is machine lea

Alex K 380 Dec 19, 2022
Using machine learning to predict and analyze high and low reader engagement for New York Times articles posted to Facebook.

How The New York Times can increase Engagement on Facebook Using machine learning to understand characteristics of news content that garners "high" Fa

Jessica Miles 0 Sep 16, 2021
Learning Spatio-Temporal Transformer for Visual Tracking

STARK The official implementation of the paper Learning Spatio-Temporal Transformer for Visual Tracking Hiring research interns for visual transformer

Multimedia Research 484 Dec 29, 2022
Code and Resources for the Transformer Encoder Reasoning Network (TERN)

Transformer Encoder Reasoning Network Code for the cross-modal visual-linguistic retrieval method from "Transformer Reasoning Network for Image-Text M

Nicola Messina 53 Dec 30, 2022
Pytorch Code for "Medical Transformer: Gated Axial-Attention for Medical Image Segmentation"

Medical-Transformer Pytorch Code for the paper "Medical Transformer: Gated Axial-Attention for Medical Image Segmentation" About this repo: This repo

Jeya Maria Jose 615 Dec 25, 2022
Stochastic Scene-Aware Motion Prediction

Stochastic Scene-Aware Motion Prediction [Project Page] [Paper] Description This repository contains the training code for MotionNet and GoalNet of SA

Mohamed Hassan 31 Dec 09, 2022
Voice Conversion by CycleGAN (语音克隆/语音转换):CycleGAN-VC3

CycleGAN-VC3-PyTorch 中文说明 | English This code is a PyTorch implementation for paper: CycleGAN-VC3: Examining and Improving CycleGAN-VCs for Mel-spectr

Kun Ma 110 Dec 24, 2022