[CVPR2021 Oral] UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

Related tags

Deep Learningup-detr
Overview

UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

This is the official PyTorch implementation and models for UP-DETR paper:

@article{dai2020up-detr,
  author  = {Zhigang Dai and Bolun Cai and Yugeng Lin and Junying Chen},
  title   = {UP-DETR: Unsupervised Pre-training for Object Detection with Transformers},
  journal = {arXiv preprint arXiv:2011.09094},
  year    = {2020},
}

In UP-DETR, we introduce a novel pretext named random query patch detection to pre-train transformers for object detection. UP-DETR inherits from DETR with the same ResNet-50 backbone, same Transformer encoder, decoder and same codebase. With unsupervised pre-training CNN, the whole UP-DETR model doesn't require any human annotations. UP-DETR achieves 43.1 AP on COCO with 300 epochs fine-tuning. The AP of open-source version is a little higher than paper report.

UP-DETR

Model Zoo

We provide pre-training UP-DETR and fine-tuning UP-DETR models on COCO, and plan to include more in future. The evaluation metric is same to DETR.

Here is the UP-DETR model pre-trained on ImageNet without labels. The CNN weight is initialized from SwAV, which is fixed during the transformer pre-training:

name backbone epochs url size md5
UP-DETR R50 (SwAV) 60 model | logs 164Mb 49f01f8b

Comparision with DETR:

name backbone (pre-train) epochs box AP url size
DETR R50 (Supervised) 500 42.0 - 159Mb
DETR R50 (SwAV) 300 42.1 - 159Mb
UP-DETR R50 (SwAV) 300 43.1 model | logs 159Mb

COCO val5k evaluation results of UP-DETR can be found in this gist.

Usage - Object Detection

There are no extra compiled components in UP-DETR and package dependencies are same to DETR. We provide instructions how to install dependencies via conda:

git clone tbd
conda install -c pytorch pytorch torchvision
conda install cython scipy
pip install -U 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'

UP-DETR follows two steps: pre-training and fine-tuning. We present the model pre-trained on ImageNet and then fine-tuned on COCO.

Unsupervised Pre-training

Data Preparation

Download and extract ILSVRC2012 train dataset.

We expect the directory structure to be the following:

path/to/imagenet/
  n06785654/  # caterogey directory
    n06785654_16140.JPEG # images
  n04584207/  # caterogey directory
    n04584207_14322.JPEG # images

Images can be organized disorderly because our pre-training is unsupervised.

Pre-training

To pr-train UP-DETR on a single node with 8 gpus for 60 epochs, run:

python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py \
    --lr_drop 40 \
    --epochs 60 \
    --pre_norm \
    --num_patches 10 \
    --batch_size 32 \
    --feature_recon \
    --fre_cnn \
    --imagenet_path path/to/imagenet \
    --output_dir path/to/save_model

As the size of pre-training images is relative small, so we can set a large batch size.

It takes about 2 hours for a epoch, so 60 epochs pre-training takes about 5 days with 8 V100 gpus.

In our further ablation experiment, we found that object query shuffle is not helpful. So, we remove it in the open-source version.

Fine-tuning

Data Preparation

Download and extract COCO 2017 dataset train and val dataset.

The directory structure is expected as follows:

path/to/coco/
  annotations/  # annotation json files
  train2017/    # train images
  val2017/      # val images

Fine-tuning

To fine-tune UP-DETR with 8 gpus for 300 epochs, run:

python -m torch.distributed.launch --nproc_per_node=8 --use_env detr_main.py \
    --lr_drop 200 \
    --epochs 300 \
    --lr_backbone 5e-4 \
    --pre_norm \
    --coco_path path/to/coco \
    --pretrain path/to/save_model/checkpoint.pth

The fine-tuning cost is exactly same to DETR, which takes 28 minutes with 8 V100 gpus. So, 300 epochs training takes about 6 days.

The model can also extended to panoptic segmentation, checking more details on DETR.

Notebook

We provide a notebook in colab to get the visualization result in the paper:

  • Visualization Notebook: This notebook shows how to perform query patch detection with the pre-training model (without any annotations fine-tuning).

vis

License

UP-DETR is released under the Apache 2.0 license. Please see the LICENSE file for more information.

Owner
dddzg
MSc student at SCUT
dddzg
Spatial-Temporal Transformer for Dynamic Scene Graph Generation, ICCV2021

Spatial-Temporal Transformer for Dynamic Scene Graph Generation Pytorch Implementation of our paper Spatial-Temporal Transformer for Dynamic Scene Gra

Yuren Cong 119 Jan 01, 2023
SMCA replication There are no extra compiled components in SMCA DETR and package dependencies are minimal

Usage There are no extra compiled components in SMCA DETR and package dependencies are minimal, so the code is very simple to use. We provide instruct

22 May 06, 2022
Joint parameterization and fitting of stroke clusters

StrokeStrip: Joint Parameterization and Fitting of Stroke Clusters Dave Pagurek van Mossel1, Chenxi Liu1, Nicholas Vining1,2, Mikhail Bessmeltsev3, Al

Dave Pagurek 44 Dec 01, 2022
🔥RandLA-Net in Tensorflow (CVPR 2020, Oral & IEEE TPAMI 2021)

RandLA-Net: Efficient Semantic Segmentation of Large-Scale Point Clouds (CVPR 2020) This is the official implementation of RandLA-Net (CVPR2020, Oral

Qingyong 1k Dec 30, 2022
mPose3D, a mmWave-based 3D human pose estimation model.

mPose3D, a mmWave-based 3D human pose estimation model.

KylinChen 35 Nov 08, 2022
A Real-Time-Strategy game for Deep Learning research

Description DeepRTS is a high-performance Real-TIme strategy game for Reinforcement Learning research. It is written in C++ for performance, but provi

Centre for Artificial Intelligence Research (CAIR) 156 Dec 19, 2022
Small utility to demangle Nim symbols in callgrind files

nim_callgrind A small utility to demangle Nim symbols from callgrind files. Usage Run your (Nim) program with something like this: valgrind --tool=cal

kraptor 3 Feb 15, 2022
Lightweight tool to perform MITM attack on local network

ARPSpy - A lightweight tool to perform MITM attack Using many library to perform ARP Spoof and auto-sniffing HTTP packet containing credential. (Never

MinhItachi 8 Aug 28, 2022
A strongly-typed genetic programming framework for Python

monkeys "If an army of monkeys were strumming on typewriters they might write all the books in the British Museum." monkeys is a framework designed to

H. Chase Stevens 115 Nov 27, 2022
CAST: Character labeling in Animation using Self-supervision by Tracking

CAST: Character labeling in Animation using Self-supervision by Tracking (Published as a conference paper at EuroGraphics 2022) Note: The CAST paper c

15 Nov 18, 2022
MLP-Numpy - A simple modular implementation of Multi Layer Perceptron in pure Numpy.

MLP-Numpy A simple modular implementation of Multi Layer Perceptron in pure Numpy. I used the Iris dataset from scikit-learn library for the experimen

Soroush Omranpour 1 Jan 01, 2022
Implementation of Ag-Grid component for Streamlit

streamlit-aggrid AgGrid is an awsome grid for web frontend. More information in https://www.ag-grid.com/. Consider purchasing a license from Ag-Grid i

Pablo Fonseca 556 Dec 31, 2022
Code for the ICML 2021 paper "Bridging Multi-Task Learning and Meta-Learning: Towards Efficient Training and Effective Adaptation", Haoxiang Wang, Han Zhao, Bo Li.

Bridging Multi-Task Learning and Meta-Learning Code for the ICML 2021 paper "Bridging Multi-Task Learning and Meta-Learning: Towards Efficient Trainin

AI Secure 57 Dec 15, 2022
PyTorch implementation of paper "Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes", CVPR 2021

Neural Scene Flow Fields PyTorch implementation of paper "Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes", CVPR 20

Zhengqi Li 585 Jan 04, 2023
DeepLab2: A TensorFlow Library for Deep Labeling

DeepLab2 is a TensorFlow library for deep labeling, aiming to provide a unified and state-of-the-art TensorFlow codebase for dense pixel labeling tasks.

Google Research 845 Jan 04, 2023
labelpix is a graphical image labeling interface for drawing bounding boxes

Welcome to labelpix 👋 labelpix is a graphical image labeling interface for drawing bounding boxes. 🏠 Homepage Install pip install -r requirements.tx

schissmantics 26 May 24, 2022
PyTorch implementation for COMPLETER: Incomplete Multi-view Clustering via Contrastive Prediction (CVPR 2021)

Completer: Incomplete Multi-view Clustering via Contrastive Prediction This repo contains the code and data of the following paper accepted by CVPR 20

XLearning Group 72 Dec 07, 2022
BossNAS: Exploring Hybrid CNN-transformers with Block-wisely Self-supervised Neural Architecture Search

BossNAS This repository contains PyTorch evaluation code, retraining code and pretrained models of our paper: BossNAS: Exploring Hybrid CNN-transforme

Changlin Li 127 Dec 26, 2022
This is an official PyTorch implementation of Task-Adaptive Neural Network Search with Meta-Contrastive Learning (NeurIPS 2021, Spotlight).

NeurIPS 2021 (Spotlight): Task-Adaptive Neural Network Search with Meta-Contrastive Learning This is an official PyTorch implementation of Task-Adapti

Wonyong Jeong 15 Nov 21, 2022
Official repository for the paper "GN-Transformer: Fusing AST and Source Code information in Graph Networks".

GN-Transformer AST This is the official repository for the paper "GN-Transformer: Fusing AST and Source Code information in Graph Networks". Data Prep

Cheng Jun-Yan 10 Nov 26, 2022