DETReg: Unsupervised Pretraining with Region Priors for Object Detection

Overview

DETReg: Unsupervised Pretraining with Region Priors for Object Detection

Amir Bar, Xin Wang, Vadim Kantorov, Colorado J Reed, Roei Herzig, Gal Chechik, Anna Rohrbach, Trevor Darrell, Amir Globerson

DETReg

This repository is the implementation of DETReg, see Project Page.

Release

  • COCO training code and eval - DONE
  • Pretrained models - DONE
  • Pascal VOC training code and eval- TODO

Introduction

DETReg is an unsupervised pretraining approach for object DEtection with TRansformers using Region priors. Motivated by the two tasks underlying object detection: localization and categorization, we combine two complementary signals for self-supervision. For an object localization signal, we use pseudo ground truth object bounding boxes from an off-the-shelf unsupervised region proposal method, Selective Search, which does not require training data and can detect objects at a high recall rate and very low precision. The categorization signal comes from an object embedding loss that encourages invariant object representations, from which the object category can be inferred. We show how to combine these two signals to train the Deformable DETR detection architecture from large amounts of unlabeled data. DETReg improves the performance over competitive baselines and previous self-supervised methods on standard benchmarks like MS COCO and PASCAL VOC. DETReg also outperforms previous supervised and unsupervised baseline approaches on low-data regime when trained with only 1%, 2%, 5%, and 10% of the labeled data on MS COCO.

Installation

Requirements

  • Linux, CUDA>=9.2, GCC>=5.4

  • Python>=3.7

    We recommend you to use Anaconda to create a conda environment:

    conda create -n detreg python=3.7 pip

    Then, activate the environment:

    conda activate detreg

    Installation: (change cudatoolkit to your cuda version. For detailed pytorch installation instructions click here)

    conda install pytorch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0 cudatoolkit=10.2 -c pytorch
  • Other requirements

    pip install -r requirements.txt

Compiling CUDA operators

cd ./models/ops
sh ./make.sh
# unit test (should see all checking is True)
python test.py

Usage

Dataset preparation

Please download COCO 2017 dataset and ImageNet and organize them as following:

code_root/
└── data/
    ├── ilsvrc/
          ├── train/
          └── val/
    └── MSCoco/
        ├── train2017/
        ├── val2017/
        └── annotations/
        	├── instances_train2017.json
        	└── instances_val2017.json

Note that in this work we used the ImageNet100 dataset, which is x10 smaller than ImageNet. To create ImageNet100 run the following command:

mkdir -p data/ilsvrc100/train
mkdir -p data/ilsvrc100/val
while read line; do ln -s <code_root>/data/ilsvrc/train/$line <code_root>/data/ilsvrc100/train/$line; done < <code_root>/datasets/category.txt
while read line; do ln -s <code_root>/data/ilsvrc/val/$line <code_root>/data/ilsvrc100/val/$line; done < <code_root>/datasets/category.txt

This should results with the following structure:

code_root/
└── data/
    ├── ilsvrc/
          ├── train/
          └── val/
    ├── ilsvrc100/
          ├── train/
          └── val/
    └── MSCoco/
        ├── train2017/
        ├── val2017/
        └── annotations/
        	├── instances_train2017.json
        	└── instances_val2017.json

Create ImageNet Selective Search boxes:

Download the precomputed ImageNet boxes and extract in the cache folder:

mkdir -p /cache/ilsvrc && cd /cache/ilsvrc 
wget https://github.com/amirbar/DETReg/releases/download/1.0.0/ss_box_cache.tar.gz
tar -xf ss_box_cache.tar.gz

Alternatively, you can compute Selective Search boxes yourself:

To create selective search boxes for ImageNet100 on a single machine, run the following command (set num_processes):

python -m datasets.cache_ss --dataset imagenet100 --part 0 --num_m 1 --num_p <num_processes_to_use> 

To speed up the creation of boxes, change the arguments accordingly and run the following command on each different machine:

python -m datasets.cache_ss --dataset imagenet100 --part <machine_number> --num_m <num_machines> --num_p <num_processes_to_use> 

The cached boxes are saved in the following structure:

code_root/
└── cache/
    └── ilsvrc/

Training

The command for pretraining DETReg on 8 GPUs on ImageNet100 is as following:

GPUS_PER_NODE=8 ./tools/run_dist_launch.sh 8 ./configs/DETReg_top30_in100.sh --batch_size 24 --num_workers 8

Training takes around 1.5 days with 8 NVIDIA V100 GPUs, you can download a pretrained model (see below) if you want to skip this step.

After pretraining, a checkpoint is saved in exps/DETReg_top30_in100/checkpoint.pth. To fine tune it over different coco settings use the following commands: Fine tuning on full COCO (should take 2 days with 8 NVIDIA V100 GPUs):

GPUS_PER_NODE=8 ./tools/run_dist_launch.sh 8 ./configs/DETReg_fine_tune_full_coco.sh

For smaller subsets which trains faster, you can use smaller number of gpus (e.g 4 with batch size 2)/ Fine tuning on 1%

GPUS_PER_NODE=4 ./tools/run_dist_launch.sh 4 ./configs/DETReg_fine_tune_1pct_coco.sh --batch_size 2

Fine tuning on 2%

GPUS_PER_NODE=4 ./tools/run_dist_launch.sh 4 ./configs/DETReg_fine_tune_2pct_coco.sh --batch_size 2

Fine tuning on 5%

GPUS_PER_NODE=4 ./tools/run_dist_launch.sh 4 ./configs/DETReg_fine_tune_5pct_coco.sh --batch_size 2

Fine tuning on 10%

GPUS_PER_NODE=4 ./tools/run_dist_launch.sh 4 ./configs/DETReg_fine_tune_10pct_coco.sh --batch_size 2

Evaluation

To evaluate a finetuned model, use the following command from the project basedir:

./configs/<config file>.sh --resume exps/<config file>/checkpoint.pth --eval

Pretrained Models

Cite

If you found this code helpful, feel free to cite our work:

@misc{bar2021detreg,
      title={DETReg: Unsupervised Pretraining with Region Priors for Object Detection},
      author={Amir Bar and Xin Wang and Vadim Kantorov and Colorado J Reed and Roei Herzig and Gal Chechik and Anna Rohrbach and Trevor Darrell and Amir Globerson},
      year={2021},
      eprint={2106.04550},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Related Works

If you found DETReg useful, consider checking out these related works as well: ReSim, SwAV, DETR, UP-DETR, and Deformable DETR.

Acknowlegments

DETReg builds on previous works code base such as Deformable DETR and UP-DETR. If you found DETReg useful please consider citing these works as well.

Comments
  • Question about reproducing the Semi-supervised Learning experiment

    Question about reproducing the Semi-supervised Learning experiment

    When i using this checkpoint as pretrain

    image

    and using these script to reproducing the Semi-supervised Learning experiment

    image

    the result turns out to be huge difference :

    image

    Please help me, did i missing anything in reproducing ?

    By the way, i can reproduce the full COCO result @45.5AP. So the conda env is probably right.

    opened by 4-0-4-notfound 5
  • Question about selective search cached boxes in training and validation

    Question about selective search cached boxes in training and validation

    Why are there some '.npy' files for the Imagnet validation set in 'ss_box_cache.tar.gz', for example ILSVRC2012_val_00000006.npy. Are training sets and validation sets used for pretraining?

    opened by CQIITLAB 3
  • RuntimeError: The size of tensor a (512) must match the size of tensor b (128) at non-singleton dimension 3

    RuntimeError: The size of tensor a (512) must match the size of tensor b (128) at non-singleton dimension 3

    Hi, I'm trying to run the pretraining but I receive a mismatch size here https://github.com/amirbar/DETReg/blob/main/models/deformable_detr.py#L328, src_features has a shape of torch.Size([228, 512]) and target_features a shape of torch.Size([228, 3, 128, 128]). Is this ok?

    Start training /home/jossalgon/my-envs/detreg/lib/python3.7/site-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at /opt/conda/conda-bld/pytorch_1623448265233/work/c10/core/TensorImpl.h:1156.) return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode) /home/jossalgon/my-envs/detreg/lib/python3.7/site-packages/torch/_tensor.py:575: UserWarning: floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at /opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/BinaryOps.cpp:467.) return torch.floor_divide(self, other) /home/jossalgon/notebooks/unsupervised/DETReg/models/deformable_detr.py:329: UserWarning: Using a target size (torch.Size([228, 3, 128, 128])) that is different to the input size (torch.Size([228, 512])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size. return {'object_embedding_loss': torch.nn.functional.l1_loss(src_features, target_features, reduction='mean')} Traceback (most recent call last): File "main.py", line 403, in main(args) File "main.py", line 314, in main model, swav_model, criterion, data_loader_train, optimizer, device, epoch, args.clip_max_norm) File "/home/jossalgon/notebooks/unsupervised/DETReg/engine.py", line 50, in train_one_epoch loss_dict = criterion(outputs, targets) File "/home/jossalgon/my-envs/detreg/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/home/jossalgon/notebooks/unsupervised/DETReg/models/deformable_detr.py", line 406, in forward losses.update(self.get_loss(loss, outputs, targets, indices, num_boxes, **kwargs)) File "/home/jossalgon/notebooks/unsupervised/DETReg/models/deformable_detr.py", line 381, in get_loss return loss_map[loss](outputs, targets, indices, num_boxes, **kwargs) File "/home/jossalgon/notebooks/unsupervised/DETReg/models/deformable_detr.py", line 329, in loss_object_embedding_loss return {'object_embedding_loss': torch.nn.functional.l1_loss(src_features, target_features, reduction='mean')} File "/home/jossalgon/my-envs/detreg/lib/python3.7/site-packages/torch/nn/functional.py", line 3058, in l1_loss expanded_input, expanded_target = torch.broadcast_tensors(input, target) File "/home/jossalgon/my-envs/detreg/lib/python3.7/site-packages/torch/functional.py", line 73, in broadcast_tensors return _VF.broadcast_tensors(tensors) # type: ignore[attr-defined] RuntimeError: The size of tensor a (512) must match the size of tensor b (128) at non-singleton dimension 3 Traceback (most recent call last): File "./tools/launch.py", line 192, in main() File "./tools/launch.py", line 188, in main cmd=process.args)

    Using: cudatoolkit 11.1.74 h6bb024c_0 nvidia/linux-64 pytorch 1.9.0 py3.7_cuda11.1_cudnn8.0.5_0 pytorch/linux-64 torchaudio 0.9.0 py37 pytorch/linux-64 torchvision 0.10.0 py37_cu111 pytorch/linux-64

    Thanks and great work!

    opened by jossalgon 3
  • error occured when following the Compiling CUDA operators step.

    error occured when following the Compiling CUDA operators step.

    Hello, when I try to run sh ./make.sh by following the Compiling CUDA operators it always show the error, Traceback (most recent call last): File "setup.py", line 69, in ext_modules=get_extensions(), File "setup.py", line 47, in get_extensions raise NotImplementedError('Cuda is not availabel') NotImplementedError: Cuda is not availabel

    Any idea why this happens? Thanks!

    opened by ruizhaoz 2
  • Can`t install MultiScaleDeformableAttention

    Can`t install MultiScaleDeformableAttention

    Hi, I have seen others have resolved this, but no clues are left behind. I was not able to install the MultiScaleDeformableAttention package from pip or conda, and there is nothing in Readme.

    Please assist. Thank you!

    opened by jshtok 2
  • Results between IN100 and IN1k setting

    Results between IN100 and IN1k setting

    In the arXiv v1 version, the fine-tune result on COCO is 45.5 with IN100 pretrain. But in the arXiv v2 version, it seems the fine-tune result on COCO is still 45.5, but the pretrain dataset is IN1k. So, in my understanding, with more pretrain data, but the fine-tune result is not improved?

    opened by 4-0-4-notfound 2
  • Fine Tuning the Model on a fraction of VOC

    Fine Tuning the Model on a fraction of VOC

    Hi @amirbar,

    Thank You for the great work. It looks like the parameter --filter_pct has never been used in the code. It means the code effectively running fine-tuning on whole VOC/COCO datasets. Please correct me if I am wrong.

    Thanks

    opened by mmaaz60 2
  • It seems in the pretrain stage the network output 90 categories instead of 2

    It seems in the pretrain stage the network output 90 categories instead of 2

    Hello, It seems the network output 90 categories instead of 2, in the pretrain stage. In the paper, it supposes to output 2 categories (either back gourd or foreground), which is not true in the code. I'm so confused, Am i missing something?

    https://github.com/amirbar/DETReg/blob/490e40403860d51c19333b5db53bcd0ee23647ad/configs/DETReg_top30_in100.sh#L8

    https://github.com/amirbar/DETReg/blob/490e40403860d51c19333b5db53bcd0ee23647ad/main.py#L120

    https://github.com/amirbar/DETReg/blob/490e40403860d51c19333b5db53bcd0ee23647ad/models/deformable_detr.py#L497-L503

    opened by 4-0-4-notfound 2
  • Pretrained model on ImageNet-1K

    Pretrained model on ImageNet-1K

    Hi, Thank you for sharing your great work. I am conducting a study on the features of DETReg, and wanted to explore the performance with the pretrained model trained on the full ImageNet. I was wondering if you could share an ImageNet-1K pretrained model?

    Thank you

    opened by hanoonaR 2
  • Bug: Target[

    Bug: Target["area"] incorrect when using selective_search (and possibly others)

    The selective_search function changes the boxes to xyxy coordinates. boxes[..., 2] = boxes[..., 0] + boxes[..., 2] boxes[..., 3] = boxes[..., 1] + boxes[..., 3]

    In [get_item] (https://github.com/amirbar/DETReg/blob/36ae5844183499f6bc1a6d8922427b0f473e06d9/datasets/selfdet.py#L67)
    we have boxes = selective_search(img, h, w, res_size=128) ... target['boxes'] = torch.tensor(boxes) ... target['area'] = target['boxes'][..., 2] * target['boxes'][..., 3]

    But boxes at this point on in xyxy not cxcywh, So the "area" is incorrect. I do not know if this effects anything down the line, it may not.

    opened by AZaitzeff 1
  • What is the difference between 'head' and 'intermediate' in 'obj_embedding_head'?

    What is the difference between 'head' and 'intermediate' in 'obj_embedding_head'?

    https://github.com/amirbar/DETReg/blob/0a258d879d8981b27ab032b83defc6dfcbf07d35/models/backbone.py#L156-L177

    It seems 'head' is the new training setting that uses dim=128 to align features. But dim=512 ('intermediate') is used in the paper. Does it mean that we should change to dim=128 ('head') to achieve better performance of DETReg?

    Thanks.

    opened by Cohesion97 1
  • CVE-2007-4559 Patch

    CVE-2007-4559 Patch

    Patching CVE-2007-4559

    Hi, we are security researchers from the Advanced Research Center at Trellix. We have began a campaign to patch a widespread bug named CVE-2007-4559. CVE-2007-4559 is a 15 year old bug in the Python tarfile package. By using extract() or extractall() on a tarfile object without sanitizing input, a maliciously crafted .tar file could perform a directory path traversal attack. We found at least one unsantized extractall() in your codebase and are providing a patch for you via pull request. The patch essentially checks to see if all tarfile members will be extracted safely and throws an exception otherwise. We encourage you to use this patch or your own solution to secure against CVE-2007-4559. Further technical information about the vulnerability can be found in this blog.

    If you have further questions you may contact us through this projects lead researcher Kasimir Schulz.

    opened by TrellixVulnTeam 0
  • Fine-tuning based on the DETR architecture code, but the verification indicators are all 0

    Fine-tuning based on the DETR architecture code, but the verification indicators are all 0

    Thanks for your work. I noticed that you open-sourced the detreg of the DETR architecture, and then I tried to use the pre-trained model on the imagenet dataset you provided to fine-tune training for my custom dataset. But I found that all the indicators are still 0 after more than fifty batches of pre-training. I have followed the tips in the related issues of DETR (https://github.com/facebookresearch/detr/issues?page=1&q=zero) , the num_calss was modified. Many people mentioned that DETR requires a large amount of training data, or fine-tuning. But I am currently using fine-tuning, and the number of fine-tuning datasets is about one thousand. But the effect is still very poor, may I ask why. It's normal for me to use deformable-detr architecture. image

    opened by Flyooofly 0
  •   checkpoint_args = torch.load(args.resume, map_location='cpu')['args'] KeyError: 'args'???

    checkpoint_args = torch.load(args.resume, map_location='cpu')['args'] KeyError: 'args'???

    1: checkpoint_args = torch.load(args.resume,map_location='cpu')['args'] KeyError: 'args' I have this kind of error report in the evaluation stage, I don't know how to deal with it, I hope the owner can help me to solve it, thank you very much. 2: Does the test effect of a single GPU appear to be reduced?

    opened by 873552584 0
  • About few-shot object detection

    About few-shot object detection

    I found the result of few-shot object detection is better than others, could you release the few-shot object detection code? or hyperparameters? or how to import novel and base datasets? thanks :)

    opened by YAOSL98 0
  • urllib.error.HTTPError: HTTP Error 403: Forbidden

    urllib.error.HTTPError: HTTP Error 403: Forbidden

    Downloading: "https://dl.fbaipublicfiles.com/deepcluster/swav_800ep_pretrain.pth.tar" to C:\Users\pc/.cache\torch\hub\checkpoints\swav_800ep_pretrain.pth.tar urllib.error.HTTPError: HTTP Error 403: Forbidden hope to solve,thanks

    opened by GDzhu01 0
Releases(1.0.0)
Pytorch implementation of XRD spectral identification from COD database

XRDidentifier Pytorch implementation of XRD spectral identification from COD database. Details will be explained in the paper to be submitted to NeurI

Masaki Adachi 4 Jan 07, 2023
Wandb-predictions - WANDB Predictions With Python

WANDB API CI/CD Below we capture the CI/CD scenarios that we would expect with o

Anish Shah 6 Oct 07, 2022
This tool converts a Nondeterministic Finite Automata (NFA) into a Deterministic Finite Automata (DFA)

This tool converts a Nondeterministic Finite Automata (NFA) into a Deterministic Finite Automata (DFA)

Quinn Herden 1 Feb 04, 2022
Elucidating Robust Learning with Uncertainty-Aware Corruption Pattern Estimation

Elucidating Robust Learning with Uncertainty-Aware Corruption Pattern Estimation Introduction 📋 Official implementation of Explainable Robust Learnin

JeongEun Park 6 Apr 19, 2022
Simple, but essential Bayesian optimization package

BayesO: A Bayesian optimization framework in Python Simple, but essential Bayesian optimization package. http://bayeso.org Online documentation Instal

Jungtaek Kim 74 Dec 05, 2022
Code and data for ImageCoDe, a contextual vison-and-language benchmark

ImageCoDe This repository contains code and data for ImageCoDe: Image Retrieval from Contextual Descriptions. Data All collected descriptions for the

McGill NLP 27 Dec 02, 2022
The code for MM2021 paper "Multi-Level Counterfactual Contrast for Visual Commonsense Reasoning"

The Code for MM2021 paper "Multi-Level Counterfactual Contrast for Visual Commonsense Reasoning" Setting up and using the repo Get the dataset. Follow

4 Apr 20, 2022
Angle data is a simple data type.

angledat Angle data is a simple data type. Installing + using Put angledat.py in the main dir of your project. Import it and use. Comments Comments st

1 Jan 05, 2022
EdiBERT is a generative model based on a bi-directional transformer, suited for image manipulation

EdiBERT, a generative model for image editing EdiBERT is a generative model based on a bi-directional transformer, suited for image manipulation. The

16 Dec 07, 2022
Code for paper "Document-Level Argument Extraction by Conditional Generation". NAACL 21'

Argument Extraction by Generation Code for paper "Document-Level Argument Extraction by Conditional Generation". NAACL 21' Dependencies pytorch=1.6 tr

Zoey Li 87 Dec 26, 2022
End-to-end machine learning project for rices detection

Basmatinet Welcome to this project folks ! Whether you like it or not this project is all about riiiiice or riz in french. It is also about Deep Learn

Béranger 47 Jun 18, 2022
A universal memory dumper using Frida

Fridump Fridump (v0.1) is an open source memory dumping tool, primarily aimed to penetration testers and developers. Fridump is using the Frida framew

551 Jan 07, 2023
Rohit Ingole 2 Mar 24, 2022
Pretrained models for Jax/Flax: StyleGAN2, GPT2, VGG, ResNet.

Pretrained models for Jax/Flax: StyleGAN2, GPT2, VGG, ResNet.

Matthias Wright 169 Dec 26, 2022
Official code release for ICCV 2021 paper SNARF: Differentiable Forward Skinning for Animating Non-rigid Neural Implicit Shapes.

Official code release for ICCV 2021 paper SNARF: Differentiable Forward Skinning for Animating Non-rigid Neural Implicit Shapes.

235 Dec 26, 2022
李云龙二次元风格化!打滚卖萌,使用了animeGANv2进行了视频的风格迁移

李云龙二次元风格化!一键star、fork,你也可以生成这样的团长! 打滚卖萌求star求fork! 0.效果展示 视频效果前往B站观看效果最佳:李云龙二次元风格化: github开源repo:李云龙二次元风格化 百度AIstudio开源地址,一键fork即可运行: 李云龙二次元风格化!一键fork

oukohou 44 Dec 04, 2022
SOLO and SOLOv2 for instance segmentation, ECCV 2020 & NeurIPS 2020.

SOLO: Segmenting Objects by Locations This project hosts the code for implementing the SOLO algorithms for instance segmentation. SOLO: Segmenting Obj

Xinlong Wang 1.5k Dec 31, 2022
Pocsploit is a lightweight, flexible and novel open source poc verification framework

Pocsploit is a lightweight, flexible and novel open source poc verification framework

cckuailong 208 Dec 24, 2022
Unsupervised Image-to-Image Translation

UNIT: UNsupervised Image-to-image Translation Networks Imaginaire Repository We have a reimplementation of the UNIT method that is more performant. It

Ming-Yu Liu 劉洺堉 1.9k Dec 26, 2022
Code to train models from "Paraphrastic Representations at Scale".

Paraphrastic Representations at Scale Code to train models from "Paraphrastic Representations at Scale". The code is written in Python 3.7 and require

John Wieting 71 Dec 19, 2022