Code and models for ICCV2021 paper "Robust Object Detection via Instance-Level Temporal Cycle Confusion".

Overview

Robust Object Detection via Instance-Level Temporal Cycle Confusion

This repo contains the implementation of the ICCV 2021 paper, Robust Object Detection via Instance-Level Temporal Cycle Confusion.

Screenshot

Building reliable object detectors that are robust to domain shifts, such as various changes in context, viewpoint, and object appearances, is critical for real world applications. In this work, we study the effectiveness of auxiliary self-supervised tasks to improve out-of-distribution generalization of object detectors. Inspired by the principle of maximum entropy, we introduce a novel self-supervised task, instance-level cycle confusion (CycConf), which operates on the region features of the object detectors. For each object, the task is to find the most different object proposals in the adjacent frame in a video and then cycle back to itself for self-supervision. CycConf encourages the object detector to explore invariant structures across instances under various motion, which leads to improved model robustness in unseen domains at test time. We observe consistent out-of-domain performance improvements when training object detectors in tandem with self-supervised tasks on various domain adaptation benchmarks with static images (Cityscapes, Foggy Cityscapes, Sim10K) and large-scale video datasets (BDD100K and Waymo open data).

Installation

Environment

  • CUDA 10.2
  • Python >= 3.7
  • Pytorch >= 1.6
  • THe Detectron2 version matches Pytorch and CUDA versions.

Dependencies

  1. Create a virtual env.
  • python3 -m pip install --user virtualenv
  • python3 -m venv cyc-conf
  • source cyc-conf/bin/activate
  1. Install dependencies.
  • pip install -r requirements.txt

  • Install Pytorch 1.9

pip3 install torch torchvision

Check out the previous Pytorch versions here.

  • Install Detectron2 Build Detectron2 from Source (gcc & g++ >= 5.4) python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'

Or, you can install Pre-built detectron2 (example for CUDA 10.2, Pytorch 1.9)

python -m pip install detectron2 -f \ https://dl.fbaipublicfiles.com/detectron2/wheels/cu102/torch1.9/index.html

More details can be found here.

Data Preparation

BDD100K

  1. Download the BDD100K MOT 2020 dataset (MOT 2020 Images and MOT 2020 Labels) and the detection labels (Detection 2020 Labels) here and the detailed description is available here. Put the BDD100K data under datasets/ in this repo. After downloading the data, the folder structure should be like below:
├── datasets
│   ├── bdd100k
│   │   ├── images
│   │   │    └── track
│   │   │        ├── train
│   │   │        ├── val
│   │   │        └── test
│   │   └── labels
│   │        ├── box_track_20
│   │        │   ├── train
│   │        │   └── val
│   │        └── det_20
│   │            ├── det_train.json
│   │            └── det_val.json
│   ├── waymo

Convert the labels of the MOT 2020 data (train & val sets) into COCO format by running:

python3 datasets/bdd100k2coco.py -i datasets/bdd100k/labels/box_track_20/val/ -o datasets/bdd100k/labels/track/bdd100k_mot_val_coco.json -m track
python3 datasets/bdd100k2coco.py -i datasets/bdd100k/labels/box_track_20/train/ -o datasets/bdd100k/labels/track/bdd100k_mot_train_coco.json -m track
  1. Split the original videos into different domains (time of day). Run the following command:
python3 -m datasets.domain_splits_bdd100k

This script will first extract the domain attributes from the BDD100K detection set and then map them to the tracking set sequences. After the processing steps, you would see two additional folders domain_splits and per_seq under the datasets/bdd100k/labels/box_track_20. The domain splits of all attributes in BDD100K detection set can be found at datasets/bdd100k/labels/domain_splits.

Waymo

  1. Download the Waymo dataset here. Put the Waymo raw data under datasets/ in this repo. After downloading the data, the folder structure should be like below:
├── datasets
│   ├── bdd100k
│   ├── waymo
│   │   └── raw

Convert the raw TFRecord data files into COCO format by running:

python3 -m datasets.waymo2coco

Note that this script takes a long time to run, be prepared to keep it running for over a day.

  1. Convert the BDD100K dataset labels into 3 classes (originally 8). This needs to be done in order to match the 3 classes of the Waymo dataset. Run the following command:
python3 -m datasets.convert_bdd_3cls

Get Started

For joint training,

python3 -m tools.train_net --config-file [config_file] --num-gpus 8

For evaluation,

python3 -m tools.train_net --config-file [config_file] --num-gpus [num] --eval-only

This command will load the latest checkpoint in the folder. If you want to specify a different checkpoint or evaluate the pretrained checkpoints, you can run

python3 -m tools.train_net --config-file [config_file] --num-gpus [num] --eval-only MODEL.WEIGHTS [PATH_TO_CHECKPOINT]

Benchmark Results

Dataset Statistics

Dataset Split Seq frames/seq. boxes classes
BDD100K Daytime train 757 204 1.82M 8
val 108 204 287K 8
BDD100K Night train 564 204 895K 8
val 71 204 137K 8
Waymo Open Data train 798 199 3.64M 3
val 202 199 886K 3

Out of Domain Evaluation

BDD100K Daytime to Night. The base detector is Faster R-CNN with ResNet-50.

Model AP AP50 AP75 APs APm APl Config Checkpoint
Faster R-CNN 17.84 31.35 17.68 4.92 16.15 35.56 link link
+ Rotation 18.58 32.95 18.15 5.16 16.93 36.00 link link
+ Jigsaw 17.47 31.22 16.81 5.08 15.80 33.84 link link
+ Cycle Consistency 18.35 32.44 18.07 5.04 17.07 34.85 link link
+ Cycle Confusion 19.09 33.58 19.14 5.70 17.68 35.86 link link

BDD100K Night to Daytime.

Model AP AP50 AP75 APs APm APl Config Checkpoint
Faster R-CNN 19.14 33.04 19.16 5.38 21.42 40.34 link link
+ Rotation 19.07 33.25 18.83 5.53 21.32 40.06 link link
+ Jigsaw 19.22 33.87 18.71 5.67 22.35 38.57 link link
+ Cycle Consistency 18.89 33.50 18.31 5.82 21.01 39.13 link link
+ Cycle Confusion 19.57 34.34 19.26 6.06 22.55 38.95 link link

Waymo Front Left to BDD100K Night.

Model AP AP50 AP75 APs APm APl Config Checkpoint
Faster R-CNN 10.07 19.62 9.05 2.67 10.81 18.62 link link
+ Rotation 11.34 23.12 9.65 3.53 11.73 21.60 link link
+ Jigsaw 9.86 19.93 8.40 2.77 10.53 18.82 link link
+ Cycle Consistency 11.55 23.44 10.00 2.96 12.19 21.99 link link
+ Cycle Confusion 12.27 26.01 10.24 3.44 12.22 23.56 link link

Waymo Front Right to BDD100K Night.

Model AP AP50 AP75 APs APm APl Config Checkpoint
Faster R-CNN 8.65 17.26 7.49 1.76 8.29 19.99 link link
+ Rotation 9.25 18.48 8.08 1.85 8.71 21.08 link link
+ Jigsaw 8.34 16.58 7.26 1.61 8.01 18.09 link link
+ Cycle Consistency 9.11 17.92 7.98 1.78 9.36 19.18 link link
+ Cycle Confusion 9.99 20.58 8.30 2.18 10.25 20.54 link link

Citation

If you find this repository useful for your publications, please consider citing our paper.

@article{wang2021robust,
  title={Robust Object Detection via Instance-Level Temporal Cycle Confusion},
  author={Wang, Xin and Huang, Thomas E and Liu, Benlin and Yu, Fisher and Wang, Xiaolong and Gonzalez, Joseph E and Darrell, Trevor},
  journal={International Conference on Computer Vision (ICCV)},
  year={2021}
}
Owner
Xin Wang
Researcher from Microsoft Research. Prev. Ph.D. student at UC Berkeley.
Xin Wang
A simple implementation of Kalman filter in Multi Object Tracking

kalman Filter in Multi-object Tracking A simple implementation of Kalman filter in Multi Object Tracking 本实现是在https://github.com/liuchangji/kalman-fil

124 Dec 29, 2022
Real-time object detection on Android using the YOLO network with TensorFlow

TensorFlow YOLO object detection on Android Source project android-yolo is the first implementation of YOLO for TensorFlow on an Android device. It is

Nataniel Ruiz 624 Jan 03, 2023
Code for the ICCV 2021 Workshop paper: A Unified Efficient Pyramid Transformer for Semantic Segmentation.

Unified-EPT Code for the ICCV 2021 Workshop paper: A Unified Efficient Pyramid Transformer for Semantic Segmentation. Installation Linux, CUDA=10.0,

29 Aug 23, 2022
PyTorch implementation of paper "StarEnhancer: Learning Real-Time and Style-Aware Image Enhancement" (ICCV 2021 Oral)

StarEnhancer StarEnhancer: Learning Real-Time and Style-Aware Image Enhancement (ICCV 2021 Oral) Abstract: Image enhancement is a subjective process w

IDKiro 133 Dec 28, 2022
You Only Hypothesize Once: Point Cloud Registration with Rotation-equivariant Descriptors

You Only Hypothesize Once: Point Cloud Registration with Rotation-equivariant Descriptors In this paper, we propose a novel local descriptor-based fra

Haiping Wang 80 Dec 15, 2022
Unofficial PyTorch implementation of SimCLR by Google Brain

Unofficial PyTorch implementation of SimCLR by Google Brain

Rishabh Anand 2 Oct 13, 2021
An end-to-end framework for mixed-integer optimization with data-driven learned constraints.

OptiCL OptiCL is an end-to-end framework for mixed-integer optimization (MIO) with data-driven learned constraints. We address a problem setting in wh

Holly Wiberg 57 Dec 26, 2022
Official implementation for "Image Quality Assessment using Contrastive Learning"

Image Quality Assessment using Contrastive Learning Pavan C. Madhusudana, Neil Birkbeck, Yilin Wang, Balu Adsumilli and Alan C. Bovik This is the offi

Pavan Chennagiri 67 Dec 30, 2022
Code for KHGT model, AAAI2021

KHGT Code for KHGT accepted by AAAI2021 Please unzip the data files in Datasets/ first. To run KHGT on Yelp data, use python labcode_yelp.py For Movi

32 Nov 29, 2022
Research into Forex price prediction from price history using Deep Sequence Modeling with Stacked LSTMs.

Forex Data Prediction via Recurrent Neural Network Deep Sequence Modeling Research Paper Our research paper can be viewed here Installation Clone the

Alex Taradachuk 2 Aug 07, 2022
A Japanese Medical Information Extraction Toolkit

JaMIE: a Japanese Medical Information Extraction toolkit Joint Japanese Medical Problem, Modality and Relation Recognition The Train/Test phrases requ

7 Dec 12, 2022
Animation of solving the traveling salesman problem to optimality using mixed-integer programming and iteratively eliminating sub tours

tsp-streamlit Animation of solving the traveling salesman problem to optimality using mixed-integer programming and iteratively eliminating sub tours.

4 Nov 05, 2022
A Number Recognition algorithm

Paddle-VisualAttention Results_Compared SVHN Dataset Methods Steps GPU Batch Size Learning Rate Patience Decay Step Decay Rate Training Speed (FPS) Ac

1 Nov 12, 2021
Weight initialization schemes for PyTorch nn.Modules

nninit Weight initialization schemes for PyTorch nn.Modules. This is a port of the popular nninit for Torch7 by @kaixhin. ##Update This repo has been

Alykhan Tejani 69 Jan 26, 2021
Progressive Growing of GANs for Improved Quality, Stability, and Variation

Progressive Growing of GANs for Improved Quality, Stability, and Variation — Official TensorFlow implementation of the ICLR 2018 paper Tero Karras (NV

Tero Karras 5.9k Jan 05, 2023
Pytorch implementations of the paper Value Functions Factorization with Latent State Information Sharing in Decentralized Multi-Agent Policy Gradients

LSF-SAC Pytorch implementations of the paper Value Functions Factorization with Latent State Information Sharing in Decentralized Multi-Agent Policy G

Hanhan 2 Aug 14, 2022
Unified API to facilitate usage of pre-trained "perceptor" models, a la CLIP

mmc installation git clone https://github.com/dmarx/Multi-Modal-Comparators cd 'Multi-Modal-Comparators' pip install poetry poetry build pip install d

David Marx 37 Nov 25, 2022
a reimplementation of LiteFlowNet in PyTorch that matches the official Caffe version

pytorch-liteflownet This is a personal reimplementation of LiteFlowNet [1] using PyTorch. Should you be making use of this work, please cite the paper

Simon Niklaus 365 Dec 31, 2022
Learning Visual Words for Weakly-Supervised Semantic Segmentation

[IJCAI 2021] Learning Visual Words for Weakly-Supervised Semantic Segmentation Implementation of IJCAI 2021 paper Learning Visual Words for Weakly-Sup

Lixiang Ru 24 Oct 05, 2022
A PyTorch Implementation of Single Shot Scale-invariant Face Detector.

S³FD: Single Shot Scale-invariant Face Detector A PyTorch Implementation of Single Shot Scale-invariant Face Detector. Eval python wider_eval_pytorch.

carwin 235 Jan 07, 2023