Real-time Object Detection for Streaming Perception, CVPR 2022

Overview

StreamYOLO

Real-time Object Detection for Streaming Perception

Jinrong Yang, Songtao Liu, Zeming Li, Xiaoping Li, Sun Jian
Real-time Object Detection for Streaming Perception, CVPR 2022 (Oral)
Paper

Bestsoftwarechoose

Benchmark

Model size velocity sAP
0.5:0.95
sAP50 sAP75 weights COCO pretrained weights
StreamYOLO-s 600×960 1x 29.8 50.3 29.8 github github
StreamYOLO-m 600×960 1x 33.7 54.5 34.0 github github
StreamYOLO-l 600×960 1x 36.9 58.1 37.5 github github
StreamYOLO-l 600×960 2x 34.6 56.3 34.7 github github
StreamYOLO-l 600×960 still 39.4 60.0 40.2 github github

Quick Start

Dataset preparation

You can download Argoverse-1.1 full dataset and annotation from HERE and unzip it.

The folder structure should be organized as follows before our processing.

StreamYOLO
├── exps
├── tools
├── yolox
├── data
│   ├── Argoverse-1.1
│   │   ├── annotations
│   │       ├── tracking
│   │           ├── train
│   │           ├── val
│   │           ├── test
│   ├── Argoverse-HD
│   │   ├── annotations
│   │       ├── test-meta.json
│   │       ├── train.json
│   │       ├── val.json

The hash strings represent different video sequences in Argoverse, and ring_front_center is one of the sensors for that sequence. Argoverse-HD annotations correspond to images from this sensor. Information from other sensors (other ring cameras or LiDAR) is not used, but our framework can be also extended to these modalities or to a multi-modality setting.

Installation
# basic python libraries
conda create --name streamyolo python=3.7

pip install torch==1.7.1+cu110 torchvision==0.8.2+cu110 torchaudio==0.7.2 -f https://download.pytorch.org/whl/torch_stable.html

pip3 install yolox==0.3
git clone [email protected]:yancie-yjr/StreamYOLO.git

cd StreamYOLO/

# add StreamYOLO to PYTHONPATH and add this line to ~/.bashrc or ~/.zshrc (change the file accordingly)
ADDPATH=$(pwd)
echo export PYTHONPATH=$PYTHONPATH:$ADDPATH >> ~/.bashrc
source ~/.bashrc

# Installing `mmcv` for the official sAP evaluation:
# Please replace `{cu_version}` and ``{torch_version}`` with the versions you are currently using.
# You will get import or runtime errors if the versions are incorrect.
pip install mmcv-full==1.1.5 -f https://download.openmmlab.com/mmcv/dist/{cu_version}/{torch_version}/index.html
Reproduce our results on Argoverse-HD

Step1. Prepare COCO dataset

cd <StreamYOLO_HOME>
ln -s /path/to/your/Argoverse-1.1 ./data/Argoverse-1.1
ln -s /path/to/your/Argoverse-HD ./data/Argoverse-HD

Step2. Reproduce our results on Argoverse:

python tools/train.py -f cfgs/m_s50_onex_dfp_tal_flip.py -d 8 -b 32 -c [/path/to/your/coco_pretrained_path] -o --fp16
  • -d: number of gpu devices.
  • -b: total batch size, the recommended number for -b is num-gpu * 8.
  • --fp16: mixed precision training.
  • -c: model checkpoint path.
Offline Evaluation

We support batch testing for fast evaluation:

python tools/eval.py -f  cfgs/l_s50_onex_dfp_tal_flip.py -c [/path/to/your/model_path] -b 64 -d 8 --conf 0.01 [--fp16] [--fuse]
  • --fuse: fuse conv and bn.
  • -d: number of GPUs used for evaluation. DEFAULT: All GPUs available will be used.
  • -b: total batch size across on all GPUs.
  • -c: model checkpoint path.
  • --conf: NMS threshold. If using 0.001, the performance will further improve by 0.2~0.3 sAP.
Online Evaluation

We modify the online evaluation from sAP

Please use 1 V100 GPU to test the performance since other GPUs with low computing power will trigger non-real-time results!!!!!!!!

cd sAP/streamyolo
bash streamyolo.sh

Citation

Please cite the following paper if this repo helps your research:

@InProceedings{streamyolo,
    author    = {Yang, Jinrong and Liu, Songtao and Li, Zeming and Li, Xiaoping and Sun, Jian},
    title     = {Real-time Object Detection for Streaming Perception},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
    year      = {2022}
}

License

This repo is released under the Apache 2.0 license. Please see the LICENSE file for more information.

Comments
  • when will the readme document be completed

    when will the readme document be completed

    Hi, @GOATmessi7 @yancie-yjr great wokrs. Can you enrich the readme about datasets preparing、how to training & validation and so on. hope to finish it soon. thanks

    opened by SmallMunich 1
  • ModuleNotFoundError: No module named 'exps'

    ModuleNotFoundError: No module named 'exps'

    hi everyone, I got this issue ...File "cfgs/m_s50_onex_dfp_tal_flip.py", line 189, in get_trainer from exps.train_utils.double_trainer import Trainer ModuleNotFoundError: No module named 'exps'

    Actually I ran code on local I got this error but when I try "echo export PYTHONPATH=$PYTHONPATH:$ADDPATH >> " it worked. But as you can guess my local GPU didn't enough for training. And I established everything on colab but this time "echo export..." didn't save me.

    opened by Tezcan98 3
  • A small bug in README about Dataset Prep.

    A small bug in README about Dataset Prep.

    For Developers

    Hi! When reproducing your results on Argoverse-HD, I found that the directory structure you provided in Quick Start - Dataset preparation section doesn't match the original directory structure of Argoverse-HD dataset, as well as your code required. The directory structure in Quick Start - Dataset preparation section:

    StreamYOLO
    ├── exps
    ├── tools
    ├── yolox
    ├── data
    │   ├── Argoverse-1.1
    │   │   ├── annotations
    │   │       ├── tracking
    │   │           ├── train
    │   │           ├── val
    │   │           ├── test
    │   ├── Argoverse-HD
    │   │   ├── annotations
    │   │       ├── test-meta.json
    │   │       ├── train.json
    │   │       ├── val.json
    

    should be edited as:

    StreamYOLO
    ├── exps
    ├── tools
    ├── yolox
    ├── data
    │   ├── Argoverse-1.1
    │   │   ├── tracking
    │   │       ├── train
    │   │       ├── val
    │   │       ├── test
    │   ├── Argoverse-HD
    │   │   ├── annotations
    │   │       ├── test-meta.json
    │   │       ├── train.json
    │   │       ├── val.json
    

    which matches the directory structure of the Argoverse-HD dataset: Screenshot 2022-09-21 151703.png

    For Stargazers

    BTW, if anyone manually modifies the directory structure to fit the one provided in README, an AssertionError will occur: (some parts of file path was edited)

    AssertionError: Caught AssertionError in DataLoader worker process 0.
    Original Traceback (most recent call last):
      File "%HOME%\anaconda3\envs\streamyolo\lib\site-packages\torch\utils\data\_utils\worker.py", line 198, in _worker_loop
        data = fetcher.fetch(index)
      File "%HOME%\anaconda3\envs\streamyolo\lib\site-packages\torch\utils\data\_utils\fetch.py", line 44, in fetch
        data = [self.dataset[idx] for idx in possibly_batched_index]
      File "%HOME%\anaconda3\envs\streamyolo\lib\site-packages\torch\utils\data\_utils\fetch.py", line 44, in <listcomp>
        data = [self.dataset[idx] for idx in possibly_batched_index]
      File "%HOME%\anaconda3\envs\streamyolo\lib\site-packages\yolox\data\datasets\datasets_wrapper.py", line 110, in wrapper
        ret_val = getitem_fn(self, index)
      File "%WORKSPACE%\StreamYOLO\exps\data\tal_flip_mosaicdetection.py", line 255, in __getitem__
        img, support_img, label, support_label, img_info, id_ = self._dataset.pull_item(idx)
      File "%WORKSPACE%\StreamYOLO\exps\dataset\tal_flip_one_future_argoversedataset.py", line 227, in pull_item
        img = self.load_resized_img(index)
      File "%WORKSPACE%\StreamYOLO\exps\dataset\tal_flip_one_future_argoversedataset.py", line 180, in load_resized_img
        img = self.load_image(index)
      File "%WORKSPACE%\StreamYOLO\exps\dataset\tal_flip_one_future_argoversedataset.py", line 196, in load_image
        assert img is not None
    AssertionError
    

    If anyone gets the similar error message, the content in For Developers may be helpful.

    opened by jingwenchong 6
  • Figure 2 in the paper

    Figure 2 in the paper

    Hi, I have read your paper.

    I have a question in figure 2.

    On the page3 in the paper, you wrote the expression "the output y1 of the frame F1 is matched and evaluated with the ground truth of F3 and the result of F2 is missed" about Figure 2.

    I understood like that expression mean y1 is the output of the none-real-time detectors of frame F1.

    But, before the frame F3 is received, the frame F2 is received in first.

    So I can't understand that point and I also want to ask when the output of the frame f0 come out.

    opened by wpdlatm1452 1
  • How can i save the detection result?

    How can i save the detection result?

    Hi, thank you for suggesting your nice code.

    I trained the model using Argoverse dataset following your readme.

    I want to run demo and save detection results (image or video), how can i do that?

    thank you.

    opened by daminlee1 0
Owner
Jinrong Yang
Research: Object detection, Deep learning
Jinrong Yang
Fog Simulation on Real LiDAR Point Clouds for 3D Object Detection in Adverse Weather

LiDAR fog simulation Created by Martin Hahner at the Computer Vision Lab of ETH Zurich. This is the official code release of the paper Fog Simulation

Martin Hahner 110 Dec 30, 2022
Portfolio Optimization and Quantitative Strategic Asset Allocation in Python

Riskfolio-Lib Quantitative Strategic Asset Allocation, Easy for Everyone. Description Riskfolio-Lib is a library for making quantitative strategic ass

Riskfolio 1.7k Jan 07, 2023
face_recognization (FaceNet) + TFHE (HNP) + hand_face_detection (Mediapipe)

SuperControlSystem Face_Recognization (FaceNet) 面部识别 (FaceNet) Fully Homomorphic Encryption over the Torus (HNP) 环面全同态加密 (TFHE) Hand_Face_Detection (M

liziyu0104 2 Dec 30, 2021
A collection of scripts I developed for personal and working projects.

A collection of scripts I developed for personal and working projects Table of contents Introduction Repository diagram structure List of scripts pyth

Gianluca Bianco 109 Dec 26, 2022
Solving reinforcement learning tasks which require language and vision

Multimodal Reinforcement Learning JAX implementations of the following multimodal reinforcement learning approaches. Dual-coding Episodic Memory from

Henry Prior 31 Feb 26, 2022
Implementation of SE3-Transformers for Equivariant Self-Attention, in Pytorch.

SE3 Transformer - Pytorch Implementation of SE3-Transformers for Equivariant Self-Attention, in Pytorch. May be needed for replicating Alphafold2 resu

Phil Wang 207 Dec 23, 2022
Discretized Integrated Gradients for Explaining Language Models (EMNLP 2021)

Discretized Integrated Gradients for Explaining Language Models (EMNLP 2021) Overview of paths used in DIG and IG. w is the word being attributed. The

INK Lab @ USC 17 Oct 27, 2022
High-Fidelity Pluralistic Image Completion with Transformers (ICCV 2021)

Image Completion Transformer (ICT) Project Page | Paper (ArXiv) | Pre-trained Models | Supplemental Material This repository is the official pytorch i

Ziyu Wan 243 Jan 03, 2023
Code, environments, and scripts for the paper: "How Private Is Your RL Policy? An Inverse RL Based Analysis Framework"

Privacy-Aware Inverse RL (PRIL) Analysis Framework Code, environments, and scripts for the paper: "How Private Is Your RL Policy? An Inverse RL Based

1 Dec 06, 2021
Sum-Product Probabilistic Language

Sum-Product Probabilistic Language SPPL is a probabilistic programming language that delivers exact solutions to a broad range of probabilistic infere

MIT Probabilistic Computing Project 57 Nov 17, 2022
PyTorch Implementation of "Non-Autoregressive Neural Machine Translation"

Non-Autoregressive Transformer Code release for Non-Autoregressive Neural Machine Translation by Jiatao Gu, James Bradbury, Caiming Xiong, Victor O.K.

Salesforce 261 Nov 12, 2022
Code for C2-Matching (CVPR2021). Paper: Robust Reference-based Super-Resolution via C2-Matching.

C2-Matching (CVPR2021) This repository contains the implementation of the following paper: Robust Reference-based Super-Resolution via C2-Matching Yum

Yuming Jiang 151 Dec 26, 2022
Asterisk is a framework to generate high-quality training datasets at scale

Asterisk is a framework to generate high-quality training datasets at scale

Mona Nashaat 44 Apr 25, 2022
FID calculation with proper image resizing and quantization steps

clean-fid: Fixing Inconsistencies in FID Project | Paper The FID calculation involves many steps that can produce inconsistencies in the final metric.

Gaurav Parmar 606 Jan 06, 2023
Official implementation of Deep Burst Super-Resolution

Deep-Burst-SR Official implementation of Deep Burst Super-Resolution Publication: Deep Burst Super-Resolution. Goutam Bhat, Martin Danelljan, Luc Van

Goutam Bhat 113 Dec 19, 2022
Learning Logic Rules for Document-Level Relation Extraction

LogiRE Learning Logic Rules for Document-Level Relation Extraction We propose to introduce logic rules to tackle the challenges of doc-level RE. Equip

41 Dec 26, 2022
PyTorch implementaton of our CVPR 2021 paper "Bridging the Visual Gap: Wide-Range Image Blending"

Bridging the Visual Gap: Wide-Range Image Blending PyTorch implementaton of our CVPR 2021 paper "Bridging the Visual Gap: Wide-Range Image Blending".

Chia-Ni Lu 69 Dec 20, 2022
Pytorch implementation of Integrating Tree Path in Transformer for Code Representation

This is an official Pytorch implementation of the approaches proposed in: Han Peng, Ge Li, Wenhan Wang, Yunfei Zhao, Zhi Jin “Integrating Tree Path in

Han Peng 16 Dec 23, 2022
A Survey on Deep Learning Technique for Video Segmentation

A Survey on Deep Learning Technique for Video Segmentation A Survey on Deep Learning Technique for Video Segmentation Wenguan Wang, Tianfei Zhou, Fati

Tianfei Zhou 112 Dec 12, 2022
TensorFlow implementation of Barlow Twins (Barlow Twins: Self-Supervised Learning via Redundancy Reduction)

Barlow-Twins-TF This repository implements Barlow Twins (Barlow Twins: Self-Supervised Learning via Redundancy Reduction) in TensorFlow and demonstrat

Sayak Paul 36 Sep 14, 2022