[ACM MM 2021] Multiview Detection with Shadow Transformer (and View-Coherent Data Augmentation)

Last update: Dec 13, 2022

Related tags

Overview

Multiview Detection with Shadow Transformer (and View-Coherent Data Augmentation) [arXiv] [paper]

@inproceedings{hou2021multiview,
  title={Multiview Detection with Shadow Transformer (and View-Coherent Data Augmentation)},
  author={Hou, Yunzhong and Zheng, Liang},
  booktitle={Proceedings of the 29th ACM International Conference on Multimedia (MM ’21)},
  year={2021}
}

Overview

We release the PyTorch code for MVDeTr, a state-of-the-art multiview pedestrian detector. Its superior performance should be credited to transformer architectures, updated loss terms, and view-coherent data augmentations. Moreover, MVDeTr is also very efficient and can be trained on a single RTX 2080TI. This repo also includes a simplified version of MVDet, which also runs on a single RTX 2080TI.

MVDeTr Code

This repo is dedicated to the code for MVDeTr.

Dependencies

This code uses the following libraries

python
pytorch & tochvision
numpy
matplotlib
pillow
opencv-python
kornia

Data Preparation

By default, all datasets are in ~/Data/. We use MultiviewX and Wildtrack in this project.

Your ~/Data/ folder should look like this

Data
├── MultiviewX/
│   └── ...
└── Wildtrack/ 
    └── ...

Code Preparation

Before running the code, one should go to multiview_detector/models/ops and run bash mask.sh to build the deformable transformer (forked from Deformable DETR).

Training

In order to train classifiers, please run the following,

python main.py -d wildtrack
python main.py -d multiviewx

This should automatically return evaluation results similar to the reported 91.5% MODA on Wildtrack dataset and 93.7% MODA on MultiviewX dataset.

Architectures

This repo supports multiple architecture variants. For MVDeTr, please specify --world_feat deform_trans; for a similar fully convolutional architecture like MVDet, please specify --world_feat conv.

Loss terms

This repo supports multiple loss terms. For the focal loss variant as in MVDeTr, please specify --use_mse 0; for the MSE loss as in MVDet, please specify ----use_mse 1.

Augmentations

This repo includes support for view coherent data augmentation, which applies affine transformations onto the per-view inputs, and then invert the per-view feature maps to maintain multiview coherency.

Pre-trained models

You can download the checkpoints at this link.

[ACM MM 2021] Multiview Detection with Shadow Transformer (and View-Coherent Data Augmentation)

Related tags

Overview

Multiview Detection with Shadow Transformer (and View-Coherent Data Augmentation) [arXiv] [paper]

Overview

Content

MVDeTr Code

Dependencies

Data Preparation

Code Preparation

Training

Architectures

Loss terms

Augmentations

Pre-trained models

Owner

Yunzhong Hou

Unofficial implementation of the Involution operation from CVPR 2021

Apache Spark - A unified analytics engine for large-scale data processing

Object Tracking and Detection Using OpenCV

ByteTrack with ReID module following the paradigm of FairMOT, tracking strategy is borrowed from FairMOT/JDE.

CCCL: Contrastive Cascade Graph Learning.

Pure python implementations of popular ML algorithms.

Build a small, 3 domain internet using Github pages and Wikipedia and construct a crawler to crawl, render, and index.

Material for my PyConDE & PyData Berlin 2022 Talk "5 Steps to Speed Up Your Data-Analysis on a Single Core"

SNE-RoadSeg in PyTorch, ECCV 2020

Reimplementation of Dynamic Multi-scale filters for Semantic Segmentation.

Code image classification of MNIST dataset using different architectures: simple linear NN, autoencoder, and highway network

A Fast and Accurate One-Stage Approach to Visual Grounding, ICCV 2019 (Oral)

A Pytorch Implementation of Domain adaptation of object detector using scissor-like networks

Exploring the Dual-task Correlation for Pose Guided Person Image Generation

Unofficial implementation of the ImageNet, CIFAR 10 and SVHN Augmentation Policies learned by AutoAugment using pillow

Source code related to the article submitted to the International Conference on Computational Science ICCS 2022 in London

LogDeep is an open source deeplearning-based log analysis toolkit for automated anomaly detection.

YOLOX is a high-performance anchor-free YOLO, exceeding yolov3~v5 with ONNX, TensorRT, ncnn, and OpenVINO supported.

7th place solution of Human Protein Atlas - Single Cell Classification on Kaggle

MapReader: A computer vision pipeline for the semantic exploration of maps at scale