MutualGuide is a compact object detector specially designed for embedded devices

Overview

Introduction

MutualGuide is a compact object detector specially designed for embedded devices. Comparing to existing detectors, this repo contains two key features.

Firstly, the Mutual Guidance mecanism assigns labels to the classification task based on the prediction on the localization task, and vice versa, alleviating the misalignment problem between both tasks; Secondly, the teacher-student prediction disagreements guides the knowledge transfer in a feature-based detection distillation framework, thereby reducing the performance gap between both models.

For more details, please refer to our ACCV paper and BMVC paper.

Planning

  • Add RepVGG backbone.
  • Add ShuffleNetV2 backbone.
  • Add TensorRT transform code for inference acceleration.
  • Add draw function to plot detection results.
  • Add custom dataset training (annotations in XML format).
  • Add Transformer backbone.
  • Add BiFPN neck.

Benchmark

  • Without knowledge distillation:
Backbone Resolution APval
0.5:0.95
APval
0.5
APval
0.75
APval
small
APval
medium
APval
large
Speed V100
(ms)
Weights
ShuffleNet-1.0 512x512 35.8 52.9 38.6 19.8 40.1 48.3 8.3 Google
ResNet-34 512x512 44.1 62.3 47.6 26.5 50.2 58.3 6.9 Google
ResNet-18 512x512 42.0 60.0 45.3 25.4 47.1 56.0 4.4 Google
RepVGG-A2 512x512 44.2 62.5 47.5 27.2 50.3 57.2 5.3 Google
RepVGG-A1 512x512 43.1 61.3 46.6 26.6 49.3 55.9 4.4 Google
  • With knowledge distillation:
Backbone Resolution APval
0.5:0.95
APval
0.5
APval
0.75
APval
small
APval
medium
APval
large
Speed V100
(ms)
Weights
ResNet-18 512x512 42.9 60.7 46.2 25.4 48.8 57.2 4.4 Google
RepVGG-A1 512x512 44.0 62.1 47.3 27.6 49.9 57.9 4.4 Google

Remarks:

  • The precision is measured on the COCO2017 Val dataset.
  • The inference runtime is measured by Pytorch framework (without TensorRT acceleration) on a Tesla V100 GPU, and the post-processing time (e.g., NMS) is not included (i.e., we measure the model inference time).
  • To dowload from Baidu cloud, go to this link (password: dvz7).

Datasets

First download the VOC and COCO dataset, you may find the sripts in data/scripts/ helpful. Then create a folder named datasets and link the downloaded datasets inside:

$ mkdir datasets
$ ln -s /path_to_your_voc_dataset datasets/VOCdevkit
$ ln -s /path_to_your_coco_dataset datasets/coco2017

Remarks:

  • For training on custom dataset, first modify the dataset path XMLroot and categories XML_CLASSES in data/xml_dataset.py. Then apply --dataset XML.

Training

For training with Mutual Guide:

$ python3 train.py --neck ssd --backbone vgg16    --dataset VOC --size 320 --multi_level --multi_anchor --mutual_guide --pretrained
                          fpn            resnet34           COCO       512
                          pafpn          repvgg-A2          XML
                                         shufflenet-1.0

For knowledge distillation using PDF-Distil:

$ python3 distil.py --neck ssd --backbone vgg11    --dataset VOC --size 320 --multi_level --multi_anchor --mutual_guide --pretrained --kd pdf
                           fpn            resnet18           COCO       512
                           pafpn          repvgg-A1          XML
                                          shufflenet-0.5

Remarks:

  • For training without MutualGuide, just remove the --mutual_guide;
  • For training on custom dataset, convert your annotations into XML format and use the parameter --dataset XML. An example is given in datasets/XML/;
  • For knowledge distillation with traditional MSE loss, just use parameter --kd mse;
  • The default folder to save trained model is weights/.

Evaluation

Every time you want to evaluate a trained network:

$ python3 test.py --neck ssd --backbone vgg11    --dataset VOC --size 320 --trained_model path_to_saved_weights --multi_level --multi_anchor --pretrained --draw
                         fpn            resnet18           COCO       512
                         pafpn          repvgg-A1          XML
                                        shufflenet-0.5

Remarks:

  • It will directly print the mAP, AP50 and AP50 results on VOC2007 Test or COCO2017 Val;
  • Add parameter --draw to draw detection results. They will be saved in draw/VOC/ or draw/COCO/ or draw/XML/;
  • Add --trt to activate TensorRT acceleration.

Citing us

Please cite our papers in your publications if they help your research:

@InProceedings{Zhang_2020_ACCV,
    author    = {Zhang, Heng and Fromont, Elisa and Lefevre, Sebastien and Avignon, Bruno},
    title     = {Localize to Classify and Classify to Localize: Mutual Guidance in Object Detection},
    booktitle = {Proceedings of the Asian Conference on Computer Vision (ACCV)},
    month     = {November},
    year      = {2020}
}

@InProceedings{Zhang_2021_BMVC,
    author    = {Zhang, Heng and Fromont, Elisa and Lefevre, Sebastien and Avignon, Bruno},
    title     = {PDF-Distil: including Prediction Disagreements in Feature-based Distillation for object detection},
    booktitle = {Proceedings of the British Machine Vision Conference (BMVC)},
    month     = {November},
    year      = {2021}
}

Acknowledgement

This project contains pieces of code from the following projects: mmdetection, ssd.pytorch, rfbnet and yolox.

Multimodal Temporal Context Network (MTCN)

Multimodal Temporal Context Network (MTCN) This repository implements the model proposed in the paper: Evangelos Kazakos, Jaesung Huh, Arsha Nagrani,

Evangelos Kazakos 13 Nov 24, 2022
PyTorch implementation of "Debiased Visual Question Answering from Feature and Sample Perspectives" (NeurIPS 2021)

D-VQA We provide the PyTorch implementation for Debiased Visual Question Answering from Feature and Sample Perspectives (NeurIPS 2021). Dependencies P

Zhiquan Wen 19 Dec 22, 2022
Tutorial for the PERFECTING FACTORY 5.0 WITH EDGE-POWERED AI workshop

Workshop Advantech Jetson Nano This tutorial has been designed for the PERFECTING FACTORY 5.0 WITH EDGE-POWERED AI workshop in collaboration with Adva

Edge Impulse 18 Nov 22, 2022
InterfaceGAN++: Exploring the limits of InterfaceGAN

InterfaceGAN++: Exploring the limits of InterfaceGAN Authors: Apavou Clément & Belkada Younes From left to right - Images generated using styleGAN and

Younes Belkada 42 Dec 23, 2022
Demystifying How Self-Supervised Features Improve Training from Noisy Labels

Demystifying How Self-Supervised Features Improve Training from Noisy Labels This code is a PyTorch implementation of the paper "[Demystifying How Sel

<a href=[email protected]"> 4 Oct 14, 2022
Repository for MeshTalk supplemental material and code once the (already approved) 16 GHS captures our lab will make publicly available are released.

meshtalk This repository contains code to run MeshTalk for face animation from audio. If you use MeshTalk, please cite @inproceedings{richard2021mesht

Meta Research 221 Jan 06, 2023
Retrieve and analysis data from SDSS (Sloan Digital Sky Survey)

Author: Behrouz Safari License: MIT sdss A python package for retrieving and analysing data from SDSS (Sloan Digital Sky Survey) Installation Install

Behrouz 3 Oct 28, 2022
Official implementation of Few-Shot and Continual Learning with Attentive Independent Mechanisms

Few-Shot and Continual Learning with Attentive Independent Mechanisms This repository is the official implementation of Few-Shot and Continual Learnin

Chikan_Huang 25 Dec 08, 2022
This is the official implementation for the paper "(Almost) Free Incentivized Exploration from Decentralized Learning Agents" in NeurIPS 2021.

Observe then Incentivize Experiments This is the code used for the paper "(Almost) Free Incentivized Exploration from Decentralized Learning Agents",

Cong Shen Research Group 0 Mar 08, 2022
PyTorchCV: A PyTorch-Based Framework for Deep Learning in Computer Vision.

PyTorchCV: A PyTorch-Based Framework for Deep Learning in Computer Vision @misc{CV2018, author = {Donny You ( Donny You 40 Sep 14, 2022

Flexible Option Learning - NeurIPS 2021

Flexible Option Learning This repository contains code for the paper Flexible Option Learning presented as a Spotlight at NeurIPS 2021. The implementa

Martin Klissarov 7 Nov 09, 2022
[v1 (ISBI'21) + v2] MedMNIST: A Large-Scale Lightweight Benchmark for 2D and 3D Biomedical Image Classification

MedMNIST Project (Website) | Dataset (Zenodo) | Paper (arXiv) | MedMNIST v1 (ISBI'21) Jiancheng Yang, Rui Shi, Donglai Wei, Zequan Liu, Lin Zhao, Bili

683 Dec 28, 2022
Multi-Content GAN for Few-Shot Font Style Transfer at CVPR 2018

MC-GAN in PyTorch This is the implementation of the Multi-Content GAN for Few-Shot Font Style Transfer. The code was written by Samaneh Azadi. If you

Samaneh Azadi 422 Dec 04, 2022
Qlib is an AI-oriented quantitative investment platform

Qlib is an AI-oriented quantitative investment platform, which aims to realize the potential, empower the research, and create the value of AI technologies in quantitative investment.

Microsoft 10.1k Dec 30, 2022
Official code repository for the EMNLP 2021 paper

Integrating Visuospatial, Linguistic and Commonsense Structure into Story Visualization PyTorch code for the EMNLP 2021 paper "Integrating Visuospatia

Adyasha Maharana 23 Dec 19, 2022
pytorch implementation of GPV-Pose

GPV-Pose Pytorch implementation of GPV-Pose: Category-level Object Pose Estimation via Geometry-guided Point-wise Voting. (link) UPDATE A new version

40 Dec 01, 2022
Simple torch.nn.module implementation of Alias-Free-GAN style filter and resample

Alias-Free-Torch Simple torch module implementation of Alias-Free GAN. This repository including Alias-Free GAN style lowpass sinc filter @filter.py A

이준혁(Junhyeok Lee) 64 Dec 22, 2022
Implementation of 'lightweight' GAN, proposed in ICLR 2021, in Pytorch. High resolution image generations that can be trained within a day or two

512x512 flowers after 12 hours of training, 1 gpu 256x256 flowers after 12 hours of training, 1 gpu Pizza 'Lightweight' GAN Implementation of 'lightwe

Phil Wang 1.5k Jan 02, 2023
The source code of the ICCV2021 paper "PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering"

The source code of the ICCV2021 paper "PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering"

Ren Yurui 261 Jan 09, 2023
[NeurIPS'21] "AugMax: Adversarial Composition of Random Augmentations for Robust Training" by Haotao Wang, Chaowei Xiao, Jean Kossaifi, Zhiding Yu, Animashree Anandkumar, and Zhangyang Wang.

AugMax: Adversarial Composition of Random Augmentations for Robust Training Haotao Wang, Chaowei Xiao, Jean Kossaifi, Zhiding Yu, Anima Anandkumar, an

VITA 112 Nov 07, 2022