SparseInst: Sparse Instance Activation for Real-Time Instance Segmentation, CVPR 2022

Overview

SparseInst 🚀

A simple framework for real-time instance segmentation, CVPR 2022
by
Tianheng Cheng, Xinggang Wang, Shaoyu Chen, Wenqiang Zhang, Qian Zhang, Chang Huang, Zhaoxiang Zhang, Wenyu Liu
(: corresponding author)

Highlights



PWC

  • SparseInst presents a new object representation method, i.e., Instance Activation Maps (IAM), to adaptively highlight informative regions of objects for recognition.
  • SparseInst is a simple, efficient, and fully convolutional framework without non-maximum suppression (NMS) or sorting, and easy to deploy!
  • SparseInst achieves good trade-off between speed and accuracy, e.g., 37.9 AP and 40 FPS with 608x input.

Updates

This project is under active development, please stay tuned!

  • [2022-4-29]: We fix the common issue about the visualization demo.py, e.g., ValueError: GenericMask cannot handle ....

  • [2022-4-7]: We provide the demo code for visualization and inference on images. Besides, we have added more backbones for SparseInst, including ResNet-101, CSPDarkNet, and PvTv2. We are still supporting more backbones.

  • [2022-3-25]: We have released the code and models for SparseInst!

Overview

SparseInst is a conceptually novel, efficient, and fully convolutional framework for real-time instance segmentation. In contrast to region boxes or anchors (centers), SparseInst adopts a sparse set of instance activation maps as object representation, to highlight informative regions for each foreground objects. Then it obtains the instance-level features by aggregating features according to the highlighted regions for recognition and segmentation. The bipartite matching compels the instance activation maps to predict objects in a one-to-one style, thus avoiding non-maximum suppression (NMS) in post-processing. Owing to the simple yet effective designs with instance activation maps, SparseInst has extremely fast inference speed and achieves 40 FPS and 37.9 AP on COCO (NVIDIA 2080Ti), significantly outperforms the counter parts in terms of speed and accuracy.

Models

We provide two versions of SparseInst, i.e., the basic IAM (3x3 convolution) and the Group IAM (G-IAM for short), with different backbones. All models are trained on MS-COCO train2017.

Fast models

model backbone input aug APval AP FPS weights
SparseInst R-50 640 32.8 33.2 44.3 model
SparseInst R-50-vd 640 34.1 34.5 42.6 model
SparseInst (G-IAM) R-50 608 33.4 34.0 44.6 model
SparseInst (G-IAM) R-50 608 34.2 34.7 44.6 model
SparseInst (G-IAM) R-50-DCN 608 36.4 36.8 41.6 model
SparseInst (G-IAM) R-50-vd 608 35.6 36.1 42.8 model
SparseInst (G-IAM) R-50-vd-DCN 608 37.4 37.9 40.0 model
SparseInst (G-IAM) R-50-vd-DCN 640 37.7 38.1 39.3 model

Larger models

model backbone input aug APval AP FPS weights
SparseInst (G-IAM) R-101 640 34.9 35.5 - model
SparseInst (G-IAM) R-101-DCN 640 36.4 36.9 - model

SparseInst with Vision Transformers

model backbone input aug APval AP FPS weights
SparseInst (G-IAM) PVTv2-B1 640 35.3 36.0 33.5 (48.9) model
SparseInst (G-IAM) PVTv2-B2-li 640 37.2 38.2 26.5 model

: measured on RTX 3090.

Note:

  • We will continue adding more models including more efficient convolutional networks, vision transformers, and larger models for high performance and high speed, please stay tuned 😁 !
  • Inference speeds are measured on one NVIDIA 2080Ti unless specified.
  • We haven't adopt TensorRT or other tools to accelerate the inference of SparseInst. However, we are working on it now and will provide support for ONNX, TensorRT, MindSpore, Blade, and other frameworks as soon as possible!
  • AP denotes AP evaluated on MS-COCO test-dev2017
  • input denotes the shorter side of the input, e.g., 512x864 and 608x864, we keep the aspect ratio of the input and the longer side is no more than 864.
  • The inference speed might slightly change on different machines (2080 Ti) and different versions of detectron (we mainly use v0.3). If the change is sharp, e.g., > 5ms, please feel free to contact us.
  • For aug (augmentation), we only adopt the simple random crop (crop size: [384, 600]) provided by detectron2.
  • We adopt weight decay=5e-2 as default setting, which is slightly different from the original paper.
  • [Weights on BaiduPan]: we also provide trained models on BaiduPan: ShareLink (password: lkdo).

Installation and Prerequisites

This project is built upon the excellent framework detectron2, and you should install detectron2 first, please check official installation guide for more details.

Note: we mainly use v0.3 of detectron2 for experiments and evaluations. Besides, we also test our code on the newest version v0.6. If you find some bugs or incompatibility problems of higher version of detectron2, please feel free to raise a issue!

Install the detectron2:

git clone https://github.com/facebookresearch/detectron2.git
# if you swith to a specific version, e.g., v0.3 (recommended)
git checkout tags/v0.3
# build detectron2
python setup.py build develop

Getting Start

Testing SparseInst

Before testing, you should specify the config file <CONFIG> and the model weights <MODEL-PATH>. In addition, you can change the input size by setting the INPUT.MIN_SIZE_TEST in both config file or commandline.

  • [Performance Evaluation] To obtain the evaluation results, e.g., mask AP on COCO, you can run:
python train_net.py --config-file <CONFIG> --num-gpus <GPUS> --eval MODEL.WEIGHTS <MODEL-PATH>
# example:
python train_net.py --config-file configs/sparse_inst_r50_giam.yaml --num-gpus 8 --eval MODEL.WEIGHTS sparse_inst_r50_giam_aug_2b7d68.pth
  • [Inference Speed] To obtain the inference speed (FPS) on one GPU device, you can run:
python test_net.py --config-file <CONFIG> MODEL.WEIGHTS <MODEL-PATH> INPUT.MIN_SIZE_TEST 512
# example:
python test_net.py --config-file configs/sparse_inst_r50_giam.yaml MODEL.WEIGHTS sparse_inst_r50_giam_aug_2b7d68.pth INPUT.MIN_SIZE_TEST 512

Note:

  • The test_net.py only supports 1 GPU and 1 image per batch for measuring inference speed.
  • The inference time consists of the pure forward time and the post-processing time. While the evaluation processing, data loading, and pre-processing for wrappers (e.g., ImageList) are not included.
  • COCOMaskEvaluator is modified from COCOEvaluator for evaluating mask-only results.

Visualizing Images with SparseInst

To inference or visualize the segmentation results on your images, you can run:

python demo.py --config-file <CONFIG> --input <IMAGE-PATH> --output results --opts MODEL.WEIGHTS <MODEL-PATH>
# example
python demo.py --config-file configs/sparse_inst_r50_giam.yaml --input datasets/coco/val2017/* --output results --opt MODEL.WEIGHTS sparse_inst_r50_giam_aug_2b7d68.pth INPUT.MIN_SIZE_TEST 512
  • Besides, the demo.py also supports inference on video (--video-input), camera (--webcam). For inference on video, you might refer to issue #9 to avoid someerrors.
  • --opts supports modifications to the config-file, e.g., INPUT.MIN_SIZE_TEST 512.
  • --input can be single image or a folder of images, e.g., xxx/*.
  • If --output is not specified, a popup window will show the visualization results for each image.
  • Lowering the confidence-threshold will show more instances but with more false positives.

Visualization results (SparseInst-R50-GIAM)

Training SparseInst

To train the SparseInst model on COCO dataset with 8 GPUs. 8 GPUs are required for the training. If you only have 4 GPUs or GPU memory is limited, it doesn't matter and you can reduce the batch size through SOLVER.IMS_PER_BATCH or reduce the input size. If you adjust the batch size, learning schedule should be adjusted according to the linear scaling rule.

python train_net.py --config-file <CONFIG> --num-gpus 8 
# example
python train_net.py --config-file configs/sparse_inst_r50vd_dcn_giam_aug.yaml --num-gpus 8

Acknowledgements

SparseInst is based on detectron2, OneNet, DETR, and timm, and we sincerely thanks for their code and contribution to the community!

Citing SparseInst

If you find SparseInst is useful in your research or applications, please consider giving us a star 🌟 and citing SparseInst by the following BibTeX entry.

@inproceedings{Cheng2022SparseInst,
  title     =   {Sparse Instance Activation for Real-Time Instance Segmentation},
  author    =   {Cheng, Tianheng and Wang, Xinggang and Chen, Shaoyu and Zhang, Wenqiang and Zhang, Qian and Huang, Chang and Zhang, Zhaoxiang and Liu, Wenyu},
  booktitle =   {Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR)},
  year      =   {2022}
}

License

SparseInst is released under the MIT Licence.

Owner
Hust Visual Learning Team
Hust Visual Learning Team belongs to the Artificial Intelligence Research Institute in the School of EIC in HUST, Lead by @xinggangw
Hust Visual Learning Team
A rule-based log analyzer & filter

Flog 一个根据规则集来处理文本日志的工具。 前言 在日常开发过程中,由于缺乏必要的日志规范,导致很多人乱打一通,一个日志文件夹解压缩后往往有几十万行。 日志泛滥会导致信息密度骤减,给排查问题带来了不小的麻烦。 以前都是用grep之类的工具先挑选出有用的,再逐条进行排查,费时费力。在忍无可忍之后决

上山打老虎 9 Jun 23, 2022
Fast RFC3339 compliant Python date-time library

udatetime: Fast RFC3339 compliant date-time library Handling date-times is a painful act because of the sheer endless amount of formats used by people

Simon Pirschel 235 Oct 25, 2022
Character Grounding and Re-Identification in Story of Videos and Text Descriptions

Character in Story Identification Network (CiSIN) This project hosts the code for our paper. Youngjae Yu, Jongseok Kim, Heeseung Yun, Jiwan Chung and

8 Dec 09, 2022
Fast, general, and tested differentiable structured prediction in PyTorch

Fast, general, and tested differentiable structured prediction in PyTorch

HNLP 1.1k Dec 16, 2022
A PyTorch-based R-YOLOv4 implementation which combines YOLOv4 model and loss function from R3Det for arbitrary oriented object detection.

R-YOLOv4 This is a PyTorch-based R-YOLOv4 implementation which combines YOLOv4 model and loss function from R3Det for arbitrary oriented object detect

94 Dec 03, 2022
No-Reference Image Quality Assessment via Transformers, Relative Ranking, and Self-Consistency

This repository contains the implementation for the paper: No-Reference Image Quality Assessment via Transformers, Relative Ranking, and Self-Consiste

Alireza Golestaneh 75 Dec 30, 2022
Text Extraction Formulation + Feedback Loop for state-of-the-art WSD (EMNLP 2021)

ConSeC is a novel approach to Word Sense Disambiguation (WSD), accepted at EMNLP 2021. It frames WSD as a text extraction task and features a feedback loop strategy that allows the disambiguation of

Sapienza NLP group 36 Dec 13, 2022
Measuring Coding Challenge Competence With APPS

Measuring Coding Challenge Competence With APPS This is the repository for Measuring Coding Challenge Competence With APPS by Dan Hendrycks*, Steven B

Dan Hendrycks 218 Dec 27, 2022
PyJokes - Joking around with Python library pyjokes

Hi, it's Muhaimin again 👋 This is something unorthodox but cool. Don't forget t

Muhaimin A. Salay Kanton 1 Feb 02, 2022
Realtime micro-expression recognition using OpenCV and PyTorch

Micro-expression Recognition Realtime micro-expression recognition from scratch using OpenCV and PyTorch Try it out with a webcam or video using the e

Irfan 35 Dec 05, 2022
Implementation of Shape and Electrostatic similarity metric in deepFMPO.

DeepFMPO v3D Code accompanying the paper "On the value of using 3D-shape and electrostatic similarities in deep generative methods". The paper can be

34 Nov 28, 2022
Transport Mode detection - can detect the mode of transport with the help of features such as acceeration,jerk etc

title emoji colorFrom colorTo sdk app_file pinned Transport_Mode_Detector 🚀 purple yellow gradio app.py false Configuration title: string Display tit

Nishant Rajadhyaksha 3 Jan 16, 2022
Message Passing on Cell Complexes

CW Networks This repository contains the code used for the papers Weisfeiler and Lehman Go Cellular: CW Networks (Under review) and Weisfeiler and Leh

Twitter Research 108 Jan 05, 2023
official code for dynamic convolution decomposition

Revisiting Dynamic Convolution via Matrix Decomposition (ICLR 2021) A pytorch implementation of DCD. If you use this code in your research please cons

Yunsheng Li 110 Nov 23, 2022
One implementation of the paper "DMRST: A Joint Framework for Document-Level Multilingual RST Discourse Segmentation and Parsing".

Introduction One implementation of the paper "DMRST: A Joint Framework for Document-Level Multilingual RST Discourse Segmentation and Parsing". Users

seq-to-mind 18 Dec 11, 2022
[AAAI-2021] Visual Boundary Knowledge Translation for Foreground Segmentation

Trans-Net Code for (Visual Boundary Knowledge Translation for Foreground Segmentation, AAAI2021). [https://ojs.aaai.org/index.php/AAAI/article/view/16

ZJU-VIPA 2 Mar 04, 2022
In this project, we create and implement a deep learning library from scratch.

ARA In this project, we create and implement a deep learning library from scratch. Table of Contents Deep Leaning Library Table of Contents About The

22 Aug 23, 2022
Unofficial Pytorch Lightning implementation of Contrastive Syn-to-Real Generalization (ICLR, 2021)

Unofficial Pytorch Lightning implementation of Contrastive Syn-to-Real Generalization (ICLR, 2021)

Gyeongjae Choi 17 Sep 23, 2021
Implementation of Cross Transformer for spatially-aware few-shot transfer, in Pytorch

Cross Transformers - Pytorch (wip) Implementation of Cross Transformer for spatially-aware few-shot transfer, in Pytorch Install $ pip install cross-t

Phil Wang 40 Dec 22, 2022
Python-experiments - A Repository which contains python scripts to automate things and make your life easier with python

Python Experiments A Repository which contains python scripts to automate things

Vivek Kumar Singh 11 Sep 25, 2022