DCT-Mask: Discrete Cosine Transform Mask Representation for Instance Segmentation

Related tags

Deep LearningDCT-Mask
Overview

DCT-Mask: Discrete Cosine Transform Mask Representation for Instance Segmentation

This project hosts the code for implementing the DCT-MASK algorithms for instance segmentation.

[DCT-Mask: Discrete Cosine Transform Mask Representation for Instance Segmentation] Xing Shen*, Jirui Yang*, Chunbo Wei, Bing Deng, Jianqiang Huang, Xiansheng Hua Xiaoliang Cheng, Kewei Liang

In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition(CVPR 2021)

arXiv preprint(arXiv:2011.09876)

Contributions

  • We propose a high-quality and low-complexity mask representation for instance segmentation, which encodes the high-resolution binary mask into a compact vector with discrete cosine transform.
  • With slight modifications, DCT-Mask could be integrated into most pixel-based frameworks, and achieve significant and consistent improvement on different datasets, backbones, and training schedules. Specifically, it obtains more improvements for more complex backbones and higher-quality annotations.
  • DCT-Mask does not require extra pre-processing or pre-training. It achieves high-resolution mask prediction at a speed similar to low-resolution.

Installation

Requirements

  • PyTorch ≥ 1.5 and fvcore == 0.1.1.post20200716

This implementation is based on detectron2. Please refer to INSTALL.md. for installation and dataset preparation.

Usage

The codes of this project is on projects/DCT_Mask/

Train with multiple GPUs

cd ./projects/DCT_Mask/
./train1.sh

Testing

cd ./projects/DCT_Mask/
./test1.sh

Model ZOO

Trained models on COCO

Model Backbone Schedule Multi-scale training Inference time (s/im) AP (minival) Link
DCT-Mask R-CNN R50 1x Yes 0.0465 36.5 download(Fetch code: xpdm)
DCT-Mask R-CNN R101 3x Yes 0.0595 39.9 download(Fetch code: 7q6x)
DCT-Mask R-CNN RX101 3x Yes 0.1049 41.2 download(Fetch code: ufw2)
Casecade DCT-Mask R-CNN R50 1x Yes 0.0630 37.5 download(Fetch code: yqxp)
Casecade DCT-Mask R-CNN R101 3x Yes 0.0750 40.8 download(Fetch code: r8xv)
Casecade DCT-Mask R-CNN RX101 3x Yes 0.1195 42.0 download(Fetch code: pdej)

Trained models on Cityscapes

Model Data Backbone Schedule Multi-scale training AP (val) Link
DCT-Mask R-CNN Fine-Only R50 1x Yes 37.0 download(Fetch code: dn7i)
DCT-Mask R-CNN CoCo-Pretrain +Fine R50 1x Yes 39.6 download(Fetch code: ntqf)

Notes

  • We observe about 0.2 AP noise in COCO.
  • High variance observed in CityScapes when trained on fine annotations only. We report the median of 5 runs AP in the article (i.e. 35.6), while in this repo we report the best results (37.0).
  • Initialized from COCO pre-training will reduce the variance on CityScapes as well as increasing mask AP.
  • The inference time is measured on single GPU with batchsize 1. All GPUs are NVIDIA V100.
  • Lvis 0.5 is used for evaluation.

Contributing to the project

Any pull requests or issues are welcome.

If there is any problem with this project, please contact Xing Shen.

Citations

Please consider citing our papers in your publications if the project helps your research.

License

  • MIT License.
Owner
Alibaba Cloud
More Than Just Cloud
Alibaba Cloud
Fashion Recommender System With Python

Fashion-Recommender-System Thr growing e-commerce industry presents us with a la

Omkar Gawade 2 Feb 02, 2022
Transformer part of 12th place solution in Riiid! Answer Correctness Prediction

kaggle_riiid Transformer part of 12th place solution in Riiid! Answer Correctness Prediction. Please see here for more information. Execution You need

Sakami Kosuke 2 Apr 23, 2022
Gif-caption - A straightforward GIF Captioner written in Python

Broksy's GIF Captioner Have you ever wanted to easily caption a GIF without havi

3 Apr 09, 2022
Deep Learning Models for Causal Inference

Extensive tutorials for learning how to build deep learning models for causal inference using selection on observables in Tensorflow 2.

Bernard J Koch 151 Dec 31, 2022
Code for paper Decoupled Dynamic Spatial-Temporal Graph Neural Network for Traffic Forecasting

Decoupled Spatial-Temporal Graph Neural Networks Code for our paper: Decoupled Dynamic Spatial-Temporal Graph Neural Network for Traffic Forecasting.

S22 43 Jan 04, 2023
Joint Channel and Weight Pruning for Model Acceleration on Mobile Devices

Joint Channel and Weight Pruning for Model Acceleration on Mobile Devices Abstract For practical deep neural network design on mobile devices, it is e

11 Dec 30, 2022
Language-Driven Semantic Segmentation

Language-driven Semantic Segmentation (LSeg) The repo contains official PyTorch Implementation of paper Language-driven Semantic Segmentation. Authors

Intelligent Systems Lab Org 416 Jan 03, 2023
PyTorch Implementation of the paper Learning to Reweight Examples for Robust Deep Learning

Learning to Reweight Examples for Robust Deep Learning Unofficial PyTorch implementation of Learning to Reweight Examples for Robust Deep Learning. Th

Daniel Stanley Tan 325 Dec 28, 2022
TrackTech: Real-time tracking of subjects and objects on multiple cameras

TrackTech: Real-time tracking of subjects and objects on multiple cameras This project is part of the 2021 spring bachelor final project of the Bachel

5 Jun 17, 2022
People log into different sites every day to get information and browse through these sites one by one

HyperLink People log into different sites every day to get information and browse through these sites one by one. And they are exposed to advertisemen

0 Feb 17, 2022
Deep Learning for Human Part Discovery in Images - Chainer implementation

Deep Learning for Human Part Discovery in Images - Chainer implementation NOTE: This is not official implementation. Original paper is Deep Learning f

Shintaro Shiba 63 Sep 25, 2022
A Dynamic Residual Self-Attention Network for Lightweight Single Image Super-Resolution

DRSAN A Dynamic Residual Self-Attention Network for Lightweight Single Image Super-Resolution Karam Park, Jae Woong Soh, and Nam Ik Cho Environments U

4 May 10, 2022
Low-code/No-code approach for deep learning inference on devices

EzEdgeAI A concept project that uses a low-code/no-code approach to implement deep learning inference on devices. It provides a componentized framewor

On-Device AI Co., Ltd. 7 Apr 05, 2022
A neuroanatomy-based augmented reality experience powered by computer vision. Features 3D visuals of the Atlas Brain Map slices.

Brain Augmented Reality (AR) A neuroanatomy-based augmented reality experience powered by computer vision that features 3D visuals of the Atlas Brain

Yasmeen Brain 10 Oct 06, 2022
Shuffle Attention for MobileNetV3

SA-MobileNetV3 Shuffle Attention for MobileNetV3 Train Run the following command for train model on your own dataset: python train.py --dataset mnist

Sajjad Aemmi 36 Dec 28, 2022
Symbolic Music Generation with Diffusion Models

Symbolic Music Generation with Diffusion Models Supplementary code release for our work Symbolic Music Generation with Diffusion Models. Installation

Magenta 119 Jan 07, 2023
Dataset and Code for the paper "DepthTrack: Unveiling the Power of RGBD Tracking" (ICCV2021), and "Depth-only Object Tracking" (BMVC2021)

DeT and DOT Code and datasets for "DepthTrack: Unveiling the Power of RGBD Tracking" (ICCV2021) "Depth-only Object Tracking" (BMVC2021) @InProceedings

Yan Song 55 Dec 15, 2022
Code of PVTv2 is released! PVTv2 largely improves PVTv1 and works better than Swin Transformer with ImageNet-1K pre-training.

Updates (2020/06/21) Code of PVTv2 is released! PVTv2 largely improves PVTv1 and works better than Swin Transformer with ImageNet-1K pre-training. Pyr

1.3k Jan 04, 2023
“Robust Lightweight Facial Expression Recognition Network with Label Distribution Training”, AAAI 2021.

EfficientFace Zengqun Zhao, Qingshan Liu, Feng Zhou. "Robust Lightweight Facial Expression Recognition Network with Label Distribution Training". AAAI

Zengqun Zhao 119 Jan 08, 2023
CZU-MHAD: A multimodal dataset for human action recognition utilizing a depth camera and 10 wearable inertial sensors

CZU-MHAD: A multimodal dataset for human action recognition utilizing a depth camera and 10 wearable inertial sensors   In order to facilitate the res

yujmo 11 Dec 12, 2022