Video Instance Segmentation with a Propose-Reduce Paradigm (ICCV 2021)

Overview

Propose-Reduce VIS

This repo contains the official implementation for the paper:

Video Instance Segmentation with a Propose-Reduce Paradigm

Huaijia Lin*, Ruizheng Wu*, Shu Liu, Jiangbo Lu, Jiaya Jia

ICCV 2021 | Paper

TeaserImage

Installation

Please refer to INSTALL.md.

Demo

You can compute the VIS results for your own videos.

  1. Download pretrained weight.
  2. Put example videos in 'demo/inputs'. We support two types of inputs, frames directories or .mp4 files (see example for details).
  3. Run the following script and obtain the results in demo/outputs.
sh demo.sh

Data Preparation

(1) Download the videos and jsons of val set from YouTube-VIS 2019

(2) Download the videos and jsons of val set from YouTube-VIS 2021

(3) Symlink the corresponding dataset and json files to the data folder

mkdir data
data
├── valset_ytv19 --> /path/to/ytv2019/vos/valid/JPEGImages/ 
├── valid_ytv19.json --> /path/to/ytv2019/vis/valid.json
├── valset_ytv21 --> /path/to/ytv2021/vis/valid/JPEGImages/ 
├── valid_ytv21.json --> /path/to/ytv2021/vis/valid/instances.json

Results

We provide the results of several pretrained models and corresponding scripts on different backbones. The results have slight differences from the paper because we make minor modifications to the inference codes.

Download the pretrained models and put them in pretrained folder.

mkdir pretrained
Dataset Method Backbone CA Reduce AP [email protected] download
YouTube-VIS 2019 Seq Mask R-CNN ResNet-50 40.8 49.9 model | scripts
YouTube-VIS 2019 Seq Mask R-CNN ResNet-50 42.5 56.8 scripts
YouTube-VIS 2019 Seq Mask R-CNN ResNet-101 43.8 52.7 model | scripts
YouTube-VIS 2019 Seq Mask R-CNN ResNet-101 45.2 59.0 scripts
YouTube-VIS 2019 Seq Mask R-CNN ResNeXt-101 47.6 56.7 model | scripts
YouTube-VIS 2019 Seq Mask R-CNN ResNeXt-101 48.8 62.2 scripts
YouTube-VIS 2021 Seq Mask R-CNN ResNet-50 39.6 47.5 model | scripts
YouTube-VIS 2021 Seq Mask R-CNN ResNet-50 41.7 54.9 scripts
YouTube-VIS 2021 Seq Mask R-CNN ResNeXt-101 45.6 52.9 model | scripts
YouTube-VIS 2021 Seq Mask R-CNN ResNeXt-101 47.2 57.6 scripts

Evaluation

YouTube-VIS 2019: A json file will be saved in `../Results_ytv19' folder. Please zip and upload to the codalab server.

YouTube-VIS 2021: A json file will be saved in `../Results_ytv21' folder. Please zip and upload to the codalab server.

TODOs

Citation

If you find this work useful in your research, please cite:

@article{lin2021video,
  title={Video Instance Segmentation with a Propose-Reduce Paradigm},
  author={Lin, Huaijia and Wu, Ruizheng and Liu, Shu and Lu, Jiangbo and Jia, Jiaya},
  booktitle={IEEE International Conference on Computer Vision (ICCV)},
  year={2021}
}

Contact

If you have any questions regarding the repo, please feel free to contact me ([email protected]) or create an issue.

Acknowledgments

This repo is based on MMDetection, MaskTrackRCNN, STM, MMCV and COCOAPI.

Owner
DV Lab
Deep Vision Lab
DV Lab
Code and model benchmarks for "SEVIR : A Storm Event Imagery Dataset for Deep Learning Applications in Radar and Satellite Meteorology"

NeurIPS 2020 SEVIR Code for paper: SEVIR : A Storm Event Imagery Dataset for Deep Learning Applications in Radar and Satellite Meteorology Requirement

USAF - MIT Artificial Intelligence Accelerator 46 Dec 15, 2022
Mahadi-Now - This Is Pakistani Just Now Login Tools

PAKISTANI JUST NOW LOGIN TOOLS Install apt update apt upgrade apt install python

MAHADI HASAN AFRIDI 19 Apr 06, 2022
CoaT: Co-Scale Conv-Attentional Image Transformers

CoaT: Co-Scale Conv-Attentional Image Transformers Introduction This repository contains the official code and pretrained models for CoaT: Co-Scale Co

mlpc-ucsd 191 Dec 03, 2022
Solutions of Reinforcement Learning 2nd Edition

Solutions of Reinforcement Learning, An Introduction

YIFAN WANG 1.4k Dec 30, 2022
Optimizing Value-at-Risk and Conditional Value-at-Risk of Black Box Functions with Lacing Values (LV)

BayesOpt-LV Optimizing Value-at-Risk and Conditional Value-at-Risk of Black Box Functions with Lacing Values (LV) About This repository contains the s

1 Nov 11, 2021
Per-Pixel Classification is Not All You Need for Semantic Segmentation

MaskFormer: Per-Pixel Classification is Not All You Need for Semantic Segmentation Bowen Cheng, Alexander G. Schwing, Alexander Kirillov [arXiv] [Proj

Facebook Research 1k Jan 08, 2023
USAD - UnSupervised Anomaly Detection on multivariate time series

USAD - UnSupervised Anomaly Detection on multivariate time series Scripts and utility programs for implementing the USAD architecture. Implementation

116 Jan 04, 2023
A python software that can help blind people find things like laptops, phones, etc the same way a guide dog guides a blind person in finding his way.

GuidEye A python software that can help blind people find things like laptops, phones, etc the same way a guide dog guides a blind person in finding h

Munal Jain 0 Aug 09, 2022
This repository contains part of the code used to make the images visible in the article "How does an AI Imagine the Universe?" published on Towards Data Science.

Generative Adversarial Network - Generating Universe This repository contains part of the code used to make the images visible in the article "How doe

Davide Coccomini 9 Dec 18, 2022
Official Repsoitory for "Activate or Not: Learning Customized Activation." [CVPR 2021]

CVPR 2021 | Activate or Not: Learning Customized Activation. This repository contains the official Pytorch implementation of the paper Activate or Not

184 Dec 27, 2022
Source code for "Pack Together: Entity and Relation Extraction with Levitated Marker"

PL-Marker Source code for Pack Together: Entity and Relation Extraction with Levitated Marker. Quick links Overview Setup Install Dependencies Data Pr

THUNLP 173 Dec 30, 2022
My implementation of Fully Convolutional Neural Networks in Keras

Keras-FCN This repository contains my implementation of Fully Convolutional Networks in Keras (Tensorflow backend). Currently, semantic segmentation c

The Duy Nguyen 15 Jan 13, 2020
the official implementation of the paper "Isometric Multi-Shape Matching" (CVPR 2021)

Isometric Multi-Shape Matching (IsoMuSh) Paper-CVF | Paper-arXiv | Video | Code Citation If you find our work useful in your research, please consider

Maolin Gao 9 Jul 17, 2022
This is the official repository of XVFI (eXtreme Video Frame Interpolation)

XVFI This is the official repository of XVFI (eXtreme Video Frame Interpolation), https://arxiv.org/abs/2103.16206 Last Update: 20210607 We provide th

Jihyong Oh 195 Dec 29, 2022
Unofficial pytorch implementation of paper "One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing"

One-Shot Free-View Neural Talking Head Synthesis Unofficial pytorch implementation of paper "One-Shot Free-View Neural Talking-Head Synthesis for Vide

ZLH 406 Dec 23, 2022
TransZero++: Cross Attribute-guided Transformer for Zero-Shot Learning

TransZero++ This repository contains the testing code for the paper "TransZero++: Cross Attribute-guided Transformer for Zero-Shot Learning" submitted

Shiming Chen 6 Aug 16, 2022
FLVIS: Feedback Loop Based Visual Initial SLAM

FLVIS Feedback Loop Based Visual Inertial SLAM 1-Video EuRoC DataSet MH_05 Handheld Test in Lab FlVIS on UAV Platform 2-Relevent Publication: Under Re

UAV Lab - HKPolyU 182 Dec 04, 2022
Implementation of the paper: "SinGAN: Learning a Generative Model from a Single Natural Image"

SinGAN This is an unofficial implementation of SinGAN from someone who's been sitting right next to SinGAN's creator for almost five years. Please ref

35 Nov 10, 2022
The repo contains the code to train and evaluate a system which extracts relations and explanations from dialogue.

The repo contains the code to train and evaluate a system which extracts relations and explanations from dialogue. How do I cite D-REX? For now, cite

Alon Albalak 6 Mar 31, 2022