MaskTrackRCNN for video instance segmentation based on mmdetection

Last update: Jan 05, 2023

Related tags

Overview

MaskTrackRCNN for video instance segmentation

Introduction

This repo serves as the official code release of the MaskTrackRCNN model for video instance segmentation described in the tech report:

@article{ Yang2019vis,
  author = {Linjie Yang and Yuchen Fan and Ning Xu},  
  title = {Video instance segmentation},
  journal = {CoRR},
  volume = {abs/1905.04804},
  year = {2019},
  url = {https://arxiv.org/abs/1905.04804}
}

In this work, a new task video instance segmentation is presented. Video instance segmentation extends the image instance segmentation task from the image domain to the video domain. The new problem aims at simultaneous detection, segmentation and tracking of object instances in videos. YouTubeVIS, a new dataset tailored for this task is collected based on the current largest video object segmentation dataset YouTubeVOS. Sample annotations of a video clip can be seen below. We also proposed an algorithm to jointly detect, segment, and track object instances in a video, named MaskTrackRCNN. A tracking head is added to the original MaskRCNN model to match objects across frames. An overview of the algorithm is shown below.

Installation

This repo is built based on mmdetection commit hash f3a939f. Please refer to INSTALL.md to install the library. You also need to install a customized COCO API for YouTubeVIS dataset. You can use following commands to create conda env with all dependencies.

conda create -n MaskTrackRCNN -y
conda activate MaskTrackRCNN
conda install -c pytorch pytorch=0.4.1 torchvision cuda92 -y
conda install -c conda-forge cudatoolkit-dev=9.2 opencv -y
conda install cython -y
pip install git+https://github.com/youtubevos/cocoapi.git#"egg=pycocotools&subdirectory=PythonAPI"
bash compile.sh
pip install .

You may also need to follow #1 to load MSCOCO pretrained models.

Model training and evaluation

Our model is based on MaskRCNN-resnet50-FPN. The model is trained end-to-end on YouTubeVIS based on a MSCOCO pretrained checkpoint (link).

Training

Download YouTubeVIS from here.
Symlink the train/validation dataset to $MMDETECTION/data folder. Put COCO-style annotations under $MMDETECTION/data/annotations.

mmdetection
├── mmdet
├── tools
├── configs
├── data
│   ├── train
│   ├── val
│   ├── annotations
│   │   ├── instances_train_sub.json
│   │   ├── instances_val_sub.json

Run python3 tools/train.py configs/masktrack_rcnn_r50_fpn_1x_youtubevos.py to train the model. For reference to arguments such as learning rate and model parameters, please refer to configs/masktrack_rcnn_r50_fpn_1x_youtubevos.py

Evaluation

Our pretrained model is available for download at Google Drive. Run the following command to evaluate the model on YouTubeVIS.

python3 tools/test_video.py configs/masktrack_rcnn_r50_fpn_1x_youtubevos.py [MODEL_PATH] --out [OUTPUT_PATH] --eval segm

A json file containing the predicted result will be generated as OUTPUT_PATH.json. YouTubeVIS currently only allows evaluation on the codalab server. Please upload the generated result to codalab server to see actual performances.

License

This project is released under the Apache 2.0 license.

Contact

If you have any questions regarding the repo, please contact Linjie Yang ([email protected]) or create an issue.

MaskTrackRCNN for video instance segmentation based on mmdetection

Related tags

Overview

MaskTrackRCNN for video instance segmentation

Introduction

Installation

Model training and evaluation

Training

Evaluation

License

Contact

Owner

Hands-On Machine Learning for Algorithmic Trading, published by Packt

Official code for "End-to-End Optimization of Scene Layout" -- including VAE, Diff Render, SPADE for colorization (CVPR 2020 Oral)

Finite difference solution of 2D Poisson equation. Can handle Dirichlet, Neumann and mixed boundary conditions.

The first machine learning framework that encourages learning ML concepts instead of memorizing class functions.

SlotRefine: A Fast Non-Autoregressive Model forJoint Intent Detection and Slot Filling

Official code of the paper "Expanding Low-Density Latent Regions for Open-Set Object Detection" (CVPR 2022)

duralava is a neural network which can simulate a lava lamp in an infinite loop.

Official repository of the paper "A Variational Approximation for Analyzing the Dynamics of Panel Data". Mixed Effect Neural ODE. UAI 2021.

Spectral normalization (SN) is a widely-used technique for improving the stability and sample quality of Generative Adversarial Networks (GANs)

Programming with Neural Surrogates of Programs

Example scripts for the detection of lanes using the ultra fast lane detection model in ONNX.

Expert Finding in Legal Community Question Answering

Open-source python package for the extraction of Radiomics features from 2D and 3D images and binary masks.

The official re-implementation of the Neurips 2021 paper, "Targeted Neural Dynamical Modeling".

Reproducible research and reusable acyclic workflows in Python. Execute code on HPC systems as if you executed them on your personal computer!

Rethinking the U-Net architecture for multimodal biomedical image segmentation

Fast and accurate optimisation for registration with little learningconvexadam

This is an implementation of PIFuhd based on Pytorch

Related resources for our EMNLP 2021 paper

Codes of paper "Unseen Object Amodal Instance Segmentation via Hierarchical Occlusion Modeling"