Proposal, Tracking and Segmentation (PTS): A Cascaded Network for Video Object Segmentation

Related tags

Deep LearningPTSNet
Overview

Proposal, Tracking and Segmentation (PTS): A Cascaded Network for Video Object Segmentation

By Qiang Zhou*, Zilong Huang*, Lichao Huang, Han Shen, Yongchao Gong, Chang Huang, Wenyu Liu, Xinggang Wang.(* means equal contribution)

This code is the implementation mainly for DAVIS 2017 dataset. For more detail, please refer to our paper.

Architecture


Overview of our proposed PTSNet for video object segmentation. OPN is designed for generating proposals of the interested objects and OTN aims to distinguish which one of the proposals is the best. Finally, DRSN does the final pixel level tracking(segmentation) task. Note in our implementation we couple OPN and OTN as a whole network, and spearate DRSN out under engineering consideration.

Usage

Preparation

  1. Install PyTorch 1.0 and necessary libraries like opencv, PIL etc.

  2. There are some native CUDA implementations, InPlace-ABN and MaskRCNN Operators, which must be compiled at the very start.

    # Before you compile, you need to figure out several things:
    # - The CUDA kernels supported by your GPU, here we use `sm_52`, `sm_61` and `sm_70` for NVIDIA Titan V.
    # - `cuda` and `nvcc` paths in your operating system, which exist usually in `/usr/local/cuda` and `/usr/local/cuda/bin/nvcc` respectively.
    # InPlace-ABN_0.4   (PyTorch 0.4)
    cd model/inplace_ABN_0.4
    bash build.sh
    # OR you could choose the 1.0 version of inplace ABN.
    # InPlace-ABN_1.0   (PyTorch 1.0)
    cd model/inplace_ABN    # It is dynamically compiled when running (gcc > 4.9)
    
    # MaskRCNN Operators (PyTorch 0.4)
    cd coupled_otn_opn/tracking/maskrcnn/lib
    bash make.sh
  3. You can train PTSNet from scratch or just evaluate our pretrained model.

    • Train it from scratch, you need to download:

       # DRSN: wget "https://download.pytorch.org/models/resnet50-19c8e357.pth" -O drsn/init_models/resnet50-19c8e357.pth
       # OPN: wget "https://drive.google.com/open?id=1ma1fNmEvS9dJLOIcm1FRzYofVS_t3aI3" -O coupled_otn_opn/tracking/maskrcnn/data/X-152-32x8d-IN5k.pkl
       # If you want to use our pretrained OTN:
       #   wget https://drive.google.com/open?id=12bF1dRlEUZoQz3Qcr2WD3ojqNHzbCrjf, put it into `coupled_otn_opn/models/mdnet_davis_50cyche.pth`
       # Else please modify from py-MDNet(https://github.com/HyeonseobNam/py-MDNet) to train OTN on DAVIS by yourself.
    • If you want to use our pretrained model to do the evaluation, you need to download:

       # DRSN: https://drive.google.com/open?id=116yXnqX43BZ7kEgdzUhIeTSn1dbvcE2F, put it into `drsn/snapshots/drsn_yvos_10w_davis_3p5w.pth`
       # OPN: wget "https://drive.google.com/open?id=1ma1fNmEvS9dJLOIcm1FRzYofVS_t3aI3" -O coupled_otn_opn/tracking/maskrcnn/data/X-152-32x8d-IN5k.pkl
       # OTN: https://drive.google.com/open?id=12bF1dRlEUZoQz3Qcr2WD3ojqNHzbCrjf, put it into `coupled_otn_opn/models/mdnet_davis_50cycle.pth`
  4. Dataset

    • YouTube-VOS: Download from YouTube-VOS, note we only need the training part(train_all_frames.zip), totally about 41G. Unzip, move and rename it to drsn/dataset/yvos.
    • DAVIS: Download from DAVIS, note we only need the 480p version(DAVIS-2017-trainval-480p.zip). Unzip, move and rename it to drsn/dataset/DAVIS/trainval and coupled_otn_opn/DAVIS/trainval. Here you need to make a subdirectory of trainval directory to store the dataset.

    And make sure to put the files as the following structure:

    .
    ├── drsn
    │   ├── dataset
    │   │   ├── DAVIS
    │   │   │   └── trainval
    │   │   │       ├── Annotations
    │   │   │       ├── ImageSets
    │   │   │       └── JPEGImages
    │   │   └── yvos
    │   │       └── train_all_frames
    │   ├── init_model
    │   │   └── resnet50-19c8e357.pth
    │   └── snapshots
    │       └── drsn_yvos_10w_davis_3p5w.pth
    └── coupled_otn_opn
        ├── DAVIS
        │   └── trainval
        ├── models
        │   └── mdnet_davis_50cycle.pth
        └── tracking
            └── maskrcnn
                └── data
                    └── X-152-32x8d-FPN-IN5k.pkl
    

Train and Evaluate

  • Firstly, check the directory of coupled_otn_opn and follow the README.md inside to generate our proposals. You can also skip this step for we have provided generated proposals in drsn/dataset/result_davis directory.
  • Secondly, enter drsn and check do_train_eval.sh to train and evaluate.
  • Finally, we also provide result masks by our PTSNet in result-masks-GoogleDrive. The quantitative results are measured by DAVIS official matlab toolbox.
J Mean F Mean G Mean
Avg 71.6 77.7 74.7

Acknowledgment

The work was mainly done during an internship at Horizon Robotics.

Citing PTSNet

If you find PTSNet useful in your research, please consider citing:

@article{ptsnet2019,
        title={Proposal, Tracking and Segmentation (PTS): A Cascaded Network for Video Object Segmentation},
        author={Zhou, Qiang and Huang, Zilong and Huang, Lichao and Han, Shen and Gong, Yongchao and Huang, Chang and Liu, Wenyu and Wang, Xinggang},
        journal = {arXiv preprint arXiv:1907.01203v2},
        year={2019}
        }

Thanks to the Third Party Libs

Owner
Forest
If a bullet's going to get you, it has already been fired.
Forest
This repository is for the preprint "A generative nonparametric Bayesian model for whole genomes"

BEAR Overview This repository contains code associated with the preprint A generative nonparametric Bayesian model for whole genomes (2021), which pro

Debora Marks Lab 10 Sep 18, 2022
deep_image_prior_extension

Code for "Is Deep Image Prior in Need of a Good Education?" Project page: https://jleuschn.github.io/docs.educated_deep_image_prior/. Supplementary Ma

riccardo barbano 7 Jan 09, 2022
Self-Supervised Multi-Frame Monocular Scene Flow (CVPR 2021)

Self-Supervised Multi-Frame Monocular Scene Flow 3D visualization of estimated depth and scene flow (overlayed with input image) from temporally conse

Visual Inference Lab @TU Darmstadt 85 Dec 22, 2022
A programming language written with python

Kaoft A programming language written with python How to use A simple Hello World: c="Hello World" c Output: "Hello World" Operators: a=12

1 Jan 24, 2022
A parallel framework for population-based multi-agent reinforcement learning.

MALib: A parallel framework for population-based multi-agent reinforcement learning MALib is a parallel framework of population-based learning nested

MARL @ SJTU 348 Jan 08, 2023
Self-supervised spatio-spectro-temporal represenation learning for EEG analysis

EEG-Oriented Self-Supervised Learning and Cluster-Aware Adaptation This repository provides a tensorflow implementation of a submitted paper: EEG-Orie

Wonjun Ko 4 Jun 09, 2022
An implementation of "Optimal Textures: Fast and Robust Texture Synthesis and Style Transfer through Optimal Transport"

Optex An implementation of Optimal Textures: Fast and Robust Texture Synthesis and Style Transfer through Optimal Transport for TU Delft CS4240. You c

Hans Brouwer 33 Jan 05, 2023
An Unsupervised Detection Framework for Chinese Jargons in the Darknet

An Unsupervised Detection Framework for Chinese Jargons in the Darknet This repo is the Python 3 implementation of 《An Unsupervised Detection Framewor

7 Nov 08, 2022
Wafer Fault Detection using MlOps Integration

Wafer Fault Detection using MlOps Integration This is an end to end machine learning project with MlOps integration for predicting the quality of wafe

Sethu Sai Medamallela 0 Mar 11, 2022
PERIN is Permutation-Invariant Semantic Parser developed for MRP 2020

PERIN: Permutation-invariant Semantic Parsing David Samuel & Milan Straka Charles University Faculty of Mathematics and Physics Institute of Formal an

ÚFAL 40 Jan 04, 2023
Torch implementation of various types of GAN (e.g. DCGAN, ALI, Context-encoder, DiscoGAN, CycleGAN, EBGAN, LSGAN)

gans-collection.torch Torch implementation of various types of GANs (e.g. DCGAN, ALI, Context-encoder, DiscoGAN, CycleGAN, EBGAN). Note that EBGAN and

Minchul Shin 53 Jan 22, 2022
Official code of the paper "Expanding Low-Density Latent Regions for Open-Set Object Detection" (CVPR 2022)

OpenDet Expanding Low-Density Latent Regions for Open-Set Object Detection (CVPR2022) Jiaming Han, Yuqiang Ren, Jian Ding, Xingjia Pan, Ke Yan, Gui-So

csuhan 64 Jan 07, 2023
Alignment Attention Fusion framework for Few-Shot Object Detection

AAF framework Framework generalities This repository contains the code of the AAF framework proposed in this paper. The main idea behind this work is

Pierre Le Jeune 20 Dec 16, 2022
A library for hidden semi-Markov models with explicit durations

hsmmlearn hsmmlearn is a library for unsupervised learning of hidden semi-Markov models with explicit durations. It is a port of the hsmm package for

Joris Vankerschaver 69 Dec 20, 2022
My solution for the 7th place / 245 in the Umoja Hack 2022 challenge

Umoja Hack 2022 : Insurance Claim Challenge My solution for the 7th place / 245 in the Umoja Hack 2022 challenge Umoja Hack Africa is a yearly hackath

Souames Annis 17 Jun 03, 2022
This is the official implementation for "Do Transformers Really Perform Bad for Graph Representation?".

Graphormer By Chengxuan Ying, Tianle Cai, Shengjie Luo, Shuxin Zheng*, Guolin Ke, Di He*, Yanming Shen and Tie-Yan Liu. This repo is the official impl

Microsoft 1.3k Dec 29, 2022
Paper Title: Heterogeneous Knowledge Distillation for Simultaneous Infrared-Visible Image Fusion and Super-Resolution

HKDnet Paper Title: "Heterogeneous Knowledge Distillation for Simultaneous Infrared-Visible Image Fusion and Super-Resolution" Email:

wasteland 11 Nov 12, 2022
I3-master-layout - Simple master and stack layout script

Simple master and stack layout script | ------ | ----- | | | | | Ma

Tobias S 18 Dec 05, 2022
OMNIVORE is a single vision model for many different visual modalities

Omnivore: A Single Model for Many Visual Modalities [paper][website] OMNIVORE is a single vision model for many different visual modalities. It learns

Meta Research 451 Dec 27, 2022
Extracting and filtering paraphrases by bridging natural language inference and paraphrasing

nli2paraphrases Source code repository accompanying the preprint Extracting and filtering paraphrases by bridging natural language inference and parap

Matej Klemen 1 Mar 09, 2022