Proposal, Tracking and Segmentation (PTS): A Cascaded Network for Video Object Segmentation

Related tags

Deep LearningPTSNet
Overview

Proposal, Tracking and Segmentation (PTS): A Cascaded Network for Video Object Segmentation

By Qiang Zhou*, Zilong Huang*, Lichao Huang, Han Shen, Yongchao Gong, Chang Huang, Wenyu Liu, Xinggang Wang.(* means equal contribution)

This code is the implementation mainly for DAVIS 2017 dataset. For more detail, please refer to our paper.

Architecture


Overview of our proposed PTSNet for video object segmentation. OPN is designed for generating proposals of the interested objects and OTN aims to distinguish which one of the proposals is the best. Finally, DRSN does the final pixel level tracking(segmentation) task. Note in our implementation we couple OPN and OTN as a whole network, and spearate DRSN out under engineering consideration.

Usage

Preparation

  1. Install PyTorch 1.0 and necessary libraries like opencv, PIL etc.

  2. There are some native CUDA implementations, InPlace-ABN and MaskRCNN Operators, which must be compiled at the very start.

    # Before you compile, you need to figure out several things:
    # - The CUDA kernels supported by your GPU, here we use `sm_52`, `sm_61` and `sm_70` for NVIDIA Titan V.
    # - `cuda` and `nvcc` paths in your operating system, which exist usually in `/usr/local/cuda` and `/usr/local/cuda/bin/nvcc` respectively.
    # InPlace-ABN_0.4   (PyTorch 0.4)
    cd model/inplace_ABN_0.4
    bash build.sh
    # OR you could choose the 1.0 version of inplace ABN.
    # InPlace-ABN_1.0   (PyTorch 1.0)
    cd model/inplace_ABN    # It is dynamically compiled when running (gcc > 4.9)
    
    # MaskRCNN Operators (PyTorch 0.4)
    cd coupled_otn_opn/tracking/maskrcnn/lib
    bash make.sh
  3. You can train PTSNet from scratch or just evaluate our pretrained model.

    • Train it from scratch, you need to download:

       # DRSN: wget "https://download.pytorch.org/models/resnet50-19c8e357.pth" -O drsn/init_models/resnet50-19c8e357.pth
       # OPN: wget "https://drive.google.com/open?id=1ma1fNmEvS9dJLOIcm1FRzYofVS_t3aI3" -O coupled_otn_opn/tracking/maskrcnn/data/X-152-32x8d-IN5k.pkl
       # If you want to use our pretrained OTN:
       #   wget https://drive.google.com/open?id=12bF1dRlEUZoQz3Qcr2WD3ojqNHzbCrjf, put it into `coupled_otn_opn/models/mdnet_davis_50cyche.pth`
       # Else please modify from py-MDNet(https://github.com/HyeonseobNam/py-MDNet) to train OTN on DAVIS by yourself.
    • If you want to use our pretrained model to do the evaluation, you need to download:

       # DRSN: https://drive.google.com/open?id=116yXnqX43BZ7kEgdzUhIeTSn1dbvcE2F, put it into `drsn/snapshots/drsn_yvos_10w_davis_3p5w.pth`
       # OPN: wget "https://drive.google.com/open?id=1ma1fNmEvS9dJLOIcm1FRzYofVS_t3aI3" -O coupled_otn_opn/tracking/maskrcnn/data/X-152-32x8d-IN5k.pkl
       # OTN: https://drive.google.com/open?id=12bF1dRlEUZoQz3Qcr2WD3ojqNHzbCrjf, put it into `coupled_otn_opn/models/mdnet_davis_50cycle.pth`
  4. Dataset

    • YouTube-VOS: Download from YouTube-VOS, note we only need the training part(train_all_frames.zip), totally about 41G. Unzip, move and rename it to drsn/dataset/yvos.
    • DAVIS: Download from DAVIS, note we only need the 480p version(DAVIS-2017-trainval-480p.zip). Unzip, move and rename it to drsn/dataset/DAVIS/trainval and coupled_otn_opn/DAVIS/trainval. Here you need to make a subdirectory of trainval directory to store the dataset.

    And make sure to put the files as the following structure:

    .
    ├── drsn
    │   ├── dataset
    │   │   ├── DAVIS
    │   │   │   └── trainval
    │   │   │       ├── Annotations
    │   │   │       ├── ImageSets
    │   │   │       └── JPEGImages
    │   │   └── yvos
    │   │       └── train_all_frames
    │   ├── init_model
    │   │   └── resnet50-19c8e357.pth
    │   └── snapshots
    │       └── drsn_yvos_10w_davis_3p5w.pth
    └── coupled_otn_opn
        ├── DAVIS
        │   └── trainval
        ├── models
        │   └── mdnet_davis_50cycle.pth
        └── tracking
            └── maskrcnn
                └── data
                    └── X-152-32x8d-FPN-IN5k.pkl
    

Train and Evaluate

  • Firstly, check the directory of coupled_otn_opn and follow the README.md inside to generate our proposals. You can also skip this step for we have provided generated proposals in drsn/dataset/result_davis directory.
  • Secondly, enter drsn and check do_train_eval.sh to train and evaluate.
  • Finally, we also provide result masks by our PTSNet in result-masks-GoogleDrive. The quantitative results are measured by DAVIS official matlab toolbox.
J Mean F Mean G Mean
Avg 71.6 77.7 74.7

Acknowledgment

The work was mainly done during an internship at Horizon Robotics.

Citing PTSNet

If you find PTSNet useful in your research, please consider citing:

@article{ptsnet2019,
        title={Proposal, Tracking and Segmentation (PTS): A Cascaded Network for Video Object Segmentation},
        author={Zhou, Qiang and Huang, Zilong and Huang, Lichao and Han, Shen and Gong, Yongchao and Huang, Chang and Liu, Wenyu and Wang, Xinggang},
        journal = {arXiv preprint arXiv:1907.01203v2},
        year={2019}
        }

Thanks to the Third Party Libs

Owner
Forest
If a bullet's going to get you, it has already been fired.
Forest
A Number Recognition algorithm

Paddle-VisualAttention Results_Compared SVHN Dataset Methods Steps GPU Batch Size Learning Rate Patience Decay Step Decay Rate Training Speed (FPS) Ac

1 Nov 12, 2021
Clinica is a software platform for clinical research studies involving patients with neurological and psychiatric diseases and the acquisition of multimodal data

Clinica Software platform for clinical neuroimaging studies Homepage | Documentation | Paper | Forum | See also: AD-ML, AD-DL ClinicaDL About The Proj

ARAMIS Lab 165 Dec 29, 2022
Luminous is a framework for testing the performance of Embodied AI (EAI) models in indoor tasks.

Luminous is a framework for testing the performance of Embodied AI (EAI) models in indoor tasks. Generally, we intergrete different kind of functional

28 Jan 08, 2023
code for `Look Closer to Segment Better: Boundary Patch Refinement for Instance Segmentation`

Look Closer to Segment Better: Boundary Patch Refinement for Instance Segmentation (CVPR 2021) Introduction PBR is a conceptually simple yet effective

H.Chen 143 Jan 05, 2023
[PNAS2021] The neural architecture of language: Integrative modeling converges on predictive processing

The neural architecture of language: Integrative modeling converges on predictive processing Code accompanying the paper The neural architecture of la

Martin Schrimpf 36 Dec 01, 2022
The source code of the ICCV2021 paper "PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering"

The source code of the ICCV2021 paper "PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering"

Ren Yurui 261 Jan 09, 2023
Code for "Steerable Pyramid Transform Enables Robust Left Ventricle Quantification"

Code for "Steerable Pyramid Transform Enables Robust Left Ventricle Quantification" This is an end-to-end framework for accurate and robust left ventr

2 Jul 09, 2022
Retinal vessel segmentation based on GT-UNet

Retinal vessel segmentation based on GT-UNet Introduction This project is a retinal blood vessel segmentation code based on UNet-like Group Transforme

Kent0n 27 Dec 18, 2022
Use stochastic processes to generate samples and use them to train a fully-connected neural network based on Keras

Use stochastic processes to generate samples and use them to train a fully-connected neural network based on Keras which will then be used to generate residuals

Federico Lopez 2 Jan 14, 2022
[AAAI 2022] Sparse Structure Learning via Graph Neural Networks for Inductive Document Classification

Sparse Structure Learning via Graph Neural Networks for inductive document classification Make graph dataset create co-occurrence graph for datasets.

16 Dec 22, 2022
Harmonious Textual Layout Generation over Natural Images via Deep Aesthetics Learning

Harmonious Textual Layout Generation over Natural Images via Deep Aesthetics Learning Code for the paper Harmonious Textual Layout Generation over Nat

7 Aug 09, 2022
Sample Prior Guided Robust Model Learning to Suppress Noisy Labels

PGDF This repo is the official implementation of our paper "Sample Prior Guided Robust Model Learning to Suppress Noisy Labels ". Citation If you use

CVSM Group - email: <a href=[email protected]"> 22 Dec 23, 2022
CSKG is a commonsense knowledge graph that combines seven popular sources into a consolidated representation

CSKG: The CommonSense Knowledge Graph CSKG is a commonsense knowledge graph that combines seven popular sources into a consolidated representation: AT

USC ISI I2 85 Dec 12, 2022
TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation, CVPR2022

TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation Paper Links: TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentati

Hust Visual Learning Team 253 Dec 21, 2022
A unified framework to jointly model images, text, and human attention traces.

connect-caption-and-trace This repository contains the reference code for our paper Connecting What to Say With Where to Look by Modeling Human Attent

Meta Research 73 Oct 24, 2022
A NSFW content filter.

Project_Nfilter A NSFW content filter. With a motive of minimizing the spreads and leakage of NSFW contents on internet and access to others devices ,

1 Jan 20, 2022
Vision Transformer for 3D medical image registration (Pytorch).

ViT-V-Net: Vision Transformer for Volumetric Medical Image Registration keywords: vision transformer, convolutional neural networks, image registratio

Junyu Chen 192 Dec 20, 2022
Code for the paper "Curriculum Dropout", ICCV 2017

Curriculum Dropout Dropout is a very effective way of regularizing neural networks. Stochastically "dropping out" units with a certain probability dis

Pietro Morerio 21 Jan 02, 2022
Fuzzer for Linux Kernel Drivers

difuze: Fuzzer for Linux Kernel Drivers This repo contains all the sources (including setup scripts), you need to get difuze up and running. Tested on

seclab 344 Dec 27, 2022
Can we visualize a large scientific data set with a surrogate model? We're building a GAN for the Earth's Mantle Convection data set to see if we can!

EarthGAN - Earth Mantle Surrogate Modeling Can a surrogate model of the Earth’s Mantle Convection data set be built such that it can be readily run in

Tim 0 Dec 09, 2021