[2021][ICCV][FSNet] Full-Duplex Strategy for Video Object Segmentation

Overview

Full-Duplex Strategy for Video Object Segmentation (ICCV, 2021)

Authors: Ge-Peng Ji, Keren Fu, Zhe Wu, Deng-Ping Fan*, Jianbing Shen, & Ling Shao

  • This repository provides code for paper "Full-Duplex Strategy for Video Object Segmentation" accepted by the ICCV-2021 conference (arXiv Version / 中译版本).

  • This project is under construction. If you have any questions about our paper or bugs in our git project, feel free to contact me.

  • If you like our FSNet for your personal research, please cite this paper (BibTeX).

1. News

  • [2021/08/24] Upload the training script for video object segmentation.
  • [2021/08/22] Upload the pre-trained snapshot and the pre-computed results on U-VOS and V-SOD tasks.
  • [2021/08/20] Release inference code, evaluation code (VSOD).
  • [2021/07/20] Create Github page.

2. Introduction

Why?

Appearance and motion are two important sources of information in video object segmentation (VOS). Previous methods mainly focus on using simplex solutions, lowering the upper bound of feature collaboration among and across these two cues.


Figure 1: Visual comparison between the simplex (i.e., (a) appearance-refined motion and (b) motion-refined appear- ance) and our full-duplex strategy. In contrast, our FS- Net offers a collaborative way to leverage the appearance and motion cues under the mutual restraint of full-duplex strategy, thus providing more accurate structure details and alleviating the short-term feature drifting issue.

What?

In this paper, we study a novel framework, termed the FSNet (Full-duplex Strategy Network), which designs a relational cross-attention module (RCAM) to achieve bidirectional message propagation across embedding subspaces. Furthermore, the bidirectional purification module (BPM) is introduced to update the inconsistent features between the spatial-temporal embeddings, effectively improving the model's robustness.


Figure 2: The pipeline of our FSNet. The Relational Cross-Attention Module (RCAM) abstracts more discriminative representations between the motion and appearance cues using the full-duplex strategy. Then four Bidirectional Purification Modules (BPM) are stacked to further re-calibrate inconsistencies between the motion and appearance features. Finally, we utilize the decoder to generate our prediction.

How?

By considering the mutual restraint within the full-duplex strategy, our FSNet performs the cross-modal feature-passing (i.e., transmission and receiving) simultaneously before the fusion and decoding stage, making it robust to various challenging scenarios (e.g., motion blur, occlusion) in VOS. Extensive experiments on five popular benchmarks (i.e., DAVIS16, FBMS, MCL, SegTrack-V2, and DAVSOD19) show that our FSNet outperforms other state-of-the-arts for both the VOS and video salient object detection tasks.


Figure 3: Qualitative results on five datasets, including DAVIS16, MCL, FBMS, SegTrack-V2, and DAVSOD19.

3. Usage

How to Inference?

  • Download the test dataset from Baidu Driver (PSW: aaw8) or Google Driver and save it at ./dataset/*.

  • Install necessary libraries: PyTorch 1.1+, scipy 1.2.2, PIL

  • Download the pre-trained weights from Baidu Driver (psw: 36lm) or Google Driver. Saving the pre-trained weights at ./snapshot/FSNet/2021-ICCV-FSNet-20epoch-new.pth

  • Just run python inference.py to generate the segmentation results.

  • About the post-processing technique DenseCRF we used in the original paper, you can find it here: DSS-CRF.

How to train our model from scratch?

Download the train dataset from Baidu Driver (PSW: u01t) or Google Driver Set1/Google Driver Set2 and save it at ./dataset/*. Our training pipeline consists of three steps:

  • First, train the model using the combination of static SOD dataset (i.e., DUTS) with 12,926 samples and U-VOS datasets (i.e., DAVIS16 & FBMS) with 2,373 samples.

    • Set --train_type='pretrain_rgb' and run python train.py in terminal
  • Second, train the model using the optical-flow map of U-VOS datasets (i.e., DAVIS16 & FBMS).

    • Set --train_type='pretrain_flow' and run python train.py in terminal
  • Third, train the model using the pair of frame and optical flow of U-VOS datasets (i.e., DAVIS16 & FBMS).

    • Set --train_type='finetune' and run python train.py in terminal

4. Benchmark

Unsupervised/Zero-shot Video Object Segmentation (U/Z-VOS) task

NOTE: In the U-VOS, all the prediction results are strictly binary. We only adopt the naive binarization algorithm (i.e., threshold=0.5) in our experiments.

  • Quantitative results (NOTE: The following results have slight improvement compared with the reported results in our conference paper):

    mean-J recall-J decay-J mean-F recall-F decay-F T
    FSNet (w/ CRF) 0.834 0.945 0.032 0.831 0.902 0.026 0.213
    FSNet (w/o CRF) 0.823 0.943 0.033 0.833 0.919 0.028 0.213
  • Pre-Computed Results: Please download the prediction results of FSNet, refer to Baidu Driver (psw: ojsl) or Google Driver.

  • Evaluation Toolbox: We use the standard evaluation toolbox from DAVIS16. (Note that all the pre-computed segmentations are downloaded from this link).

Video Salient Object Detection (V-SOD) task

NOTE: In the V-SOD, all the prediction results are non-binary.

4. Citation

@inproceedings{ji2021FSNet,
  title={Full-Duplex Strategy for Video Object Segmentation},
  author={Ji, Ge-Peng and Fu, Keren and Wu, Zhe and Fan, Deng-Ping and Shen, Jianbing and Shao, Ling},
  booktitle={IEEE ICCV},
  year={2021}
}

5. Acknowledgements

Many thanks to my collaborator Ph.D. Zhe Wu, who provides excellent work SCRN and design inspirations.

Owner
Daniel-Ji
Computer Vision & Medical Imaging
Daniel-Ji
A Tensorfflow implementation of Attend, Infer, Repeat

Attend, Infer, Repeat: Fast Scene Understanding with Generative Models This is an unofficial Tensorflow implementation of Attend, Infear, Repeat (AIR)

Adam Kosiorek 82 May 27, 2022
Source code for our paper "Do Not Trust Prediction Scores for Membership Inference Attacks"

Do Not Trust Prediction Scores for Membership Inference Attacks Abstract: Membership inference attacks (MIAs) aim to determine whether a specific samp

<a href=[email protected]"> 3 Oct 25, 2022
HugsVision is a easy to use huggingface wrapper for state-of-the-art computer vision

HugsVision is an open-source and easy to use all-in-one huggingface wrapper for computer vision. The goal is to create a fast, flexible and user-frien

Labrak Yanis 166 Nov 27, 2022
Pytorch implementation of Deep Recursive Residual Network for Super Resolution (DRRN)

DRRN-pytorch This is an unofficial implementation of "Deep Recursive Residual Network for Super Resolution (DRRN)", CVPR 2017 in Pytorch. [Paper] You

yun_yang 192 Dec 12, 2022
Unrolled Generative Adversarial Networks

Unrolled Generative Adversarial Networks Luke Metz, Ben Poole, David Pfau, Jascha Sohl-Dickstein arxiv:1611.02163 This repo contains an example notebo

Ben Poole 292 Dec 06, 2022
A web porting for NVlabs' StyleGAN2, to facilitate exploring all kinds characteristic of StyleGAN networks

This project is a web porting for NVlabs' StyleGAN2, to facilitate exploring all kinds characteristic of StyleGAN networks. Thanks for NVlabs' excelle

K.L. 150 Dec 15, 2022
An implementation of the Contrast Predictive Coding (CPC) method to train audio features in an unsupervised fashion.

CPC_audio This code implements the Contrast Predictive Coding algorithm on audio data, as described in the paper Unsupervised Pretraining Transfers we

8 Nov 14, 2022
A simple code to convert image format and channel as well as resizing and renaming multiple images.

Rename-Resize-and-convert-multiple-images A simple code to convert image format and channel as well as resizing and renaming multiple images. This cod

Happy N. Monday 3 Feb 15, 2022
Dataset and Code for the paper "DepthTrack: Unveiling the Power of RGBD Tracking" (ICCV2021), and "Depth-only Object Tracking" (BMVC2021)

DeT and DOT Code and datasets for "DepthTrack: Unveiling the Power of RGBD Tracking" (ICCV2021) "Depth-only Object Tracking" (BMVC2021) @InProceedings

Yan Song 55 Dec 15, 2022
data/code repository of "C2F-FWN: Coarse-to-Fine Flow Warping Network for Spatial-Temporal Consistent Motion Transfer"

C2F-FWN data/code repository of "C2F-FWN: Coarse-to-Fine Flow Warping Network for Spatial-Temporal Consistent Motion Transfer" (https://arxiv.org/abs/

EKILI 46 Dec 14, 2022
Styled Handwritten Text Generation with Transformers (ICCV 21)

⚡ Handwriting Transformers [PDF] Ankan Kumar Bhunia, Salman Khan, Hisham Cholakkal, Rao Muhammad Anwer, Fahad Shahbaz Khan & Mubarak Shah Abstract: We

Ankan Kumar Bhunia 85 Dec 22, 2022
Model-based 3D Hand Reconstruction via Self-Supervised Learning, CVPR2021

S2HAND: Model-based 3D Hand Reconstruction via Self-Supervised Learning S2HAND presents a self-supervised 3D hand reconstruction network that can join

Yujin Chen 72 Dec 12, 2022
DeepFaceLive - Live Deep Fake in python, Real-time face swap for PC streaming or video calls

DeepFaceLive - Live Deep Fake in python, Real-time face swap for PC streaming or video calls

8.3k Dec 31, 2022
ICCV2021 Papers with Code

ICCV2021 Papers with Code

Amusi 1.4k Jan 02, 2023
Global Rhythm Style Transfer Without Text Transcriptions

Global Prosody Style Transfer Without Text Transcriptions This repository provides a PyTorch implementation of AutoPST, which enables unsupervised glo

Kaizhi Qian 193 Dec 30, 2022
Specification language for generating Generalized Linear Models (with or without mixed effects) from conceptual models

tisane Tisane: Authoring Statistical Models via Formal Reasoning from Conceptual and Data Relationships TL;DR: Analysts can use Tisane to author gener

Eunice Jun 11 Nov 15, 2022
BraTs-VNet - BraTS(Brain Tumour Segmentation) using V-Net

BraTS(Brain Tumour Segmentation) using V-Net This project is an approach to dete

Rituraj Dutta 7 Nov 27, 2022
Cancer-and-Tumor-Detection-Using-Inception-model - In this repo i am gonna show you how i did cancer/tumor detection in lungs using deep neural networks, specifically here the Inception model by google.

Cancer-and-Tumor-Detection-Using-Inception-model In this repo i am gonna show you how i did cancer/tumor detection in lungs using deep neural networks

Deepak Nandwani 1 Jan 01, 2022
A toolset of Python programs for signal modeling and indentification via sparse semilinear autoregressors.

SPAAR Description A toolset of Python programs for signal modeling via sparse semilinear autoregressors. References Vides, F. (2021). Computing Semili

Fredy Vides 0 Oct 30, 2021
Graph Self-Attention Network for Learning Spatial-Temporal Interaction Representation in Autonomous Driving

GSAN Introduction Code for paper GSAN: Graph Self-Attention Network for Learning Spatial-Temporal Interaction Representation in Autonomous Driving, wh

YE Luyao 6 Oct 27, 2022