[2021][ICCV][FSNet] Full-Duplex Strategy for Video Object Segmentation

Overview

Full-Duplex Strategy for Video Object Segmentation (ICCV, 2021)

Authors: Ge-Peng Ji, Keren Fu, Zhe Wu, Deng-Ping Fan*, Jianbing Shen, & Ling Shao

  • This repository provides code for paper "Full-Duplex Strategy for Video Object Segmentation" accepted by the ICCV-2021 conference (arXiv Version / 中译版本).

  • This project is under construction. If you have any questions about our paper or bugs in our git project, feel free to contact me.

  • If you like our FSNet for your personal research, please cite this paper (BibTeX).

1. News

  • [2021/08/24] Upload the training script for video object segmentation.
  • [2021/08/22] Upload the pre-trained snapshot and the pre-computed results on U-VOS and V-SOD tasks.
  • [2021/08/20] Release inference code, evaluation code (VSOD).
  • [2021/07/20] Create Github page.

2. Introduction

Why?

Appearance and motion are two important sources of information in video object segmentation (VOS). Previous methods mainly focus on using simplex solutions, lowering the upper bound of feature collaboration among and across these two cues.


Figure 1: Visual comparison between the simplex (i.e., (a) appearance-refined motion and (b) motion-refined appear- ance) and our full-duplex strategy. In contrast, our FS- Net offers a collaborative way to leverage the appearance and motion cues under the mutual restraint of full-duplex strategy, thus providing more accurate structure details and alleviating the short-term feature drifting issue.

What?

In this paper, we study a novel framework, termed the FSNet (Full-duplex Strategy Network), which designs a relational cross-attention module (RCAM) to achieve bidirectional message propagation across embedding subspaces. Furthermore, the bidirectional purification module (BPM) is introduced to update the inconsistent features between the spatial-temporal embeddings, effectively improving the model's robustness.


Figure 2: The pipeline of our FSNet. The Relational Cross-Attention Module (RCAM) abstracts more discriminative representations between the motion and appearance cues using the full-duplex strategy. Then four Bidirectional Purification Modules (BPM) are stacked to further re-calibrate inconsistencies between the motion and appearance features. Finally, we utilize the decoder to generate our prediction.

How?

By considering the mutual restraint within the full-duplex strategy, our FSNet performs the cross-modal feature-passing (i.e., transmission and receiving) simultaneously before the fusion and decoding stage, making it robust to various challenging scenarios (e.g., motion blur, occlusion) in VOS. Extensive experiments on five popular benchmarks (i.e., DAVIS16, FBMS, MCL, SegTrack-V2, and DAVSOD19) show that our FSNet outperforms other state-of-the-arts for both the VOS and video salient object detection tasks.


Figure 3: Qualitative results on five datasets, including DAVIS16, MCL, FBMS, SegTrack-V2, and DAVSOD19.

3. Usage

How to Inference?

  • Download the test dataset from Baidu Driver (PSW: aaw8) or Google Driver and save it at ./dataset/*.

  • Install necessary libraries: PyTorch 1.1+, scipy 1.2.2, PIL

  • Download the pre-trained weights from Baidu Driver (psw: 36lm) or Google Driver. Saving the pre-trained weights at ./snapshot/FSNet/2021-ICCV-FSNet-20epoch-new.pth

  • Just run python inference.py to generate the segmentation results.

  • About the post-processing technique DenseCRF we used in the original paper, you can find it here: DSS-CRF.

How to train our model from scratch?

Download the train dataset from Baidu Driver (PSW: u01t) or Google Driver Set1/Google Driver Set2 and save it at ./dataset/*. Our training pipeline consists of three steps:

  • First, train the model using the combination of static SOD dataset (i.e., DUTS) with 12,926 samples and U-VOS datasets (i.e., DAVIS16 & FBMS) with 2,373 samples.

    • Set --train_type='pretrain_rgb' and run python train.py in terminal
  • Second, train the model using the optical-flow map of U-VOS datasets (i.e., DAVIS16 & FBMS).

    • Set --train_type='pretrain_flow' and run python train.py in terminal
  • Third, train the model using the pair of frame and optical flow of U-VOS datasets (i.e., DAVIS16 & FBMS).

    • Set --train_type='finetune' and run python train.py in terminal

4. Benchmark

Unsupervised/Zero-shot Video Object Segmentation (U/Z-VOS) task

NOTE: In the U-VOS, all the prediction results are strictly binary. We only adopt the naive binarization algorithm (i.e., threshold=0.5) in our experiments.

  • Quantitative results (NOTE: The following results have slight improvement compared with the reported results in our conference paper):

    mean-J recall-J decay-J mean-F recall-F decay-F T
    FSNet (w/ CRF) 0.834 0.945 0.032 0.831 0.902 0.026 0.213
    FSNet (w/o CRF) 0.823 0.943 0.033 0.833 0.919 0.028 0.213
  • Pre-Computed Results: Please download the prediction results of FSNet, refer to Baidu Driver (psw: ojsl) or Google Driver.

  • Evaluation Toolbox: We use the standard evaluation toolbox from DAVIS16. (Note that all the pre-computed segmentations are downloaded from this link).

Video Salient Object Detection (V-SOD) task

NOTE: In the V-SOD, all the prediction results are non-binary.

4. Citation

@inproceedings{ji2021FSNet,
  title={Full-Duplex Strategy for Video Object Segmentation},
  author={Ji, Ge-Peng and Fu, Keren and Wu, Zhe and Fan, Deng-Ping and Shen, Jianbing and Shao, Ling},
  booktitle={IEEE ICCV},
  year={2021}
}

5. Acknowledgements

Many thanks to my collaborator Ph.D. Zhe Wu, who provides excellent work SCRN and design inspirations.

Owner
Daniel-Ji
Computer Vision & Medical Imaging
Daniel-Ji
Pytorch implementation of winner from VQA Chllange Workshop in CVPR'17

2017 VQA Challenge Winner (CVPR'17 Workshop) pytorch implementation of Tips and Tricks for Visual Question Answering: Learnings from the 2017 Challeng

Mark Dong 166 Dec 11, 2022
Classification Modeling: Probability of Default

Credit Risk Modeling in Python Introduction: If you've ever applied for a credit card or loan, you know that financial firms process your information

Aktham Momani 2 Nov 07, 2022
Neurons Dataset API - The official dataloader and visualization tools for Neurons Datasets.

Neurons Dataset API - The official dataloader and visualization tools for Neurons Datasets. Introduction We propose our dataloader API for loading and

1 Nov 19, 2021
Pytorch implementation of Nueral Style transfer

Nueral Style Transfer Pytorch implementation of Nueral style transfer algorithm , it is used to apply artistic styles to content images . Content is t

Abhinav 9 Oct 15, 2022
Implementation for Learning to Track with Object Permanence

Learning to Track with Object Permanence A video-based MOT approach capable of tracking through full occlusions: Learning to Track with Object Permane

Toyota Research Institute - Machine Learning 91 Jan 03, 2023
Website which uses Deep Learning to generate horror stories.

Creepypasta - Text Generator Website which uses Deep Learning to generate horror stories. View Demo · View Website Repo · Report Bug · Request Feature

Dhairya Sharma 5 Oct 14, 2022
Code and Experiments for ACL-IJCNLP 2021 Paper Mind Your Outliers! Investigating the Negative Impact of Outliers on Active Learning for Visual Question Answering.

Code and Experiments for ACL-IJCNLP 2021 Paper Mind Your Outliers! Investigating the Negative Impact of Outliers on Active Learning for Visual Question Answering.

Sidd Karamcheti 50 Nov 16, 2022
Sionna: An Open-Source Library for Next-Generation Physical Layer Research

Sionna: An Open-Source Library for Next-Generation Physical Layer Research Sionna™ is an open-source Python library for link-level simulations of digi

NVIDIA Research Projects 313 Dec 22, 2022
Unofficial pytorch implementation of the paper "Dynamic High-Pass Filtering and Multi-Spectral Attention for Image Super-Resolution"

DFSA Unofficial pytorch implementation of the ICCV 2021 paper "Dynamic High-Pass Filtering and Multi-Spectral Attention for Image Super-Resolution" (p

2 Nov 15, 2021
Revisiting Oxford and Paris: Large-Scale Image Retrieval Benchmarking

Revisiting Oxford and Paris: Large-Scale Image Retrieval Benchmarking We revisit and address issues with Oxford 5k and Paris 6k image retrieval benchm

Filip Radenovic 188 Dec 17, 2022
Codes for our paper "SentiLARE: Sentiment-Aware Language Representation Learning with Linguistic Knowledge" (EMNLP 2020)

SentiLARE: Sentiment-Aware Language Representation Learning with Linguistic Knowledge Introduction SentiLARE is a sentiment-aware pre-trained language

74 Dec 30, 2022
Implementation of various Vision Transformers I found interesting

Implementation of various Vision Transformers I found interesting

Kim Seonghyeon 78 Dec 06, 2022
Self-Supervised Monocular 3D Face Reconstruction by Occlusion-Aware Multi-view Geometry Consistency[ECCV 2020]

Self-Supervised Monocular 3D Face Reconstruction by Occlusion-Aware Multi-view Geometry Consistency(ECCV 2020) This is an official python implementati

304 Jan 03, 2023
This GitHub repository contains code used for plots in NeurIPS 2021 paper 'Stochastic Multi-Armed Bandits with Control Variates.'

About Repository This repository contains code used for plots in NeurIPS 2021 paper 'Stochastic Multi-Armed Bandits with Control Variates.' About Code

Arun Verma 1 Nov 09, 2021
Hierarchical Time Series Forecasting with a familiar API

scikit-hts Hierarchical Time Series with a familiar API. This is the result from not having found any good implementations of HTS on-line, and my work

Carlo Mazzaferro 204 Dec 17, 2022
🔎 Super-scale your images and run experiments with Residual Dense and Adversarial Networks.

Image Super-Resolution (ISR) The goal of this project is to upscale and improve the quality of low resolution images. This project contains Keras impl

idealo 4k Jan 08, 2023
This is the PyTorch implementation of GANs N’ Roses: Stable, Controllable, Diverse Image to Image Translation

Official PyTorch repo for GAN's N' Roses. Diverse im2im and vid2vid selfie to anime translation.

1.1k Jan 01, 2023
Continual learning with sketched Jacobian approximations

Continual learning with sketched Jacobian approximations This repository contains the code for reproducing figures and results in the paper ``Provable

Machine Learning and Information Processing Laboratory 1 Jun 30, 2022
A setup script to generate ITK Python Wheels

ITK Python Package This project provides a setup.py script to build ITK Python binary packages and infrastructure to build ITK external module Python

Insight Software Consortium 59 Dec 14, 2022
Do Smart Glasses Dream of Sentimental Visions? Deep Emotionship Analysis for Eyewear Devices

EMOShip This repository contains the EMO-Film dataset described in the paper "Do Smart Glasses Dream of Sentimental Visions? Deep Emotionship Analysis

1 Nov 18, 2022