[CVPR 2021] MiVOS - Scribble to Mask module

Last update: Dec 22, 2022

Overview

MiVOS (CVPR 2021) - Scribble To Mask

Ho Kei Cheng, Yu-Wing Tai, Chi-Keung Tang

A simplistic network that turns scribbles to mask. It supports multi-object segmentation using soft-aggregation. Don't expect SOTA results from this model!

Overall structure and capabilities

	MiVOS	Mask-Propagation	Scribble-to-Mask
DAVIS/YouTube semi-supervised evaluation	❌	✔️	❌
DAVIS interactive evaluation	✔️	❌	❌
User interaction GUI tool	✔️	❌	❌
Dense Correspondences	❌	✔️	❌
Train propagation module	❌	✔️	❌
Train S2M (interaction) module	❌	❌	✔️
Train fusion module	✔️	❌	❌
Generate more synthetic data	✔️	❌	❌

Requirements

The package versions shown here are the ones that I used. You might not need the exact versions.

PyTorch 1.6.0
torchvision 0.7.0
opencv-contrib 4.2.0
davis-interactive (https://github.com/albertomontesg/davis-interactive)
gitpython for training
gdown for downloading pretrained models

Refer to the official PyTorch guide for installing PyTorch/torchvision. The rest can be installed by:

pip install opencv-contrib-python gitpython gdown

Pretrained model

Download and put the model in ./saves/. Alternatively use the provided download_model.py.

[OneDrive Mirror]

Interactive GUI

python interactive.py --image <image>

Controls:

Mouse Left - Draw scribbles
Mouse middle key - Switch positive/negative
Key f - Commit changes, clear scribbles
Key r - Clear everything
Key d - Switch between overlay/mask view
Key s - Save masks into a temporary output folder (./output/)

Known issues

The model almost always needs to focus on at least one object. It is very difficult to erase all existing masks from an image using scribbles.

Training

Datasets

Download and extract LVIS training set.
Download and extract a set of static image segmentation datasets. These are already downloaded for you if you used the download_datasets.py in Mask-Propagation.

├── lvis
│   ├── lvis_v1_train.json
│   └── train2017
├── Scribble-to-Mask
└── static
    ├── BIG_small
    └── ...

Commands

Use the deeplabv3plus_resnet50 pretrained model provided here.

CUDA_VISIBLE_DEVICES=0,1 OMP_NUM_THREADS=4 python -m torch.distributed.launch --master_port 9842 --nproc_per_node=2 train.py --id s2m --load_deeplab <path_to_deeplab.pth>

Credit

Deeplab implementation and pretrained model: https://github.com/VainF/DeepLabV3Plus-Pytorch.

Citation

Please cite our paper if you find this repo useful!

@inproceedings{MiVOS_2021,
  title={Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion},
  author={Cheng, Ho Kei and Tai, Yu-Wing and Tang, Chi-Keung},
  booktitle={CVPR},
  year={2021}
}

Contact: [email protected]

Comments

AttributeError: Caught AttributeError in DataLoader worker process 0

Hello! I followed the instructions of the training command, it has thrown an error about AttributeError. I put the static folder outside this repository as you mentioned. It is confusing that I can use the same datasets for the pretraining propagation module, the train.py in Mask-Propagation works fine.

opened by xwhkkk 2
git.exc.InvalidGitRepositoryError when running train.py

Hello! I followed the instruction of the training command, but it has thrown an error about GitRepositoryError. I used command : CUDA_VISIBLE_DEVICES=0,1 OMP_NUM_THREADS=4 python -m torch.distributed.launch --master_port 1842 --nproc_per_node=2 train.py --id s2m --load_deeplab ./deeplab_resnet50/best_deeplabv3plus_resnet50_voc_os16.pth, and I have 2 GPUs. Could you give me some suggestions?

opened by xwhkkk 2
About evaluation of the model

Hi,

thank you for the nice work.

I have a concern about the evaluation of the model. Because there is no validation set to pick the best model. It may has a potential overfitting problem. (Or what should the validation set for interactive segmentation look like? If there is a unified standard, it will be more helpful for everyone to compare their methods.)

In interactive object segmentation setting, is this setting popular? I am new here for the interactive segmentation. Wish to solve my concern, thank you.

opened by Limingxing00 2
Question about Local Control Strategy

A simple but practical segmentation tool! I've read your paper, and it says that local control strategy is used in S2M. However, I don't find the local control step in this code. Why don't you provide it in this tool? Will local control make significant difference to the performance?

opened by distillation-dcf 1
DeepLabv3 pre-trained models

Hello,

I wanted to mention that in order to train S2M from scratch, using the deeplabv3_resnet50 pre-trained model provided in this repo, returns the following error: KeyError: 'classifier.classifier.0.convs.0.0.weight. Meaning that the weights from this layer are not present in deeplabv3_resnet50. But using the deeplabv3plus_resnet50 from the same repo executes without errors.

Best!

opened by UndecidedBoy 1
saving error

Hello! Thanks for sharing your code. When I run python interactive.py and want to save the masks, appeared following error.

Could you give me some suggestions?

opened by xwhkkk 3
Fix simple issues and allow for cpu only use

I had to make some changes to be able to use the code on cpu only system and had troubles saving the mask from the interactive GUI and fixed it. Thanks for the great work.

opened by rami-alloush 3

Releases(1.0)

1.0(Mar 14, 2021)

Pretrained model
Source code(tar.gz)
Source code(zip)
s2m.pth(152.04 MB)

Owner

Rex Cheng

@ HKUST.

GitHub Repository https://hkchengrex.github.io/MiVOS/

PyTorch-centric library for evaluating and enhancing the robustness of AI technologies

Responsible AI Toolbox A library that provides high-quality, PyTorch-centric tools for evaluating and enhancing both the robustness and the explainabi

24 Dec 22, 2022

[NeurIPS'21] Shape As Points: A Differentiable Poisson Solver

Shape As Points (SAP) Paper | Project Page | Short Video (6 min) | Long Video (12 min) This repository contains the implementation of the paper: Shape

394 Dec 30, 2022

Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners

DART Implementation for ICLR2022 paper Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. Environment

83 Dec 27, 2022

Pseudo-Visual Speech Denoising

Pseudo-Visual Speech Denoising This code is for our paper titled: Visual Speech Enhancement Without A Real Visual Stream published at WACV 2021. Autho

94 Oct 22, 2022

ViViT: Curvature access through the generalized Gauss-Newton's low-rank structure

ViViT is a collection of numerical tricks to efficiently access curvature from the generalized Gauss-Newton (GGN) matrix based on its low-rank structure. Provided functionality includes computing

12 Dec 08, 2022

WaveFake: A Data Set to Facilitate Audio DeepFake Detection

WaveFake: A Data Set to Facilitate Audio DeepFake Detection This is the code repository for our NeurIPS 2021 (Track on Datasets and Benchmarks) paper

27 Dec 22, 2022

Simple STAC Catalogs discovery tool.

STAC Catalog Discovery Simple STAC discovery tool. Just paste the STAC Catalog link and press Enter. Details STAC Discovery tool enables discovering d

21 Oct 19, 2022

Wikidated : An Evolving Knowledge Graph Dataset of Wikidata’s Revision History

Wikidated Wikidated 1.0 is a dataset of Wikidata’s full revision history, which encodes changes between Wikidata revisions as sets of deletions and ad

11 Aug 16, 2022

masscan + nmap + Finger

说明个人根据使用习惯修改masnmap而来的一个小工具。调用masscan做全端口扫描，再调用nmap做服务识别，最后调用Finger做Web指纹识别。工具使用场景适合风险探测排查、众测等。使用方法安装依赖 pip3 install -r requirements.txt -i https:/

3 Mar 25, 2022

Semantic Segmentation Suite in TensorFlow

Semantic Segmentation Suite in TensorFlow. Implement, train, and test new Semantic Segmentation models easily!

2.5k Jan 06, 2023

Back to Event Basics: SSL of Image Reconstruction for Event Cameras

Back to Event Basics: SSL of Image Reconstruction for Event Cameras Minimal code for Back to Event Basics: Self-Supervised Learning of Image Reconstru

42 Dec 26, 2022

Computational Methods Course at UdeA. Forked and size reduced from:

Computational Methods for Physics & Astronomy Book version at: https://restrepo.github.io/ComputationalMethods by: Sebastian Bustamante 2014/2015 Dieg

11 Sep 10, 2022

LONG-TERM SERIES FORECASTING WITH QUERYSELECTOR – EFFICIENT MODEL OF SPARSEATTENTION

Query Selector Here you can find code and data loaders for the paper https://arxiv.org/pdf/2107.08687v1.pdf . Query Selector is a novel approach to sp

62 Dec 17, 2022

VIL-100: A New Dataset and A Baseline Model for Video Instance Lane Detection (ICCV 2021)

Preparation Please see dataset/README.md to get more details about our datasets-VIL100 Please see INSTALL.md to install environment and evaluation too

82 Dec 15, 2022

Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation

STCN Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation Ho Kei Cheng, Yu-Wing Tai, Chi-Keung Tang [a

456 Dec 12, 2022

The code for paper "Contrastive Spatio-Temporal Pretext Learning for Self-supervised Video Representation" which is accepted by AAAI 2022

Contrastive Spatio Temporal Pretext Learning for Self-supervised Video Representation (AAAI 2022) The code for paper "Contrastive Spatio-Temporal Pret

8 Jun 30, 2022