Referring Video Object Segmentation

Overview

Awesome-Referring-Video-Object-Segmentation Awesome

Welcome to starts โญ & comments ๐Ÿ’น & sharing ๐Ÿ˜€ !!

- 2021.12.12: Recent papers (from 2021) 
- welcome to add if any information misses. ๐Ÿ˜Ž

Introduction

image

Referring video object segmentation aims at segmenting an object in video with language expressions.

Unlike the previous video object segmentation, the task exploits a different type of supervision, language expressions, to identify and segment an object referred by the given language expressions in a video. A detailed explanation of the new task can be found in the following paper.

Seonguk Seo, Joon-Young Lee, Bohyung Han, โ€œURVOS: Unified Referring Video Object Segmentation Network with a Large-Scale Benchmarkโ€, European Conference on Computer Vision (ECCV), 2020:https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123600205.pdf

Impressive Works Related to Referring Video Object Segmentation (RVOS)

Cross-modal progressive comprehension for referring segmentation:https://arxiv.org/abs/2105.07175 image

Benchmark

The 3rd Large-scale Video Object Segmentation - Track 3: Referring Video Object Segmentation

Datasets

image

Refer-YouTube-VOS-datasets

  • YouTube-VOS:
wget https://github.com/JerryX1110/awesome-rvos/blob/main/down_YTVOS_w_refer.py
python down_YTVOS_w_refer.py

Folder structure:

${current_path}/
โ””โ”€โ”€ refer_youtube_vos/ 
    โ”œโ”€โ”€ train/
    โ”‚   โ”œโ”€โ”€ JPEGImages/
    โ”‚   โ”‚   โ””โ”€โ”€ */ (video folders)
    โ”‚   โ”‚       โ””โ”€โ”€ *.jpg (frame image files) 
    โ”‚   โ””โ”€โ”€ Annotations/
    โ”‚       โ””โ”€โ”€ */ (video folders)
    โ”‚           โ””โ”€โ”€ *.png (mask annotation files) 
    โ”œโ”€โ”€ valid/
    โ”‚   โ””โ”€โ”€ JPEGImages/
    โ”‚       โ””โ”€โ”€ */ (video folders)
    โ”‚           โ””โ”€โ”€ *.jpg (frame image files) 
    โ””โ”€โ”€ meta_expressions/
        โ”œโ”€โ”€ train/
        โ”‚   โ””โ”€โ”€ meta_expressions.json  (text annotations)
        โ””โ”€โ”€ valid/
            โ””โ”€โ”€ meta_expressions.json  (text annotations)
  • A2D-Sentences:

REPO:https://web.eecs.umich.edu/~jjcorso/r/a2d/

paper:https://arxiv.org/abs/1803.07485

image

Citation:

@misc{gavrilyuk2018actor,
      title={Actor and Action Video Segmentation from a Sentence}, 
      author={Kirill Gavrilyuk and Amir Ghodrati and Zhenyang Li and Cees G. M. Snoek},
      year={2018},
      eprint={1803.07485},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

License: The dataset may not be republished in any form without the written consent of the authors.

README Dataset and Annotation (version 1.0, 1.9GB, tar.bz) Evaluation Toolkit (version 1.0, tar.bz)

mkdir a2d_sentences
cd a2d_sentences
wget https://web.eecs.umich.edu/~jjcorso/bigshare/A2D_main_1_0.tar.bz
tar jxvf A2D_main_1_0.tar.bz
mkdir text_annotations

cd text_annotations
wget https://kgavrilyuk.github.io/actor_action/a2d_annotation.txt
wget https://kgavrilyuk.github.io/actor_action/a2d_missed_videos.txt
wget https://github.com/JerryX1110/awesome-rvos/blob/main/down_a2d_annotation_with_instances.py
python down_a2d_annotation_with_instances.py
unzip a2d_annotation_with_instances.zip
#rm a2d_annotation_with_instances.zip
cd ..

cd ..

Folder structure:

${current_path}/
โ””โ”€โ”€ a2d_sentences/ 
    โ”œโ”€โ”€ Release/
    โ”‚   โ”œโ”€โ”€ videoset.csv  (videos metadata file)
    โ”‚   โ””โ”€โ”€ CLIPS320/
    โ”‚       โ””โ”€โ”€ *.mp4     (video files)
    โ””โ”€โ”€ text_annotations/
        โ”œโ”€โ”€ a2d_annotation.txt  (actual text annotations)
        โ”œโ”€โ”€ a2d_missed_videos.txt
        โ””โ”€โ”€ a2d_annotation_with_instances/ 
            โ””โ”€โ”€ */ (video folders)
                โ””โ”€โ”€ *.h5 (annotations files) 

Citation:

@inproceedings{YaXuCaCVPR2017,
  author = {Yan, Y. and Xu, C. and Cai, D. and {\bf Corso}, {\bf J. J.}},
  booktitle = {{Proceedings of IEEE Conference on Computer Vision and Pattern Recognition}},
  tags = {computer vision, activity recognition, video understanding, semantic segmentation},
  title = {Weakly Supervised Actor-Action Segmentation via Robust Multi-Task Ranking},
  year = {2017}
}
@inproceedings{XuCoCVPR2016,
  author = {Xu, C. and {\bf Corso}, {\bf J. J.}},
  booktitle = {{Proceedings of IEEE Conference on Computer Vision and Pattern Recognition}},
  datadownload = {http://web.eecs.umich.edu/~jjcorso/r/a2d},
  tags = {computer vision, activity recognition, video understanding, semantic segmentation},
  title = {Actor-Action Semantic Segmentation with Grouping-Process Models},
  year = {2016}
}
@inproceedings{XuHsXiCVPR2015,
  author = {Xu, C. and Hsieh, S.-H. and Xiong, C. and {\bf Corso}, {\bf J. J.}},
  booktitle = {{Proceedings of IEEE Conference on Computer Vision and Pattern Recognition}},
  datadownload = {http://web.eecs.umich.edu/~jjcorso/r/a2d},
  poster = {http://web.eecs.umich.edu/~jjcorso/pubs/xu_corso_CVPR2015_A2D_poster.pdf},
  tags = {computer vision, activity recognition, video understanding, semantic segmentation},
  title = {Can Humans Fly? {Action} Understanding with Multiple Classes of Actors},
  url = {http://web.eecs.umich.edu/~jjcorso/pubs/xu_corso_CVPR2015_A2D.pdf},
  year = {2015}
}

image

downloading_script

mkdir jhmdb_sentences
cd jhmdb_sentences
wget http://files.is.tue.mpg.de/jhmdb/Rename_Images.tar.gz
wget https://kgavrilyuk.github.io/actor_action/jhmdb_annotation.txt
wget http://files.is.tue.mpg.de/jhmdb/puppet_mask.zip
tar -xzvf  Rename_Images.tar.gz
unzip puppet_mask.zip
cd ..

Folder structure:

${current_path}/
โ””โ”€โ”€ jhmdb_sentences/ 
    โ”œโ”€โ”€ Rename_Images/  (frame images)
    โ”‚   โ””โ”€โ”€ */ (action dirs)
    โ”œโ”€โ”€ puppet_mask/  (mask annotations)
    โ”‚   โ””โ”€โ”€ */ (action dirs)
    โ””โ”€โ”€ jhmdb_annotation.txt  (text annotations)

Citation:

@inproceedings{Jhuang:ICCV:2013,
title = {Towards understanding action recognition},
author = {H. Jhuang and J. Gall and S. Zuffi and C. Schmid and M. J. Black},
booktitle = {International Conf. on Computer Vision (ICCV)},
month = Dec,
pages = {3192-3199},
year = {2013}
}

image image image

Owner
Explorer
Explorer
Evaluating Cross-lingual Sentence Representations

XNLI: The Cross-Lingual NLI Corpus XNLI is an evaluation corpus for language transfer and cross-lingual sentence classification in 15 languages. New:

Meta Research 395 Dec 19, 2022
Deep Learning and Reinforcement Learning Library for Scientists and Engineers ๐Ÿ”ฅ

TensorLayer is a novel TensorFlow-based deep learning and reinforcement learning library designed for researchers and engineers. It provides an extens

TensorLayer Community 7.1k Dec 29, 2022
PyTorch implementation of SCAFFOLD (Stochastic Controlled Averaging for Federated Learning, ICML 2020).

Scaffold-Federated-Learning PyTorch implementation of SCAFFOLD (Stochastic Controlled Averaging for Federated Learning, ICML 2020). Environment numpy=

KI 30 Dec 29, 2022
"Reinforcement Learning for Bandit Neural Machine Translation with Simulated Human Feedback"

This is code repo for our EMNLP 2017 paper "Reinforcement Learning for Bandit Neural Machine Translation with Simulated Human Feedback", which implements the A2C algorithm on top of a neural encoder-

Khanh Nguyen 131 Oct 21, 2022
Official implementation of the NeurIPS'21 paper 'Conditional Generation Using Polynomial Expansions'.

Conditional Generation Using Polynomial Expansions Official implementation of the conditional image generation experiments as described on the NeurIPS

Grigoris 4 Aug 07, 2022
Using OpenAI's CLIP to upscale and enhance images

CLIP Upscaler and Enhancer Using OpenAI's CLIP to upscale and enhance images Based on nshepperd's JAX CLIP Guided Diffusion v2.4 Sample Results Viewpo

Tripp Lyons 5 Jun 14, 2022
Automatic detection and classification of Covid severity degree in LUS (lung ultrasound) scans

Final-Project Final project in the Technion, Biomedical faculty, by Mor Ventura, Dekel Brav & Omri Magen. Subproject 1: Automatic Detection of LUS Cha

Mor Ventura 1 Dec 18, 2021
Simple PyTorch implementations of Badnets on MNIST and CIFAR10.

Simple PyTorch implementations of Badnets on MNIST and CIFAR10.

Vera 75 Dec 13, 2022
JAX bindings to the Flatiron Institute Non-uniform Fast Fourier Transform (FINUFFT) library

JAX bindings to FINUFFT This package provides a JAX interface to (a subset of) the Flatiron Institute Non-uniform Fast Fourier Transform (FINUFFT) lib

Dan Foreman-Mackey 32 Oct 15, 2022
Boost learning for GNNs from the graph structure under challenging heterophily settings. (NeurIPS'20)

Beyond Homophily in Graph Neural Networks: Current Limitations and Effective Designs Jiong Zhu, Yujun Yan, Lingxiao Zhao, Mark Heimann, Leman Akoglu,

GEMS Lab: Graph Exploration & Mining at Scale, University of Michigan 70 Dec 18, 2022
[AAAI2022] Source code for our paperใ€ŠSuppressing Static Visual Cues via Normalizing Flows for Self-Supervised Video Representation Learningใ€‹

SSVC The source code for paper [Suppressing Static Visual Cues via Normalizing Flows for Self-Supervised Video Representation Learning] samples of the

7 Oct 26, 2022
torchbearer: A model fitting library for PyTorch

Note: We're moving to PyTorch Lightning! Read about the move here. From the end of February, torchbearer will no longer be actively maintained. We'll

631 Jan 04, 2023
Official Code Release for "CLIP-Adapter: Better Vision-Language Models with Feature Adapters"

Official Code Release for "CLIP-Adapter: Better Vision-Language Models with Feature Adapters" Pipeline of CLIP-Adapter CLIP-Adapter is a drop-in modul

peng gao 157 Dec 26, 2022
3rd Place Solution for ICCV 2021 Workshop SSLAD Track 3A - Continual Learning Classification Challenge

Online Continual Learning via Multiple Deep Metric Learning and Uncertainty-guided Episodic Memory Replay 3rd Place Solution for ICCV 2021 Workshop SS

Rifki Kurniawan 6 Nov 10, 2022
My usage of Real-ESRGAN to upscale anime, some test and results in the test_img folder

anime upscaler My usage of Real-ESRGAN to upscale anime, I hope to use this on a proper GPU cuz doing this on CPU is completely shit ๐Ÿ˜‚ , I even tried

Shangar Muhunthan 29 Jan 07, 2023
This is a collection of simple PyTorch implementations of neural networks and related algorithms. These implementations are documented with explanations,

labml.ai Deep Learning Paper Implementations This is a collection of simple PyTorch implementations of neural networks and related algorithms. These i

labml.ai 16.4k Jan 09, 2023
A library that allows for inference on probabilistic models

Bean Machine Overview Bean Machine is a probabilistic programming language for inference over statistical models written in the Python language using

Meta Research 234 Dec 29, 2022
Instance Segmentation in 3D Scenes using Semantic Superpoint Tree Networks

SSTNet Instance Segmentation in 3D Scenes using Semantic Superpoint Tree Networks(ICCV2021) by Zhihao Liang, Zhihao Li, Songcen Xu, Mingkui Tan, Kui J

83 Nov 29, 2022
ไฝฟ็”จๆทฑๅบฆๅญฆไน ๆก†ๆžถๆๅ–่ง†้ข‘็กฌๅญ—ๅน•๏ผ›dockerๅฎนๅ™จๅ…ๅฎ‰่ฃ…ๆทฑๅบฆๅญฆไน ๅบ“๏ผŒไฝฟ็”จๆœฌๅœฐapiๆŽฅๅฃไฝฟๅพ—็•Œ้ขๅ’ŒๅŽ็ซฏ่ฏ†ๅˆซๅˆ†็ฆป๏ผ›

extract-video-subtittle ไฝฟ็”จๆทฑๅบฆๅญฆไน ๆก†ๆžถๆๅ–่ง†้ข‘็กฌๅญ—ๅน•๏ผ› ๆœฌๅœฐ่ฏ†ๅˆซๆ— ้œ€่”็ฝ‘๏ผ› CPU่ฏ†ๅˆซ้€Ÿๅบฆๅฏ่ง‚๏ผ› ๅฎนๅ™จๆไพ›APIๆŽฅๅฃ๏ผ› ่ฟ่กŒ็Žฏๅขƒ ๆœฌ้กน็›ฎ่ฟ่กŒ็Žฏๅขƒ้žๅธธๅฅฝๆญๅปบ๏ผŒๆˆ‘ๅšๅฅฝไบ†dockerๅฎนๅ™จๅ…ๅฎ‰่ฃ…ๅ„็งๆทฑๅบฆๅญฆไน ๅŒ…๏ผ› ๆไพ›windows็•Œ้ขๆ“ไฝœ๏ผ› ๅฎนๅ™จไธบCPU็‰ˆๆœฌ๏ผ› ่ง†้ข‘ๆผ”็คบ https

ๆญŒ่€… 16 Aug 06, 2022
MacroTools provides a library of tools for working with Julia code and expressions.

MacroTools.jl MacroTools provides a library of tools for working with Julia code and expressions. This includes a powerful template-matching system an

FluxML 278 Dec 11, 2022