AOT (Associating Objects with Transformers) in PyTorch

Overview

AOT (Associating Objects with Transformers) in PyTorch

A modular reference PyTorch implementation of Associating Objects with Transformers for Video Object Segmentation (NIPS 2021). [paper]

alt text

alt text

Highlights

  • High performance: up to 85.5% (R50-AOTL) on YouTube-VOS 2018 and 82.1% (SwinB-AOTL) on DAVIS-2017 Test-dev under standard settings.
  • High efficiency: up to 51fps (AOTT) on DAVIS-2017 (480p) even with 10 objects and 41fps on YouTube-VOS (1.3x480p). AOT can process multiple objects (less than a pre-defined number, 10 in default) as efficiently as processing a single object. This project also supports inferring any number of objects together within a video by automatic separation and aggregation.
  • Multi-GPU training and inference
  • Mixed precision training and inference
  • Test-time augmentation: multi-scale and flipping augmentations are supported.

TODO

  • Code documentation
  • Demo tool
  • Adding your own dataset

Requirements

  • Python3
  • pytorch >= 1.7.0 and torchvision
  • opencv-python
  • Pillow

Optional (for better efficiency):

  • Pytorch Correlation (recommend to install from source instead of using pip)

Demo

Coming

Model Zoo and Results

Pre-trained models and corresponding results reproduced by this project can be found in MODEL_ZOO.md.

Getting Started

  1. Prepare datasets:

    Please follow the below instruction to prepare datasets in each correspondding folder.

    • Static

      datasets/Static: pre-training dataset with static images. A guidance can be found in AFB-URR.

    • YouTube-VOS

      A commonly-used large-scale VOS dataset.

      datasets/YTB/2019: version 2019, download link. train is required for training. valid (6fps) and valid_all_frames (30fps, optional) are used for evaluation.

      datasets/YTB/2018: version 2018, download link. Only valid (6fps) and valid_all_frames (30fps, optional) are required for this project and used for evaluation.

    • DAVIS

      A commonly-used small-scale VOS dataset.

      datasets/DAVIS: TrainVal (480p) contains both the training and validation split. Test-Dev (480p) contains the Test-dev split. The full-resolution version is also supported for training and evluation but not required.

  2. Prepare ImageNet pre-trained encoders

    Select and download below checkpoints into pretrain_models:

    The current default training configs are not optimized for encoders larger than ResNet-50. If you want to use larger encoders, we recommond to early stop the main-training stage at 80,000 iteration (100,000 in default) to avoid over-fitting on the seen classes of YouTube-VOS.

  3. Training and Evaluation

    The example script will train AOTT with 2 stages using 4 GPUs and auto-mixed precision (--amp). The first stage is a pre-training stage using Static dataset, and the second stage is main-training stage, which uses both YouTube-VOS 2019 train and DAVIS-2017 train for training, resulting in a model can generalize to different domains (YouTube-VOS and DAVIS) and different frame rates (6fps, 24fps, and 30fps).

    Notably, you can use only the YouTube-VOS 2019 train split in the second stage by changing pre_ytb_dav to pre_ytb, which leads to better YouTube-VOS performance on unseen classes. Besides, if you don't want to do the first stage, you can start the training from stage ytb, but the performance will drop about 1~2% absolutely.

    After the training is finished, the example script will evaluate the model on YouTube-VOS and DAVIS, and the results will be packed into Zip files. For calculating scores, please use offical YouTube-VOS servers (2018 server and 2019 server) and offical DAVIS toolkit.

Adding your own dataset

Coming

Troubleshooting

Waiting

Citations

Please consider citing the related paper(s) in your publications if it helps your research.

@inproceedings{yang2021aot,
  title={Associating Objects with Transformers for Video Object Segmentation},
  author={Yang, Zongxin and Wei, Yunchao and Yang, Yi},
  booktitle={Advances in Neural Information Processing Systems (NeurIPS)},
  year={2021}
}

License

This project is released under the BSD-3-Clause license. See LICENSE for additional details.

Owner
CS graduate student, Zhejiang University.
Streaming Anomaly Detection Framework in Python (Outlier Detection for Streaming Data)

Python Streaming Anomaly Detection (PySAD) PySAD is an open-source python framework for anomaly detection on streaming multivariate data. Documentatio

Selim Firat Yilmaz 181 Dec 18, 2022
TilinGNN: Learning to Tile with Self-Supervised Graph Neural Network (SIGGRAPH 2020)

TilinGNN: Learning to Tile with Self-Supervised Graph Neural Network (SIGGRAPH 2020) About The goal of our research problem is illustrated below: give

59 Dec 09, 2022
Alpha-IoU: A Family of Power Intersection over Union Losses for Bounding Box Regression

Alpha-IoU: A Family of Power Intersection over Union Losses for Bounding Box Regression YOLOv5 with alpha-IoU losses implemented in PyTorch. Example r

Jacobi(Jiabo He) 147 Dec 05, 2022
Flexible Option Learning - NeurIPS 2021

Flexible Option Learning This repository contains code for the paper Flexible Option Learning presented as a Spotlight at NeurIPS 2021. The implementa

Martin Klissarov 7 Nov 09, 2022
Code for “ACE-HGNN: Adaptive Curvature ExplorationHyperbolic Graph Neural Network”

ACE-HGNN: Adaptive Curvature Exploration Hyperbolic Graph Neural Network This repository is the implementation of ACE-HGNN in PyTorch. Environment pyt

9 Nov 28, 2022
Code for "Steerable Pyramid Transform Enables Robust Left Ventricle Quantification"

Code for "Steerable Pyramid Transform Enables Robust Left Ventricle Quantification" This is an end-to-end framework for accurate and robust left ventr

2 Jul 09, 2022
The toolkit to generate auto labeled datasets

Ozeu Ozeu is the toolkit to autolabal dataset for instance segmentation. You can generate datasets labaled with segmentation mask and bounding box fro

Xiong Jie 28 Mar 28, 2022
PixelPyramids: Exact Inference Models from Lossless Image Pyramids (ICCV 2021)

PixelPyramids: Exact Inference Models from Lossless Image Pyramids This repository contains the PyTorch implementation of the paper PixelPyramids: Exa

Visual Inference Lab @TU Darmstadt 8 Dec 11, 2022
🥈78th place in Riiid Solution🥈

Riiid Answer Correctness Prediction Introduction This repository is the code that placed 78th in Riiid Answer Correctness Prediction competition. Requ

ds wook 14 Apr 26, 2022
A annotation of yolov5-5.0

代码版本:0714 commit #4000 $ git clone https://github.com/ultralytics/yolov5 $ cd yolov5 $ git checkout 720aaa65c8873c0d87df09e3c1c14f3581d4ea61 这个代码只是注释版

Laughing 229 Dec 17, 2022
ICCV2021 Oral SA-ConvONet: Sign-Agnostic Optimization of Convolutional Occupancy Networks

Sign-Agnostic Convolutional Occupancy Networks Paper | Supplementary | Video | Teaser Video | Project Page This repository contains the implementation

64 Jan 05, 2023
Mini-hmc-jax - A simple implementation of Hamiltonian Monte Carlo in JAX

mini-hmc-jax This is a simple implementation of Hamiltonian Monte Carlo in JAX t

Martin Marek 6 Mar 03, 2022
Deep Residual Networks with 1K Layers

Deep Residual Networks with 1K Layers By Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun. Microsoft Research Asia (MSRA). Table of Contents Introduc

Kaiming He 856 Jan 06, 2023
thundernet ncnn

MMDetection_Lite 基于mmdetection 实现一些轻量级检测模型,安装方式和mmdeteciton相同 voc0712 voc 0712训练 voc2007测试 coco预训练 thundernet_voc_shufflenetv2_1.5 input shape mAP 320

DayBreak 39 Dec 05, 2022
Scripts of Machine Learning Algorithms from Scratch. Implementations of machine learning models and algorithms using nothing but NumPy with a focus on accessibility. Aims to cover everything from basic to advance.

Algo-ScriptML Python implementations of some of the fundamental Machine Learning models and algorithms from scratch. The goal of this project is not t

Algo Phantoms 81 Nov 26, 2022
an implementation of Video Frame Interpolation via Adaptive Separable Convolution using PyTorch

This work has now been superseded by: https://github.com/sniklaus/revisiting-sepconv sepconv-slomo This is a reference implementation of Video Frame I

Simon Niklaus 985 Jan 08, 2023
An executor that loads ONNX models and embeds documents using the ONNX runtime.

ONNXEncoder An executor that loads ONNX models and embeds documents using the ONNX runtime. Usage via Docker image (recommended) from jina import Flow

Jina AI 2 Mar 15, 2022
基于YoloX目标检测+DeepSort算法实现多目标追踪Baseline

项目简介: 使用YOLOX+Deepsort实现车辆行人追踪和计数,代码封装成一个Detector类,更容易嵌入到自己的项目中。 代码地址(欢迎star): https://github.com/Sharpiless/yolox-deepsort/ 最终效果: 运行demo: python demo

114 Dec 30, 2022
Easy-to-use,Modular and Extendible package of deep-learning based CTR models .

DeepCTR DeepCTR is a Easy-to-use,Modular and Extendible package of deep-learning based CTR models along with lots of core components layers which can

浅梦 6.6k Jan 08, 2023
Neural machine translation between the writings of Shakespeare and modern English using TensorFlow

Shakespeare translations using TensorFlow This is an example of using the new Google's TensorFlow library on monolingual translation going from modern

Motoki Wu 245 Dec 28, 2022