Learning to Track with Object Permanence

A video-based MOT approach capable of tracking through full occlusions:

Learning to Track with Object Permanence,
Pavel Tokmakov, Jie Li, Wolfram Burgard, Adrien Gaidon,
arXiv technical report (arXiv 2103.14258)

@inproceedings{tokmakov2021learning,
  title={Learning to Track with Object Permanence},
  author={Tokmakov, Pavel and Li, Jie and Burgard, Wolfram and Gaidon, Adrien},
  booktitle={ICCV},
  year={2021}
}

Abstract

Tracking by detection, the dominant approach for online multi-object tracking, alternates between localization and association steps. As a result, it strongly depends on the quality of instantaneous observations, often failing when objects are not fully visible. In contrast, tracking in humans is underlined by the notion of object permanence: once an object is recognized, we are aware of its physical existence and can approximately localize it even under full occlusions. In this work, we introduce an end-to-end trainable approach for joint object detection and tracking that is capable of such reasoning. We build on top of the recent CenterTrack architecture, which takes pairs of frames as input, and extend it to videos of arbitrary length. To this end, we augment the model with a spatio-temporal, recurrent memory module, allowing it to reason about object locations and identities in the current frame using all the previous history. It is, however, not obvious how to train such an approach. We study this question on a new, large-scale, synthetic dataset for multi-object tracking, which provides ground truth annotations for invisible objects, and propose several approaches for supervising tracking behind occlusions. Our model, trained jointly on synthetic and real data, outperforms the state of the art on KITTI and MOT17 datasets thanks to its robustness to occlusions.

Installation

Please refer to INSTALL.md for installation instructions.

Benchmark Evaluation and Training

After installation, follow the instructions in DATA.md to setup the datasets. Then check GETTING_STARTED.md to reproduce the results in the paper. We provide scripts for all the experiments in the experiments folder.

License

PermaTrack is developed upon CenterTrack. Both codebases are released under MIT License themselves. Some code of CenterTrack are from third-parties with different licenses, please check the CenterTrack repo for details. In addition, this repo uses py-motmetrics for MOT evaluation, nuscenes-devkit for nuScenes evaluation and preprocessing, and TAO codebase for computing Track AP. ConvGRU implementation is adopted from this repo. See NOTICE for detail. Please note the licenses of each dataset. Most of the datasets we used in this project are under non-commercial licenses.

Implementation for Learning to Track with Object Permanence

Related tags

Overview

Learning to Track with Object Permanence

Abstract

Installation

Benchmark Evaluation and Training

License

Owner

Toyota Research Institute - Machine Learning

Implementation of GeoDiff: a Geometric Diffusion Model for Molecular Conformation Generation (ICLR 2022).

Source code of the paper PatchGraph: In-hand tactile tracking with learned surface normals.

Empower Sequence Labeling with Task-Aware Language Model

The object detection pipeline is based on Ultralytics YOLOv5

A pytorch implementation of Detectron. Both training from scratch and inferring directly from pretrained Detectron weights are available.

Dense Prediction Transformers

PyTorch code for the NAACL 2021 paper "Improving Generation and Evaluation of Visual Stories via Semantic Consistency"

Cancer-and-Tumor-Detection-Using-Inception-model - In this repo i am gonna show you how i did cancer/tumor detection in lungs using deep neural networks, specifically here the Inception model by google.

Utilities to bridge Canvas-generated course rosters with GitLab's API.

A simple algorithm for extracting tree height in sparse scene from point cloud data.

4D Human Body Capture from Egocentric Video via 3D Scene Grounding

Sequence modeling benchmarks and temporal convolutional networks

Object detection, 3D detection, and pose estimation using center point detection:

This is a pytorch implementation for the BST model from Alibaba https://arxiv.org/pdf/1905.06874.pdf

PyGRANSO: A PyTorch-enabled port of GRANSO with auto-differentiation

Gapmm2: gapped alignment using minimap2 (align transcripts to genome)

Official code implementation for "Personalized Federated Learning using Hypernetworks"

Web service for facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation based on OpenFace 2.0

Source code for Acorn, the precision farming rover by Twisted Fields

The code for 'Deep Residual Fourier Transformation for Single Image Deblurring'