Learning Tracking Representations via Dual-Branch Fully Transformer Networks

DualTFR

⭐ We achieves the runner-ups for both VOT2021ST (short-term) and RT(real-time). The variants of DualTFR take 3rd/4th places of VOT2020RT and 4th places of VOT2020ST

For VOT21 challenge model weight download:

We provide the models of Five trackers SAMN, SAMN_DiMP, DualTFR, DualTFRst, DualTFRon here.

Note that the AlphaRefine (https://github.com/MasterBin-IIAU/AlphaRefine) model and SuperDiMP (https://github.com/visionml/pytracking) model are the same with the original author.

Tracker	model quantity	model name
SAMN	1	SAMN.tar
SAMN_DiMP	2	super_dimp.pth.tar, SAMN.tar
DualTFR	2	DualTFR.tar, ar.pth.tar
DualTFRst	2	DualTFRst.tar, ar.pth.tar
DualTFRon	2	DualTFRon.tar, ar.pth.tar

Models can be downloaded from BaiduNetDisk or GoogleDrive:

BaiduNetDisk:

https://pan.baidu.com/s/1RHA7HVlXtNEzYPGIjJbQ-g (sruh)

GoogleDrive:

https://drive.google.com/drive/folders/1Z61_mfh2vwzqDxejt5idBOgYhWOCZOr5?usp=sharing

Code will be released soon.

We present a simple Siamese-like Dual-branch network based on solely Transformer networks to learn about tracking features. Given a template and a search image, we divide them into non-overlapping image patches and extract a feature vector for each based on its matching results with others within an attention window. Then for each token, we estimate whether it contains the target object and the corresponding size. The prominent advantage of the approach is that the features are learned from matching, and ultimately, for matching. So the features are aligned with the subsequent object tracking task. The method achieves comparable results comparing to the best-performing methods which first use CNN to extract features and then use Transformer to fuse them. Without bells and whistles, it outperforms the state-of-the-art methods on GOT-10k and VOT2020 benchmarks. In addition, the method achieves real-time inference speed (about 40 fps).

Acknowledgments

Thanks for the great PyTracking Library, which helps us to quickly implement our ideas.
We use the implementation of the Swin Transformer from the official repo https://github.com/microsoft/Swin-Transformer.

Contacts

Fei Xie, School of Automation, Southeast University, China, [email protected], wechat: 372998044

Learning Tracking Representations via Dual-Branch Fully Transformer Networks

Related tags

Overview

Learning Tracking Representations via Dual-Branch Fully Transformer Networks

DualTFR

For VOT21 challenge model weight download:

Code will be released soon.

Acknowledgments

Contacts

Owner

phiphi

Vector.ai assignment

Matlab Python Heuristic Battery Opt - SMOP conversion and manual conversion

AFLFast (extends AFL with Power Schedules)

MapReader: A computer vision pipeline for the semantic exploration of maps at scale

MARE - Multi-Attribute Relation Extraction

Multi-task Multi-agent Soft Actor Critic for SMAC

Repository of the paper Compressing Sensor Data for Remote Assistance of Autonomous Vehicles using Deep Generative Models at ML4AD @ NeurIPS 2021.

EgGateWayGetShell py脚本

[PAMI 2020] Show, Match and Segment: Joint Weakly Supervised Learning of Semantic Matching and Object Co-segmentation

Qt-GUI implementation of the YOLOv5 algorithm (ver.6 and ver.5)

A pre-trained language model for social media text in Spanish

RealTime Emotion Recognizer for Machine Learning Study Jam's demo

From the basics to slightly more interesting applications of Tensorflow

「PyTorch Implementation of AnimeGANv2」を用いて、生成した顔画像を元の画像に上書きするデモ

Implementation of Hourglass Transformer, in Pytorch, from Google and OpenAI

Deep Markov Factor Analysis (NeurIPS2021)

This reporistory contains the test-dev data of the paper "xGQA: Cross-lingual Visual Question Answering".

Multi-modal Vision Transformers Excel at Class-agnostic Object Detection

Python library for computer vision labeling tasks. The core functionality is to translate bounding box annotations between different formats-for example, from coco to yolo.

Pytorch implementation of ICASSP 2022 paper Attention Probe: Vision Transformer Distillation in the Wild