WebUAV-3M: A Benchmark Unveiling the Power of Million-Scale Deep UAV Tracking [Paper Link]

Abstract

In this work, we contribute a new million-scale Unmanned Aerial Vehicle (UAV) tracking benchmark, called WebUAV-3M. Firstly, we collect 4,485 videos with more than 3M frames from the Internet. Then, an efficient and scalable Semi-Automatic Target Annotation (SATA) pipeline is devised to label the tremendous WebUAV-3M in every frame. To the best of our knowledge, the densely bounding box annotated WebUAV-3M is by far the largest public UAV tracking benchmark. We expect to pave the way for the follow-up study in the UAV tracking by establishing a million-scale annotated benchmark covering a wide range of target categories. Moreover, considering the close connections among visual appearance, natural language and audio, we enrich WebUAV-3M by providing natural language specification and audio description, encouraging the exploration of natural language features and audio cues for UAV tracking. Equipped with this benchmark, we delve into million-scale deep UAV tracking problems, aiming to provide the community with a dedicated large-scale benchmark for training deep UAV trackers and evaluating UAV tracking approaches. Extensive experiments on WebUAV-3M demonstrate that there is still a big room for robust deep UAV tracking improvements. The dataset, toolkits and baseline results will be available at this page.

WebUAV-3M dataset

Dataset coming here soon...

Evaluation toolkits

Toolkits coming here soon...

Baseline results

Results coming here soon...

Environment

The experiments are implemented using PyTorch or MATLAB with an Intel (R) Xeon (R) Gold 6230R CPU @ 2.10GHz and three NVIDIA RTX A5000 GPUs on an Ubuntu 18.04 server.

Citation

If you find the dataset and toolkits useful in your research, please consider citing:

@inproceedings{WebUAV_3M_2022,
    title={WebUAV-3M: A Benchmark Unveiling the Power of Million-Scale Deep UAV Tracking},
    author = {Chunhui Zhang, and Guanjie Huang, and Li Liu, and Shan Huang, and Yinan Yang, and Yuxuan Zhang, and Xiang Wan, and Shiming Ge},
    journal = {arXiv:2201.07425},
    year = {2022}
  }

Acknowledgments

Thanks for the great [GOT-10k toolkit]

WebUAV-3M: A Benchmark Unveiling the Power of Million-Scale Deep UAV Tracking

Related tags

Overview

WebUAV-3M: A Benchmark Unveiling the Power of Million-Scale Deep UAV Tracking [Paper Link]

Abstract

WebUAV-3M dataset

Evaluation toolkits

Baseline results

Environment

Citation

Acknowledgments

Owner

Repository for publicly available deep learning models developed in Rosetta community

Pytorch implementation of the paper: "SAPNet: Segmentation-Aware Progressive Network for Perceptual Contrastive Image Deraining"

Serve TensorFlow ML models with TF-Serving and then create a Streamlit UI to use them

Main repository for the HackBio'2021 Virtual Internship Experience for #Team-Greider ❤️

Learning Continuous Image Representation with Local Implicit Image Function

PyTorch Implementation of CvT: Introducing Convolutions to Vision Transformers

Image-retrieval-baseline - MUGE Multimodal Retrieval Baseline

CPU inference engine that delivers unprecedented performance for sparse models

Code for NeurIPS 2021 paper 'Spatio-Temporal Variational Gaussian Processes'

Evaluating AlexNet features at various depths

The Instructed Glacier Model (IGM)

Python scripts for performing road segemtnation and car detection using the HybridNets multitask model in ONNX.

Unofficial PyTorch implementation of TokenLearner by Google AI

JORLDY an open-source Reinforcement Learning (RL) framework provided by KakaoEnterprise

Civsim is a basic civilisation simulation and modelling system built in Python 3.8.

Code for LIGA-Stereo Detector, ICCV'21

Interactive Image Segmentation via Backpropagating Refinement Scheme

RTS3D: Real-time Stereo 3D Detection from 4D Feature-Consistency Embedding Space for Autonomous Driving

Official implementation of Protected Attribute Suppression System, ICCV 2021

A Python library for common tasks on 3D point clouds