TCPNet - Temporal-attentive-Covariance-Pooling-Networks-for-Video-Recognition

Last update: Dec 08, 2022

Overview

Temporal-attentive-Covariance-Pooling-Networks-for-Video-Recognition

This is an implementation of TCPNet.

Introduction

For video recognition task, a global representation summarizing the whole contents of the video snippets plays an important role for the final performance. However, existing video architectures usually generate it by using a simple, global average pooling (GAP) method, which has limited ability to capture complex dynamics of videos. For image recognition task, there exist evidences showing that covariance pooling has stronger representation ability than GAP. Unfortunately, such plain covariance pooling used in image recognition is an orderless representative, which cannot model spatio-temporal structure inherent in videos. Therefore, this paper proposes a Temporal-attentive Covariance Pooling (TCP), inserted at the end of deep architectures, to produce powerful video representations. Specifi- cally, our TCP first develops a temporal attention module to adaptively calibrate spatio-temporal features for the succeeding covariance pooling, approximatively producing attentive covariance representations. Then, a temporal covariance pooling performs temporal pooling of the attentive covariance representations to char- acterize both intra-frame correlations and inter-frame cross-correlations of the calibrated features. As such, the proposed TCP can capture complex temporal dynamics. Finally, a fast matrix power normalization is introduced to exploit geometry of covariance representations. Note that our TCP is model-agnostic and can be flexibly integrated into any video architectures, resulting in TCPNet for effective video recognition. The extensive experiments on six benchmarks (e.g., Kinetics, Something-Something V1 and Charades) using various video architectures show our TCPNet is clearly superior to its counterparts, while having strong generalization ability.

Citation

@InProceedings{Gao_2021_TCP,
                author = {Zilin, Gao and Qilong, Wang and Bingbing, Zhang and Qinghua, Hu and Peihua, Li},
                title = {Temporal-attentive Covariance Pooling Networks for Video Recognition},
                booktitle = {arxiv preprint axXiv:2021.06xxx},
                year = {2021}
  }

Model Zoo

Kinetics-400

Method	Backbone	frames	1 crop Acc (%)	30 views Acc (%)	Model	Pretrained Model	test log
TCPNet	TSN R50	8f	72.4/90.4	75.3/91.8	K400_TCP_TSN_R50_8f	Img1K_R50_GCP	log
TCPNet	TEA R50	8f	73.9/91.6	76.8/92.9	K400_TCP_TEA_R50_8f	Img1K_Res2Net50_GCP	log
TCPNet	TSN R152	8f	75.7/92.2	78.3/93.7	K400_TCP_TSN_R152_8f	Img11K_1K_R152_GCP	log
TCPNet	TSN R50	16f	73.9/91.2	75.8/92.1	K400_TCP_TSN_R50_16f	Img1K_R50_GCP	log
TCPNet	TEA R50	16f	75.3/92.2	77.2/93.1	K400_TCP_TEA_R50_16f	Img1K_Res2Net50_GCP	log
TCPNet	TSN R152	16f	77.2/93.1	79.3/94.0	K400_TCP_TSN_R152_16f	Img11K_1K_R152_GCP	TODO

Mini-Kinetics-200

Method	Backbone	frames	1 crop Acc (%)	30 views Acc (%)	Model	Pretrained Model
TCPNet	TSN R50	8f	78.7	80.7	K200_TCP_TSN_8f	K400_TCP_TSN_R50_8f

Environments

pytorch v1.0+(for TCP_TSN); v1.0~1.4(for TCP+TEA)

ffmpeg

graphviz pip install graphviz

tensorboard pip install tensorboardX

tqdm pip install tqdm

scikit-learn conda install scikit-learn

matplotlib conda install -c conda-forge matplotlib

fvcore pip install 'git+https://github.com/facebookresearch/fvcore'

Dataset Preparation

We provide a detailed dataset preparation guideline for Kinetics-400 and Mini-Kinetics-200. See Dataset preparation.

StartUp

download the pretrained model and put it in pretrained_models/
execute the training script file e.g.: sh script/K400/train_TCP_TSN_8f_R50.sh
execute the inference script file e.g.: sh script/K400/test_TCP_TSN_R50_8f.sh

TCP Code


├── ops
|    ├── TCP
|    |   ├── TCP_module.py
|    |   ├── TCP_att_module.py
|    |   ├── TSA.py
|    |   └── TCA.py
|    ├ ...
├ ...

Acknowledgement

We thank TSM for providing well-designed 2D action recognition toolbox.
We also refer to some functions from iSQRT, TEA and Non-local.
Mini-K200 dataset samplling strategy follows Mini_K200.
We would like to thank Facebook for developing pytorch toolbox.

Thanks for their work!

TCPNet - Temporal-attentive-Covariance-Pooling-Networks-for-Video-Recognition

Related tags

Overview

Temporal-attentive-Covariance-Pooling-Networks-for-Video-Recognition

Introduction

Citation

Model Zoo

Kinetics-400

Mini-Kinetics-200

Environments

Dataset Preparation

StartUp

TCP Code

Acknowledgement

Owner

Zilin Gao

Official Implementation of DE-DETR and DELA-DETR in "Towards Data-Efficient Detection Transformers"

Fast Differentiable Matrix Sqrt Root

MARE - Multi-Attribute Relation Extraction

meProp: Sparsified Back Propagation for Accelerated Deep Learning

Video Instance Segmentation using Inter-Frame Communication Transformers (NeurIPS 2021)

The source code for CATSETMAT: Cross Attention for Set Matching in Bipartite Hypergraphs

Event sourced bank - A wide-and-shallow example using the Python event sourcing library

《Deep Single Portrait Image Relighting》(ICCV 2019)

Federated Deep Reinforcement Learning for the Distributed Control of NextG Wireless Networks.

Codebase for the self-supervised goal reaching benchmark introduced in the LEXA paper

Unofficial implementation of "Coordinate Attention for Efficient Mobile Network Design"

Multi-Horizon-Forecasting-for-Limit-Order-Books

SVG Icon processing tool for C++

FinRL-Meta: A Universe for Data-Driven Financial Reinforcement Learning. 🔥

Fibonacci Method Gradient Descent

NeuPy is a Tensorflow based python library for prototyping and building neural networks

🤗 Paper Style Guide

AdamW optimizer for bfloat16 models in pytorch.

KoCLIP: Korean port of OpenAI CLIP, in Flax

This is an implementation of Googles Yogi-Optimizer in Keras (tf.keras)

TCPNet - Temporal-attentive-Covariance-Pooling-Networks-for-Video-Recognition

Related tags

Overview

Temporal-attentive-Covariance-Pooling-Networks-for-Video-Recognition

Introduction

Citation

Model Zoo

Kinetics-400

Mini-Kinetics-200

Environments

Dataset Preparation

StartUp

TCP Code

Acknowledgement

Owner

Zilin Gao

Official Implementation of DE-DETR and DELA-DETR in "Towards Data-Efficient Detection Transformers"

Fast Differentiable Matrix Sqrt Root

MARE - Multi-Attribute Relation Extraction

meProp: Sparsified Back Propagation for Accelerated Deep Learning

Video Instance Segmentation using Inter-Frame Communication Transformers (NeurIPS 2021)

The source code for CATSETMAT: Cross Attention for Set Matching in Bipartite Hypergraphs

Event sourced bank - A wide-and-shallow example using the Python event sourcing library

《Deep Single Portrait Image Relighting》(ICCV 2019)

Federated Deep Reinforcement Learning for the Distributed Control of NextG Wireless Networks.

Codebase for the self-supervised goal reaching benchmark introduced in the LEXA paper

Unofficial implementation of "Coordinate Attention for Efficient Mobile Network Design"

Multi-Horizon-Forecasting-for-Limit-Order-Books

SVG Icon processing tool for C++

FinRL­-Meta: A Universe for Data­-Driven Financial Reinforcement Learning. 🔥

Fibonacci Method Gradient Descent

NeuPy is a Tensorflow based python library for prototyping and building neural networks

🤗 Paper Style Guide

AdamW optimizer for bfloat16 models in pytorch.

KoCLIP: Korean port of OpenAI CLIP, in Flax

This is an implementation of Googles Yogi-Optimizer in Keras (tf.keras)

FinRL-Meta: A Universe for Data-Driven Financial Reinforcement Learning. 🔥