TCPNet - Temporal-attentive-Covariance-Pooling-Networks-for-Video-Recognition

Overview

Temporal-attentive-Covariance-Pooling-Networks-for-Video-Recognition

This is an implementation of TCPNet.

arch

Introduction

For video recognition task, a global representation summarizing the whole contents of the video snippets plays an important role for the final performance. However, existing video architectures usually generate it by using a simple, global average pooling (GAP) method, which has limited ability to capture complex dynamics of videos. For image recognition task, there exist evidences showing that covariance pooling has stronger representation ability than GAP. Unfortunately, such plain covariance pooling used in image recognition is an orderless representative, which cannot model spatio-temporal structure inherent in videos. Therefore, this paper proposes a Temporal-attentive Covariance Pooling (TCP), inserted at the end of deep architectures, to produce powerful video representations. Specifi- cally, our TCP first develops a temporal attention module to adaptively calibrate spatio-temporal features for the succeeding covariance pooling, approximatively producing attentive covariance representations. Then, a temporal covariance pooling performs temporal pooling of the attentive covariance representations to char- acterize both intra-frame correlations and inter-frame cross-correlations of the calibrated features. As such, the proposed TCP can capture complex temporal dynamics. Finally, a fast matrix power normalization is introduced to exploit geometry of covariance representations. Note that our TCP is model-agnostic and can be flexibly integrated into any video architectures, resulting in TCPNet for effective video recognition. The extensive experiments on six benchmarks (e.g., Kinetics, Something-Something V1 and Charades) using various video architectures show our TCPNet is clearly superior to its counterparts, while having strong generalization ability.

Citation

@InProceedings{Gao_2021_TCP,
                author = {Zilin, Gao and Qilong, Wang and Bingbing, Zhang and Qinghua, Hu and Peihua, Li},
                title = {Temporal-attentive Covariance Pooling Networks for Video Recognition},
                booktitle = {arxiv preprint axXiv:2021.06xxx},
                year = {2021}
  }

Model Zoo

Kinetics-400

Method Backbone frames 1 crop Acc (%) 30 views Acc (%) Model Pretrained Model test log
TCPNet TSN R50 8f 72.4/90.4 75.3/91.8 K400_TCP_TSN_R50_8f Img1K_R50_GCP log
TCPNet TEA R50 8f 73.9/91.6 76.8/92.9 K400_TCP_TEA_R50_8f Img1K_Res2Net50_GCP log
TCPNet TSN R152 8f 75.7/92.2 78.3/93.7 K400_TCP_TSN_R152_8f Img11K_1K_R152_GCP log
TCPNet TSN R50 16f 73.9/91.2 75.8/92.1 K400_TCP_TSN_R50_16f Img1K_R50_GCP log
TCPNet TEA R50 16f 75.3/92.2 77.2/93.1 K400_TCP_TEA_R50_16f Img1K_Res2Net50_GCP log
TCPNet TSN R152 16f 77.2/93.1 79.3/94.0 K400_TCP_TSN_R152_16f Img11K_1K_R152_GCP TODO

Mini-Kinetics-200

Method Backbone frames 1 crop Acc (%) 30 views Acc (%) Model Pretrained Model
TCPNet TSN R50 8f 78.7 80.7 K200_TCP_TSN_8f K400_TCP_TSN_R50_8f

Environments

pytorch v1.0+(for TCP_TSN); v1.0~1.4(for TCP+TEA)

ffmpeg

graphviz pip install graphviz

tensorboard pip install tensorboardX

tqdm pip install tqdm

scikit-learn conda install scikit-learn

matplotlib conda install -c conda-forge matplotlib

fvcore pip install 'git+https://github.com/facebookresearch/fvcore'

Dataset Preparation

We provide a detailed dataset preparation guideline for Kinetics-400 and Mini-Kinetics-200. See Dataset preparation.

StartUp

  1. download the pretrained model and put it in pretrained_models/
  2. execute the training script file e.g.: sh script/K400/train_TCP_TSN_8f_R50.sh
  3. execute the inference script file e.g.: sh script/K400/test_TCP_TSN_R50_8f.sh

TCP Code


├── ops
|    ├── TCP
|    |   ├── TCP_module.py
|    |   ├── TCP_att_module.py
|    |   ├── TSA.py
|    |   └── TCA.py
|    ├ ...
├ ...

Acknowledgement

  • We thank TSM for providing well-designed 2D action recognition toolbox.
  • We also refer to some functions from iSQRT, TEA and Non-local.
  • Mini-K200 dataset samplling strategy follows Mini_K200.
  • We would like to thank Facebook for developing pytorch toolbox.

Thanks for their work!

Owner
Zilin Gao
Zilin Gao
Source code of all the projects of Udacity Self-Driving Car Engineer Nanodegree.

self-driving-car In this repository I will share the source code of all the projects of Udacity Self-Driving Car Engineer Nanodegree. Hope this might

Andrea Palazzi 2.4k Dec 29, 2022
meProp: Sparsified Back Propagation for Accelerated Deep Learning

meProp The codes were used for the paper meProp: Sparsified Back Propagation for Accelerated Deep Learning with Reduced Overfitting (ICML 2017) [pdf]

LancoPKU 107 Nov 18, 2022
git《Beta R-CNN: Looking into Pedestrian Detection from Another Perspective》(NeurIPS 2020) GitHub:[fig3]

Beta R-CNN: Looking into Pedestrian Detection from Another Perspective This is the pytorch implementation of our paper "[Beta R-CNN: Looking into Pede

35 Sep 08, 2021
Code for testing convergence rates of Lipschitz learning on graphs

📈 LipschitzLearningRates The code in this repository reproduces the experimental results on convergence rates for k-nearest neighbor graph infinity L

2 Dec 20, 2021
Hierarchical Aggregation for 3D Instance Segmentation (ICCV 2021)

HAIS Hierarchical Aggregation for 3D Instance Segmentation (ICCV 2021) by Shaoyu Chen, Jiemin Fang, Qian Zhang, Wenyu Liu, Xinggang Wang*. (*) Corresp

Hust Visual Learning Team 145 Jan 05, 2023
Network Pruning That Matters: A Case Study on Retraining Variants (ICLR 2021)

Network Pruning That Matters: A Case Study on Retraining Variants (ICLR 2021)

Duong H. Le 18 Jun 13, 2022
😮The official implementation of "CoNeRF: Controllable Neural Radiance Fields" 😮

CoNeRF: Controllable Neural Radiance Fields This is the official implementation for "CoNeRF: Controllable Neural Radiance Fields" Project Page Paper V

Kacper Kania 61 Dec 24, 2022
This is a vision-based 3d model manipulation and control UI

Manipulation of 3D Models Using Hand Gesture This program allows user to manipulation 3D models (.obj format) with their hands. The project support bo

Cortic Technology Corp. 43 Oct 23, 2022
Build upon neural radiance fields to create a scene-specific implicit 3D semantic representation, Semantic-NeRF

Semantic-NeRF: Semantic Neural Radiance Fields Project Page | Video | Paper | Data In-Place Scene Labelling and Understanding with Implicit Scene Repr

Shuaifeng Zhi 243 Jan 07, 2023
Deep Q-Learning Network in pytorch (not actively maintained)

pytoch-dqn This project is pytorch implementation of Human-level control through deep reinforcement learning and I also plan to implement the followin

Hung-Tu Chen 342 Jan 01, 2023
Galaxy images labelled by morphology (shape). Aimed at ML development and teaching

Galaxy images labelled by morphology (shape). Aimed at ML debugging and teaching.

Mike Walmsley 14 Nov 28, 2022
Based on the given clinical dataset, Predict whether the patient having Heart Disease or Not having Heart Disease

Heart_Disease_Classification Based on the given clinical dataset, Predict whether the patient having Heart Disease or Not having Heart Disease Dataset

Ashish 1 Jan 30, 2022
An implementation of IMLE-Net: An Interpretable Multi-level Multi-channel Model for ECG Classification

IMLE-Net: An Interpretable Multi-level Multi-channel Model for ECG Classification The repostiory consists of the code, results and data set links for

12 Dec 26, 2022
This Repo is the official CUDA implementation of ICCV 2019 Oral paper for CARAFE: Content-Aware ReAssembly of FEatures

Introduction This Repo is the official CUDA implementation of ICCV 2019 Oral paper for CARAFE: Content-Aware ReAssembly of FEatures. @inproceedings{Wa

Jiaqi Wang 42 Jan 07, 2023
General purpose GPU compute framework for cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends)

General purpose GPU compute framework for cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). Blazing fast, mobile-enabled, asynchronous and optimized for advanced GPU data processing usec

The Kompute Project 1k Jan 06, 2023
The official implementation of You Only Compress Once: Towards Effective and Elastic BERT Compression via Exploit-Explore Stochastic Nature Gradient.

You Only Compress Once: Towards Effective and Elastic BERT Compression via Exploit-Explore Stochastic Nature Gradient (paper) @misc{zhang2021compress,

46 Dec 07, 2022
CONetV2: Efficient Auto-Channel Size Optimization for CNNs

CONetV2: Efficient Auto-Channel Size Optimization for CNNs Exciting News! CONetV2: Efficient Auto-Channel Size Optimization for CNNs has been accepted

Mahdi S. Hosseini 3 Dec 13, 2021
Keyword-BERT: Keyword-Attentive Deep Semantic Matching

project discription An implementation of the Keyword-BERT model mentioned in my paper Keyword-Attentive Deep Semantic Matching (Plz cite this github r

1 Nov 14, 2021
Unified MultiWOZ evaluation scripts for the context-to-response task.

MultiWOZ Context-to-Response Evaluation Standardized and easy to use Inform, Success, BLEU ~ See the paper ~ Easy-to-use scripts for standardized eval

Tomáš Nekvinda 38 Dec 13, 2022
adversarial_multi_armed_bandit_variable_plays

Adversarial Multi-Armed Bandit with Variable Plays This code is for paper: Adversarial Online Learning with Variable Plays in the Evasion-and-Pursuit

Yiyang Wang 1 Oct 28, 2021