TDN: Temporal Difference Networks for Efficient Action Recognition

Overview

TDN: Temporal Difference Networks for Efficient Action Recognition

1

Overview

We release the PyTorch code of the TDN(Temporal Difference Networks). This code is based on the TSN and TSM codebase. The core code to implement the Temporal Difference Module are ops/base_module.py and ops/tdn_net.py.

🔥 [NEW!] We have released the PyTorch code of TDN.

Prerequisites

The code is built with following libraries:

Data Preparation

We have successfully trained TDN on Kinetics400, UCF101, HMDB51, Something-Something-V1 and V2 with this codebase.

  • The processing of Something-Something-V1 & V2 can be summarized into 3 steps:

    1. Extract frames from videos(you can use ffmpeg to get frames from video)
    2. Generate annotations needed for dataloader (" " in annotations) The annotation usually includes train.txt and val.txt. The format of *.txt file is like:
      frames/video_1 num_frames label_1
      frames/video_2 num_frames label_2
      frames/video_3 num_frames label_3
      ...
      frames/video_N num_frames label_N
      
    3. Add the information to ops/dataset_configs.py
  • The processing of Kinetics400 can be summarized into 2 steps:

    1. Generate annotations needed for dataloader (" " in annotations) The annotation usually includes train.txt and val.txt. The format of *.txt file is like:
      frames/video_1.mp4  label_1
      frames/video_2.mp4  label_2
      frames/video_3.mp4  label_3
      ...
      frames/video_N.mp4  label_N
      
    2. Add the information to ops/dataset_configs.py

Model Zoo

Here we provide some off-the-shelf pretrained models. The accuracy might vary a little bit compared to the paper, since the raw video of Kinetics downloaded by users may have some differences.

Something-Something-V1

Model Frames x Crops x Clips Top-1 Top-5 checkpoint
TDN-ResNet50 8x1x1 52.3% 80.6% link
TDN-ResNet50 16x1x1 53.9% 82.1% link

Something-Something-V2

Model Frames x Crops x Clips Top-1 Top-5 checkpoint
TDN-ResNet50 8x1x1 64.0% 88.8% link
TDN-ResNet50 16x1x1 65.3% 89.7% link

Kinetics400

Model Frames x Crops x Clips Top-1 (30 view) Top-5 (30 view) checkpoint
TDN-ResNet50 8x3x10 76.6% 92.8% link
TDN-ResNet50 16x3x10 77.5% 93.2% link
TDN-ResNet101 8x3x10 77.5% 93.6% link
TDN-ResNet101 16x3x10 78.5% 93.9% link

Testing

  • For center crop single clip, the processing of testing can be summarized into 2 steps:
    1. Run the following testing scripts:
      CUDA_VISIBLE_DEVICES=0 python3 test_models_center_crop.py something \
      --archs='resnet50' --weights   --test_segments=8  \
      --test_crops=1 --batch_size=16  --gpus 0 --output_dir  -j 4 --clip_index=1
      
    2. Run the following scripts to get result from the raw score:
      python3 pkl_to_results.py --num_clips 1 --test_crops 1 --output_dir   
      
  • For 3 crops, 10 clips, the processing of testing can be summarized into 2 steps:
    1. Run the following testing scripts for 10 times(clip_index from 0 to 9):
      CUDA_VISIBLE_DEVICES=0 python3 test_models_three_crops.py  kinetics \
      --archs='resnet50' --weights   --test_segments=8 \
      --test_crops=3 --batch_size=16 --full_res --gpus 0 --output_dir   \
      -j 4 --clip_index 
      
    2. Run the following scripts to ensemble the raw score of the 30 views:
      python pkl_to_results.py --num_clips 10 --test_crops 3 --output_dir  
      

Training

This implementation supports multi-gpu, DistributedDataParallel training, which is faster and simpler.

  • For example, to train TDN-ResNet50 on Something-Something-V1 with 8 gpus, you can run:
    python -m torch.distributed.launch --master_port 12347 --nproc_per_node=8 \
                main.py  something  RGB --arch resnet50 --num_segments 8 --gd 20 --lr 0.02 \
                --lr_scheduler step --lr_steps  30 45 55 --epochs 60 --batch-size 16 \
                --wd 5e-4 --dropout 0.5 --consensus_type=avg --eval-freq=1 -j 4 --npb 
    
  • For example, to train TDN-ResNet50 on Kinetics400 with 8 gpus, you can run:
    python -m torch.distributed.launch --master_port 12347 --nproc_per_node=8 \
            main.py  kinetics RGB --arch resnet50 --num_segments 8 --gd 20 --lr 0.02 \
            --lr_scheduler step  --lr_steps 50 75 90 --epochs 100 --batch-size 16 \
            --wd 1e-4 --dropout 0.5 --consensus_type=avg --eval-freq=1 -j 4 --npb 
    

Acknowledgements

We especially thank the contributors of the TSN and TSM codebase for providing helpful code.

License

This repository is released under the Apache-2.0. license as found in the LICENSE file.

Citation

If you think our work is useful, please feel free to cite our paper 😆 :

@article{wang2020tdn,
      title={TDN: Temporal Difference Networks for Efficient Action Recognition}, 
      author={Limin Wang and Zhan Tong and Bin Ji and Gangshan Wu},
      journal={arXiv preprint arXiv:2012.10071},
      year={2020}
}
Owner
Multimedia Computing Group, Nanjing University
Multimedia Computing Group, Nanjing University
PyTorch implementation of EigenGAN

PyTorch Implementation of EigenGAN Train python train.py [image_folder_path] --name [experiment name] Test python test.py [ckpt path] --traverse FFH

62 Nov 12, 2022
Implementation of UNet on the Joey ML framework

Independent Research Project - Code Joey can be cloned from here https://github.com/devitocodes/joey/. Devito and other dependencies such as PyTorch a

Navjot Kukreja 1 Oct 21, 2021
Single-Stage 6D Object Pose Estimation, CVPR 2020

Overview This repository contains the code for the paper Single-Stage 6D Object Pose Estimation. Yinlin Hu, Pascal Fua, Wei Wang and Mathieu Salzmann.

CVLAB @ EPFL 89 Dec 26, 2022
A framework that allows people to write their own Rocket League bots.

YOU PROBABLY SHOULDN'T PULL THIS REPO Bot Makers Read This! If you just want to make a bot, you don't need to be here. Instead, start with one of thes

543 Dec 20, 2022
Lightweight Cuda Renderer with Python Wrapper.

pyRender Lightweight Cuda Renderer with Python Wrapper. Compile Change compile.sh line 5 to the glm library include path. This library can be download

Jingwei Huang 53 Dec 02, 2022
Code to accompany the paper "Finding Bipartite Components in Hypergraphs", which is published in NeurIPS'21.

Finding Bipartite Components in Hypergraphs This repository contains code to accompany the paper "Finding Bipartite Components in Hypergraphs", publis

Peter Macgregor 5 May 06, 2022
Change Detection in SAR Images Based on Multiscale Capsule Network

SAR_CD_MS_CapsNet Code for the paper "Change Detection in SAR Images Based on Multiscale Capsule Network" , IEEE Geoscience and Remote Sensing Letters

Feng Gao 21 Nov 29, 2022
Python package provinding tools for artistic interactive applications using AI

Documentation redrawing Python package provinding tools for artistic interactive applications using AI Created by ReDrawing Campinas team for the Open

ReDrawing Campinas 1 Sep 30, 2021
李云龙二次元风格化!打滚卖萌,使用了animeGANv2进行了视频的风格迁移

李云龙二次元风格化!一键star、fork,你也可以生成这样的团长! 打滚卖萌求star求fork! 0.效果展示 视频效果前往B站观看效果最佳:李云龙二次元风格化: github开源repo:李云龙二次元风格化 百度AIstudio开源地址,一键fork即可运行: 李云龙二次元风格化!一键fork

oukohou 44 Dec 04, 2022
A multilingual version of MS MARCO passage ranking dataset

mMARCO A multilingual version of MS MARCO passage ranking dataset This repository presents a neural machine translation-based method for translating t

75 Dec 27, 2022
This is the repo for the paper "Improving the Accuracy-Memory Trade-Off of Random Forests Via Leaf-Refinement".

Improving the Accuracy-Memory Trade-Off of Random Forests Via Leaf-Refinement This is the repository for the paper "Improving the Accuracy-Memory Trad

3 Dec 29, 2022
Real-time face detection and emotion/gender classification using fer2013/imdb datasets with a keras CNN model and openCV.

Real-time face detection and emotion/gender classification using fer2013/imdb datasets with a keras CNN model and openCV.

Octavio Arriaga 5.3k Dec 30, 2022
Wileless-PDGNet Implementation

Wileless-PDGNet Implementation This repo is related to the following paper: Boning Li, Ananthram Swami, and Santiago Segarra, "Power allocation for wi

6 Oct 04, 2022
Training and Evaluation Code for Neural Volumes

Neural Volumes This repository contains training and evaluation code for the paper Neural Volumes. The method learns a 3D volumetric representation of

Meta Research 370 Dec 08, 2022
Sound Event Detection with FilterAugment

Sound Event Detection with FilterAugment Official implementation of Heavily Augmented Sound Event Detection utilizing Weak Predictions (DCASE2021 Chal

43 Aug 28, 2022
Imaginaire - NVIDIA's Deep Imagination Team's PyTorch Library

Imaginaire Docs | License | Installation | Model Zoo Imaginaire is a pytorch library that contains optimized implementation of several image and video

NVIDIA Research Projects 3.6k Dec 29, 2022
A Blender python script for getting asset browser custom preview images for objects and collections.

asset_snapshot A Blender python script for getting asset browser custom preview images for objects and collections. Installation: Click the code butto

Johnny Matthews 44 Nov 29, 2022
Real-time VIBE: Frame by Frame Inference of VIBE (Video Inference for Human Body Pose and Shape Estimation)

Real-time VIBE Inference VIBE frame-by-frame. Overview This is a frame-by-frame inference fork of VIBE at [https://github.com/mkocabas/VIBE]. Usage: i

23 Jul 02, 2022
机器学习、深度学习、自然语言处理等人工智能基础知识总结。

说明 机器学习、深度学习、自然语言处理基础知识总结。 目前主要参考李航老师的《统计学习方法》一书,也有一些内容例如XGBoost、聚类、深度学习相关内容、NLP相关内容等是书中未提及的。

Peter 445 Dec 12, 2022
Pytorch implementation of our paper under review -- 1xN Pattern for Pruning Convolutional Neural Networks

1xN Pattern for Pruning Convolutional Neural Networks (paper) . This is Pytorch re-implementation of "1xN Pattern for Pruning Convolutional Neural Net

Mingbao Lin (林明宝) 29 Nov 29, 2022