This is official implementaion of paper "Token Shift Transformer for Video Classification".

Last update: Dec 30, 2022

Related tags

Overview

TokShift-Transformer

This is official implementaion of paper "Token Shift Transformer for Video Classification". We achieve SOTA performance 80.40% on Kinetics-400 val. Paper link

Updates
Model Zoo and Baselines
Installation
Quick Start
Contributors
Citing
Acknowledgement

Updates

July 11, 2021

Release this V1 version (the version used in paper) to public.
we are preparing a V2 version which include the following modifications, will release within 1 week:

Directly decode video mp4 file during training/evaluation
Change to adopt standarlize timm code-base.
Performances are further improved than reported in paper version (average +0.5).

April 22, 2021

Add Train/Test guidline and Data perpariation

April 16, 2021

Publish TokShift Transformer for video content understanding

Model Zoo and Baselines

architecture	backbone	pretrain	Res & Frames	GFLOPs x views	top1	config
ViT (Video)	Base16	ImgNet21k	224 & 8	134.7 x 30	76.02 `link`	k400_vit_8x32_224.yml
TokShift	Base-16	ImgNet21k	224 & 8	134.7 x 30	77.28 `link`	k400_tokshift_div4_8x32_base_224.yml
TokShift (MR)	Base16	ImgNet21k	256 & 8	175.8 x 30	77.68 `link`	k400_tokshift_div4_8x32_base_256.yml
TokShift (HR)	Base16	ImgNet21k	384 & 8	394.7 x 30	78.14 `link`	k400_tokshift_div4_8x32_base_384.yml
TokShift	Base16	ImgNet21k	224 & 16	268.5 x 30	78.18 `link`	k400_tokshift_div4_16x32_base_224.yml
TokShift-Large (HR)	Large16	ImgNet21k	384 & 8	1397.6 x 30	79.83 `link`	k400_tokshift_div4_8x32_large_384.yml
TokShift-Large (HR)	Large16	ImgNet21k	384 & 12	2096.4 x 30	80.40 `link`	k400_tokshift_div4_12x32_large_384.yml

Below is trainig log, we use 3 views evaluation (instead of 30 views) during validation for time-saving.

Installation

PyTorch >= 1.7, torchvision
tensorboardx

Quick Start

Train

Download ImageNet-22k pretrained weights from Base16 and Large16.
Prepare Kinetics-400 dataset organized in the following structure, trainValTest

k400
|_ frames331_train
|  |_ [category name 0]
|  |  |_ [video name 0]
|  |  |  |_ img_00001.jpg
|  |  |  |_ img_00002.jpg
|  |  |  |_ ...
|  |  |
|  |  |_ [video name 1]
|  |  |   |_ img_00001.jpg
|  |  |   |_ img_00002.jpg
|  |  |   |_ ...
|  |  |_ ...
|  |
|  |_ [category name 1]
|  |  |_ [video name 0]
|  |  |  |_ img_00001.jpg
|  |  |  |_ img_00002.jpg
|  |  |  |_ ...
|  |  |
|  |  |_ [video name 1]
|  |  |   |_ img_00001.jpg
|  |  |   |_ img_00002.jpg
|  |  |   |_ ...
|  |  |_ ...
|  |_ ...
|
|_ frames331_val
|  |_ [category name 0]
|  |  |_ [video name 0]
|  |  |  |_ img_00001.jpg
|  |  |  |_ img_00002.jpg
|  |  |  |_ ...
|  |  |
|  |  |_ [video name 1]
|  |  |   |_ img_00001.jpg
|  |  |   |_ img_00002.jpg
|  |  |   |_ ...
|  |  |_ ...
|  |
|  |_ [category name 1]
|  |  |_ [video name 0]
|  |  |  |_ img_00001.jpg
|  |  |  |_ img_00002.jpg
|  |  |  |_ ...
|  |  |
|  |  |_ [video name 1]
|  |  |   |_ img_00001.jpg
|  |  |   |_ img_00002.jpg
|  |  |   |_ ...
|  |  |_ ...
|  |_ ...
|
|_ trainValTest
   |_ train.txt
   |_ val.txt

Using train-script (train.sh) to train k400

#!/usr/bin/env python
import os

cmd = "python -u main_ddp_shift_v3.py \
		--multiprocessing-distributed --world-size 1 --rank 0 \
		--dist-ur tcp://127.0.0.1:23677 \
		--tune_from pretrain/ViT-L_16_Img21.npz \
		--cfg config/custom/kinetics400/k400_tokshift_div4_12x32_large_384.yml"
os.system(cmd)

Test

Using test.sh (test.sh) to evaluate k400

#!/usr/bin/env python
import os
cmd = "python -u main_ddp_shift_v3.py \
        --multiprocessing-distributed --world-size 1 --rank 0 \
        --dist-ur tcp://127.0.0.1:23677 \
        --evaluate \
        --resume model_zoo/ViT-B_16_k400_dense_cls400_segs8x32_e18_lr0.1_B21_VAL224/best_vit_B8x32x224_k400.pth \
        --cfg config/custom/kinetics400/k400_vit_8x32_224.yml"
os.system(cmd)

Contributors

VideoNet is written and maintained by Dr. Hao Zhang and Dr. Yanbin Hao.

Citing

If you find TokShift-xfmr is useful in your research, please use the following BibTeX entry for citation.

@article{tokshift2021,
  title={Token Shift Transformer for Video Classification},
  author={Hao Zhang, Yanbin Hao, Chong-Wah Ngo},
  journal={ACM Multimedia 2021},
}

Acknowledgement

Thanks for the following Github projects:

This is official implementaion of paper "Token Shift Transformer for Video Classification".

Related tags

Overview

TokShift-Transformer

Updates

July 11, 2021

April 22, 2021

April 16, 2021

Model Zoo and Baselines

Installation

Quick Start

Train

Test

Contributors

Citing

Acknowledgement

Owner

VideoNet

Code for the SIGIR 2022 paper "Hybrid Transformer with Multi-level Fusion for Multimodal Knowledge Graph Completion"

Taking A Closer Look at Domain Shift: Category-level Adversaries for Semantics Consistent Domain Adaptation

A curated list of the top 10 computer vision papers in 2021 with video demos, articles, code and paper reference.

Implementation of ResMLP, an all MLP solution to image classification, in Pytorch

Emotion Recognition from Facial Images

Python-kafka-reset-consumergroup-offset-example - Python Kafka reset consumergroup offset example

Deep Reinforced Attention Regression for Partial Sketch Based Image Retrieval.

AnimationKit: AI Upscaling & Interpolation using Real-ESRGAN+RIFE

A pytorch implementation of MBNET: MOS PREDICTION FOR SYNTHESIZED SPEECH WITH MEAN-BIAS NETWORK

Context Axial Reverse Attention Network for Small Medical Objects Segmentation

Hyperbolic Image Segmentation, CVPR 2022

Hamiltonian Dynamics with Non-Newtonian Momentum for Rapid Sampling

The source code and dataset for the RecGURU paper (WSDM 2022)

automatic color-grading

GANsformer: Generative Adversarial Transformers Drew A

Good Semi-Supervised Learning That Requires a Bad GAN

Density-aware Single Image De-raining using a Multi-stream Dense Network (CVPR 2018)

"Domain Adaptive Semantic Segmentation without Source Data" (ACM MM 2021)

A benchmark dataset for emulating atmospheric radiative transfer in weather and climate models with machine learning (NeurIPS 2021 Datasets and Benchmarks Track)

Official PyTorch implementation of Joint Object Detection and Multi-Object Tracking with Graph Neural Networks