This is the official implement of paper "ActionCLIP: A New Paradigm for Action Recognition"

Last update: Jan 09, 2023

Related tags

Deep Learning ActionCLIP

Overview

This is an official pytorch implementation of ActionCLIP: A New Paradigm for Video Action Recognition [arXiv]

Overview

Content

Prerequisites
Data Preparation
Uodates
Pretrained Models
- Kinetics-400
- Hmdb51 && UCF101
Testing
Training
Contributors
Citing_ActionClip
Acknowledgments

Prerequisites

The code is built with following libraries:

PyTorch >= 1.8
wandb
RandAugment
pprint
tqdm
dotmap
yaml
csv

For video data pre-processing, you may need ffmpeg.

More detail information about libraries see INSTALL.md.

Data Preparation

We need to first extract videos into frames for fast reading. Please refer to TSN repo for the detailed guide of data pre-processing. We have successfully trained on Kinetics, UCF101, HMDB51, Charades.

Updates

We now support single crop validation(including zero-shot) on Kinetics-400, UCF101 and HMDB51. The pretrained models see MODEL_ZOO.md for more information.
we now support the model-training on Kinetics-400, UCF101 and HMDB51 on 8, 16 and 32 frames. The model-training configs see configs/README.md for more information.
We now support the model-training on your own datasets. The detail information see configs/README.md.

Pretrained Models

Training video models is computationally expensive. Here we provide some of the pretrained models. We provide a large set of trained models in the ActionCLIP MODEL_ZOO.md.

Kinetics-400

We experiment ActionCLIP with different backbones(we choose Transf as our final visual prompt since it obtains the best results) and input frames configurations on k400. Here is a list of pre-trained models that we provide (see Table 6 of the paper).

model	n-frame	top1 Acc(single-crop)	top5 Acc(single-crop)	checkpoint
ViT-B/32	8	78.36%	94.25%	link pwd:8hg2
ViT-B/16	8	81.09%	95.49%	link
ViT-B/16	16	81.68%	95.87%	link
ViT-B/16	32	82.32%	96.20%	link pwd:v7nn

HMDB51 && UCF101

On HMDB51 and UCF101 datasets, the accuracy(k400 pretrained) is reported under the accurate setting.

HMDB51

model	n-frame	top1 Acc(single-crop)	checkpoint
ViT-B/16	32	76.2%	link

UCF101

model	n-frame	top1 Acc(single-crop)	checkpoint
ViT-B/16	32	97.1%	link

Testing

To test the downloaded pretrained models on Kinetics or HMDB51 or UCF101, you can run scripts/run_test.sh. For example:

# test
bash scripts/run_test.sh  ./configs/k400/k400_ft_tem.yaml

Zero-shot

We provide several examples to do zero-shot validation on kinetics-400, UCF101 and HMDB51.

To do zero-shot validation on Kinetics from CLIP pretrained models, you can run:

# zero-shot
bash scripts/run_test.sh  ./configs/k400/k400_ft_zero_shot.yaml

To do zero-shot validation on UCF101 and HMDB51 from Kinetics pretrained models, you need first prepare the k400 pretrained model and then you can run:

# zero-shot
bash scripts/run_test.sh  ./configs/hmdb51/hmdb_ft_zero_shot.yaml

Training

We provided several examples to train ActionCLIP with this repo:

To train on Kinetics from CLIP pretrained models, you can run:

# train 
bash scripts/run_train.sh  ./configs/k400/k400_ft_tem_test.yaml

To train on HMDB51 from Kinetics400 pretrained models, you can run:

# train 
bash scripts/run_train.sh  ./configs/hmdb51/hmdb_ft.yaml

To train on UCF101 from Kinetics400 pretrained models, you can run:

# train 
bash scripts/run_train.sh  ./configs/ucf101/ucf_ft.yaml

More training details, you can find in configs/README.md

Contributors

ActionCLIP is written and maintained by Mengmeng Wang and Jiazheng Xing.

Citing ActionCLIP

If you find ActionClip useful in your research, please use the following BibTex entry for citation.

@inproceedings{wang2022ActionCLIP,
  title={ActionCLIP: A New Paradigm for Video Action Recognition},
  author={Mengmeng Wang, Jiazheng Xing and Yong Liu},
  booktitle={Proceedings of the IEEE International Conference on Computer Vision},
  year={2021}
}

Acknowledgments

Our code is based on CLIP and STM.

This is the official implement of paper "ActionCLIP: A New Paradigm for Action Recognition"

Related tags

Overview

This is an official pytorch implementation of ActionCLIP: A New Paradigm for Video Action Recognition [arXiv]

Overview

Content

Prerequisites

Data Preparation

Updates

Pretrained Models

Kinetics-400

HMDB51 && UCF101

HMDB51

UCF101

Testing

Zero-shot

Training

Contributors

Citing ActionCLIP

Acknowledgments

Owner

3D Avatar Lip Syncronization from speech (JALI based face-rigging)

A GUI to automatically create a TOPAS-readable MLC simulation file

Code and Data for the paper: Molecular Contrastive Learning with Chemical Element Knowledge Graph [AAAI 2022]

Weighing Counts: Sequential Crowd Counting by Reinforcement Learning

Embracing Single Stride 3D Object Detector with Sparse Transformer

Plugin adapted from Ultralytics to bring YOLOv5 into Napari

Example repository for custom C++/CUDA operators for TorchScript

Deep-learning-roadmap - All You Need to Know About Deep Learning - A kick-starter

Anomaly Localization in Model Gradients Under Backdoor Attacks Against Federated Learning

GNN-based Recommendation Benchmark

A set of tools for Namebase and HNS

PaddlePaddle GAN library, including lots of interesting applications like First-Order motion transfer, wav2lip, picture repair, image editing, photo2cartoon, image style transfer, and so on.

Implementation of ICCV21 paper: PnP-DETR: Towards Efficient Visual Analysis with Transformers

😇A pyTorch implementation of the DeepMoji model: state-of-the-art deep learning model for analyzing sentiment, emotion, sarcasm etc

Generic Event Boundary Detection: A Benchmark for Event Segmentation

Back to the Feature: Learning Robust Camera Localization from Pixels to Pose (CVPR 2021)

When Does Pretraining Help? Assessing Self-Supervised Learning for Law and the CaseHOLD Dataset of 53,000+ Legal Holdings

This repository is based on Ultralytics/yolov5, with adjustments to enable rotate prediction boxes.

PyTorch implementation of Spiking Neural Networks trained on surrogate gradient & BPTT using snntorch.

Medical Insurance Cost Prediction using Machine earning