MVFNet: Multi-View Fusion Network for Efficient Video Recognition (AAAI 2021)

Last update: Jan 29, 2022

Related tags

Overview

MVFNet: Multi-View Fusion Network for Efficient Video Recognition (AAAI 2021)

Overview

We release the code of the MVFNet (Multi-View Fusion Network). The core code to implement the Multi-View Fusion Module is codes/models/modules/MVF.py.

[Mar 24, 2021] We has released the code of MVFNet.

[Dec 20, 2020] MVFNet has been accepted by AAAI 2021.

Prerequisites
Data Preparation
Model Zoo
Testing
Training

Prerequisites

All dependencies can be installed using pip:

python -m pip install -r requirements.txt

Our experiments run on Python 3.7 and PyTorch 1.5. Other versions should work but are not tested.

Download Pretrained Models

Download ImageNet pre-trained models

cd pretrained
sh download_imgnet.sh

Download K400 pre-trained models

Please refer to Model Zoo.

Data Preparation

Please refer to DATASETS.md for data preparation.

Model Zoo

Architecture	Dataset	T x interval	Top-1 Acc.	Pre-trained model	Train log	Test log
MVFNet-ResNet50	Kinetics-400	4x16	74.2%	Download link	Log link	Log link
MVFNet-ResNet50	Kinetics-400	8x8	76.0%	Download link	Miss	Log link
MVFNet-ResNet50	Kinetics-400	16x4	77.0%	Download link	Log link	Log link
MVFNet-ResNet101	Kinetics-400	4x16	76.0%	Download link	Log link	Log link
MVFNet-ResNet101	Kinetics-400	8x8	77.4%	Download link	Log link	Log link
MVFNet-ResNet101	Kinetics-400	16x4	78.4%	Download link	Log link	Log link

Testing

For 3 crops, 10 clips, the processing of testing

# Dataset: Kinetics-400
# Architecture: R50_8x8 [email protected]=76.0%
bash scripts/dist_test_recognizer.sh configs/MVFNet/K400/mvf_kinetics400_2d_rgb_r50_dense.py ckpt_path 8 --fcn_testing

Training

This implementation supports multi-gpu, DistributedDataParallel training, which is faster and simpler.

For example, to train MVFNet-ResNet50 on Kinetics400 with 8 gpus, you can run:

bash scripts/dist_train_recognizer.sh configs/MVFNet/K400/mvf_kinetics400_2d_rgb_r50_dense.py 8

We also provide the script to train MVFNet on Kinetics400 with multiple machines (e.g., 2 machines and 16 GPUs).

# For first machine, --master_addr is the ip of your first machine
bash scripts/dist_train_multinode_1.sh configs/MVFNet/K400/mvf_kinetics400_2d_rgb_r50_dense.py 8

# For second machine, --master_addr is still the ip of your first machine
bash scripts/dist_train_multinode_2.sh configs/MVFNet/K400/mvf_kinetics400_2d_rgb_r50_dense.py 8

Acknowledgements

We especially thank the contributors of the mmaction codebase for providing helpful code.

License

This repository is released under the Apache-2.0. license as found in the LICENSE file.

Citation

If you think our work is useful, please feel free to cite our paper 😆 :

@inproceedings{wu2020MVFNet,
  author    = {Wu, Wenhao and He, Dongliang and Lin, Tianwei and Li, Fu and Gan, Chuang and Ding, Errui},
  title     = {MVFNet: Multi-View Fusion Network for Efficient Video Recognition},
  booktitle = {AAAI},
  year      = {2021}
}

Contact

For any question, please file an issue or contact

Wenhao Wu: [email protected]

MVFNet: Multi-View Fusion Network for Efficient Video Recognition (AAAI 2021)

Related tags

Overview

MVFNet: Multi-View Fusion Network for Efficient Video Recognition (AAAI 2021)

Overview

Prerequisites

Download Pretrained Models

Data Preparation

Model Zoo

Testing

Training

Acknowledgements

License

Citation

Contact

Owner

Code for Discriminative Sounding Objects Localization (NeurIPS 2020)

(CVPR 2021) Lifting 2D StyleGAN for 3D-Aware Face Generation

Visual dialog agents with pre-trained vision-and-language encoders.

Pneumonia Detection using machine learning - with PyTorch

Codes for building and training the neural network model described in Domain-informed neural networks for interaction localization within astroparticle experiments.

Deep Learning Datasets Maker is a QGIS plugin to make datasets creation easier for raster and vector data.

Tools for robust generative diffeomorphic slice to volume reconstruction

An end-to-end machine learning library to directly optimize AUC loss

chainladder - Property and Casualty Loss Reserving in Python

DPT: Deformable Patch-based Transformer for Visual Recognition (ACM MM2021)

Lightweight Python library for adding real-time object tracking to any detector.

Official repo for AutoInt: Automatic Integration for Fast Neural Volume Rendering in CVPR 2021

FairMOT for Multi-Class MOT using YOLOX as Detector

A PyTorch library for Vision Transformers

A time series processing library

Code for ICLR 2021 Paper, "Anytime Sampling for Autoregressive Models via Ordered Autoencoding"

Code for our CVPR 2021 paper "MetaCam+DSCE"

[CVPR22] Official codebase of Semantic Segmentation by Early Region Proxy.

Based on Yolo's low-power, ultra-lightweight universal target detection algorithm, the parameter is only 250k, and the speed of the smart phone mobile terminal can reach ~300fps+

Joint detection and tracking model named DEFT, or ``Detection Embeddings for Tracking.