[AAAI 2021] MVFNet: Multi-View Fusion Network for Efficient Video Recognition

Last update: Nov 27, 2022

Related tags

Overview

MVFNet: Multi-View Fusion Network for Efficient Video Recognition (AAAI 2021)

Overview

We release the code of the MVFNet (Multi-View Fusion Network). The core code to implement the Multi-View Fusion Module is codes/models/modules/MVF.py.

[Mar 24, 2021] We has released the code of MVFNet.

[Dec 20, 2020] MVFNet has been accepted by AAAI 2021.

Prerequisites
Data Preparation
Model Zoo
Testing
Training

Prerequisites

All dependencies can be installed using pip:

python -m pip install -r requirements.txt

Our experiments run on Python 3.7 and PyTorch 1.5. Other versions should work but are not tested.

Download Pretrained Models

Download ImageNet pre-trained models

cd pretrained
sh download_imgnet.sh

Download K400 pre-trained models

Please refer to Model Zoo.

Data Preparation

Please refer to DATASETS.md for data preparation.

Model Zoo

Architecture	Dataset	T x interval	Top-1 Acc.	Pre-trained model	Train log	Test log
MVFNet-ResNet50	Kinetics-400	4x16	74.2%	Download link	Log link	Log link
MVFNet-ResNet50	Kinetics-400	8x8	76.0%	Download link	Miss	Log link
MVFNet-ResNet50	Kinetics-400	16x4	77.0%	Download link	Log link	Log link
MVFNet-ResNet101	Kinetics-400	4x16	76.0%	Download link	Log link	Log link
MVFNet-ResNet101	Kinetics-400	8x8	77.4%	Download link	Log link	Log link
MVFNet-ResNet101	Kinetics-400	16x4	78.4%	Download link	Log link	Log link

Testing

For 3 crops, 10 clips, the processing of testing

# Dataset: Kinetics-400
# Architecture: R50_8x8 [email protected]=76.0%
bash scripts/dist_test_recognizer.sh configs/MVFNet/K400/mvf_kinetics400_2d_rgb_r50_dense.py ckpt_path 8 --fcn_testing

Training

This implementation supports multi-gpu, DistributedDataParallel training, which is faster and simpler.

For example, to train MVFNet-ResNet50 on Kinetics400 with 8 gpus, you can run:

bash scripts/dist_train_recognizer.sh configs/MVFNet/K400/mvf_kinetics400_2d_rgb_r50_dense.py 8

Acknowledgements

We especially thank the contributors of the mmaction codebase for providing helpful code.

License

This repository is released under the Apache-2.0. license as found in the LICENSE file.

Citation

If you think our work is useful, please feel free to cite our paper 😆 :

@inproceedings{wu2020MVFNet,
  author    = {Wu, Wenhao and He, Dongliang and Lin, Tianwei and Li, Fu and Gan, Chuang and Ding, Errui},
  title     = {MVFNet: Multi-View Fusion Network for Efficient Video Recognition},
  booktitle = {AAAI},
  year      = {2021}
}

Contact

For any question, please file an issue or contact

Wenhao Wu: [email protected]

This is the official implementation of 3D-CVF: Generating Joint Camera and LiDAR Features Using Cross-View Spatial Feature Fusion for 3D Object Detection, built on SECOND.

3D-CVF This is the official implementation of 3D-CVF: Generating Joint Camera and LiDAR Features Using Cross-View Spatial Feature Fusion for 3D Object

97 Dec 20, 2022

AdaFocus (ICCV 2021) Adaptive Focus for Efficient Video Recognition

AdaFocus (ICCV 2021) This repo contains the official code and pre-trained models for AdaFocus. Adaptive Focus for Efficient Video Recognition Referenc

115 Dec 21, 2022

the code of the paper: Recurrent Multi-view Alignment Network for Unsupervised Surface Registration (CVPR 2021)

RMA-Net This repo is the implementation of the paper: Recurrent Multi-view Alignment Network for Unsupervised Surface Registration (CVPR 2021). Paper

205 Nov 9, 2022

[CVPR 2021] Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion

364 Jan 3, 2023

Official code for "EagerMOT: 3D Multi-Object Tracking via Sensor Fusion" [ICRA 2021]

EagerMOT: 3D Multi-Object Tracking via Sensor Fusion Read our ICRA 2021 paper here. Check out the 3 minute video for the quick intro or the full prese

276 Dec 30, 2022

Deep RGB-D Saliency Detection with Depth-Sensitive Attention and Automatic Multi-Modal Fusion (CVPR'2021, Oral)

DSA^2 F: Deep RGB-D Saliency Detection with Depth-Sensitive Attention and Automatic Multi-Modal Fusion (CVPR'2021, Oral) This repo is the official imp

46 Dec 21, 2022

《LightXML: Transformer with dynamic negative sampling for High-Performance Extreme Multi-label Text Classiﬁcation》(AAAI 2021) GitHub:

LightXML: Transformer with dynamic negative sampling for High-Performance Extreme Multi-label Text Classiﬁcation

76 Dec 5, 2022

We present a framework for training multi-modal deep learning models on unlabelled video data by forcing the network to learn invariances to transformations applied to both the audio and video streams.

Multi-Modal Self-Supervision using GDT and StiCa This is an official pytorch implementation of papers: Multi-modal Self-Supervision from Generalized D

42 Dec 9, 2022

Implementation of "Efficient Regional Memory Network for Video Object Segmentation" (Xie et al., CVPR 2021).

RMNet This repository contains the source code for the paper Efficient Regional Memory Network for Video Object Segmentation. Cite this work @inprocee

76 Dec 14, 2022

Comments

Is this right for the test configuration?
Hi I noticed your great job for action recognition from AAAI 2021. And I am trying to get the test results as yours on Kinetics400. After I have processed all the test videos to get the frames, I found that there is no annotation processing for kinetics400 test set up, neither in your configuration file. Could you share the test annotation for Kinetics400 and explain why using validation for test? https://github.com/whwu95/MVFNet/blob/ed336228ad88821ffe407a4355017acb416e4670/configs/MVFNet/K400/mvf_kinetics400_2d_rgb_r50_dense.py#L58 https://github.com/whwu95/MVFNet/blob/ed336228ad88821ffe407a4355017acb416e4670/configs/MVFNet/K400/mvf_kinetics400_2d_rgb_r50_dense.py#L145

ann_file_test = 'datalist/kinetics400/val_ffmpeg_fps30.txt' ... test=dict( type=dataset_type, ann_file=ann_file_test, data_root=data_root_val, pipeline=test_pipeline, test_mode=True, modality='RGB', filename_tmpl='img_{:05}.jpg' ))

Thanks a lot!
opened by DanLuoNEU 2
About online recognition

Thank you for your great work. My question is that the mvf module needs to use convolution among multi-view dimensions,especially contains T dimension. If we want to apply the model into online recognition, it is difficult to store too many history frames. So how to apply it to the online recognition?Thank you.

opened by ohheysherry66 0

Releases(v0.2)

v0.2(Mar 26, 2021)

We release the training logs, inference logs and the models pre-trained on the Kinetics-400 dataset. You can reproduce the results on the AAAI-2021 paper using these models and configs.
Source code(tar.gz)
Source code(zip)
R101_16x4.pth(331.43 MB)
R101_16x4_P40_Train.log(2.99 MB)
R101_16x4_Test.txt(1.51 MB)
R101_4x16.pth(331.42 MB)
R101_4x16_1080Ti_Train.log(2.93 MB)
R101_4x16_Test.txt(1.51 MB)
R101_8x8.pth(331.43 MB)
R101_8x8_P40_Train.log(2.98 MB)
R101_8x8_Test.txt(1.51 MB)
R50_16x4.pth(186.02 MB)
R50_16x4_P40_Train.log(2.97 MB)
R50_16x4_Test.txt(1.51 MB)
R50_4x16.pth(186.02 MB)
R50_4x16_1080Ti_Train.log(2.16 MB)
R50_4x16_Test.txt(1.53 MB)
R50_8x8.pth(186.02 MB)
R50_8x8_Test.txt(1.51 MB)
v0.1(Mar 24, 2021)

We provide the ImageNet pre-trained models for offline experiment environment.
Source code(tar.gz)
Source code(zip)
resnet101.pth(170.43 MB)
resnet50.pth(97.74 MB)

Owner

Wenhao Wu

GitHub Repository

Augmentation for Single-Image-Super-Resolution

SRAugmentation Augmentation for Single-Image-Super-Resolution Implimentation CutBlur Cutout CutMix Cutup CutMixup Blend RGBPermutation Identity OneOf

6 Jun 27, 2022

EigenGAN Tensorflow, EigenGAN: Layer-Wise Eigen-Learning for GANs

Gender Bangs Body Side Pose (Yaw) Lighting Smile Face Shape Lipstick Color Painting Style Pose (Yaw) Pose (Pitch) Zoom & Rotate Flush & Eye Color Mout

321 Dec 01, 2022

Website for D2C paper

D2C This is the repository that contains source code for the D2C Website. If you find D2C useful for your work please cite: @article{sinha2021d2c au

1 Oct 21, 2021

Text mining project; Using distilBERT to predict authors in the classification task authorship attribution.

DistilBERT-Text-mining-authorship-attribution Dataset used: https://www.kaggle.com/azimulh/tweets-data-for-authorship-attribution-modelling/version/2

1 Jan 13, 2022

Official pytorch implementation of the AAAI 2021 paper Semantic Grouping Network for Video Captioning

Semantic Grouping Network for Video Captioning Hobin Ryu, Sunghun Kang, Haeyong Kang, and Chang D. Yoo. AAAI 2021. [arxiv] Environment Ubuntu 16.04 CU

43 Nov 25, 2022

Privacy-Preserving Portrait Matting [ACM MM-21]

Privacy-Preserving Portrait Matting [ACM MM-21] This is the official repository of the paper Privacy-Preserving Portrait Matting. Jizhizi Li∗, Sihan M

212 Dec 27, 2022

The Video-based Accident Detection System built in Python

Accident-detection-system About the Project This Repository contains the Video-based Accident Detection System built in Python. Contributors Yukta Gop

50 Dec 07, 2022

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

107 Dec 02, 2022

[NeurIPS 2021] A weak-shot object detection approach by transferring semantic similarity and mask prior.

49 Jul 27, 2022

Space-invaders - Simple Game created using Python & PyGame, as my Beginner Python Project

Space Invaders This is a simple SPACE INVADER game create using PYGAME whihc hav

2 Jan 08, 2022

CLNTM - Contrastive Learning for Neural Topic Model

Contrastive Learning for Neural Topic Model This repository contains the impleme

25 Nov 24, 2022

A Tensorflow implementation of BicycleGAN.

BicycleGAN implementation in Tensorflow As part of the implementation series of Joseph Lim's group at USC, our motivation is to accelerate (or sometim

97 Dec 02, 2022

A collection of easy-to-use, ready-to-use, interesting deep neural network models

Interesting and reproducible research works should be conserved. This repository wraps a collection of deep neural network models into a simple and un

16 Jun 16, 2022

Pretrained models for Jax/Haiku; MobileNet, ResNet, VGG, Xception.

Pre-trained image classification models for Jax/Haiku Jax/Haiku Applications are deep learning models that are made available alongside pre-trained we

14 Dec 20, 2022

EgGateWayGetShell py脚本

EgGateWayGetShell_py 免责声明由于传播、利用此文所提供的信息而造成的任何直接或者间接的后果及损失，均由使用者本人负责，作者不为此承担任何责任。使用 python3 eg.py urls.txt 目标 title:锐捷网络-EWEB网管系统 port:4430 漏洞成因 ?p

61 Nov 09, 2022

codes for "Scheduled Sampling Based on Decoding Steps for Neural Machine Translation" (long paper of EMNLP-2022)

Scheduled Sampling Based on Decoding Steps for Neural Machine Translation (EMNLP-2021 main conference) Contents Overview Background Quick to Use Furth

13 Jul 25, 2022

Repository for the paper : Meta-FDMixup: Cross-Domain Few-Shot Learning Guided byLabeled Target Data

1 Meta-FDMIxup Repository for the paper : Meta-FDMixup: Cross-Domain Few-Shot Learning Guided byLabeled Target Data. (ACM MM 2021) paper News! the rep

44 Nov 18, 2022

Unofficial PyTorch implementation of "RTM3D: Real-time Monocular 3D Detection from Object Keypoints for Autonomous Driving" (ECCV 2020)

RTM3D-PyTorch The PyTorch Implementation of the paper: RTM3D: Real-time Monocular 3D Detection from Object Keypoints for Autonomous Driving (ECCV 2020

271 Nov 29, 2022

Kaggle | 9th place single model solution for TGS Salt Identification Challenge

UNet for segmenting salt deposits from seismic images with PyTorch. General We, tugstugi and xuyuan, have participated in the Kaggle competition TGS S

276 Dec 20, 2022

Code for 2021 NeurIPS --- Towards Multi-Grained Explainability for Graph Neural Networks

ReFine: Multi-Grained Explainability for GNNs We are trying hard to update the code, but it may take a while to complete due to our tight schedule rec

47 Dec 16, 2022

[AAAI 2021] MVFNet: Multi-View Fusion Network for Efficient Video Recognition

Related tags

Overview

MVFNet: Multi-View Fusion Network for Efficient Video Recognition (AAAI 2021)

Overview

Prerequisites

Download Pretrained Models

Data Preparation

Model Zoo

Testing

Training

Acknowledgements

License

Citation

Contact

You might also like...

This is the official implementation of 3D-CVF: Generating Joint Camera and LiDAR Features Using Cross-View Spatial Feature Fusion for 3D Object Detection, built on SECOND.

AdaFocus (ICCV 2021) Adaptive Focus for Efficient Video Recognition

the code of the paper: Recurrent Multi-view Alignment Network for Unsupervised Surface Registration (CVPR 2021)

[CVPR 2021] Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion

Official code for "EagerMOT: 3D Multi-Object Tracking via Sensor Fusion" [ICRA 2021]

Deep RGB-D Saliency Detection with Depth-Sensitive Attention and Automatic Multi-Modal Fusion (CVPR'2021, Oral)

《LightXML: Transformer with dynamic negative sampling for High-Performance Extreme Multi-label Text Classiﬁcation》(AAAI 2021) GitHub:

We present a framework for training multi-modal deep learning models on unlabelled video data by forcing the network to learn invariances to transformations applied to both the audio and video streams.

Implementation of "Efficient Regional Memory Network for Video Object Segmentation" (Xie et al., CVPR 2021).

Comments

Is this right for the test configuration?

About online recognition

Releases(v0.2)

v0.2(Mar 26, 2021)

v0.1(Mar 24, 2021)

Owner

Wenhao Wu

Augmentation for Single-Image-Super-Resolution

EigenGAN Tensorflow, EigenGAN: Layer-Wise Eigen-Learning for GANs

Website for D2C paper

Text mining project; Using distilBERT to predict authors in the classification task authorship attribution.

Official pytorch implementation of the AAAI 2021 paper Semantic Grouping Network for Video Captioning

Privacy-Preserving Portrait Matting [ACM MM-21]

The Video-based Accident Detection System built in Python

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

[NeurIPS 2021] A weak-shot object detection approach by transferring semantic similarity and mask prior.

Space-invaders - Simple Game created using Python & PyGame, as my Beginner Python Project

CLNTM - Contrastive Learning for Neural Topic Model

A Tensorflow implementation of BicycleGAN.

A collection of easy-to-use, ready-to-use, interesting deep neural network models

Pretrained models for Jax/Haiku; MobileNet, ResNet, VGG, Xception.

EgGateWayGetShell py脚本

codes for "Scheduled Sampling Based on Decoding Steps for Neural Machine Translation" (long paper of EMNLP-2022)

Repository for the paper : Meta-FDMixup: Cross-Domain Few-Shot Learning Guided byLabeled Target Data

Unofficial PyTorch implementation of "RTM3D: Real-time Monocular 3D Detection from Object Keypoints for Autonomous Driving" (ECCV 2020)

Kaggle | 9th place single model solution for TGS Salt Identification Challenge

Code for 2021 NeurIPS --- Towards Multi-Grained Explainability for Graph Neural Networks