AdaFocus (ICCV 2021) Adaptive Focus for Efficient Video Recognition

Related tags

Deep LearningAdaFocus
Overview

AdaFocus (ICCV 2021)

This repo contains the official code and pre-trained models for AdaFocus.

Reference

If you find our code or paper useful for your research, please cite:

@InProceedings{Wang_2021_ICCV,
author = {Wang, Yulin and Chen, Zhaoxi and Jiang, Haojun and Song, Shiji and Han, Yizeng and Huang, Gao},
title = {Adaptive Focus for Efficient Video Recognition},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
year = {2021}
}

Introduction

In this paper, we explore the spatial redundancy in video recognition with the aim to improve the computational efficiency. It is observed that the most informative region in each frame of a video is usually a small image patch, which shifts smoothly across frames. Therefore, we model the patch localization problem as a sequential decision task, and propose a reinforcement learning based approach for efficient spatially adaptive video recognition (AdaFocus). In specific, a light-weighted ConvNet is first adopted to quickly process the full video sequence, whose features are used by a recurrent policy network to localize the most task-relevant regions. Then the selected patches are inferred by a high-capacity network for the final prediction. During offline inference, once the informative patch sequence has been generated, the bulk of computation can be done in parallel, and is efficient on modern GPU devices. In addition, we demonstrate that the proposed method can be easily extended by further considering the temporal redundancy, e.g., dynamically skipping less valuable frames. Extensive experiments on five benchmark datasets, i.e., ActivityNet, FCVID, Mini-Kinetics, Something-Something V1&V2, demonstrate that our method is significantly more efficient than the competitive baselines.

Result

  • ActivityNet

  • Something-Something V1&V2

  • Visualization

Requirements

  • python 3.8
  • pytorch 1.7.0
  • torchvision 0.8.0
  • hydra 1.1.0

Datasets

  1. Please get train/test splits file for each dataset from Google Drive and put them in PATH_TO_DATASET.
  2. Download videos from following links, or contact the corresponding authors for the access. Save them to PATH_TO_DATASET/videos
  1. Extract frames using ops/video_jpg.py, the frames will be saved to PATH_TO_DATASET/frames. Minor modifications on file path are needed when extracting frames from different dataset.

Pre-trained Models

Please download pretrained weights and checkpoints from Google Drive.

  • globalcnn.pth.tar: pretrained weights for global CNN (MobileNet-v2).
  • localcnn.pth.tar: pretrained weights for local CNN (ResNet-50).
  • 128checkpoint.pth.tar: checkpoint of stage 1 for patch size 128x128.
  • 160checkpoint.pth.tar: checkpoint of stage 1 for patch size 160x128.
  • 192checkpoint.pth.tar: checkpoint of stage 1 for patch size 192x128.

Training

  • Here we take training model with patch size 128x128 on ActivityNet dataset for example.

  • All logs and checkpoints will be saved in the directory: ./outputs/YYYY-MM-DD/HH-MM-SS

  • Note that we store a set of default paramenter in conf/default.yaml which can override through command line. You can also use your own config files.

  • Before training, please initialize Global CNN and Local CNN by fine-tuning the ImageNet pre-trained models in Pytorch using the following command:

for Global CNN:

CUDA_VISIBLE_DEVICES=0,1 python main_dist.py dataset=actnet data_dir=PATH_TO_DATASET train_stage=0 batch_size=64 workers=8 dropout=0.8 lr_type=cos backbone_lr=0.01 epochs=15 dist_url=tcp://127.0.0.1:8857 random_patch=true patch_size=128 glance_size=224 eval_freq=5 consensus=gru hidden_dim=1024 pretrain_glancer=true

for Local CNN:

CUDA_VISIBLE_DEVICES=0,1 python main_dist.py dataset=actnet data_dir=PATH_TO_DATASET train_stage=0 batch_size=64 workers=8 dropout=0.8 lr_type=cos backbone_lr=0.01 epochs=15 dist_url=tcp://127.0.0.1:8857 random_patch=true patch_size=128 glance_size=224 eval_freq=5 consensus=gru hidden_dim=1024 pretrain_glancer=false
  • Training stage 1, pretrained weights for Global CNN and Local CNN are required:
CUDA_VISIBLE_DEVICES=0,1 python main_dist.py dataset=actnet data_dir=PATH_TO_DATASET train_stage=1 batch_size=64 workers=8 dropout=0.8 lr_type=cos backbone_lr=0.0005 fc_lr=0.05 epochs=50 dist_url=tcp://127.0.0.1:8857 random_patch=true patch_size=128 glance_size=224 eval_freq=5 consensus=gru hidden_dim=1024 pretrained_glancer=PATH_TO_CHECKPOINTS pretrained_focuser=PATH_TO_CHECKPOINTS
  • Training stage 2, a stage-1 checkpoint is required:
CUDA_VISIBLE_DEVICES=0 python main_dist.py dataset=actnet data_dir=PATH_TO_DATASET train_stage=2 batch_size=64 workers=8 dropout=0.8 lr_type=cos backbone_lr=0.0005 fc_lr=0.05 epochs=50 random_patch=false patch_size=128 glance_size=224 action_dim=49 eval_freq=5 consensus=gru hidden_dim=1024 resume=PATH_TO_CHECKPOINTS multiprocessing_distributed=false distributed=false
  • Training stage 3, a stage-2 checkpoint is required:
CUDA_VISIBLE_DEVICES=0,1 python main_dist.py dataset=actnet data_dir=PATH_TO_DATASET train_stage=3 batch_size=64 workers=8 dropout=0.8 lr_type=cos backbone_lr=0.0005 fc_lr=0.005 epochs=10 random_patch=false patch_size=128 glance_size=224 action_dim=49 eval_freq=5 consensus=gru hidden_dim=1024 resume=PATH_TO_CHECKPOINTS multiprocessing_distributed=false distributed=false

Contact

If you have any question, feel free to contact the authors or raise an issue. Yulin Wang: [email protected].

Acknowledgement

We use implementation of MobileNet-v2 and ResNet from Pytorch source code. We also borrow some codes for dataset preparation from AR-Net and PPO from here.

Owner
Rainforest Wang
Rainforest Wang
Dataset and codebase for NeurIPS 2021 paper: Exploring Forensic Dental Identification with Deep Learning

Repository under construction. Example dataset, checkpoints, and training/testing scripts will be avaible soon! 💡 Collated best practices from most p

4 Jun 26, 2022
Network Compression via Central Filter

Network Compression via Central Filter Environments The code has been tested in the following environments: Python 3.8 PyTorch 1.8.1 cuda 10.2 torchsu

2 May 12, 2022
PyTorch implementation of the wavelet analysis from Torrence & Compo

Continuous Wavelet Transforms in PyTorch This is a PyTorch implementation for the wavelet analysis outlined in Torrence and Compo (BAMS, 1998). The co

Tom Runia 262 Dec 21, 2022
A toy compiler that can convert Python scripts to pickle bytecode 🥒

Pickora 🐰 A small compiler that can convert Python scripts to pickle bytecode. Requirements Python 3.8+ No third-party modules are required. Usage us

ꌗᖘ꒒ꀤ꓄꒒ꀤꈤꍟ 68 Jan 04, 2023
A tool to visualise the results of AlphaFold2 and inspect the quality of structural predictions

AlphaFold Analyser This program produces high quality visualisations of predicted structures produced by AlphaFold. These visualisations allow the use

Oliver Powell 3 Nov 13, 2022
A complete speech segmentation system using Kaldi and x-vectors for voice activity detection (VAD) and speaker diarisation.

bbc-speech-segmenter: Voice Activity Detection & Speaker Diarization A complete speech segmentation system using Kaldi and x-vectors for voice activit

BBC 16 Oct 27, 2022
A fast python implementation of Ray Tracing in One Weekend using python and Taichi

ray-tracing-one-weekend-taichi A fast python implementation of Ray Tracing in One Weekend using python and Taichi. Taichi is a simple "Domain specific

157 Dec 26, 2022
An expansion for RDKit to read all types of files in one line

RDMolReader An expansion for RDKit to read all types of files in one line How to use? Add this single .py file to your project and import MolFromFile(

Ali Khodabandehlou 1 Dec 18, 2021
PyTorch implementation of a Real-ESRGAN model trained on custom dataset

Real-ESRGAN PyTorch implementation of a Real-ESRGAN model trained on custom dataset. This model shows better results on faces compared to the original

Sber AI 160 Jan 04, 2023
Fast and Easy Infinite Neural Networks in Python

Neural Tangents ICLR 2020 Video | Paper | Quickstart | Install guide | Reference docs | Release notes Overview Neural Tangents is a high-level neural

Google 1.9k Jan 09, 2023
Multiview 3D object detection on MultiviewC dataset through moft3d.

Multiview Orthographic Feature Transformation for 3D Object Detection Multiview 3D object detection on MultiviewC dataset through moft3d. Introduction

Jiahao Ma 20 Dec 21, 2022
A new benchmark for Icon Question Answering (IconQA) and a large-scale icon dataset Icon645.

IconQA About IconQA is a new diverse abstract visual question answering dataset that highlights the importance of abstract diagram understanding and c

Pan Lu 24 Dec 30, 2022
Deepface is a lightweight face recognition and facial attribute analysis (age, gender, emotion and race) framework for python

deepface Deepface is a lightweight face recognition and facial attribute analysis (age, gender, emotion and race) framework for python. It is a hybrid

Kushal Shingote 2 Feb 10, 2022
使用yolov5训练自己数据集(详细过程)并通过flask部署

使用yolov5训练自己的数据集(详细过程)并通过flask部署 依赖库 torch torchvision numpy opencv-python lxml tqdm flask pillow tensorboard matplotlib pycocotools Windows,请使用 pycoc

HB.com 19 Dec 28, 2022
Codes and models for the paper "Learning Unknown from Correlations: Graph Neural Network for Inter-novel-protein Interaction Prediction".

GNN_PPI Codes and models for the paper "Learning Unknown from Correlations: Graph Neural Network for Inter-novel-protein Interaction Prediction". Lear

Ursa Zrimsek 2 Dec 14, 2022
PyTorch implementation of the ACL, 2021 paper Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks.

Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks This repo contains the PyTorch implementation of the ACL, 2021 pa

Rabeeh Karimi Mahabadi 98 Dec 28, 2022
Fully Automatic Page Turning on Real Scores

Fully Automatic Page Turning on Real Scores This repository contains the corresponding code for our extended abstract Henkel F., Schwaiger S. and Widm

Florian Henkel 7 Jan 02, 2022
DPT: Deformable Patch-based Transformer for Visual Recognition (ACM MM2021)

DPT This repo is the official implementation of DPT: Deformable Patch-based Transformer for Visual Recognition (ACM MM2021). We provide code and model

CASIA-IVA-Lab 111 Dec 21, 2022
Code for NeurIPS 2021 paper: Invariant Causal Imitation Learning for Generalizable Policies

Invariant Causal Imitation Learning for Generalizable Policies Ioana Bica, Daniel Jarrett, Mihaela van der Schaar Neural Information Processing System

Ioana Bica 17 Dec 01, 2022
Pytorch implementation of AngularGrad: A New Optimization Technique for Angular Convergence of Convolutional Neural Networks

AngularGrad Optimizer This repository contains the oficial implementation for AngularGrad: A New Optimization Technique for Angular Convergence of Con

mario 124 Sep 16, 2022