PyTorch implementation of the R2Plus1D convolution based ResNet architecture described in the paper "A Closer Look at Spatiotemporal Convolutions for Action Recognition"

Last update: Dec 16, 2022

Related tags

Deep Learning R2Plus1D-PyTorch

Overview

R2Plus1D-PyTorch

PyTorch implementation of the R2Plus1D convolution based ResNet architecture described in the paper "A Closer Look at Spatiotemporal Convolutions for Action Recognition"

Link to original: paper and code

NOTE: This repository has been archived, although forks and other work that extend on top of this remain welcome

Requirements

R2Plus1D-PyTorch has the following requirements

PyTorch 0.4 and dependencies
OpenCV (tested on 3.4.0.12)
tqdm (for progress bars)

About this repository

This repository consists of four python files:

module.py - Contains an implementation of the factored, R2Plus1D convolution the entire implementation is based around. It is designed to be a replacement for nn.Conv3D in the appropriate scenario
network.py - Uses module.py to build up the residual network described in the paper
dataset.py - Implements a PyTorch dataset, that can load videos with appropriate labels from a given directory.
trainer.py - A mildly modified version of the script from the PyTorch tutorials to train the model. Features saving and restoring capabilities.

Training on Kinetics-400/600

This repository does not include a crawler or downloader for the Kinetics-400/600 dataset, however, one can be found here. It is strongly recommended to downsample the videos prior to training (and not on the fly), using a tool such as ffmpeg. If using the crawler, this can be done by adding "-vf", "scale=172:128" to the ffmpeg command list in the download clip function.

Training in general

This repository is designed for the ResNet to be trained on any dataset of videos in general, using the VideoDataloader class from dataset.py . It expects the videos to be arranged in a directory -> [train/val] folders -> [class_label] folders (one for each class) -> videos (the files themselves).

Forks and fixes of this repo are highly welcome!

PyTorch implementation of the R2Plus1D convolution based ResNet architecture described in the paper "A Closer Look at Spatiotemporal Convolutions for Action Recognition"

Related tags

Overview

R2Plus1D-PyTorch

Requirements

About this repository

Training on Kinetics-400/600

Training in general

Owner

Irhum Shafkat

Multi-Stage Episodic Control for Strategic Exploration in Text Games

Imaginaire - NVIDIA's Deep Imagination Team's PyTorch Library

Code for the ICML 2021 paper: "ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision"

Only a Matter of Style: Age Transformation Using a Style-Based Regression Model

SurvITE: Learning Heterogeneous Treatment Effects from Time-to-Event Data

Implementation of Basic Machine Learning Algorithms on small datasets using Scikit Learn.

Real-time VIBE: Frame by Frame Inference of VIBE (Video Inference for Human Body Pose and Shape Estimation)

Keras-retinanet - Keras implementation of RetinaNet object detection.

Repository for the paper titled: "When is BERT Multilingual? Isolating Crucial Ingredients for Cross-lingual Transfer"

a reccurrent neural netowrk that when trained on a peice of text and fed a starting prompt will write its on 250 character text using LSTM layers

YOLOX Win10 Project

Self-attentive task GAN for space domain awareness data augmentation.

Reproducible research and reusable acyclic workflows in Python. Execute code on HPC systems as if you executed them on your personal computer!

performing moving objects segmentation using image processing techniques with opencv and numpy

Instance-Dependent Partial Label Learning

Constrained Logistic Regression - How to apply specific constraints to logistic regression's coefficients

《Lerning n Intrinsic Grment Spce for Interctive Authoring of Grment Animtion》

FID calculation with proper image resizing and quantization steps

Object Detection Projekt in GKI WS2021/22

A PyTorch implementation of "Graph Classification Using Structural Attention" (KDD 2018).