Implementation of TimeSformer, a pure attention-based solution for video classification

Overview

TimeSformer - Pytorch

Implementation of TimeSformer, a pure and simple attention-based solution for reaching SOTA on video classification. This repository will only house the best performing variant, 'Divided Space-Time Attention', which is nothing more than attention along the time axis before the spatial.

Install

$ pip install timesformer-pytorch

Usage

import torch
from timesformer_pytorch import TimeSformer

model = TimeSformer(
    dim = 512,
    image_size = 224,
    patch_size = 16,
    num_frames = 8,
    num_classes = 10,
    depth = 12,
    heads = 8,
    dim_head =  64,
    attn_dropout = 0.1,
    ff_dropout = 0.1
)

video = torch.randn(2, 8, 3, 224, 224) # (batch x frames x channels x height x width)
pred = model(video) # (2, 10)

Citations

@misc{bertasius2021spacetime,
    title   = {Is Space-Time Attention All You Need for Video Understanding?}, 
    author  = {Gedas Bertasius and Heng Wang and Lorenzo Torresani},
    year    = {2021},
    eprint  = {2102.05095},
    archivePrefix = {arXiv},
    primaryClass = {cs.CV}
}
Comments
  • How to deal with varying length video? Thanks

    How to deal with varying length video? Thanks

    Dear all, I am wondering if TimeSformer can handle different videos with diverse lengths? Is it possible to use mask as the original Transformer? Any ideas, thanks a lot.

    opened by junyongyou 2
  • fix runtime error in SpaceTime Attention

    fix runtime error in SpaceTime Attention

    There is a shape mismatch error in Attention. When we splice out the classification token from the first token of each sequence in q, k and v, the shape becomes (batch_size * num_heads, num_frames * num_patches - 1, head_dim). Then we try to reshape the tensor by taking out a factor of num_frames or num_patches (depending on whether it is space or time attention) from dimension 1. That doesn't work because we subtracted out the classification token.

    I found that performing the rearrange operation before splicing the token fixes the issue.

    I recreate the problem and illustrate the solution in this notebook: https://colab.research.google.com/drive/1lHFcn_vgSDJNSqxHy7rtqhMVxe0nUCMS?usp=sharing.

    By the way, thank you to @lucidrains; all of your implementations on attention-based models are helping me more than you know.

    opened by adam-mehdi 1
  • Update timesformer_pytorch.py

    Update timesformer_pytorch.py

    fixing issue for scaling

    File "/home/aarti9/.local/lib/python3.6/site-packages/timesformer_pytorch/timesformer_pytorch.py", line 82, in forward q *= self.scale

    RuntimeError: Output 0 of ViewBackward is a view and is being modified inplace. This view is an output of a function that returns multiple views. Inplace operators on such views is forbidden. You should replace the inplace operation by an out-of-place one.

    opened by aarti9 0
  • Fine-tune with new datasets

    Fine-tune with new datasets

    Thank you so much for your great effort. I can predict the images using the given .py files. But, I couldn't find train.py files, so how to fine-tune the network with new datasets? where should i define the image samples of the new dataset ?

    opened by Jeba-create 0
  • problem in timesformer_pytorch.py

    problem in timesformer_pytorch.py

    start from line 182 video = rearrange(video, 'b f c (h p1) (w p2) -> b (f h w) (p1 p2 c)', p1 = p, p2 = p) i think this should be video = rearrange(video, 'b f c (hp p1) (wp p2) -> b (f hp wp) (p1 p2 c)', p1 = p, p2 = p)

    opened by Weizhongjin 2
  • Imagenet Pretrained Weights

    Imagenet Pretrained Weights

    Thanks for the work! In their paper they say For all our experiments, we adopt the “Base” ViT model architecture (Dosovitskiy et al., 2020) pretrained on ImageNet.

    I know that you said the official weights trained on kinetics and such are not officially released yet. However, I am not interested in those but am actually in need of the initial weights of the network just based on ViT Imagenet pretraining. I need to train this implementation of yours starting from those. From what it looks like, you don't have weights for this implementation that come from imagenet pretraining, do you?

    opened by RaivoKoot 5
Owner
Phil Wang
Working with Attention. It's all we need.
Phil Wang
pytorch implementation for PointNet

PointNet.pytorch This repo is implementation for PointNet in pytorch. The model is in pointnet/model.py. It is teste

Fei Xia 1.7k Dec 30, 2022
Implement face detection, and age and gender classification, and emotion classification.

YOLO Keras Face Detection Implement Face detection, and Age and Gender Classification, and Emotion Classification. (image from wider face dataset) Ove

Chloe 10 Nov 14, 2022
Implementation of the HMAX model of vision in PyTorch

PyTorch implementation of HMAX PyTorch implementation of the HMAX model that closely follows that of the MATLAB implementation of The Laboratory for C

Marijn van Vliet 52 Oct 13, 2022
UpChecker is a simple opensource project to host it fast on your server and check is server up, view statistic, get messages if it is down. UpChecker - just run file and use project easy

UpChecker UpChecker is a simple opensource project to host it fast on your server and check is server up, view statistic, get messages if it is down.

Yan 4 Apr 07, 2022
Flow is a computational framework for deep RL and control experiments for traffic microsimulation.

Flow Flow is a computational framework for deep RL and control experiments for traffic microsimulation. See our website for more information on the ap

867 Jan 02, 2023
So-ViT: Mind Visual Tokens for Vision Transformer

So-ViT: Mind Visual Tokens for Vision Transformer        Introduction This repository contains the source code under PyTorch framework and models trai

Jiangtao Xie 44 Nov 24, 2022
Lightweight Salient Object Detection in Optical Remote Sensing Images via Feature Correlation

CorrNet This project provides the code and results for 'Lightweight Salient Object Detection in Optical Remote Sensing Images via Feature Correlation'

Gongyang Li 13 Nov 03, 2022
Open-sourcing the Slates Dataset for recommender systems research

FINN.no Recommender Systems Slate Dataset This repository accompany the paper "Dynamic Slate Recommendation with Gated Recurrent Units and Thompson Sa

FINN.no 48 Nov 28, 2022
A Python package to process & model ChEMBL data.

insilico: A Python package to process & model ChEMBL data. ChEMBL is a manually curated chemical database of bioactive molecules with drug-like proper

Steven Newton 0 Dec 09, 2021
Combinatorially Hard Games where the levels are procedurally generated

puzzlegen Implementation of two procedurally simulated environments with gym interfaces. IceSlider: the agent needs to reach and stop on the pink squa

Autonomous Learning Group 3 Jun 26, 2022
Official PyTorch implementation of the paper "Self-Supervised Relational Reasoning for Representation Learning", NeurIPS 2020 Spotlight.

Official PyTorch implementation of the paper: "Self-Supervised Relational Reasoning for Representation Learning" (2020), Patacchiola, M., and Storkey,

Massimiliano Patacchiola 135 Jan 03, 2023
ROCKET: Exceptionally fast and accurate time series classification using random convolutional kernels

ROCKET + MINIROCKET ROCKET: Exceptionally fast and accurate time series classification using random convolutional kernels. Data Mining and Knowledge D

298 Dec 26, 2022
Install alphafold on the local machine, get out of docker.

AlphaFold This package provides an implementation of the inference pipeline of AlphaFold v2.0. This is a completely new model that was entered in CASP

Kui Xu 73 Dec 13, 2022
PyTorch implementation of the wavelet analysis from Torrence & Compo

Continuous Wavelet Transforms in PyTorch This is a PyTorch implementation for the wavelet analysis outlined in Torrence and Compo (BAMS, 1998). The co

Tom Runia 262 Dec 21, 2022
Revisting Open World Object Detection

Revisting Open World Object Detection Installation See INSTALL.md. Dataset Our n

58 Dec 23, 2022
This is a yolo3 implemented via tensorflow 2.7

YoloV3 - an object detection algorithm implemented via TF 2.x source code In this article I assume you've already familiar with basic computer vision

2 Jan 17, 2022
The author's officially unofficial PyTorch BigGAN implementation.

BigGAN-PyTorch The author's officially unofficial PyTorch BigGAN implementation. This repo contains code for 4-8 GPU training of BigGANs from Large Sc

Andy Brock 2.6k Jan 02, 2023
A full pipeline AutoML tool for tabular data

HyperGBM Doc | 中文 We Are Hiring! Dear folks,we are offering challenging opportunities located in Beijing for both professionals and students who are k

DataCanvas 240 Jan 03, 2023
NNR conformation conditional and global probabilities estimation and analysis in peptides or proteins fragments

NNR and global probabilities estimation and analysis in peptides or protein fragments This module calculates global and NNR conformation dependent pro

0 Jul 15, 2021
DeepRec is a recommendation engine based on TensorFlow.

DeepRec Introduction DeepRec is a recommendation engine based on TensorFlow 1.15, Intel-TensorFlow and NVIDIA-TensorFlow. Background Sparse model is a

Alibaba 676 Jan 03, 2023