Video Swin Transformer - PyTorch

Last update: Dec 20, 2022

Overview

Video-Swin-Transformer-Pytorch

This repo is a simple usage of the official implementation "Video Swin Transformer".

Introduction

Video Swin Transformer is initially described in "Video Swin Transformer", which advocates an inductive bias of locality in video Transformers, leading to a better speed-accuracy trade-off compared to previous approaches which compute self-attention globally even with spatial-temporal factorization. The locality of the proposed video architecture is realized by adapting the Swin Transformer designed for the image domain, while continuing to leverage the power of pre-trained image models. Our approach achieves state-of-the-art accuracy on a broad range of video recognition benchmarks, including action recognition (84.9 top-1 accuracy on Kinetics-400 and 86.1 top-1 accuracy on Kinetics-600 with ~20x less pre-training data and ~3x smaller model size) and temporal modeling (69.6 top-1 accuracy on Something-Something v2).

Usage

Installation

pip install -r requirements.txt

If this does not work, please refer to the official install.md for installation.

Prepare

git clone https://github.com/haofanwang/video-swin-transformer-pytorch.git

cd video-swin-transformer-pytorch
mkdir checkpoints && cd checkpoints
wget https://github.com/SwinTransformer/storage/releases/download/v1.0.4/swin_base_patch244_window1677_sthv2.pth
cd ..

If you want to try different models, please refer to Video-Swin-Transformer and download corresponding pretrained weight, then modify the config and pretrained weight.

Inference

import torch
import torch.nn as nn
from video_swin_transformer import SwinTransformer3D

model = SwinTransformer3D()
print(model)

dummy_x = torch.rand(1, 3, 32, 224, 224)
logits = model(dummy_x)
print(logits.shape)

python example.py

Acknowledgement

The code is adapted from the official Video-Swin-Transformer repository. This project is inspired by swin-transformer-pytorch, which provides the simplest code to get started.

Citation

If you find our work useful in your research, please cite:

@article{liu2021video,
  title={Video Swin Transformer},
  author={Liu, Ze and Ning, Jia and Cao, Yue and Wei, Yixuan and Zhang, Zheng and Lin, Stephen and Hu, Han},
  journal={arXiv preprint arXiv:2106.13230},
  year={2021}
}

@article{liu2021Swin,
  title={Swin Transformer: Hierarchical Vision Transformer using Shifted Windows},
  author={Liu, Ze and Lin, Yutong and Cao, Yue and Hu, Han and Wei, Yixuan and Zhang, Zheng and Lin, Stephen and Guo, Baining},
  journal={arXiv preprint arXiv:2103.14030},
  year={2021}
}

Video Swin Transformer - PyTorch

Related tags

Overview

Video-Swin-Transformer-Pytorch

Introduction

Usage

Installation

Prepare

Inference

Acknowledgement

Citation

Owner

Haofan Wang

Real-time object detection on Android using the YOLO network with TensorFlow

Time-stretch audio clips quickly with PyTorch (CUDA supported)! Additional utilities for searching efficient transformations are included.

An open source Python package for plasma science that is under development

AdaDM: Enabling Normalization for Image Super-Resolution

Semi Supervised Learning for Medical Image Segmentation, a collection of literature reviews and code implementations.

Image marine sea litter prediction Shiny

Understanding the Effects of Datasets Characteristics on Offline Reinforcement Learning

Unleashing Transformers: Parallel Token Prediction with Discrete Absorbing Diffusion for Fast High-Resolution Image Generation from Vector-Quantized Codes

RCD: Relation Map Driven Cognitive Diagnosis for Intelligent Education Systems

Code release for ICCV 2021 paper "Anticipative Video Transformer"

Official implementation of our neural-network-based fast diffuse room impulse response generator (FAST-RIR)

Symmetry and Uncertainty-Aware Object SLAM for 6DoF Object Pose Estimation

Train an imgs.ai model on your own dataset

Cross-Image Region Mining with Region Prototypical Network for Weakly Supervised Segmentation

Code for LIGA-Stereo Detector, ICCV'21

Companion code for the paper "An Infinite-Feature Extension for Bayesian ReLU Nets That Fixes Their Asymptotic Overconfidence" (NeurIPS 2021)

Neural Radiance Fields Using PyTorch

(NeurIPS 2020) Wasserstein Distances for Stereo Disparity Estimation

Cervix ROI Segmentation Using U-NET

Python版OpenCVのTracking APIのサンプルです。DaSiamRPNアルゴリズムまで対応しています。