A PyTorch implementation of "TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?"

Last update: Sep 20, 2022

Related tags

Overview

TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?

Source: Improving Vision Transformer Efficiency and Accuracy by Learning to Tokenize

A PyTorch implementation of TokenLearner: What Can 8 Learned Tokens Do for Images and Videos? [1-2]. Unlike another Unofficial PyTorch implementation [3], our version is heavily borrowed from the official implementation [4] and TensorFlow implementation[5], and try to keep consistent with them.

Usage

You can access the TokenLearner and TokenLearnerModuleV11 class from the tokenlearner file. You can use this layer with a Vision Transformer, MLPMixer, or Video Vision Transformer as done in the paper.

import torch
from tokenlearner import TokenLearner

tklr = TokenLearner(in_channels=128, num_tokens=8, use_sum_pooling=False)

x = torch.ones(256, 32, 32, 128)  # [bs, h, w, c]
y1 = tklr(x)
print(y1.shape)  # [256, 8, 128]

You can also use TokenLearnerModuleV11, which aligns with the official implementation.

import torch
from tokenlearner import TokenLearnerModuleV11

tklr_v11 = TokenLearnerModuleV11(in_channels=128, num_tokens=8, num_groups=4, dropout_rate=0.)

tklr_v11.eval()  # control droput
x = torch.ones(256, 32, 32, 128)   # [bs, h, w, c]
y2 = tklr_v11(x)
print(y2.shape)  # [256, 8, 128]

References

[1] TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?; Ryoo et al.; arXiv 2021; https://arxiv.org/abs/2106.11297

[2] TokenLearner: Adaptive Space-Time Tokenization for Videos; Ryoo et al., NeurIPS 2021; https://openreview.net/forum?id=z-l1kpDXs88

[3] Unofficial PyTorch implementation

[4] official implementation

[5] TensorFlow implementation

A PyTorch implementation of "TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?"

Related tags

Overview

TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?

Usage

References

Owner

Caiyong Wang

Hysterese plugin with two temperature offset areas

Code for "Long Range Probabilistic Forecasting in Time-Series using High Order Statistics"

DCSL - Generalizable Crowd Counting via Diverse Context Style Learning

This is an official implementation for "DeciWatch: A Simple Baseline for 10x Efficient 2D and 3D Pose Estimation"

Privacy-Preserving Portrait Matting [ACM MM-21]

Easy to use and customizable SOTA Semantic Segmentation models with abundant datasets in PyTorch

K-Nearest Neighbor in Pytorch

Group R-CNN for Point-based Weakly Semi-supervised Object Detection (CVPR2022)

Training data extraction on GPT-2

This is a Python Module For Encryption, Hashing And Other stuff

Mind the Trade-off: Debiasing NLU Models without Degrading the In-distribution Performance

NAS-HPO-Bench-II is the first benchmark dataset for joint optimization of CNN and training HPs.

Maximum Spatial Perturbation for Image-to-Image Translation (Official Implementation)

An implementation of the research paper "Retina Blood Vessel Segmentation Using A U-Net Based Convolutional Neural Network"

Supervised Contrastive Learning for Product Matching

A pytorch implementation of faster RCNN detection framework (Use detectron2, it's a masterpiece)

Epidemiology analysis package

Official implementation of "Robust channel-wise illumination estimation"

Easy and Efficient Object Detector

A set of examples around hub for creating and processing datasets