PyTorch implementation of the paper Dynamic Token Normalization Improves Vision Transfromers.

Last update: Oct 09, 2022

Related tags

Overview

Dynamic Token Normalization Improves Vision Transformers

This is the PyTorch implementation of the paper Dynamic Token Normalization Improves Vision Transfromers. Codea and Models will be available soon.

Dynamic Token Normalization

We design a novel normalization method, termed Dynamic Token Normalization (DTN), which inherits the advantages from LayerNorm and InstanceNorm. DTN can be seamlessly plugged into various transformer models, consistenly improving the performance.

Comparisons of top-1 accuracies on the validation set of ImageNet, by using ViT trained with LN and DTN.

Model	Top-1	Top-5
ViT-T*-LN	72.3	91.4
ViT-T*-DTN	73.2	91.7
ViT-S*-LN	80.6	95.2
ViT-S*-DTN	81.7	95.8
ViT-B*-LN	81.7	95.8
ViT-B*-DTN	82.5	96.1

Getting Started

Install PyTorch

Clone the repo:

git clone https://github.com/dtn-anonymous/DTN.git

Requirements

Install CUDA==10.1 with cudnn7 following the official installation instructions
Install PyTorch==1.7.1 and torchvision==0.8.2 with CUDA==10.1:

conda install pytorch==1.7.1 torchvision==0.8.2 cudatoolkit=10.1 -c pytorch

Install timm==0.3.2:

pip install timm==0.3.2

Data Preparation

Download the ImageNet dataset which should contain train and val directionary and the txt file for correspondings between images and labels.

Training a model from scratch

An example to train our DTN is given in DTN/scripts/train.sh. To train ViT-S* with our DTN,

cd DTN/scripts   
sh train.sh layer vit_norm_s_star configs/ViT/vit.yaml

Number of GPUs and configuration file to use can be modified in train.sh

PyTorch implementation of the paper Dynamic Token Normalization Improves Vision Transfromers.

Related tags

Overview

Dynamic Token Normalization Improves Vision Transformers

Dynamic Token Normalization

Getting Started

Requirements

Data Preparation

Training a model from scratch

Owner

Wenqi Shao

Codebase for Diffusion Models Beat GANS on Image Synthesis.

Repo for "Benchmarking Robustness of 3D Point Cloud Recognition against Common Corruptions" https://arxiv.org/abs/2201.12296

Reference implementation of code generation projects from Facebook AI Research. General toolkit to apply machine learning to code, from dataset creation to model training and evaluation. Comes with pretrained models.

Learning Skeletal Articulations with Neural Blend Shapes

Speech recognition tool to convert audio to text transcripts, for Linux and Raspberry Pi.

A transformer-based method for Healthcare Image Captioning in Vietnamese

WeakVRD-Captioning - Implementation of paper Improving Image Captioning with Better Use of Caption

Weight initialization schemes for PyTorch nn.Modules

PySlowFast: video understanding codebase from FAIR for reproducing state-of-the-art video models.

Underwater image enhancement

[CVPR 2021] Counterfactual VQA: A Cause-Effect Look at Language Bias

LegoDNN: a block-grained scaling tool for mobile vision systems

TART - A PyTorch implementation for Transition Matrix Representation of Trees with Transposed Convolutions

Motion and Shape Capture from Sparse Markers

Real-time object detection on Android using the YOLO network with TensorFlow

Implementation of Hierarchical Transformer Memory (HTM) for Pytorch

Code for database and frontend of webpage for Neural Fields in Visual Computing and Beyond.

[CVPR 2022] Official PyTorch Implementation for "Reference-based Video Super-Resolution Using Multi-Camera Video Triplets"

SkipGNN: Predicting Molecular Interactions with Skip-Graph Networks (Scientific Reports)

Deeper insights into graph convolutional networks for semi-supervised learning