Dynamic Token Normalization Improves Vision Transformers

Last update: Oct 09, 2022

Related tags

Overview

Dynamic Token Normalization Improves Vision Transformers

This is the PyTorch implementation of the paper Dynamic Token Normalization Improves Vision Transfromers. Codea and Models will be available soon.

Dynamic Token Normalization

We design a novel normalization method, termed Dynamic Token Normalization (DTN), which inherits the advantages from LayerNorm and InstanceNorm. DTN can be seamlessly plugged into various transformer models, consistenly improving the performance.

Comparisons of top-1 accuracies on the validation set of ImageNet, by using ViT trained with LN and DTN.

Model	Top-1	Top-5
ViT-T*-LN	72.3	91.4
ViT-T*-DTN	73.2	91.7
ViT-S*-LN	80.6	95.2
ViT-S*-DTN	81.7	95.8
ViT-B*-LN	81.7	95.8
ViT-B*-DTN	82.5	96.1

Getting Started

Install PyTorch

Clone the repo:

git clone https://github.com/dtn-anonymous/DTN.git

Requirements

Install CUDA==10.1 with cudnn7 following the official installation instructions
Install PyTorch==1.7.1 and torchvision==0.8.2 with CUDA==10.1:

conda install pytorch==1.7.1 torchvision==0.8.2 cudatoolkit=10.1 -c pytorch

Install timm==0.3.2:

pip install timm==0.3.2

Data Preparation

Download the ImageNet dataset which should contain train and val directionary and the txt file for correspondings between images and labels.

Training a model from scratch

An example to train our DTN is given in DTN/scripts/train.sh. To train ViT-S* with our DTN,

cd DTN/scripts   
sh train.sh layer vit_norm_s_star configs/ViT/vit.yaml

Number of GPUs and configuration file to use can be modified in train.sh

Dynamic Token Normalization Improves Vision Transformers

Related tags

Overview

Dynamic Token Normalization Improves Vision Transformers

Dynamic Token Normalization

Getting Started

Requirements

Data Preparation

Training a model from scratch

Owner

Wenqi Shao

Pytorch Implementation of Various Point Transformers

CellRank's reproducibility repository.

A minimalist implementation of score-based diffusion model

Keras implementations of Generative Adversarial Networks.

Repo for parser tensorflow(.pb) and tflite(.tflite)

Supplementary code for the experiments described in the 2021 ISMIR submission: Leveraging Hierarchical Structures for Few Shot Musical Instrument Recognition.

PyTorch implementation of Super SloMo by Jiang et al.

An Open Source Machine Learning Framework for Everyone

A Unified Generative Framework for Various NER Subtasks.

Official Pytorch implementation of ICLR 2018 paper Deep Learning for Physical Processes: Integrating Prior Scientific Knowledge.

For encoding a text longer than 512 tokens, for example 800. Set max_pos to 800 during both preprocessing and training.

[ICCV 2021] Group-aware Contrastive Regression for Action Quality Assessment

Proto-RL: Reinforcement Learning with Prototypical Representations

SlideGraph+: Whole Slide Image Level Graphs to Predict HER2 Status in Breast Cancer

v objective diffusion inference code for JAX.

CMUA-Watermark: A Cross-Model Universal Adversarial Watermark for Combating Deepfakes (AAAI2022)

PyTorch and Tensorflow functional model definitions

Code for the ECCV2020 paper "A Differentiable Recurrent Surface for Asynchronous Event-Based Data"

Knowledge Distillation Toolbox for Semantic Segmentation