The official pytorch implementation of ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias

Last update: Nov 27, 2022

Related tags

Overview

ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias

Introduction

This repository contains the code, models, test results for the paper ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias. It contains several reduction cells and normal cells to introduce scale-invariance and locality into vision transformers.

Updates

07/12/2021 The code is released!

19/10/2021 The paper is accepted by Neurips'2021! The code will be released soon!

06/08/2021 The paper is post on arxiv! The code will be made public available once cleaned up.

Usage

Install

Clone this repo:

git clone https://github.com/Annbless/ViTAE.git
cd ViTAE

Create a conda virtual environment and activate it:

conda create -n vitae python=3.7 -y
conda activate vitae

conda install pytorch==1.8.1 torchvision==0.9.1 cudatoolkit=10.2 -c pytorch -c conda-forge

Install timm==0.3.4:

pip install timm==0.3.4

Install Apex:

git clone https://github.com/NVIDIA/apex
cd apex
git reset --hard a651e2c24ecf97cbf367fd3f330df36760e1c597
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

Install other requirements:

pip install pyyaml ipdb

Data Prepare

We use standard ImageNet dataset, you can download it from http://image-net.org/. The file structure should look like:

$ tree data
imagenet
├── train
│   ├── class1
│   │   ├── img1.jpeg
│   │   ├── img2.jpeg
│   │   └── ...
│   ├── class2
│   │   ├── img3.jpeg
│   │   └── ...
│   └── ...
└── val
    ├── class1
    │   ├── img4.jpeg
    │   ├── img5.jpeg
    │   └── ...
    ├── class2
    │   ├── img6.jpeg
    │   └── ...
    └── ...

Evaluation

Take ViTAE_basic_7 as an example, to evaluate the pretrained ViTAE model on ImageNet val, run

python validate.py [ImageNetPath] --model ViTAE_basic_7 --eval_checkpoint [Checkpoint Path]

Training

Take ViTAE_basic_7 as an example, to train the ViTAE model on ImageNet with 4 GPU and 512 batch size, run

python -m torch.distributed.launch --nproc_per_node=4 main.py [ImageNetPath] --model ViTAE_basic_7 -b 128 --lr 1e-3 --weight-decay .03 --img-size 224 --amp

The trained model file will be saved under the output folder

Results

Main Results on ImageNet-1K with pretrained models

name	resolution	[email protected]	[email protected]	[email protected]	Pretrained
ViTAE-T	224x224	75.3	92.7	82.9	Coming Soon
ViTAE-6M	224x224	77.9	94.1	84.9	Coming Soon
ViTAE-13M	224x224	81.0	95.4	86.9	Coming Soon
ViTAE-S	224x224	82.0	95.9	87.0	Coming Soon

Statement

This project is for research purpose only. For any other questions please contact yufei.xu at outlook.com qmzhangzz at hotmail.com .

The official pytorch implementation of ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias

Related tags

Overview

ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias

Introduction

Updates

Usage

Install

Data Prepare

Evaluation

Training

Results

Main Results on ImageNet-1K with pretrained models

Statement

Owner

Statistical Rethinking: A Bayesian Course Using CmdStanPy and Plotnine

WAL enables programmable waveform analysis.

PostQF is a user-friendly Postfix queue data filter which operates on data produced by postqueue -j.

Get mutations in cluster by querying from LAPIS API

A program that uses an API and a AI model to get info of sotcks

Accurately separate the TLD from the registered domain and subdomains of a URL, using the Public Suffix List.

We're Team Arson and we're using the power of predictive modeling to combat wildfires.

Programmatically access the physical and chemical properties of elements in modern periodic table.

SNV calling pipeline developed explicitly to process individual or trio vcf files obtained from Illumina based pipeline (grch37/grch38).

PATC: Introduction to Big Data Analytics. Practical Data Analytics for Solving Real World Problems

Functional tensors for probabilistic programming

Python Practicum - prepare for your Data Science interview or get a refresher.

Python package for analyzing behavioral data for Brain Observatory: Visual Behavior

PyIOmica (pyiomica) is a Python package for omics analyses.

Two phase pipeline + StreamlitTwo phase pipeline + Streamlit

Create HTML profiling reports from pandas DataFrame objects

Analysis of a dataset of 10000 passwords to find common trends and mistakes people generally make while setting up a password.

Additional tools for particle accelerator data analysis and machine information

Data-sets from the survey and analysis

Using approximate bayesian posteriors in deep nets for active learning