CoaT: Co-Scale Conv-Attentional Image Transformers

Last update: Dec 03, 2022

Related tags

Overview

CoaT: Co-Scale Conv-Attentional Image Transformers

Introduction

This repository contains the official code and pretrained models for CoaT: Co-Scale Conv-Attentional Image Transformers. It introduces (1) a co-scale mechanism to realize fine-to-coarse, coarse-to-fine and cross-scale attention modeling and (2) an efficient conv-attention module to realize relative position encoding in the factorized attention.

For more details, please refer to CoaT: Co-Scale Conv-Attentional Image Transformers by Weijian Xu*, Yifan Xu*, Tyler Chang, and Zhuowen Tu.

Changelog

04/23/2021: Pre-trained checkpoint for CoaT-Lite Mini is released.
04/22/2021: Code and pre-trained checkpoint for CoaT-Lite Tiny are released.

Usage

Environment Preparation

Set up a new conda environment and activate it.

# Create an environment with Python 3.8.
conda create -n coat python==3.8
conda activate coat

Install required packages.

# Install PyTorch 1.7.1 w/ CUDA 11.0.
pip install torch==1.7.1+cu110 torchvision==0.8.2+cu110 torchaudio==0.7.2 -f https://download.pytorch.org/whl/torch_stable.html

# Install timm 0.3.2.
pip install timm==0.3.2

# Install einops.
pip install einops

Code and Dataset Preparation

Clone the repo.

git clone https://github.com/mlpc-ucsd/CoaT
cd CoaT

Download ImageNet dataset (ILSVRC 2012) and extract.

# Create dataset folder.
mkdir -p ./data/ImageNet

# Download the dataset (not shown here) and copy the files (assume the download path is in $DATASET_PATH).
cp $DATASET_PATH/ILSVRC2012_img_train.tar $DATASET_PATH/ILSVRC2012_img_val.tar $DATASET_PATH/ILSVRC2012_devkit_t12.tar.gz ./data/ImageNet

# Extract the dataset.
python -c "from torchvision.datasets import ImageNet; ImageNet('./data/ImageNet', split='train')"
python -c "from torchvision.datasets import ImageNet; ImageNet('./data/ImageNet', split='val')"
# After the extraction, you should observe `train` and `val` folders under ./data/ImageNet.

Evaluate Pre-trained Checkpoint

We provide the CoaT checkpoints pre-trained on the ImageNet dataset.

Name	[email protected]	[email protected]	#Params	SHA-256 (first 8 chars)	URL
CoaT-Lite Tiny	77.5	93.8	5.7M	e88e96b0	model, log
CoaT-Lite Mini	79.1	94.5	11M	6b4a8ae5	model, log

The following commands provide an example (CoaT-Lite Tiny) to evaluate the pre-trained checkpoint.

# Download the pretrained checkpoint.
mkdir -p ./output/pretrained
wget http://vcl.ucsd.edu/coat/pretrained/coat_lite_tiny_e88e96b0.pth -P ./output/pretrained
sha256sum ./output/pretrained/coat_lite_tiny_e88e96b0.pth  # Make sure it matches the SHA-256 hash (first 8 characters) in the table.

# Evaluate.
# Usage: bash ./scripts/eval.sh [model name] [output folder] [checkpoint path]
bash ./scripts/eval.sh coat_lite_tiny coat_lite_tiny_pretrained ./output/pretrained/coat_lite_tiny_e88e96b0.pth
# It should output results similar to "[email protected] 77.504 [email protected] 93.814" at very last.

Train

The following commands provide an example (CoaT-Lite Tiny, 8-GPU) to train the CoaT model.

# Usage: bash ./scripts/train.sh [model name] [output folder]
bash ./scripts/train.sh coat_lite_tiny coat_lite_tiny

Evaluate

The following commands provide an example (CoaT-Lite Tiny) to evaluate the checkpoint after training.

# Usage: bash ./scripts/eval.sh [model name] [output folder] [checkpoint path]
bash ./scripts/eval.sh coat_lite_tiny coat_lite_tiny_eval ./output/coat_lite_tiny/checkpoints/checkpoint0299.pth

Citation

@misc{xu2021coscale,
      title={Co-Scale Conv-Attentional Image Transformers}, 
      author={Weijian Xu and Yifan Xu and Tyler Chang and Zhuowen Tu},
      year={2021},
      eprint={2104.06399},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

License

This repository is released under the Apache License 2.0. License can be found in LICENSE file.

Acknowledgment

Thanks to DeiT and pytorch-image-models for a clear and data-efficient implementation of ViT. Thanks to lucidrains' implementation of Lambda Networks and CPVT.

CoaT: Co-Scale Conv-Attentional Image Transformers

Related tags

Overview

CoaT: Co-Scale Conv-Attentional Image Transformers

Introduction

Changelog

Usage

Environment Preparation

Code and Dataset Preparation

Evaluate Pre-trained Checkpoint

Train

Evaluate

Citation

License

Acknowledgment

Owner

mlpc-ucsd

Notepy is a full-featured Notepad Python app

Embodied Intelligence via Learning and Evolution

Code for the ECCV2020 paper "A Differentiable Recurrent Surface for Asynchronous Event-Based Data"

Code for our paper Domain Adaptive Semantic Segmentation with Self-Supervised Depth Estimation

Mixup for Supervision, Semi- and Self-Supervision Learning Toolbox and Benchmark

Omniscient Video Super-Resolution

Using LSTM write Tang poetry

KE-Dialogue: Injecting knowledge graph into a fully end-to-end dialogue system.

Multi-robot collaborative exploration and mapping through Voronoi partition and DRL in unknown environment

This is an official pytorch implementation of Fast Fourier Convolution.

Suite of 500 procedurally-generated NLP tasks to study language model adaptability

Comp445 project - Data Communications & Computer Networks

DL course co-developed by YSDA, HSE and Skoltech

百度2021年语言与智能技术竞赛机器阅读理解Pytorch版baseline

Detecting and Tracking Small and Dense Moving Objects in Satellite Videos: A Benchmark

DeiT: Data-efficient Image Transformers

A computational optimization project towards the goal of gerrymandering the results of a hypothetical election in the UK.

Understanding the Properties of Minimum Bayes Risk Decoding in Neural Machine Translation.

Open & Efficient for Framework for Aspect-based Sentiment Analysis

This a classic fintech problem that introduces real life difficulties such as data imbalance. Check out the notebook to find out more!