CMT: Convolutional Neural Networks Meet Vision Transformers

Last update: Dec 30, 2022

Related tags

Overview

CMT: Convolutional Neural Networks Meet Vision Transformers

1. Introduction

This repo is the CMT model which impelement with pytorch, no reference source code so this is a non-official version.

2. Enveriments

python 3.7+
pytorch 1.7.1
pillow
apex
opencv-python

You can see this repo to find how to install the apex

3. DataSet

Trainig

/data/home/imagenet/train/xxx.jpeg, 0
/data/home/imagenet/train/xxx.jpeg, 1
...
/data/home/imagenet/train/xxx.jpeg, 999

Testing

/data/home/imagenet/test/xxx.jpeg, 0
/data/home/imagenet/test/xxx.jpeg, 1
...
/data/home/imagenet/test/xxx.jpeg, 999

4. Training & Inference

Training

CMT-Tiny

#!/bin/bash
OMP_NUM_THREADS=1
MKL_NUM_THREADS=1
export OMP_NUM_THREADS
export MKL_NUM_THREADS
cd CMT-pytorch;
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -W ignore -m torch.distributed.launch --nproc_per_node 8 train.py --batch_size 512 --num_workers 48 --lr 6e-3 --optimizer_name "adamw" --tf_optimizer 1 --cosine 1 --model_name cmtti --max_epochs 300 \
--warmup_epochs 5 --num-classes 1000 --input_size 184 \ --crop_size 160 --weight_decay 1e-1 --grad_clip 0 --repeated-aug 0 --max_grad_norm 5.0 
--drop_path_rate 0.1 --FP16 0 --qkv_bias 1 
--ape 0 --rpe 1 --pe_nd 0 --mode O2 --amp 1 --apex 0 \ 
--train_file $file_folder$/train.txt \
--val_file $file_folder$/val.txt \
--log-dir $save_folder$/log_dir \
--checkpoints-path $save_folder$/checkpoints

Note: If you use the bs 128 * 8 may be get more accuracy, balance the acc & speed.

Inference

#!/bin/bash
cd CMT-pytorch;
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -W ignore test.py \
--dist-url 'tcp://127.0.0.1:9966' --dist-backend 'nccl' --multiprocessing-distributed=1 --world-size=1  --rank=0 
--batch-size 128 --num-workers 48 --num-classes 1000 --input_size 184 --crop_size 160 \
--ape 0 --rpe 1 --pe_nd 0 --qkv_bias 1 --swin 0 --model_name cmtti --dropout 0.1 --emb_dropout 0.1 \
--test_file $file_folder$/val.txt \
--checkpoints-path $save_folder$/checkpoints/xxx.pth.tar \
--save_folder $save_folder$/acc_logits/

calculate acc

python utils/calculate_acc.py --logits_file $save_folder$/acc_logits/

5. Imagenet Result

model-name	input_size	FLOPs	Params	[email protected]_crop(ours)	acc(papers)	weights
CMT-T	160x160	516M	11.3M	75.124%	79.2%	weights
CMT-T	224x224	1.01G	11.3M	78.4%	-	weights
CMT-XS	192x192	-	-	-	81.8%	-
CMT-S	224x224	-	-	-	83.5%	-
CMT-L	256x256	-	-	-	84.5%	-

6. TODO

Other result may comming sonn if someone need.
Release the CMT-XS result on the imagenet.
Check the diff with papers, author give the hyparameters on the issue
Adjusting the best hyperparameters for CMT or transformers

Supplementary

If you want to know more, I give the CMT explanation, as well as the tuning and training process on here.

CMT: Convolutional Neural Networks Meet Vision Transformers

Related tags

Overview

CMT: Convolutional Neural Networks Meet Vision Transformers

1. Introduction

2. Enveriments

3. DataSet

4. Training & Inference

5. Imagenet Result

6. TODO

Supplementary

Owner

FlyEgle

How to Train a GAN? Tips and tricks to make GANs work

Evaluating Privacy-Preserving Machine Learning in Critical Infrastructures: A Case Study on Time-Series Classification

(CVPR2021) Kaleido-BERT: Vision-Language Pre-training on Fashion Domain

VQMIVC - Vector Quantization and Mutual Information-Based Unsupervised Speech Representation Disentanglement for One-shot Voice Conversion

Unofficial implementation of MUSIQ (Multi-Scale Image Quality Transformer)

Provably Rare Gem Miner.

Pytorch implementation of Masked Auto-Encoder

Tensorflow implementation of soft-attention mechanism for video caption generation.

CRISCE: Automatically Generating Critical Driving Scenarios From Car Accident Sketches

A clean and robust Pytorch implementation of PPO on continuous action space.

Model Serving Made Easy

Cryptocurrency Prediction with Artificial Intelligence (Deep Learning via LSTM Neural Networks)

[NeurIPS2021] Code Release of K-Net: Towards Unified Image Segmentation

An original implementation of "MetaICL Learning to Learn In Context" by Sewon Min, Mike Lewis, Luke Zettlemoyer and Hannaneh Hajishirzi

Accelerating BERT Inference for Sequence Labeling via Early-Exit

This is the official repository for our paper: ''Pruning Self-attentions into Convolutional Layers in Single Path''.

PyTorch implementation for our AAAI 2022 Paper "Graph-wise Common Latent Factor Extraction for Unsupervised Graph Representation Learning"

A PyTorch implementation of the baseline method in Panoptic Narrative Grounding (ICCV 2021 Oral)

Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network

Medical Image Segmentation using Squeeze-and-Expansion Transformers