Shunted Transformer
This is the offical implementation of Shunted Self-Attention via Multi-Scale Token Aggregation by Sucheng Ren, Daquan Zhou, Shengfeng He, Jiashi Feng, Xinchao Wang
Training from scratch
Training
bash dist_train.sh
Model Zoo
The checkpoints can be found at Goolge Drive, Baidu Pan (code:hazr) (Checkpoints of the large models are coming soon.)
Method | Size | [email protected] | #Params (M) |
---|---|---|---|
Shunted-T | 224 | 79.8 | 11.5 |
Shunted-S | 224 | 82.9 | 22.4 |
Shunted-B | 224 | 84.0 | 39.6 |
Citation
@misc{ren2021shunted,
title={Shunted Self-Attention via Multi-Scale Token Aggregation},
author={Sucheng Ren and Daquan Zhou and Shengfeng He and Jiashi Feng and Xinchao Wang},
year={2021},
eprint={2111.15193},
archivePrefix={arXiv},
primaryClass={cs.CV}
}