Paddle pit - Rethinking Spatial Dimensions of Vision Transformers

Overview

基于Paddle实现PiT ——Rethinking Spatial Dimensions of Vision Transformers,arxiv

  • 官方原版代码(基于PyTorch)pit.

  • 本项目基于 PaddleViT 实现,在其基础上与原版代码实现了更进一步的对齐,并通过完整训练与测试完成对pit_ti模型的复现.

1. 简介

从CNN的成功设计原理出发,作者研究了空间尺寸转换的作用及其在基于Transformer的体系结构上的有效性。

具体来说,类似于CNN的降维原则(随着深度的增加,传统的CNN会增加通道尺寸并减小空间尺寸),作者用实验表明了这同样有利于Transformer的性能提升,并提出了基于池化的Vision Transformer,即PiT(模型示意图如下)。

drawing

PiT 模型示意图

2. 数据集和复现精度

数据集

原文使用的为ImageNet-1k 2012(ILSVRC2012),共1000类,训练集/测试集图片分布:1281167/50000,数据集大小为144GB。

本项目使用的为官方推荐的图片压缩过的更轻量的Light_ILSVRC2012,数据集大小为65GB。其在AI Studio上的地址为:Light_ILSVRC2012_part_0.tarLight_ILSVRC2012_part_1.tar

复现精度

Model 目标精度[email protected] 实现精度[email protected] Image Size batch_size Crop_pct epoch #Params
pit_ti 73.0 73.01 224 256*4GPUs 0.9 300
(+10 COOLDOWN)
4.8M

【注】上表中的实现精度在原版ILSVRC2012验证集上测试得到。 值得一提的是,本项目在Light_ILSVRC2012的验证集上的Validation [email protected]达到了73.17

本项目训练得到的最佳模型参数与训练日志log均存放于output文件夹下。

日志文件说明

本项目通过AI Studio的脚本任务运行,中途中断了4次,因此共有5个日志文件。为了方便检阅,本人手动将log命名为log_开始epoch-结束epoch.txt格式。具体来说:

  • output/log_1-76.txt:epoch1~epoch76。这一版代码定义每10个epoch保存一次模型权重,每2个epoch验证一次,同时若验证精度高于历史精度,则保存为Best_PiT.pdparams,因此在epoch76训练结束但还未验证的时候中断,下一次的训练只能从验证精度最高的epoch74继续训练。

  • output/log_75-142.txt:epoch75~epoch142。从这一版代码开始,新增了每次训练之后都保存一下模型参数为PiT-Latest.pdparams,这样无论哪个epoch训练中断都可以继续训练啦。

  • output/log_143-225.txt:epoch143~epoch225。

  • output/log_226-303.txt:epoch226~epoch303。

  • output/log_304-310.txt:epoch304~epoch310。

  • output/log_eval.txt:使用训练得到的最好模型(epoch308)在原版ILSVRC2012验证集上验证日志。

3. 准备环境

推荐环境配置:

本人环境配置:

  • 硬件:Tesla V100 * 4(由衷感谢百度飞桨平台提供高性能算力支持)

  • PaddlePaddle==2.2.1

  • Python==3.7

4. 快速开始

本项目现已通过脚本任务形式部署到AI Studio上,您可以选择fork下来直接运行sh run.sh,数据集处理等脚本均已部署好。链接:paddle_pit

或者您也可以git本repo在本地运行:

第一步:克隆本项目

git clone https://github.com/hatimwen/paddle_pit.git
cd paddle_pit

第二步:修改参数

请根据实际情况,修改scripts路径下的脚本内容(如:gpu,数据集路径data_path,batch_size等)。

第三步:验证模型

多卡请运行:

sh scripts/run_eval_multi.sh

单卡请运行:

sh scripts/run_eval.sh

第四步:训练模型

多卡请运行:

sh scripts/run_train_multi.sh

单卡请运行:

sh scripts/run_train.sh

第五步:验证预测

python predict.py \
-pretrained='output/Best_PiT' \
-img_path='images/ILSVRC2012_val_00004506.JPEG'

验证图片(类别:藏獒, id: 244)

输出结果为:

class_id: 244, prob: 9.12291145324707

对照ImageNet类别id(ImageNet数据集编号对应的类别内容),可知244为藏獒,预测结果正确。

5.代码结构

|-- paddle_pit
    |-- output              # 日志及模型文件
    |-- configs             # 参数
        |-- pit_ti.yaml
    |-- datasets
        |-- ImageNet1K      # 数据集路径
    |-- scripts             # 运行脚本
        |-- run_train.sh
        |-- run_train_multi.sh
        |-- run_eval.sh
        |-- run_eval_multi.sh
    |-- augment.py          # 数据增强
    |-- config.py           # 最底层配置文件
    |-- datasets.py         # dataset与dataloader
    |-- droppath.py         # droppath定义
    |-- losses.py           # loss定义
    |-- main_multi_gpu.py   # 多卡训练测试代码
    |-- main_single_gpu.py  # 单卡训练测试代码
    |-- mixup.py            # mixup定义
    |-- model_ema.py        # EMA定义
    |-- pit.py              # pit模型结构定义
    |-- random_erasing.py   # random_erasing定义
    |-- regnet.py           # 教师模型定义(本项目并未对此验证,仅作保留)
    |-- transforms.py       # RandomHorizontalFlip定义
    |-- utils.py            # CosineLRScheduler及AverageMeter定义
    |-- README.md
    |-- requirements.txt

6. 参考及引用

@InProceedings{Yuan_2021_ICCV,
    author    = {Yuan, Li and Chen, Yunpeng and Wang, Tao and Yu, Weihao and Shi, Yujun and Jiang, Zi-Hang and Tay, Francis E.H. and Feng, Jiashi and Yan, Shuicheng},
    title     = {Tokens-to-Token ViT: Training Vision Transformers From Scratch on ImageNet},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2021},
    pages     = {558-567}
}

最后,非常感谢百度举办的飞桨论文复现挑战赛(第五期)让本人对Paddle理解更加深刻。 同时也非常感谢朱欤老师团队用Paddle实现的PaddleViT,本项目大部分代码都是从中copy来的,而仅仅实现了其与原版代码训练步骤的进一步对齐与完整的训练过程,但本人也同样受益匪浅! ♥️

Contact

Owner
Hongtao Wen
Hongtao Wen
Exponential Graph is Provably Efficient for Decentralized Deep Training

Exponential Graph is Provably Efficient for Decentralized Deep Training This code repository is for the paper Exponential Graph is Provably Efficient

3 Apr 20, 2022
Best practices for segmentation of the corporate network of any company

Best-practice-for-network-segmentation What is this? This project was created to publish the best practices for segmentation of the corporate network

2k Jan 07, 2023
A library built upon PyTorch for building embeddings on discrete event sequences using self-supervision

pytorch-lifestream a library built upon PyTorch for building embeddings on discrete event sequences using self-supervision. It can process terabyte-si

Dmitri Babaev 103 Dec 17, 2022
TensorFlow 101: Introduction to Deep Learning for Python Within TensorFlow

TensorFlow 101: Introduction to Deep Learning I have worked all my life in Machine Learning, and I've never seen one algorithm knock over its benchmar

Sefik Ilkin Serengil 896 Jan 04, 2023
Algorithmic trading using machine learning.

Algorithmic Trading This machine learning algorithm was built using Python 3 and scikit-learn with a Decision Tree Classifier. The program gathers sto

Sourav Biswas 101 Nov 10, 2022
PyTorchCV: A PyTorch-Based Framework for Deep Learning in Computer Vision.

PyTorchCV: A PyTorch-Based Framework for Deep Learning in Computer Vision @misc{CV2018, author = {Donny You ( Donny You 40 Sep 14, 2022

A Python library for Deep Probabilistic Modeling

Abstract DeeProb-kit is a Python library that implements deep probabilistic models such as various kinds of Sum-Product Networks, Normalizing Flows an

DeeProb-org 46 Dec 26, 2022
This repository is an open-source implementation of the ICRA 2021 paper: Locus: LiDAR-based Place Recognition using Spatiotemporal Higher-Order Pooling.

Locus This repository is an open-source implementation of the ICRA 2021 paper: Locus: LiDAR-based Place Recognition using Spatiotemporal Higher-Order

Robotics and Autonomous Systems Group 96 Dec 15, 2022
NAS-HPO-Bench-II is the first benchmark dataset for joint optimization of CNN and training HPs.

NAS-HPO-Bench-II API Overview NAS-HPO-Bench-II is the first benchmark dataset for joint optimization of CNN and training HPs. It helps a fair and low-

yoichi hirose 8 Nov 21, 2022
Source code for the Paper: CombOptNet: Fit the Right NP-Hard Problem by Learning Integer Programming Constraints}

CombOptNet: Fit the Right NP-Hard Problem by Learning Integer Programming Constraints Installation Run pipenv install (at your own risk with --skip-lo

Autonomous Learning Group 65 Dec 27, 2022
A curated list of neural rendering resources.

Awesome-of-Neural-Rendering A curated list of neural rendering and related resources. Please feel free to pull requests or open an issue to add papers

Zhiwei ZHANG 43 Dec 09, 2022
Evaluation and Benchmarking of Speech Super-resolution Methods

Speech Super-resolution Evaluation and Benchmarking What this repo do: A toolbox for the evaluation of speech super-resolution algorithms. Unify the e

Haohe Liu (刘濠赫) 84 Dec 20, 2022
MVP Benchmark for Multi-View Partial Point Cloud Completion and Registration

MVP Benchmark: Multi-View Partial Point Clouds for Completion and Registration [NEWS] 2021-07-12 [NEW 🎉 ] The submission on Codalab starts! 2021-07-1

PL 93 Dec 21, 2022
Vrcwatch - Supply the local time to VRChat as Avatar Parameters through OSC

English: README-EN.md VRCWatch VRCWatch は、VRChat 内のアバター向けに現在時刻を送信するためのプログラムです。 使

Kosaki Mezumona 17 Nov 30, 2022
This repository contains part of the code used to make the images visible in the article "How does an AI Imagine the Universe?" published on Towards Data Science.

Generative Adversarial Network - Generating Universe This repository contains part of the code used to make the images visible in the article "How doe

Davide Coccomini 9 Dec 18, 2022
Chinese named entity recognization with BiLSTM using Keras

Chinese named entity recognization (Bilstm with Keras) Project Structure ./ ├── README.md ├── data │   ├── README.md │   ├── data 数据集 │   │   ├─

1 Dec 17, 2021
Paper: Cross-View Kernel Similarity Metric Learning Using Pairwise Constraints for Person Re-identification

Cross-View Kernel Similarity Metric Learning Using Pairwise Constraints for Person Re-identification T M Feroz Ali, Subhasis Chaudhuri, ICVGIP-20-21

T M Feroz Ali 3 Jun 17, 2022
Hyperbolic Procrustes Analysis Using Riemannian Geometry

Hyperbolic Procrustes Analysis Using Riemannian Geometry The code in this repository creates the figures presented in this article: Please notice that

Ronen Talmon's Lab 2 Jan 08, 2023
Pytorch code for semantic segmentation using ERFNet

ERFNet (PyTorch version) This code is a toolbox that uses PyTorch for training and evaluating the ERFNet architecture for semantic segmentation. For t

Edu 394 Jan 01, 2023
Contextual Attention Network: Transformer Meets U-Net

Contextual Attention Network: Transformer Meets U-Net Contexual attention network for medical image segmentation with state of the art results on skin

Reza Azad 67 Nov 28, 2022