This project aims to explore the deployment of Swin-Transformer based on TensorRT, including the test results of FP16 and INT8.

Overview

Swin Transformer

This project aims to explore the deployment of SwinTransformer based on TensorRT, including the test results of FP16 and INT8.

Introduction(Quoted from the Original Project )

Swin Transformer original github repo (the name Swin stands for Shifted window) is initially described in arxiv, which capably serves as a general-purpose backbone for computer vision. It is basically a hierarchical Transformer whose representation is computed with shifted windows. The shifted windowing scheme brings greater efficiency by limiting self-attention computation to non-overlapping local windows while also allowing for cross-window connection.

Setup

  1. Please refer to the Install session for conda environment build.
  2. Please refer to the Data preparation session to prepare Imagenet-1K.
  3. Install the TensorRT, now we choose the TensorRT 8.2 GA(8.2.1.8) as the test version.

Code Structure

Focus on the modifications and additions.

.
├── export.py                  # Export the PyTorch model to ONNX format
├── get_started.md            
├── main.py
├── models
│   ├── build.py
│   ├── __init__.py
│   ├── swin_mlp.py
│   └── swin_transformer.py    # Build the model, modified to export the onnx and build the TensorRT engine
├── README.md
├── trt                        # Directory for TensorRT's engine evaluation and visualization.
│   ├── engine.py
│   ├── eval_trt.py            # Evaluate the tensorRT engine's accuary.
│   ├── onnxrt_eval.py         # Run the onnx model, generate the results, just for debugging
├── utils.py
└── weights

Export to ONNX and Build TensorRT Engine

You need to pay attention to the two modification below.

  1. Exporting the operator roll to ONNX opset version 9 is not supported. A: Please refer to torch/onnx/symbolic_opset9.py, add the support of exporting torch.roll.

  2. Node (Concat_264) Op (Concat) [ShapeInferenceError] All inputs to Concat must have same rank.
    A: Please refer to the modifications in models/swin_transformer.py. We use the input_resolution and window_size to compute the nW.

       if mask is not None:
         nW = int(self.input_resolution[0]*self.input_resolution[1]/self.window_size[0]/self.window_size[1])
         #nW = mask.shape[0]
         #print('nW: ', nW)
         attn = attn.view(B_ // nW, nW, self.num_heads, N, N) + mask.unsqueeze(1).unsqueeze(0)
         attn = attn.view(-1, self.num_heads, N, N)
         attn = self.softmax(attn)

Accuray Test Results on ImageNet-1K Validation Dataset

  1. Download the Swin-T pretrained model from Model Zoo. Evaluate the accuracy of the Pytorch pretrained model.

    $ python -m torch.distributed.launch --nproc_per_node 1 --master_port 12345 main.py --eval --cfg configs/swin_tiny_patch4_window7_224.yaml --resume ./weights/swin_tiny_patch4_window7_224.pth --data-path ../imagenet_1k
  2. export.py exports a pytorch model to onnx format.

    $ python export.py --eval --cfg configs/swin_tiny_patch4_window7_224.yaml --resume ./weights/swin_tiny_patch4_window7_224.pth --data-path ../imagenet_1k --batch-size 16
  3. Build the TensorRT engine using trtexec.

    $ trtexec --onnx=./weights/swin_tiny_patch4_window7_224.onnx --buildOnly --verbose --saveEngine=./weights/swin_tiny_patch4_window7_224_batch16.engine --workspace=4096

    Add the --fp16 or --best tag to build the corresponding fp16 or int8 model. Take fp16 as an example.

    $ trtexec --onnx=./weights/swin_tiny_patch4_window7_224.onnx --buildOnly --verbose --fp16 --saveEngine=./weights/swin_tiny_patch4_window7_224_batch16_fp16.engine --workspace=4096

    You can use the trtexec to test the throughput of the TensorRT engine.

    $ trtexec --loadEngine=./weights/swin_tiny_patch4_window7_224_batch16.engine
  4. trt/eval_trt.py aims to evalute the accuracy of the TensorRT engine.

$ python trt/eval_trt.py --eval --cfg configs/swin_tiny_patch4_window7_224.yaml --resume ./weights/swin_tiny_patch4_window7_224_batch16.engine --data-path ../imagenet_1k --batch-size 16
  1. trt/onnxrt_eval.py aims to evalute the accuracy of the Onnx model, just for debug.
    $ python trt/onnxrt_eval.py --eval --cfg configs/swin_tiny_patch4_window7_224.yaml --resume ./weights/swin_tiny_patch4_window7_224.onnx --data-path ../imagenet_1k --batch-size 16
SwinTransformer(T4) [email protected] Notes
PyTorch Pretrained Model 81.160
TensorRT Engine(FP32) 81.156
TensorRT Engine(FP16) - TensorRT 8.0.3.4: 81.156% vs TensorRT 8.2.1.8: 72.768%

Notes: Reported a nvbug for the FP16 accuracy issue, please refer to nvbug 3464358.

Speed Test of TensorRT engine(T4)

SwinTransformer(T4) FP32 FP16 INT8
batchsize=1 245.388 qps 510.072 qps 514.707 qps
batchsize=16 316.8624 qps 804.112 qps 804.1072 qps
batchsize=64 329.13984 qps 833.4208 qps 849.5168 qps
batchsize=256 331.9808 qps 844.10752 qps 840.33024 qps

Analysis: Compared with FP16, INT8 does not speed up at present. The main reason is that, for the Transformer structure, most of the calculations are processed by Myelin. Currently Myelin does not support the PTQ path, so the current test results are expected.
Attached the int8 and fp16 engine layer information with batchsize=128 on T4.

Build with int8 precision:

[12/04/2021-06:34:17] [V] [TRT] Engine Layer Information:
Layer(Reformat): Reformatting CopyNode for Input Tensor 0 to Conv_0, Tactic: 0, input_0[Float(128,3,224,224)] -> Reformatted Input Tensor 0 to Conv_0[Int8(128,3,224,224)]
Layer(CaskConvolution): Conv_0, Tactic: 1025026069226666066, Reformatted Input Tensor 0 to Conv_0[Int8(128,3,224,224)] -> 191[Int8(128,96,56,56)]
Layer(Reformat): Reformatting CopyNode for Input Tensor 0 to {ForeignNode[318...Transpose_2125 + Flatten_2127 + (Unnamed Layer* 4178) [Shuffle]]}, Tactic: 0, 191[Int8(128,96,56,56)] -> Reformatted Input Tensor 0 to {ForeignNode[318...Transpose_2125 + Flatten_2127 + (Unnamed Layer* 4178) [Shuffle]]}[Half(128,96,56,56)]
Layer(Myelin): {ForeignNode[318...Transpose_2125 + Flatten_2127 + (Unnamed Layer* 4178) [Shuffle]]}, Tactic: 0, Reformatted Input Tensor 0 to {ForeignNode[318...Transpose_2125 + Flatten_2127 + (Unnamed Layer* 4178) [Shuffle]]}[Half(128,96,56,56)] -> (Unnamed Layer* 4178) [Shuffle]_output[Half(128,768,1,1)]
Layer(CaskConvolution): Gemm_2128, Tactic: -1838109259315759592, (Unnamed Layer* 4178) [Shuffle]_output[Half(128,768,1,1)] -> (Unnamed Layer* 4179) [Fully Connected]_output[Half(128,1000,1,1)]
Layer(Reformat): Reformatting CopyNode for Input Tensor 0 to (Unnamed Layer* 4183) [Shuffle], Tactic: 0, (Unnamed Layer* 4179) [Fully Connected]_output[Half(128,1000,1,1)] -> Reformatted Input Tensor 0 to (Unnamed Layer* 4183) [Shuffle][Float(128,1000,1,1)]
Layer(NoOp): (Unnamed Layer* 4183) [Shuffle], Tactic: 0, Reformatted Input Tensor 0 to (Unnamed Layer* 4183) [Shuffle][Float(128,1000,1,1)] -> output_0[Float(128,1000)]

Build with fp16 precision:

[12/04/2021-06:44:31] [V] [TRT] Engine Layer Information:
Layer(Reformat): Reformatting CopyNode for Input Tensor 0 to Conv_0, Tactic: 0, input_0[Float(128,3,224,224)] -> Reformatted Input Tensor 0 to Conv_0[Half(128,3,224,224)]
Layer(CaskConvolution): Conv_0, Tactic: 1579845938601132607, Reformatted Input Tensor 0 to Conv_0[Half(128,3,224,224)] -> 191[Half(128,96,56,56)]
Layer(Myelin): {ForeignNode[318...(Unnamed Layer* 4183) [Shuffle]]}, Tactic: 0, 191[Half(128,96,56,56)] -> Reformatted Output Tensor 0 to {ForeignNode[318...(Unnamed Layer* 4183) [Shuffle]]}[Half(128,1000)]
Layer(Reformat): Reformatting CopyNode for Output Tensor 0 to {ForeignNode[318...(Unnamed Layer* 4183) [Shuffle]]}, Tactic: 0, Reformatted Output Tensor 0 to {ForeignNode[318...(Unnamed Layer* 4183) [Shuffle]]}[Half(128,1000)] -> output_0[Float(128,1000)]

Todo

After the FP16 nvbug 3464358 solved, will do the QAT optimization.

Owner
maggiez
maggiez
maggiez
Multi-objective constrained optimization for energy applications via tree ensembles

Multi-objective constrained optimization for energy applications via tree ensembles

C⚙G - Imperial College London 1 Nov 19, 2021
Using OpenAI's CLIP to upscale and enhance images

CLIP Upscaler and Enhancer Using OpenAI's CLIP to upscale and enhance images Based on nshepperd's JAX CLIP Guided Diffusion v2.4 Sample Results Viewpo

Tripp Lyons 5 Jun 14, 2022
“袋鼯麻麻——智能购物平台”能够精准地定位识别每一个商品

“袋鼯麻麻——智能购物平台”能够精准地定位识别每一个商品,并且能够返回完整地购物清单及顾客应付的实际商品总价格,极大地降低零售行业实际运营过程中巨大的人力成本,提升零售行业无人化、自动化、智能化水平。

thomas-yanxin 192 Jan 05, 2023
Multi-Objective Loss Balancing for Physics-Informed Deep Learning

Multi-Objective Loss Balancing for Physics-Informed Deep Learning Code for ReLoBRaLo. Abstract Physics Informed Neural Networks (PINN) are algorithms

Rafael Bischof 16 Dec 12, 2022
PyTorch implementation of paper A Fast Knowledge Distillation Framework for Visual Recognition.

FKD: A Fast Knowledge Distillation Framework for Visual Recognition Official PyTorch implementation of paper A Fast Knowledge Distillation Framework f

Zhiqiang Shen 129 Dec 24, 2022
CoReD: Generalizing Fake Media Detection with Continual Representation using Distillation (ACMMM'21 Oral Paper)

CoReD: Generalizing Fake Media Detection with Continual Representation using Distillation (ACMMM'21 Oral Paper) (Accepted for oral presentation at ACM

Minha Kim 1 Nov 12, 2021
DeLag: Detecting Latency Degradation Patterns in Service-based Systems

DeLag: Detecting Latency Degradation Patterns in Service-based Systems Replication package of the work "DeLag: Detecting Latency Degradation Patterns

SEALABQualityGroup @ University of L'Aquila 2 Mar 24, 2022
Evaluation Pipeline for our ECCV2020: Journey Towards Tiny Perceptual Super-Resolution.

Journey Towards Tiny Perceptual Super-Resolution Test code for our ECCV2020 paper: https://arxiv.org/abs/2007.04356 Our x4 upscaling pre-trained model

Royson 6 Mar 30, 2022
Contenido del curso Bases de datos del DCC PUC versión 2021-2

IIC2413 - Bases de Datos Tabla de contenidos Equipo Profesores Ayudantes Contenidos Calendario Evaluaciones Resumen de notas Foro Política de integrid

54 Nov 23, 2022
The source code for Adaptive Kernel Graph Neural Network at AAAI2022

AKGNN The source code for Adaptive Kernel Graph Neural Network at AAAI2022. Please cite our paper if you think our work is helpful to you: @inproceedi

11 Nov 25, 2022
Exploring Visual Engagement Signals for Representation Learning

Exploring Visual Engagement Signals for Representation Learning Menglin Jia, Zuxuan Wu, Austin Reiter, Claire Cardie, Serge Belongie and Ser-Nam Lim C

Menglin Jia 9 Jul 23, 2022
Image De-raining Using a Conditional Generative Adversarial Network

Image De-raining Using a Conditional Generative Adversarial Network [Paper Link] [Project Page] He Zhang, Vishwanath Sindagi, Vishal M. Patel In this

He Zhang 216 Dec 18, 2022
Differentiable rasterization applied to 3D model simplification tasks

nvdiffmodeling Differentiable rasterization applied to 3D model simplification tasks, as described in the paper: Appearance-Driven Automatic 3D Model

NVIDIA Research Projects 336 Dec 30, 2022
A simple AI that will give you si ple task and this is made with python

Crystal-AI A simple AI that will give you si ple task and this is made with python Prerequsites: Python3.6.2 pyttsx3 pip install pyttsx3 pyaudio pip i

CrystalAnd 1 Dec 25, 2021
Tandem Mass Spectrum Prediction with Graph Transformers

MassFormer This is the original implementation of MassFormer, a graph transformer for small molecule MS/MS prediction. Check out the preprint on arxiv

Röst Lab 13 Oct 27, 2022
An Straight Dilated Network with Wavelet for image Deblurring

SDWNet: A Straight Dilated Network with Wavelet Transformation for Image Deblurring(offical) 1. Introduction This repo is not only used for our paper(

FlyEgle 41 Jan 04, 2023
This is a collection of all challenges in HKCERT CTF 2021

香港網絡保安新生代奪旗挑戰賽 2021 (HKCERT CTF 2021) This is a collection of all challenges (and writeups) in HKCERT CTF 2021 Challenges ID Chinese name Name Score S

10 Jan 27, 2022
Masked regression code - Masked Regression

Masked Regression MR - Python Implementation This repositery provides a python implementation of MR (Masked Regression). MR can efficiently synthesize

Arbish Akram 1 Dec 23, 2021
Learning Representations that Support Robust Transfer of Predictors

Transfer Risk Minimization (TRM) Code for Learning Representations that Support Robust Transfer of Predictors Prepare the Datasets Preprocess the Scen

Yilun Xu 15 Dec 07, 2022
Code of the paper "Multi-Task Meta-Learning Modification with Stochastic Approximation".

Multi-Task Meta-Learning Modification with Stochastic Approximation This repository contains the code for the paper "Multi-Task Meta-Learning Modifica

Andrew 3 Jan 05, 2022