This package proposes simplified exporting pytorch models to ONNX and TensorRT, and also gives some base interface for model inference.

Overview

PyTorch Infer Utils

This package proposes simplified exporting pytorch models to ONNX and TensorRT, and also gives some base interface for model inference.

To install

git clone https://github.com/gorodnitskiy/pytorch_infer_utils.git
pip install /path/to/pytorch_infer_utils/

Export PyTorch model to ONNX

  • Check model for denormal weights to achieve better performance. Use load_weights_rounded_model func to load model with weights rounding:
    from pytorch_infer_utils import load_weights_rounded_model
    
    model = ModelClass()
    load_weights_rounded_model(
        model,
        "/path/to/model_state_dict",
        map_location=map_location
    )
    
  • Use ONNXExporter.torch2onnx method to export pytorch model to ONNX:
    from pytorch_infer_utils import ONNXExporter
    
    model = ModelClass()
    model.load_state_dict(
        torch.load("/path/to/model_state_dict", map_location=map_location)
    )
    model.eval()
    
    exporter = ONNXExporter()
    input_shapes = [-1, 3, 224, 224] # -1 means that is dynamic shape
    exporter.torch2onnx(model, "/path/to/model.onnx", input_shapes)
    
  • Use ONNXExporter.optimize_onnx method to optimize ONNX via onnxoptimizer:
    from pytorch_infer_utils import ONNXExporter
    
    exporter = ONNXExporter()
    exporter.optimize_onnx("/path/to/model.onnx", "/path/to/optimized_model.onnx")
    
  • Use ONNXExporter.optimize_onnx_sim method to optimize ONNX via onnx-simplifier. Be careful with onnx-simplifier not to lose dynamic shapes.
    from pytorch_infer_utils import ONNXExporter
    
    exporter = ONNXExporter()
    exporter.optimize_onnx_sim("/path/to/model.onnx", "/path/to/optimized_model.onnx")
    
  • Also, a method combined the above methods is available ONNXExporter.torch2optimized_onnx:
    from pytorch_infer_utils import ONNXExporter
    
    model = ModelClass()
    model.load_state_dict(
        torch.load("/path/to/model_state_dict", map_location=map_location)
    )
    model.eval()
    
    exporter = ONNXExporter()
    input_shapes = [-1, 3, -1, -1] # -1 means that is dynamic shape
    exporter.torch2optimized_onnx(model, "/path/to/model.onnx", input_shapes)
    
  • Other params that can be used in class initialization:
    • default_shapes: default shapes if dimension is dynamic, default = [1, 3, 224, 224]
    • onnx_export_params:
      • export_params: store the trained parameter weights inside the model file, default = True
      • do_constant_folding: whether to execute constant folding for optimization, default = True
      • input_names: the model's input names, default = ["input"]
      • output_names: the model's output names, default = ["output"]
      • opset_version: the ONNX version to export the model to, default = 11
    • onnx_optimize_params:
      • fixed_point: use fixed point, default = False
      • passes: optimization passes, default = [ "eliminate_deadend", "eliminate_duplicate_initializer", "eliminate_identity", "eliminate_if_with_const_cond", "eliminate_nop_cast", "eliminate_nop_dropout", "eliminate_nop_flatten", "eliminate_nop_monotone_argmax", "eliminate_nop_pad", "eliminate_nop_transpose", "eliminate_unused_initializer", "extract_constant_to_initializer", "fuse_add_bias_into_conv", "fuse_bn_into_conv", "fuse_consecutive_concats", "fuse_consecutive_log_softmax", "fuse_consecutive_reduce_unsqueeze", "fuse_consecutive_squeezes", "fuse_consecutive_transposes", "fuse_matmul_add_bias_into_gemm", "fuse_pad_into_conv", "fuse_transpose_into_gemm", "lift_lexical_references", "nop" ]

Export ONNX to TensorRT

  • Check TensorRT health via check_tensorrt_health func
  • Use TRTEngineBuilder.build_engine method to export ONNX to TensorRT:
    from pytorch_infer_utils import TRTEngineBuilder
    
    exporter = TRTEngineBuilder()
    # get engine by itself
    engine = exporter.build_engine("/path/to/model.onnx")
    # or save engine to /path/to/model.trt
    exporter.build_engine("/path/to/model.onnx", engine_path="/path/to/model.trt")
    
  • fp16_mode is available:
    from pytorch_infer_utils import TRTEngineBuilder
    
    exporter = TRTEngineBuilder()
    engine = exporter.build_engine("/path/to/model.onnx", fp16_mode=True)
    
  • int8_mode is available. It requires calibration_set of images as List[Any], load_image_func - func to correctly read and process images, max_image_shape - max image size as [C, H, W] to allocate correct size of memory:
    from pytorch_infer_utils import TRTEngineBuilder
    
    exporter = TRTEngineBuilder()
    engine = exporter.build_engine(
        "/path/to/model.onnx",
        int8_mode=True,
        calibration_set=calibration_set,
        max_image_shape=max_image_shape,
        load_image_func=load_image_func,
    )
    
  • Also, additional params for builder config builder.create_builder_config can be put to kwargs.
  • Other params that can be used in class initialization:
    • opt_shape_dict: optimal shapes, default = {'input': [[1, 3, 224, 224], [1, 3, 224, 224], [1, 3, 224, 224]]}
    • max_workspace_size: max workspace size, default = [1, 30]
    • stream_batch_size: batch size for forward network during transferring to int8, default = 100
    • cache_file: int8_mode cache filename, default = "model.trt.int8calibration"

Inference via onnxruntime on CPU and onnx_tensort on GPU

  • Base class ONNXWrapper __init__ has the structure as below:
    def __init__(
        self,
        onnx_path: str,
        gpu_device_id: Optional[int] = None,
        intra_op_num_threads: Optional[int] = 0,
        inter_op_num_threads: Optional[int] = 0,
    ) -> None:
        """
        :param onnx_path: onnx-file path, required
        :param gpu_device_id: gpu device id to use, default = 0
        :param intra_op_num_threads: ort_session_options.intra_op_num_threads,
            to let onnxruntime choose by itself is required 0, default = 0
        :param inter_op_num_threads: ort_session_options.inter_op_num_threads,
            to let onnxruntime choose by itself is required 0, default = 0
        :type onnx_path: str
        :type gpu_device_id: int
        :type intra_op_num_threads: int
        :type inter_op_num_threads: int
        """
        if gpu_device_id is None:
            import onnxruntime
    
            self.is_using_tensorrt = False
            ort_session_options = onnxruntime.SessionOptions()
            ort_session_options.intra_op_num_threads = intra_op_num_threads
            ort_session_options.inter_op_num_threads = inter_op_num_threads
            self.ort_session = onnxruntime.InferenceSession(
                onnx_path, ort_session_options
            )
    
        else:
            import onnx
            import onnx_tensorrt.backend as backend
    
            self.is_using_tensorrt = True
            model_proto = onnx.load(onnx_path)
            for gr_input in model_proto.graph.input:
                gr_input.type.tensor_type.shape.dim[0].dim_value = 1
    
            self.engine = backend.prepare(
                model_proto, device=f"CUDA:{gpu_device_id}"
            )
    
  • ONNXWrapper.run method assumes the use of such a structure:
    img = self._process_img_(img)
    if self.is_using_tensorrt:
        preds = self.engine.run(img)
    else:
        ort_inputs = {self.ort_session.get_inputs()[0].name: img}
        preds = self.ort_session.run(None, ort_inputs)
    
    preds = self._process_preds_(preds)
    

Inference via onnxruntime on CPU and TensorRT on GPU

  • Base class TRTWrapper __init__ has the structure as below:
    def __init__(
        self,
        onnx_path: Optional[str] = None,
        trt_path: Optional[str] = None,
        gpu_device_id: Optional[int] = None,
        intra_op_num_threads: Optional[int] = 0,
        inter_op_num_threads: Optional[int] = 0,
        fp16_mode: bool = False,
    ) -> None:
        """
        :param onnx_path: onnx-file path, default = None
        :param trt_path: onnx-file path, default = None
        :param gpu_device_id: gpu device id to use, default = 0
        :param intra_op_num_threads: ort_session_options.intra_op_num_threads,
            to let onnxruntime choose by itself is required 0, default = 0
        :param inter_op_num_threads: ort_session_options.inter_op_num_threads,
            to let onnxruntime choose by itself is required 0, default = 0
        :param fp16_mode: use fp16_mode if class initializes only with
            onnx_path on GPU, default = False
        :type onnx_path: str
        :type trt_path: str
        :type gpu_device_id: int
        :type intra_op_num_threads: int
        :type inter_op_num_threads: int
        :type fp16_mode: bool
        """
        if gpu_device_id is None:
            import onnxruntime
    
            self.is_using_tensorrt = False
            ort_session_options = onnxruntime.SessionOptions()
            ort_session_options.intra_op_num_threads = intra_op_num_threads
            ort_session_options.inter_op_num_threads = inter_op_num_threads
            self.ort_session = onnxruntime.InferenceSession(
                onnx_path, ort_session_options
            )
    
        else:
            self.is_using_tensorrt = True
            if trt_path is None:
                builder = TRTEngineBuilder()
                trt_path = builder.build_engine(onnx_path, fp16_mode=fp16_mode)
    
            self.trt_session = TRTRunWrapper(trt_path)
    
  • TRTWrapper.run method assumes the use of such a structure:
    img = self._process_img_(img)
    if self.is_using_tensorrt:
        preds = self.trt_session.run(img)
    else:
        ort_inputs = {self.ort_session.get_inputs()[0].name: img}
        preds = self.ort_session.run(None, ort_inputs)
    
    preds = self._process_preds_(preds)
    

Environment

TensorRT

  • TensorRT installing guide is here
  • Required CUDA-Runtime, CUDA-ToolKit
  • Also, required additional python packages not included to setup.cfg (it depends upon CUDA environment version):
    • pycuda
    • nvidia-tensorrt
    • nvidia-pyindex

onnx_tensorrt

  • onnx_tensorrt requires cuda-runtime and tensorrt.
  • To install:
    git clone --depth 1 --branch 21.02 https://github.com/onnx/onnx-tensorrt.git
    cd onnx-tensorrt
    cp -r onnx_tensorrt /usr/local/lib/python3.8/dist-packages
    cd ..
    rm -rf onnx-tensorrt
    
Owner
Alex Gorodnitskiy
Computer Vision Engineer 🤖
Alex Gorodnitskiy
Custom IMDB Dataset is extracted between 2020-2021 and custom distilBERT model is trained for movie success probability prediction

IMDB Success Predictor Project involves Web Scraping custom IMDB data between 2020 and 2021 of 10000 movies and shows sorted by number of votes ,fine

Gautam Diwan 1 Jan 18, 2022
Repositório para arquivos sobre o Módulo 1 do curso Top Coders da Let's Code + Safra

850-Safra-DS-ModuloI Repositório para arquivos sobre o Módulo 1 do curso Top Coders da Let's Code + Safra Para aprender mais Git https://learngitbranc

Brian Nunes 7 Dec 10, 2022
NAS Benchmark in "Prioritized Architecture Sampling with Monto-Carlo Tree Search", CVPR2021

NAS-Bench-Macro This repository includes the benchmark and code for NAS-Bench-Macro in paper "Prioritized Architecture Sampling with Monto-Carlo Tree

35 Jan 03, 2023
Segmentation models with pretrained backbones. PyTorch.

Python library with Neural Networks for Image Segmentation based on PyTorch. The main features of this library are: High level API (just two lines to

Pavel Yakubovskiy 6.6k Jan 06, 2023
🎯 A comprehensive gradient-free optimization framework written in Python

Solid is a Python framework for gradient-free optimization. It contains basic versions of many of the most common optimization algorithms that do not

Devin Soni 565 Dec 26, 2022
Attentive Implicit Representation Networks (AIR-Nets)

Attentive Implicit Representation Networks (AIR-Nets) Preprint | Supplementary | Accepted at the International Conference on 3D Vision (3DV) teaser.mo

29 Dec 07, 2022
Official PyTorch implementation of Less is More: Pay Less Attention in Vision Transformers.

Less is More: Pay Less Attention in Vision Transformers Official PyTorch implementation of Less is More: Pay Less Attention in Vision Transformers. By

73 Jan 01, 2023
A font family with a great monospaced variant for programmers.

Fantasque Sans Mono A programming font, designed with functionality in mind, and with some wibbly-wobbly handwriting-like fuzziness that makes it unas

Jany Belluz 6.3k Jan 08, 2023
Official source code to CVPR'20 paper, "When2com: Multi-Agent Perception via Communication Graph Grouping"

When2com: Multi-Agent Perception via Communication Graph Grouping This is the PyTorch implementation of our paper: When2com: Multi-Agent Perception vi

34 Nov 09, 2022
Pytorch implementation of TailCalibX : Feature Generation for Long-tail Classification

TailCalibX : Feature Generation for Long-tail Classification by Rahul Vigneswaran, Marc T. Law, Vineeth N. Balasubramanian, Makarand Tapaswi [arXiv] [

Rahul Vigneswaran 34 Jan 02, 2023
The official implementation of NeurIPS 2021 paper: Finding Optimal Tangent Points for Reducing Distortions of Hard-label Attacks

The official implementation of NeurIPS 2021 paper: Finding Optimal Tangent Points for Reducing Distortions of Hard-label Attacks

machen 11 Nov 27, 2022
Amazing-Python-Scripts - 🚀 Curated collection of Amazing Python scripts from Basics to Advance with automation task scripts.

📑 Introduction A curated collection of Amazing Python scripts from Basics to Advance with automation task scripts. This is your Personal space to fin

Avinash Ranjan 1.1k Dec 29, 2022
On Uncertainty, Tempering, and Data Augmentation in Bayesian Classification

Understanding Bayesian Classification This repository hosts the code to reproduce the results presented in the paper On Uncertainty, Tempering, and Da

Sanyam Kapoor 18 Nov 17, 2022
Two-Stage Peer-Regularized Feature Recombination for Arbitrary Image Style Transfer

Two-Stage Peer-Regularized Feature Recombination for Arbitrary Image Style Transfer Paper on arXiv Public PyTorch implementation of two-stage peer-reg

NNAISENSE 38 Oct 14, 2022
SweiNet is an uncertainty-quantifying shear wave speed (SWS) estimator for ultrasound shear wave elasticity (SWE) imaging.

SweiNet SweiNet is an uncertainty-quantifying shear wave speed (SWS) estimator for ultrasound shear wave elasticity (SWE) imaging. SweiNet takes as in

Felix Jin 3 Mar 31, 2022
This is the paddle code for SeBoW(Self-Born wiring for neural trees), a kind of neural tree born form a large search space

SeBoW: Self-Born Wiring for neural trees(PaddlePaddle version) This is the paddle code for SeBoW(Self-Born wiring for neural trees), a kind of neural

HollyLee 13 Dec 08, 2022
[CVPR 2020] 3D Photography using Context-aware Layered Depth Inpainting

[CVPR 2020] 3D Photography using Context-aware Layered Depth Inpainting [Paper] [Project Website] [Google Colab] We propose a method for converting a

Virginia Tech Vision and Learning Lab 6.2k Jan 01, 2023
PyTorch code of my ICDAR 2021 paper Vision Transformer for Fast and Efficient Scene Text Recognition (ViTSTR)

Vision Transformer for Fast and Efficient Scene Text Recognition (ICDAR 2021) ViTSTR is a simple single-stage model that uses a pre-trained Vision Tra

Rowel Atienza 198 Dec 27, 2022
This repository contains the code for: RerrFact model for SciVer shared task

RerrFact This repository contains the code for: RerrFact model for SciVer shared task. Setup for Inference 1. Download SciFact database Download the S

Ashish Rana 1 May 22, 2022
Iterative Training: Finding Binary Weight Deep Neural Networks with Layer Binarization

Iterative Training: Finding Binary Weight Deep Neural Networks with Layer Binarization This repository contains the source code for the paper (link wi

Rakuten Group, Inc. 0 Nov 19, 2021