Ppq - A powerful offline neural network quantization tool with custimized IR

Related tags

Deep Learningppq
Overview

PPL Quantization Tool(PPL 量化工具)

PPL Quantization Tool (PPQ) is a powerful offline neural network quantization tool with custimized IR, executor, dispacher and optimization passes.

Features

  • Quantable graph, an quantization-oriented network representation.
  • Quantize with Cuda, quantization simulating are 3x ~ 50x faster than PyTorch.
  • Hardware-friendly, simulating calculations are mostly identical with hardware.
  • Multi-platform support.

Installation

To release the power of this advanced quantization tool, at least one CUDA computing device is required. Install CUDA from CUDA Toolkit, PPL Quantization Tool will use CUDA compiler to compile cuda kernels at runtime.

ATTENTION: For users of pytorch, pytorch might bring you a minimized CUDA libraries, which will not satisfy the requirement of this tool, you have to install CUDA from NVIDIA manually.

ATTENTION: Make sure your python version is >= 3.6.0. PPL Quantization Tool is written with dialects that only by python >= 3.6.0.

  • Install from source:
  1. Run following code with your terminal(For windows user, use command line instead).
git clone https://github.com/openppl-public/ppq.git
cd ppq
python setup.py install
  1. Wait for python finish its installation and pray for bug free.
  • Install from wheel:
  1. Download compiled python wheel from follwoing links: PPL Quantization Tool
  2. Run following command with your terminal or command line(windows): "pip install ppq.wheel", and pray for bug free.

Tutorials and Examples

  1. User guide, system design doc can be found at /doc/pages/instructions of this repository, PPL Quantization Tool documents are written with pure html5.
  2. Examples can be found at /ppq/samples.
  3. Let's quantize your network with following code:
from ppq.api import export_ppq_graph, quantize_torch_model
from ppq import TargetPlatform

# quantize your model within one single line:
quantized = quantize_torch_model(
    model=model, calib_dataloader=calibration_dataloader,
    calib_steps=32, input_shape=(1, 3, 224, 224),
    setting=quant_setting, collate_fn=collate_fn,
    platform=TargetPlatform.PPL_CUDA_INT8,
    device=DEVICE, verbose=0)

# export quantized graph with another line:
export_ppq_graph(
    graph=quantized, platform=TargetPlatform.PPL_CUDA_INT8,
    graph_save_to='Output/quantized(onnx).onnx',
    config_save_to='Output/quantized(onnx).json')

Contact Us

WeChat Official Account QQ Group
OpenPPL 627853444
OpenPPL QQGroup

Email: [email protected]

Other Resources

Contributions

We appreciate all contributions. If you are planning to contribute back bug-fixes, please do so without any further discussion.

If you plan to contribute new features, utility functions, or extensions to the core, please first open an issue and discuss the feature with us. Sending a PR without discussion might end up resulting in a rejected PR because we might be taking the core in a different direction than you might be aware of.

Benchmark

PPQ is tested with models from mmlab-classification, mmlab-detection, mmlab-segamentation, mmlab-editing, here we listed part of out testing result.

  • No quantization optimization procedure is applied with following models.
Model Type Calibration Dispatcher Metric PPQ(sim) PPLCUDA FP32
Resnet-18 Classification 512 imgs conservative Acc-Top-1 69.50% 69.42% 69.88%
ResNeXt-101 Classification 512 imgs conservative Acc-Top-1 78.46% 78.37% 78.66%
SE-ResNet-50 Classification 512 imgs conservative Acc-Top-1 77.24% 77.26% 77.76%
ShuffleNetV2 Classification 512 imgs conservative Acc-Top-1 69.13% 68.85% 69.55%
MobileNetV2 Classification 512 imgs conservative Acc-Top-1 70.99% 71.1% 71.88%
---- ---- ---- ---- ---- ---- ---- ----
retinanet Detection 32 imgs pplnn bbox_mAP 36.1% 36.1% 36.4%
faster_rcnn Detection 32 imgs pplnn bbox_mAP 36.6% 36.7% 37.0%
fsaf Detection 32 imgs pplnn bbox_mAP 36.5% 36.6% 37.4%
mask_rcnn Detection 32 imgs pplnn bbox_mAP 37.7% 37.6% 37.9%
---- ---- ---- ---- ---- ---- ---- ----
deeplabv3 Segamentation 32 imgs conservative aAcc / mIoU 96.13% / 78.81% 96.14% / 78.89% 96.17% / 79.12%
deeplabv3plus Segamentation 32 imgs conservative aAcc / mIoU 96.27% / 79.39% 96.26% / 79.29% 96.29% / 79.60%
fcn Segamentation 32 imgs conservative aAcc / mIoU 95.75% / 74.56% 95.62% / 73.96% 95.68% / 72.35%
pspnet Segamentation 32 imgs conservative aAcc / mIoU 95.79% / 77.40% 95.79% / 77.41% 95.83% / 77.74%
---- ---- ---- ---- ---- ---- ---- ----
srcnn Editing 32 imgs conservative PSNR / SSIM 27.88% / 79.70% 27.88% / 79.07% 28.41% / 81.06%
esrgan Editing 32 imgs conservative PSNR / SSIM 27.84% / 75.20% 27.49% / 72.90% 27.51% / 72.84%
  • PPQ(sim) stands for PPQ quantization simulator's result.
  • Dispatcher stands for dispatching policy of PPQ.
  • Classification models are evaluated with ImageNet, Detection and Segamentation models are evaluated with COCO dataset, Editing models are evaluated with DIV2K dataset.
  • All calibration datasets are randomly picked from training data.

License

This project is distributed under the Apache License, Version 2.0.

Comments
  • PPQ can not complie cuda extensions, please check your compiler and system environment, PPQ will disable CUDA KERNEL for now.

    PPQ can not complie cuda extensions, please check your compiler and system environment, PPQ will disable CUDA KERNEL for now.

    RTX2080Ti Python 3.8.13 ninja 1.5.1 ppq 0.6.4 PyTorch 1.12.0 tensorrt 8.4.1.5 export PATH=/usr/local/cuda-11.1/bin:$PATH export LD_LIBRARY_PATH=/usr/local/cuda-11.1/lib64:$LD_LIBRARY_PATH

    When import ppq, it raised this prompt message. Could you please give some kind advice? @zchrissirhcz @ouonline

    opened by songkq 26
  • 使用CPU执行时报错

    使用CPU执行时报错

    我又来了。我尝试在CPU跑ONNX官网model zoo 的 efficientnet-lite4-11.onnx 模型有报错。calibration策略为kl、mse时,quantization/optim/refine.py #582行触发assert,说某算子没有被正确quantize。 我用minmax策略的时候就不会出现这个问题。上述都是在CPU条件下进行的(我这边条件没有GPUhhhh),,

    我能通过改动某些代码来解决这个报错吗,还是说我只能先在CPU条件下用minmax策略勒

    opened by Menace-Dragon 18
  • AttributeError: 'Operation' object has no attribute 'config'

    AttributeError: 'Operation' object has no attribute 'config'

    Traceback (most recent call last): File "ProgramEntrance.py", line 200, in export_ppq_graph( File "/root/miniconda3/envs/mmdeploy/lib/python3.8/site-packages/ppq-0.6.5.1-py3.8.egg/ppq/api/interface.py", line 628, in export_ppq_graph exporter.export(file_path=graph_save_to, config_path=config_save_to, graph=graph, **kwargs) File "/root/miniconda3/envs/mmdeploy/lib/python3.8/site-packages/ppq-0.6.5.1-py3.8.egg/ppq/parser/trt_exporter.py", line 53, in export self.export_quantization_config(config_path, graph) File "/root/miniconda3/envs/mmdeploy/lib/python3.8/site-packages/ppq-0.6.5.1-py3.8.egg/ppq/parser/trt_exporter.py", line 29, in export_quantization_config input_cfg = op.config.input_quantization_config[0] AttributeError: 'Operation' object has no attribute 'config' 在ProgramEntrance.py中调用trt_int8出错,但是调用PPL_CUDA_INT8却没有这个问题

    opened by kaizhong2021 14
  • 关于scheduler/dispatcher.py 125行处的bug

    关于scheduler/dispatcher.py 125行处的bug

    项目很不错!但是我在跑ONNX官网model zoo 的 efficientnet-lite4-11.onnx 模型有报错。报错在scheduler/dispatcher.py 125行。分析了一下原因是这样:

    1. 该模型的graph里有这么一个流:···-->Conv-->BN-->Clip-->···。PPQ会默认 fuse ConvBN,但是fuse得到的operation 是 append 到 graph.operations末尾的。
    2. 在给Clip绑定platform时,会执行scheduler/dispatcher.py 125行的语句。

    综合1、2,也就是说,此时dispatching_table 是没有ConvBN这个operation的信息的,就会导致报错。顺序上的问题,看作者您怎么解决为好

    opened by Menace-Dragon 11
  • RuntimeError: Error happens when dealing with operation ConstantOfShape_1246(TargetPlatform.SOI)

    RuntimeError: Error happens when dealing with operation ConstantOfShape_1246(TargetPlatform.SOI)

    我执行了ppq/samples/Tutorial/quantize.py,使用模型是swin-transformer,target platform是TRT_INT8,出现了如下报错: Traceback (most recent call last): File "/usr/local/lib/python3.8/dist-packages/ppq-0.6.6-py3.8.egg/ppq/executor/torch.py", line 541, in __forward outputs = operation_forward_func(operation, inputs, self._executing_context) File "/usr/local/lib/python3.8/dist-packages/ppq-0.6.6-py3.8.egg/ppq/executor/op/torch/default.py", line 1197, in ConstantOfShape_forward output = torch.Tensor().new_full( TypeError: new_full(): argument 'size' must be tuple of ints, not list

    The above exception was the direct cause of the following exception:

    Traceback (most recent call last): File "quantize_test.py", line 71, in quantized = quantize_onnx_model( File "/usr/local/lib/python3.8/dist-packages/ppq-0.6.6-py3.8.egg/ppq/core/defs.py", line 54, in _wrapper return func(*args, **kwargs) File "/usr/local/lib/python3.8/dist-packages/ppq-0.6.6-py3.8.egg/ppq/api/interface.py", line 259, in quantize_onnx_model quantizer.quantize( File "/usr/local/lib/python3.8/dist-packages/ppq-0.6.6-py3.8.egg/ppq/core/defs.py", line 54, in _wrapper return func(*args, **kwargs) File "/usr/local/lib/python3.8/dist-packages/ppq-0.6.6-py3.8.egg/ppq/quantization/quantizer/base.py", line 61, in quantize executor.tracing_operation_meta(inputs=inputs) File "/usr/local/lib/python3.8/dist-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) File "/usr/local/lib/python3.8/dist-packages/ppq-0.6.6-py3.8.egg/ppq/core/defs.py", line 54, in _wrapper return func(*args, **kwargs) File "/usr/local/lib/python3.8/dist-packages/ppq-0.6.6-py3.8.egg/ppq/executor/torch.py", line 603, in tracing_operation_meta self.__forward( File "/usr/local/lib/python3.8/dist-packages/ppq-0.6.6-py3.8.egg/ppq/executor/torch.py", line 568, in __forward raise RuntimeError(f'Error happens when dealing with operation {str(operation)}') from _ RuntimeError: Error happens when dealing with operation ConstantOfShape_1246(TargetPlatform.SOI) - inputs:['onnx::ConstantOfShape_4472'], outputs:['onnx::Concat_4473']

    opened by shhn1 9
  • PPQ INT8 导出到 onnx 平台性能较低的可能原因

    PPQ INT8 导出到 onnx 平台性能较低的可能原因

    之前在另一个 issue #329 下面提到了,感觉还是另开一个新 issue 比较好。

    目前ppq在int8上表现不好的原因有可能就是因为导出格式的问题所导致的。因为相比未量化的模型,ppq导出的模型不仅额外引入了量化与反量化的操作,还使用了fp32进行计算。这里onnx官方的int8模型并不是这样导出的,而是使用了 QLinearConv 等量化数据专用算子进行运算。这里以 mobilenet 为例,下载仓库在这里。如图所示的是未量化的模型和官方量化的模型: image image

    下图是ppq在指定导出平台为 ONNXRUNTIME 量化得到的模型: image

    可以看到,官方实现由于使用了适用于量化后数据的 QLinearConv 等算子,并没有频繁的插入量化和反量化算子,同时还考虑到了图优化(比如他把 clip 给优化了,但这里还有可能能优化的点在于这种 QConv 接 QConv 可能也是可以融合的),但 ppq 不仅使用了非常多的量化和反量化算子,还使用了 fp32 进行实际运算。这可能是导致 ppq 导出的 onnx 模型低效的原因。

    ~~这只是一个猜想,不一定对(逃~~

    opened by Pzzzzz5142 8
  • PPL_DSP_INT8量化后export问题

    PPL_DSP_INT8量化后export问题

    GPU模式下,在跑RetinaFace(backbone为ResNet50时),量化过程成功跑完,在导出时报TypeError: Cannot convert Resize_133 to caffe op。debug发现是因为没有满足ppq/parser/caffe/caffe_export_utils.py 的第439行判断而导致的。

    opened by Menace-Dragon 8
  • 量化后模型,转换为 tensorrt int8 engine,inference 不对齐

    量化后模型,转换为 tensorrt int8 engine,inference 不对齐

    Hello, 我在尝试使用 PPQ 量化来得到 Tensorrt Int8 模型,发现模型比较大的时候,QDQ Onnx 模型转 TRT Int8 似乎存在性能问题 (无法对齐),具体地,我尝试小模型如 mnist 时可以对齐 (1e-7量级误差),稍大的模型如 resnet50 就存在较大的误差 我不确定是否我的操作存在问题,目前定位问题倾向于认为是 Tensorrt 转换过程引入了误差,所以我在 TensorRT repo 中提了 Issue,详见 https://github.com/NVIDIA/TensorRT/issues/2103 想请教一下是否遇到过类似的问题,谢谢!

    opened by FreemanHsu 7
  • Upsample算子似乎不支持量化;ConTranpose似乎无法完成BN Fold;

    Upsample算子似乎不支持量化;ConTranpose似乎无法完成BN Fold;

    a. 在跑一个onnx测试模型时,报错Upsample算子 no bakend on target platform,似乎PPQ还不支持Upsample算子的量化,后续有可能会支持吗

    b. 在跑另一个onnx测试模型时,模型中有这么个计算图: ...-->ConvTranspose-->BatchNrom-->ReLU-->... 然后报错ConvTranspose算子无法和BN进行Fold

    c. 还想叨扰请教一下,如果计算图为: ...-->BatchNrom-->Conv-->ReLU-->... 那么可以进行Fold吗?

    opened by Menace-Dragon 7
  • 无法正确的获取到 bn 的 output

    无法正确的获取到 bn 的 output

    报错如下

      File "/workspace/ppq/ppq/IR/morph.py", line 275, in format_sng_bn
        bn_out_var = bn_op.outputs[0]
    IndexError: list index out of range
    

    模型是onnx官方提供的模型

    代码是用的实例的代码,只修改的 device 部分,因为我现在没有 nv 的卡。

    # ---------------------------------------------------------------
    # 这个脚本向你展示了如何使用 onnxruntime 对 PPQ 导出的模型进行推理
    # 你需要注意,Onnxruntime 可以运行各种各样的量化方案,但模型量化对 Onnxruntime 而言几乎无法起到加速作用
    # 你可以使用 Onnxruntime 来验证量化方案以及 ppq 量化的正确性,但这不是一个合理的部署平台
    # 修改 QUANT_PLATFROM 来使用不同的量化方案。
    
    # This Script export ppq internal graph to onnxruntime,
    # you should notice that onnx is designed as an Open Neural Network Exchange format.
    # It has the capbility to describe most of ppq's quantization policy including combinations of:
    #   Symmtrical, Asymmtrical, POT, Per-channel, Per-Layer
    # However onnxruntime can not accelerate quantized model in most cases,
    # you are supposed to use onnxruntime for verifying your network quantization result only.
    # ---------------------------------------------------------------
    
    # For this onnx inference test, all test data is randomly picked.
    # If you want to use real data, just rewrite the defination of SAMPLES
    import onnxruntime
    import torch
    from ppq import *
    from ppq.api import *
    from tqdm import tqdm
    
    QUANT_PLATFROM = TargetPlatform.TRT_INT8
    MODEL = "converted.onnx"
    INPUT_SHAPE = [1, 3, 480, 640]
    SAMPLES = [
        torch.rand(size=INPUT_SHAPE) for _ in range(256)
    ]  # rewirte this to use real data.
    DEVICE = "cpu"
    FINETUNE = True
    QS = QuantizationSettingFactory.default_setting()
    EXECUTING_DEVICE = "cpu"
    REQUIRE_ANALYSE = True
    
    # -------------------------------------------------------------------
    # 下面向你展示了常用参数调节选项:
    # -------------------------------------------------------------------
    QS.lsq_optimization = FINETUNE  # 启动网络再训练过程,降低量化误差
    QS.lsq_optimization_setting.steps = 500  # 再训练步数,影响训练时间,500 步大概几分钟
    QS.lsq_optimization_setting.collecting_device = (
        "cpu"  # 缓存数据放在那,cuda 就是放在 gpu,如果显存超了你就换成 'cpu'
    )
    
    if QUANT_PLATFROM in {
        TargetPlatform.PPL_DSP_INT8,  # 这些平台是 per tensor 量化的
        TargetPlatform.HEXAGON_INT8,
        TargetPlatform.SNPE_INT8,
        TargetPlatform.METAX_INT8_T,
        TargetPlatform.FPGA_INT8,
    }:
        QS.equalization = True  # per tensor 量化平台需要做 equalization
    
    if QUANT_PLATFROM in {
        TargetPlatform.PPL_CUDA_INT8,  # 注意做这件事之前你需要确保你的执行框架具有混合精度执行的能力,以及浮点计算的能力
        TargetPlatform.TRT_INT8,
    }:
        QS.dispatching_table.append(operation="OP NAME", platform=TargetPlatform.FP32)
    
    print("正准备量化你的网络,检查下列设置:")
    print(f"TARGET PLATFORM      : {QUANT_PLATFROM.name}")
    print(f"NETWORK INPUTSHAPE   : {INPUT_SHAPE}")
    
    # ENABLE CUDA KERNEL 会加速量化效率 3x ~ 10x,但是你如果没有装相应编译环境的话是编译不了的
    # 你可以尝试安装编译环境,或者在不启动 CUDA KERNEL 的情况下完成量化:移除 with ENABLE_CUDA_KERNEL(): 即可
    # with ENABLE_CUDA_KERNEL():
    with open("a", "w") as fl:
        qir = quantize_onnx_model(
            onnx_import_file=MODEL,
            calib_dataloader=SAMPLES,
            calib_steps=128,
            setting=QS,
            input_shape=INPUT_SHAPE,
            collate_fn=lambda x: x.to(EXECUTING_DEVICE),
            platform=QUANT_PLATFROM,
            do_quantize=True,
        )
    
        # -------------------------------------------------------------------
        # PPQ 计算量化误差时,使用信噪比的倒数作为指标,即噪声能量 / 信号能量
        # 量化误差 0.1 表示在整体信号中,量化噪声的能量约为 10%
        # 你应当注意,在 graphwise_error_analyse 分析中,我们衡量的是累计误差
        # 网络的最后一层往往都具有较大的累计误差,这些误差是其前面的所有层所共同造成的
        # 你需要使用 layerwise_error_analyse 逐层分析误差的来源
        # -------------------------------------------------------------------
        print("正计算网络量化误差(SNR),最后一层的误差应小于 0.1 以保证量化精度:")
        reports = graphwise_error_analyse(
            graph=qir,
            running_device=EXECUTING_DEVICE,
            steps=32,
            dataloader=SAMPLES,
            collate_fn=lambda x: x.to(EXECUTING_DEVICE),
        )
        for op, snr in reports.items():
            if snr > 0.1:
                ppq_warning(f"层 {op} 的累计量化误差显著,请考虑进行优化")
    
        if REQUIRE_ANALYSE:
            print("正计算逐层量化误差(SNR),每一层的独立量化误差应小于 0.1 以保证量化精度:")
            layerwise_error_analyse(
                graph=qir,
                running_device=EXECUTING_DEVICE,
                interested_outputs=None,
                dataloader=SAMPLES,
                collate_fn=lambda x: x.to(EXECUTING_DEVICE),
            )
    
        print("网络量化结束,正在生成目标文件:")
        export_ppq_graph(
            graph=qir, platform=QUANT_PLATFROM, graph_save_to="model_int8.onnx"
        )
    
        exit(0)
    
        # -------------------------------------------------------------------
        # 记录一下输入输出的名字,onnxruntime 跑的时候需要提供这些名字
        # 我写的只是单输出单输入的版本,多输出多输入你得自己改改
        # -------------------------------------------------------------------
        int8_input_names = [name for name, _ in qir.inputs.items()]
        int8_output_names = [name for name, _ in qir.outputs.items()]
    
        # -------------------------------------------------------------------
        # 启动 onnxruntime 进行推理
        # 截止 2022.05, onnxruntime 跑 int8 很慢的,你就别期待它会很快了。
        # 如果你知道怎么让它跑的快点,或者onnxruntime更新了,你可以随时联系我。
        # -------------------------------------------------------------------
        session = onnxruntime.InferenceSession(
            "model_int8.onnx", providers=["CUDAExecutionProvider"]
        )
        onnxruntime_results = []
        for sample in tqdm(
            SAMPLES, desc="ONNXRUNTIME GENERATEING OUTPUTS", total=len(SAMPLES)
        ):
            result = session.run(None, {int8_input_names[0]: convert_any_to_numpy(sample)})
            onnxruntime_results.append(result)
    
    

    同时,我也对那个 opset 做了转换,转换到了12,但还是没有办法读到 bn 层的 output。然后我希望能够将一个 onnx 模型量化到 onnx 格式,请问一下该怎么做呢?我看了其他的issue好像是把 QUANT_PLATFROM 设置为 TargetPlatform.ONNXRUNTIME,但看起来目前的版本并不支持这个平台。

    opened by Pzzzzz5142 6
  • RuntimeError of Shape op during Calibration dataset progress and finetune progress

    RuntimeError of Shape op during Calibration dataset progress and finetune progress

    图片

    配置信息:

    TARGET_PLATFORM = TargetPlatform.NXP_INT8 # choose your target platform MODEL_TYPE = NetworkFramework.ONNX # or NetworkFramework.CAFFE INPUT_LAYOUT = 'chw' # input data layout, chw or hwc NETWORK_INPUTSHAPE = [16, 1, 40, 61] # input shape of your network CALIBRATION_BATCHSIZE = 16 # batchsize of calibration dataset EXECUTING_DEVICE = 'cuda' # 'cuda' or 'cpu'. REQUIRE_ANALYSE = True DUMP_RESULT = False

    SETTING = UnbelievableUserFriendlyQuantizationSetting( platform = TARGET_PLATFORM, finetune_steps = 2500, finetune_lr = 1e-3, calibration = 'percentile', equalization = True, non_quantable_op = None) dataloader = DataLoader( dataset=calibration_dataset, batch_size=32, shuffle=True) quantized = quantize( working_directory=WORKING_DIRECTORY, setting=SETTING, model_type=MODEL_TYPE, executing_device=EXECUTING_DEVICE, input_shape=NETWORK_INPUTSHAPE, target_platform=TARGET_PLATFORM, dataloader=dataloader, calib_steps=250)

    问题描述:

    在213次迭代时shape算子报上述错误,计算后发现这一次迭代batch size=19, 在dataload迭代器内部打印了下log,发现这一批次finetune确实只送出来了19个样本。后来发现数据集样本数刚好在213次迭代时遍历完一遍。 后面我将finetune step和calib_step都改为100, Calibration数据集样本数调整为32*100个之后就能正常运行。 下面是模型文件: model.zip

    opened by lycfly 6
  • parse onnx model failed

    parse onnx model failed

    problem:

    [email protected]:~/workspace$ ~/libraries/TensorRT-8.4.1.5/bin/trtexec --onnx=./unet-q.onnx --saveEngine=unet-int8.engine
    &&&& RUNNING TensorRT.trtexec [TensorRT v8401] # /home/kls/libraries/TensorRT-8.4.1.5/bin/trtexec --onnx=./unet-q.onnx --saveEngine=unet-int8.engine
    [01/06/2023-17:32:20] [I] === Model Options ===
    [01/06/2023-17:32:20] [I] Format: ONNX
    [01/06/2023-17:32:20] [I] Model: ./unet-q.onnx
    [01/06/2023-17:32:20] [I] Output:
    [01/06/2023-17:32:20] [I] === Build Options ===
    [01/06/2023-17:32:20] [I] Max batch: explicit batch
    [01/06/2023-17:32:20] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default
    [01/06/2023-17:32:20] [I] minTiming: 1
    [01/06/2023-17:32:20] [I] avgTiming: 8
    [01/06/2023-17:32:20] [I] Precision: FP32
    [01/06/2023-17:32:20] [I] LayerPrecisions: 
    [01/06/2023-17:32:20] [I] Calibration: 
    [01/06/2023-17:32:20] [I] Refit: Disabled
    [01/06/2023-17:32:20] [I] Sparsity: Disabled
    [01/06/2023-17:32:20] [I] Safe mode: Disabled
    [01/06/2023-17:32:20] [I] DirectIO mode: Disabled
    [01/06/2023-17:32:20] [I] Restricted mode: Disabled
    [01/06/2023-17:32:20] [I] Build only: Disabled
    [01/06/2023-17:32:20] [I] Save engine: unet-int8.engine
    [01/06/2023-17:32:20] [I] Load engine: 
    [01/06/2023-17:32:20] [I] Profiling verbosity: 0
    [01/06/2023-17:32:20] [I] Tactic sources: Using default tactic sources
    [01/06/2023-17:32:20] [I] timingCacheMode: local
    [01/06/2023-17:32:20] [I] timingCacheFile: 
    [01/06/2023-17:32:20] [I] Input(s)s format: fp32:CHW
    [01/06/2023-17:32:20] [I] Output(s)s format: fp32:CHW
    [01/06/2023-17:32:20] [I] Input build shapes: model
    [01/06/2023-17:32:20] [I] Input calibration shapes: model
    [01/06/2023-17:32:20] [I] === System Options ===
    [01/06/2023-17:32:20] [I] Device: 0
    [01/06/2023-17:32:20] [I] DLACore: 
    [01/06/2023-17:32:20] [I] Plugins:
    [01/06/2023-17:32:20] [I] === Inference Options ===
    [01/06/2023-17:32:20] [I] Batch: Explicit
    [01/06/2023-17:32:20] [I] Input inference shapes: model
    [01/06/2023-17:32:20] [I] Iterations: 10
    [01/06/2023-17:32:20] [I] Duration: 3s (+ 200ms warm up)
    [01/06/2023-17:32:20] [I] Sleep time: 0ms
    [01/06/2023-17:32:20] [I] Idle time: 0ms
    [01/06/2023-17:32:20] [I] Streams: 1
    [01/06/2023-17:32:20] [I] ExposeDMA: Disabled
    [01/06/2023-17:32:20] [I] Data transfers: Enabled
    [01/06/2023-17:32:20] [I] Spin-wait: Disabled
    [01/06/2023-17:32:20] [I] Multithreading: Disabled
    [01/06/2023-17:32:20] [I] CUDA Graph: Disabled
    [01/06/2023-17:32:20] [I] Separate profiling: Disabled
    [01/06/2023-17:32:20] [I] Time Deserialize: Disabled
    [01/06/2023-17:32:20] [I] Time Refit: Disabled
    [01/06/2023-17:32:20] [I] Inputs:
    [01/06/2023-17:32:20] [I] === Reporting Options ===
    [01/06/2023-17:32:20] [I] Verbose: Disabled
    [01/06/2023-17:32:20] [I] Averages: 10 inferences
    [01/06/2023-17:32:20] [I] Percentile: 99
    [01/06/2023-17:32:20] [I] Dump refittable layers:Disabled
    [01/06/2023-17:32:20] [I] Dump output: Disabled
    [01/06/2023-17:32:20] [I] Profile: Disabled
    [01/06/2023-17:32:20] [I] Export timing to JSON file: 
    [01/06/2023-17:32:20] [I] Export output to JSON file: 
    [01/06/2023-17:32:20] [I] Export profile to JSON file: 
    [01/06/2023-17:32:20] [I] 
    [01/06/2023-17:32:20] [I] === Device Information ===
    [01/06/2023-17:32:20] [I] Selected Device: NVIDIA A10
    [01/06/2023-17:32:20] [I] Compute Capability: 8.6
    [01/06/2023-17:32:20] [I] SMs: 72
    [01/06/2023-17:32:20] [I] Compute Clock Rate: 1.695 GHz
    [01/06/2023-17:32:20] [I] Device Global Memory: 22731 MiB
    [01/06/2023-17:32:20] [I] Shared Memory per SM: 100 KiB
    [01/06/2023-17:32:20] [I] Memory Bus Width: 384 bits (ECC enabled)
    [01/06/2023-17:32:20] [I] Memory Clock Rate: 6.251 GHz
    [01/06/2023-17:32:20] [I] 
    [01/06/2023-17:32:20] [I] TensorRT version: 8.4.1
    [01/06/2023-17:32:21] [I] [TRT] [MemUsageChange] Init CUDA: CPU +535, GPU +0, now: CPU 542, GPU 499 (MiB)
    [01/06/2023-17:32:21] [I] Start parsing network model
    [01/06/2023-17:32:21] [I] [TRT] ----------------------------------------------------------------
    [01/06/2023-17:32:21] [I] [TRT] Input filename:   ./unet-q.onnx
    [01/06/2023-17:32:21] [I] [TRT] ONNX IR version:  0.0.7
    [01/06/2023-17:32:21] [I] [TRT] Opset version:    13
    [01/06/2023-17:32:21] [I] [TRT] Producer name:    PPL Quantization Tool
    [01/06/2023-17:32:21] [I] [TRT] Producer version: 
    [01/06/2023-17:32:21] [I] [TRT] Domain:           
    [01/06/2023-17:32:21] [I] [TRT] Model version:    0
    [01/06/2023-17:32:21] [I] [TRT] Doc string:       
    [01/06/2023-17:32:21] [I] [TRT] ----------------------------------------------------------------
    [01/06/2023-17:32:21] [E] [TRT] ModelImporter.cpp:720: While parsing node number 23 [QuantizeLinear -> "PPQ_Variable_297"]:
    [01/06/2023-17:32:21] [E] [TRT] ModelImporter.cpp:721: --- Begin node ---
    [01/06/2023-17:32:21] [E] [TRT] ModelImporter.cpp:722: input: "outc.conv.weight"
    input: "PPQ_Variable_295"
    input: "PPQ_Variable_296"
    output: "PPQ_Variable_297"
    name: "PPQ_Operation_98"
    op_type: "QuantizeLinear"
    attribute {
      name: "axis"
      i: 0
      type: INT
    }
    
    [01/06/2023-17:32:21] [E] [TRT] ModelImporter.cpp:723: --- End node ---
    [01/06/2023-17:32:21] [E] [TRT] ModelImporter.cpp:726: ERROR: builtin_op_importers.cpp:1096 In function QuantDequantLinearHelper:
    [6] Assertion failed: axis == INVALID_AXIS && "Quantization axis attribute is not valid with a single quantization scale"
    [01/06/2023-17:32:21] [E] Failed to parse onnx file
    [01/06/2023-17:32:21] [I] Finish parsing network model
    [01/06/2023-17:32:21] [E] Parsing model failed
    [01/06/2023-17:32:21] [E] Failed to create engine from model or file.
    [01/06/2023-17:32:21] [E] Engine set up failed
    
    opened by nanmi 1
  • evaluation_with_imagenet.py is failure

    evaluation_with_imagenet.py is failure

    运行官方示例跑Resnet50报错: ppq/ppq/samples/Imagenet/evaluation_with_imagenet.py Test: [700 / 781] [email protected] 75.843 (75.843) [email protected] 92.812 (92.812) Evaluating Model...: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 781/781 [00:34<00:00, 22.68it/s]

    • [email protected] 75.804 [email protected] 92.808 [Warning] File Output/resnet50.onnx is already existed, Exporter will overwrite it. /opt/conda/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py:53: UserWarning: Specified provider 'CUDAExecutionProvider' is not in available provider names.Available providers: 'CPUExecutionProvider' warnings.warn("Specified provider '{}' is not in available provider names." Traceback (most recent call last): File "evaluation_with_imagenet.py", line 84, in evaluate_onnx_module_with_imagenet( File "/home/li.sun/github/ppq/ppq/samples/Imagenet/Utilities/Imagenet/imagenet_util.py", line 103, in evaluate_onnx_module_with_imagenet sess = onnxruntime.InferenceSession(path_or_bytes=onnxruntime_model_path, providers=['CUDAExecutionProvider']) File "/opt/conda/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 283, in init self._create_inference_session(providers, provider_options, disabled_optimizers) File "/opt/conda/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 310, in _create_inference_session sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model) onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Load model from Output/resnet50.onnx failed:Type Error: Type (tensor(float)) of output arg (PPQ_Variable_2) of node (PPQ_Operation_0) does not match expected type (tensor(int8)).

    环境:

    1. ppq git commit id: commit 76e03261bad580e7c52e6f0856034fa9313f69b5 (HEAD -> master, origin/master, origin/HEAD) Author: AwesomeCodingBoy [email protected] Date: Tue Dec 13 14:03:47 2022 +0800

      Update inference_with_ncnn.md (#324)

    onnxruntime 1.13.1 onnxruntime-gpu 1.13.1 或 onnxruntime 1.8.1 onnxruntime-gpu 1.8.1

    opened by yuyinsl 2
  • How to export the quantized onnx file and save weight in int8 format?

    How to export the quantized onnx file and save weight in int8 format?

    Hi, More than appreciate to provide wonderful tutorials on bilibili. As the beginner in quantization, I do learn a lot from a series of class. To deploy onnx model to a specified TPU, the current solution offered by the vendor is to store the output of the ONNX file in int8 format. However, the current default storage format is float32. Could you please tell me how to change this setting?

    More details you could check this issue: https://github.com/sophgo/tpu-mlir/issues/51. They suggested us to export onnx in first format.

    opened by Jackycheng0808 1
  • Convert Yolov4

    Convert Yolov4

    Hi, I have a yolov4 model, that I want to run on TensorRT INT8. I read the documentation but having a hard time following it as an English speaker. Can you please guide me on how do I convert the model and prepared dataset for ProgramEntrance.py script. I have dataset in Yolo format.

    Thanks

    opened by Sayyam-Jain 1
  • 怎么看模型适不适合量化,能不能从量化中获益?

    怎么看模型适不适合量化,能不能从量化中获益?

    你好 从B站视频了解这个项目的,B站的视频讲得很清楚,已经一键三连。视频中提到有些网络从量化中并不能获益,可能还有负反馈,但是没提到怎么详细的判断,就怕一顿操作猛如虎,一看结果二百五。 这个和平台,推理框架有关吗?比如考虑android,arm64, ncnn 这个方向有什么好的判断准则吗? 谢谢!

    opened by zuowanbushiwo 3
Releases(v0.6.5)
  • v0.6.5(Sep 2, 2022)

    • Analyzer
      • 添加了新的分析方法 statistical_analyse
      • 允许分析多输出算子
      • 重新设计了 cosine 相似度的计算方式
    • API
      • 添加了新的api: load_native_graph
      • 添加了新的api: register_network_quantizer
      • 添加了新的api: register_network_parser
      • 添加了新的api: register_network_exporter
      • 允许以setting=None调用api函数
    • Executor
      • 支持1d, 3d卷积,
      • 支持1d 3d pooling
      • 支持1d 3d 反卷积
      • 支持 lstm, gru
      • 支持 sin, cos
      • 支持 abs
      • 支持 sum
      • 支持 Erf, Elu, Reciprocal
      • 重写了 resize, slice 与 scatterND 实现
      • 移除了注册算子的限制条件,现在你可以覆盖ppq内部的算子实现
      • 修正了Conv, Pooling, ConvTranspose, Pad中的padding问题,适配onnx 1d, 2d, 3d padding,并将以较高性能运行
    • Dispatcher
      • 添加了新的调度器 purseus, allin
      • 添加了新的数据类型抽象 opsocket,在下一版本中该抽象将被移入ppq.IR
      • 默认子图切分方法更改为 purseus
      • 添加调度报警信息
    • Observer
      • 添加了新的calibration方法OrderPreserving,保序量化将被应用在分类网络当中,提升分类性能
      • 添加了mse的非对称实现,并添加了c++的mse实现
    • Graph
      • 支持1d, 3d卷积与反卷积的bn融合
      • Variable添加属性 shape,可以直接修改 shape 来设定 dynamic shape
      • 图匹配引擎允许以ep_expr = None进行模式匹配
      • 支持图的复制
    • Optim
      • LSQ 算法被重写,性能大幅提升, Advanced optimization 与 LSQ 算法合并,现在被称为CuLSQ
      • Brecq 算法被移入 legacy,不推荐使用
      • layerSplit, BiasCorrection 算法被重写,性能提升
      • Laerwise Equalization 算法被重写,现在支持 1,2,3d 卷积与反卷积,支持 include act
      • 修正了 average pooling 算子对齐的错误
      • 修正了 bias, pad 量化的相关问题
      • 移除了RuntimePerlayerCalibrationPass,相关参数不再起作用
      • 移除了ConstantBakingPass,相关参数不再起作用
      • 移除了InplaceQuantizationSettingPass,相关参数不再起作用
      • 移除了 fuse_conv_add 设置选项,相关优化过程被移入 legacy 文件,现在必须手动调用
    • Doc
      • 添加了常见优化过程文档
      • 添加了 yolo 量化相关例子
      • 添加了新的入门教程示例代码
    • Cuda
      • 重写了量化核心函数,性能提升
      • 重写了量化梯度传播函数,性能提升
      • 现在在编译开始时,会自动移除编译锁
    • Core
      • 为 TQC 添加了新的属性 Visibilty,将使用该属性控制 TQC 导出能见度
      • 修改了一些属性名字,并将它们写入 ppq.common.py
      • TensorQuantizationConfig中的函数__is_revisable现在是一个公有函数,并被重命名为is_revisable
    • Other
      • Import TensorRT的警告现在只在导出的时候才会发出
      • 添加了 snpe 1.6.3 的支持
      • 添加了 tengine 的支持
      • 修复了一系列错误
    Source code(tar.gz)
    Source code(zip)
  • v0.6.4(Jun 1, 2022)

    重做计算图操作接口,添加了函数 remove_operation, remove_variable, insert_op_on_var, insert_op_between_var, create_link_with_var, create_link_with_op, truncate_on_var

    重做 onnxruntime 导出逻辑,重做 onnx oos 导出逻辑

    更新了 lsq 算法,加速执行

    更新了 ssd 算法,加速执行

    更新了 core.ffi,现在编译不了的话会给你报告错误。

    添加了几个api函数,包括 manop 与 quantize_native_model 它们允许你手动控制优化逻辑。

    添加了第二类模式匹配功能

    添加了 gru 分解的相关逻辑

    添加了图 api 的测试类

    添加了QNN导出逻辑

    添加了 swish, mish 激活函数的融合逻辑

    添加了 FPGAQuantizer

    添加 mod 算子支持,添加 softplus 算子支持,添加 gru 算子支持  

    移除了 misc 文件夹,其中代码已经不被使用。  

    修复了 pad 顺序不对的问题

    修复了创建变量时变量可能重名的问题

    修复了 path_matching 中中间结果没有被复制,从而导致结果可能出现错误的问题

    修复了 matex gemm split pass 的一些不引人注意的 bug

    修复了 delete_isolated 函数的一些错误

    修复了一个PPL_DSP_TI_INT8 被错误命名为 PPL_DSP_TI_IN8 的问题

    Source code(tar.gz)
    Source code(zip)
  • v0.6.3(Mar 30, 2022)

  • v0.6.2(Mar 18, 2022)

    • Scale and offset are now always torch.Tensor with dtype=fp32 for training your network.
    • PPQ will display network snapshot when quantize your network.
    • Add brecq & lsq algorithms
    • Cuda kernels has been refined, more cuda kernels are introduced into ppq.
    • Add an exporter for dumping onnx quantization model.
    • Test cases are introduced here since ppq 0.6.2
    Source code(tar.gz)
    Source code(zip)
A PyTorch implementation of SlowFast based on ICCV 2019 paper "SlowFast Networks for Video Recognition"

SlowFast A PyTorch implementation of SlowFast based on ICCV 2019 paper SlowFast Networks for Video Recognition. Requirements Anaconda PyTorch conda in

Hao Ren 8 Dec 23, 2022
Open-source implementation of Google Vizier for hyper parameters tuning

Advisor Introduction Advisor is the hyper parameters tuning system for black box optimization. It is the open-source implementation of Google Vizier w

tobe 1.5k Jan 04, 2023
PyTorch implementation of Spiking Neural Networks trained on surrogate gradient & BPTT using snntorch.

snn-localization repo PyTorch implementation of Spiking Neural Networks trained on surrogate gradient & BPTT using snntorch. Install Dependencies Orig

Sami BARCHID 1 Jan 06, 2022
using yolox+deepsort for object-tracker

YOLOX_deepsort_tracker yolox+deepsort实现目标跟踪 最新的yolox尝尝鲜~~(yolox正处在频繁更新阶段,因此直接链接yolox仓库作为子模块) Install Clone the repository recursively: git clone --rec

245 Dec 26, 2022
Pose Transformers: Human Motion Prediction with Non-Autoregressive Transformers

Pose Transformers: Human Motion Prediction with Non-Autoregressive Transformers This is the repo used for human motion prediction with non-autoregress

Idiap Research Institute 26 Dec 14, 2022
Mining-the-Social-Web-3rd-Edition - The official online compendium for Mining the Social Web, 3rd Edition (O'Reilly, 2018)

Mining the Social Web, 3rd Edition The official code repository for Mining the Social Web, 3rd Edition (O'Reilly, 2019). The book is available from Am

Mikhail Klassen 838 Jan 01, 2023
Implementing DropPath/StochasticDepth in PyTorch

%load_ext memory_profiler Implementing Stochastic Depth/Drop Path In PyTorch DropPath is available on glasses my computer vision library! Introduction

Francesco Saverio Zuppichini 13 Jan 05, 2023
'Aligned mixture of latent dynamical systems' (amLDS) for stimulus decoding probabilistic manifold alignment across animals. P. Herrero-Vidal et al. NeurIPS 2021 code.

Across-animal odor decoding by probabilistic manifold alignment (NeurIPS 2021) This repository is the official implementation of aligned mixture of la

Pedro Herrero-Vidal 3 Jul 12, 2022
Match SafeGraph POIs with Data collected through a cultural resource survey in Washington DC.

Match SafeGraph POI data with Cultural Resource Places in Washington DC Match SafeGraph POIs with Data collected through a cultural resource survey in

Changjie Chen 1 Jan 05, 2022
A PyTorch implementation of "Signed Graph Convolutional Network" (ICDM 2018).

SGCN ⠀ A PyTorch implementation of Signed Graph Convolutional Network (ICDM 2018). Abstract Due to the fact much of today's data can be represented as

Benedek Rozemberczki 251 Nov 30, 2022
Code for Piggyback: Adapting a Single Network to Multiple Tasks by Learning to Mask Weights

Piggyback: https://arxiv.org/abs/1801.06519 Pretrained masks and backbones are available here: https://uofi.box.com/s/c5kixsvtrghu9yj51yb1oe853ltdfz4q

Arun Mallya 165 Nov 22, 2022
[WACV 2020] Reducing Footskate in Human Motion Reconstruction with Ground Contact Constraints

Reducing Footskate in Human Motion Reconstruction with Ground Contact Constraints Official implementation for Reducing Footskate in Human Motion Recon

Virginia Tech Vision and Learning Lab 38 Nov 01, 2022
Minimal fastai code needed for working with pytorch

fastai_minima A mimal version of fastai with the barebones needed to work with Pytorch #all_slow Install pip install fastai_minima How to use This lib

Zachary Mueller 14 Oct 21, 2022
Animal Sound Classification (Cats Vrs Dogs Audio Sentiment Classification)

this is a simple artificial neural network model using deep learning and torch-audio to classify cats and dog sounds.

crispengari 3 Dec 05, 2022
Enabling Lightweight Fine-tuning for Pre-trained Language Model Compression based on Matrix Product Operators

Enabling Lightweight Fine-tuning for Pre-trained Language Model Compression based on Matrix Product Operators This is our Pytorch implementation for t

RUCAIBox 12 Jul 22, 2022
Extremely simple and fast extreme multi-class and multi-label classifiers.

napkinXC napkinXC is an extremely simple and fast library for extreme multi-class and multi-label classification, that focus of implementing various m

Marek Wydmuch 43 Nov 14, 2022
Causal-BALD: Deep Bayesian Active Learning of Outcomes to Infer Treatment-Effects from Observational Data.

causal-bald | Abstract | Installation | Example | Citation | Reproducing Results DUE An implementation of the methods presented in Causal-BALD: Deep B

OATML 13 Oct 07, 2022
The pytorch implementation of DG-Font: Deformable Generative Networks for Unsupervised Font Generation

DG-Font: Deformable Generative Networks for Unsupervised Font Generation The source code for 'DG-Font: Deformable Generative Networks for Unsupervised

130 Dec 05, 2022
Machine Learning automation and tracking

The Open-Source MLOps Orchestration Framework MLRun is an open-source MLOps framework that offers an integrative approach to managing your machine-lea

873 Jan 04, 2023
Code to accompany the paper "Finding Bipartite Components in Hypergraphs", which is published in NeurIPS'21.

Finding Bipartite Components in Hypergraphs This repository contains code to accompany the paper "Finding Bipartite Components in Hypergraphs", publis

Peter Macgregor 5 May 06, 2022