MQBench: Towards Reproducible and Deployable Model Quantization Benchmark

Last update: Dec 29, 2022

Related tags

Overview

MQBench: Towards Reproducible and Deployable Model Quantization Benchmark

We propose a benchmark to evaluate different quantization algorithms on various settings. MQBench is a first attempt to evaluate, analyze, and benchmark the reproducibility and deployability for model quantization algorithms. We choose multiple different platforms for real-world deployments, including CPU, GPU, ASIC, DSP, and evaluate extensive state-of-the-art quantization algorithms under a unified training pipeline. MQBench acts like a bridge to connect the algorithm and the hardware. We conduct a comprehensive analysis and find considerable intuitive or counter-intuitive insights.

Highlighted Features

Integrate with the latest tracing techniques in Pytorch 1.8.
Quantization Algorithms
- Learned Step Size Quantization: https://arxiv.org/abs/1902.08153
- Quantization Interval Learning: https://arxiv.org/abs/1808.05779
- Differentiable Soft Quantization: https://arxiv.org/abs/1908.05033
- Parameterized Clipping AcTivation: https://arxiv.org/abs/1805.06085
- Additive Powers-of-Two Quantization: https://arxiv.org/abs/1909.13144
- DoReFa-Net: https://arxiv.org/abs/1606.06160
Network Architectures:
- ResNet-18, ResNet-50: https://arxiv.org/abs/1512.03385
- MobileNetV2: https://arxiv.org/abs/1801.04381
- EfficienteNet-Lite-B0: https://blog.tensorflow.org/2020/03/higher-accuracy-on-vision-models-with-efficientnet-lite.html
- RegNetX-600GF: https://arxiv.org/abs/2003.13678

Hardware Platform:

Library	Haware Type	s Form	Granularity	Symmetry	Fold BN
Academic	None	FP32	Per-tensor	Symmetric	No
TensorRT	GPU	FP32	Per-channel	Symmetric	Yes
ACL	ASIC	FP32	Per-channel	Asymmetric	Yes
TVM	ARM CPU	POT	Per-tensor	Symmetric	Yes
SNPE	DSP	FP32	Per-tensor	Asymmetric	Yes
FBGEMM	X86 CPU	FP32	Per-channel	Asymmetric	Yes

Installation

These instructions will help get MQBench up.

Clone MQBench.
(Optionally) Create a Python virtual environment.
Install the MQBench-required packages

$ pip install -r requirements.txt

Notes: MQBench uses Pytorch-1.8, our quantized model is based on the new torch.fx tracing techniques.
MQBench use the Pytorch distributed data-parallel training with nccl backend (see details here), please make sure your machine can initailize that distributed learning environment.

How to Reproduce MQBench

We provide the running scripts run.sh and configuration file config.yaml of all experiments in MQBench.

To reproduce LSQ on ResNet-18,

enter the directory

$ cd PATH-TO-PROJECT/qbench_zoo
$ cd lsq_experiments/resnet18_4bit_academic

run script

$ sh run.sh

Note that run.sh contain some commands that may not be found, the core running command is

PYTHONPATH=$PYTHONPATH:../../..
python -u -m prototype.solver.cls_quant_solver --config config.yaml

How to self-implement a quantization algorithm

All our quantization algorithms are implemented in prototype/quantization/

To implementa a new algorithm, you need to add you quantizer into this directory.

All quantizer are inheritant from QuantizeBase class. Each QuantizedBase will have an observer class which is used to estimate/update the quantization range. The observer design is inspired from the Pytorch-1.8 repo. Intializing a QuantizeBase class will also initialize a Observer class.

The parameters contained for QuantizeBase and Observer include：

quant_min, quant_max, which specify the $N_{min}, N_{max}$ for rounding boundaries.
qshcme, which can be torch.per_tensor_symmetric, torch.per_channel_symmetric, torch.per_tensor_affine, and torch.per_channel_affine. This is often determined by the hardware setup.
ch_axis, which is the dimension of channel-wise quantization. -1 is for per-tensor quantization. Typically for nn.Conv2d and nn.Linear module, the ch_axis should be 0.
ada_sign, which can adaptively choose the signness. ada_sign should be enabled for academic setting only.
pot_scale, which is used to determine the powers-of-two scale parameters.

Note: each specified quantizer may have its own unique parameters, see example of LSQ below.

Example Implementation of LSQ:

For initialization, we add new parameters for storing the scale, zero_point:

self.use_grad_scaling = use_grad_scaling
self.scale = Parameter(torch.tensor([scale]))
self.zero_point = Parameter(torch.tensor([zero_point]))

The major implementation is the forward function, which should contain several cases:

In case of ada_sign=True, the quantization range should be adjusted.

if self.ada_sign and X.min() >= 0:
  	self.quant_max = self.activation_post_process.quant_max = 2 ** self.bitwidth - 1
  	self.quant_min = self.activation_post_process.quant_min = 0
  	self.activation_post_process.adjust_sign = True

In case of symmetric quantization, the zero point should set to 0.
```
self.zero_point.data.zero_()
```

In case of powers-of-two scale, the scale should be quantized by:

def pot_quantization(tensor: torch.Tensor):
    log2t = torch.log2(tensor)
    log2t = (torch.round(log2t)-log2t).detach() + log2t
    return 2 ** log2t
    
scale = pot_quantization(self.scale)

Implement both per-channel and per-tensor quantization.

After adding you quantizer...

The next step is to register the quantizer in prototype/quantization/qconfig.py

Import your quantizer and then add it to get_qconfig function, and parse necessary arguments.

The final step is to override a config.yaml file:

qparams:
    w_method: lsq
    a_method: lsq
    bit: 4

backend: academic
bnfold: 4

By replacing the w_method, a_method, you can run your implementation.

Note: the rest of the config file should not be modified in order to keep a unified training setting.

How to self-implement a hardware configuration

Adding a new setting in hardware is much simpler that algorithms. To do this, we can add another condition in the if-else selection. For example, adding a new hardware TFLite Micro:

        elif backend == "tflitemicro":
            backend_params = dict(ada_sign=False, symmetry=True, per_channel=False, pot_scale=True)
        ...

    model_qconfig = get_qconfig(**self.qparams, **backend_params)
    model = quantize_fx.prepare_qat_fx(model, {"": model_qconfig}, foldbn_config)

Submitting Your Results to MQBench

You can submit your implementation to MQBench by submmitting a merge request to this repo. The implementation of new algorithms and the running scripts, log file are needed for evalutation.

License

This project is under Apache 2.0 License.

Comments

Deploy之前想保存量化的pth模型，torch.save失败

File "/opt/conda/lib/python3.8/site-packages/torch/serialization.py", line 379, in save _save(obj, opened_zipfile, pickle_module, pickle_protocol) File "/opt/conda/lib/python3.8/site-packages/torch/serialization.py", line 484, in _save pickler.dump(obj) AttributeError: Can't pickle local object 'ObserverBase.__init__.<locals>.PerChannelLoadHook'

opened by wangshankun 13

基于最新mqbench对yolox进行量化，选择backbend=tengine_u8时报错:AttributeError: 'dict' object has no attribute 'detach'

使用UP框架基于最新mqbench对yolox进行QAT训练，选择backbend=tengine_u8 时报错:AttributeError: 'dict' object has no attribute 'detach'

以下是使用的QAT配置文件：

num_classes: &num_classes 13
runtime:
  aligned: true
    # async_norm: True
  special_bn_init: true
  task_names: quant_det
  runner:
    type: quant

quant:
  quant_type: qat
  deploy_backend: Tengine_u8
  cali_batch_size: 900
  prepare_args:
    extra_qconfig_dict:
      w_observer: MinMaxObserver
      a_observer: EMAMinMaxObserver
      w_fakequantize: FixedFakeQuantize
      a_fakequantize: FixedFakeQuantize
    leaf_module: [Space2Depth, FrozenBatchNorm2d]
    extra_quantizer_dict:
      additional_module_type: [ConvFreezebn2d, ConvFreezebnReLU2d]


mixup:
  type: yolox_mixup_cv2
  kwargs:
    extra_input: true
    input_size: [640, 640]
    mixup_scale: [0.8, 1.6]
    fill_color: 0

mosaic:
  type: mosaic
  kwargs:
    extra_input: true
    tar_size: 640
    fill_color: 0

random_perspective:
  type: random_perspective_yolox
  kwargs:
    degrees: 10.0 # 0.0
    translate: 0.1
    scale: [0.1, 2.0] # 0.5
    shear: 2.0 # 0.0
    perspective: 0.0
    fill_color: 0  # 0
    border: [-320, -320]

augment_hsv:
  type: augment_hsv
  kwargs:
    hgain: 0.015
    sgain: 0.7
    vgain: 0.4
    color_mode: BGR

flip:
  type: flip
  kwargs:
    flip_p: 0.5

to_tensor: &to_tensor
  type: custom_to_tensor

train_resize: &train_resize
  type: keep_ar_resize_max
  kwargs:
    max_size: 640
    random_size: [15, 25]
    scale_step: 32
    padding_type: left_top
    padding_val: 0

test_resize: &test_resize
  type: keep_ar_resize_max
  kwargs:
    max_size: 640
    padding_type: left_top
    padding_val: 0

dataset:
  train:
    dataset:
      type: coco
      kwargs:
        meta_file: train.json
        image_reader:
          type: fs_opencv
          kwargs:
            image_dir: &img_root /images/
            color_mode: BGR
        transformer: [*train_resize, *to_tensor]
    batch_sampler:
      type: base
      kwargs:
        sampler:
          type: dist
          kwargs: {}
        batch_size: 4
  test:
    dataset:
      type: coco
      kwargs:
        meta_file: &gt_file val.json
        image_reader:
          type: fs_opencv
          kwargs:
            image_dir: *img_root
            color_mode: BGR
        transformer: [*test_resize, *to_tensor]
        evaluator:
          type: COCO
          kwargs:
            gt_file: *gt_file
            iou_types: [bbox]
    batch_sampler:
      type: base
      kwargs:
        sampler:
          type: dist
          kwargs: {}
        batch_size: 4
  dataloader:
    type: base
    kwargs:
      num_workers: 4
      alignment: 32
      worker_init: true
      pad_type: batch_pad

trainer: # Required.
  max_epoch: &max_epoch 6             # total epochs for the training
  save_freq: 1
  test_freq: 1
  only_save_latest: false
  optimizer:                 # optimizer = SGD(params,lr=0.01,momentum=0.937,weight_decay=0.0005)
    register_type: yolov5
    type: SGD
    kwargs:
      lr: 0.0000003125
      momentum: 0.9
      nesterov: true
      weight_decay: 0.0      # weight_decay = 0.0005 * batch_szie / 64
  lr_scheduler:              # lr_scheduler = MultStepLR(optimizer, milestones=[9,14],gamma=0.1)
    warmup_epochs: 0        # set to be 0 to disable warmup. When warmup,  target_lr = init_lr * total_batch_size
    warmup_type: linear
    warmup_ratio: 0.001
    type: MultiStepLR
    kwargs:
      milestones: [2, 4]     # epochs to decay lr
      gamma: 0.1             # decay rate

saver:
  save_dir: checkpoints/yolox_s_ret_a1_comloc_quant_tengine
  results_dir: results_dir/yolox_s_ret_a1_comloc_quant_tengine
  resume_model: /United-Perception/train_config/pretrain/300_65_ckpt_best.pth
  auto_resume: True



ema:
  enable: false
  ema_type: exp
  kwargs:
    decay: 0.9998

net:
- name: backbone
  type: yolox_s
  kwargs:
    out_layers: [2, 3, 4]
    out_strides: [8, 16, 32]
    normalize: {type: mqbench_freeze_bn}
    act_fn: {type: Silu}
- name: neck
  prev: backbone
  type: YoloxPAFPN
  kwargs:
    depth: 0.33
    out_strides: [8, 16, 32]
    normalize: {type: mqbench_freeze_bn}
    act_fn: {type: Silu}
- name: roi_head
  prev: neck
  type: YoloXHead
  kwargs:
    num_classes: *num_classes
    width: 0.5
    num_point: &dense_points 1
    normalize: {type: mqbench_freeze_bn}
    act_fn: {type: Silu}
- name: post_process
  prev: roi_head
  type: retina_post_iou
  kwargs:
    num_classes: *num_classes
                                  # number of classes including backgroudn. for rpn, it's 2; for RetinaNet, it's 81
    cfg:
      cls_loss:
        type: quality_focal_loss
        kwargs:
          gamma: 2.0
      iou_branch_loss:
        type: sigmoid_cross_entropy
      loc_loss:
        type: compose_loc_loss
        kwargs:
          loss_cfg:
          - type: iou_loss
            kwargs:
              loss_type: giou
              loss_weight: 1.0
          - type: l1_loss
            kwargs:
              loss_weight: 1.0
      anchor_generator:
        type: hand_craft
        kwargs:
          anchor_ratios: [1]    # anchor strides are provided as feature strides by feature extractor
          anchor_scales: [4]   # scale of anchors relative to feature map
      roi_supervisor:
        type: atss
        kwargs:
          top_n: 9
          use_iou: true
      roi_predictor:
        type: base
        kwargs:
          pre_nms_score_thresh: 0.05    # to reduce computation
          pre_nms_top_n: 1000
          post_nms_top_n: 1000
          roi_min_size: 0                 # minimum scale of a valid roi
          merger:
            type: retina
            kwargs:
              top_n: 100
              nms:
                type: naive
                nms_iou_thresh: 0.65

以下是报错信息：

[MQBENCH] INFO: Enable observer and Disable quantize for act_fake_quant
[MQBENCH] INFO: Enable observer and Disable quantize for act_fake_quant
[MQBENCH] INFO: Enable observer and Disable quantize for act_fake_quant
Traceback (most recent call last):
  File "/opt/conda/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/opt/conda/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/data/lsc/United-Perception/up/__main__.py", line 27, in <module>
    main()
  File "/data/lsc/United-Perception/up/__main__.py", line 21, in main
    args.run(args)
  File "/data/lsc/United-Perception/up/commands/train.py", line 144, in _main
    launch(main, args.num_gpus_per_machine, args.num_machines, args=args, start_method=args.fork_method)
  File "/data/lsc/United-Perception/up/utils/env/launch.py", line 52, in launch
    mp.start_processes(
  File "/opt/conda/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes
    while not context.join():
  File "/opt/conda/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 150, in join
    raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException: 

-- Process 3 terminated with the following error:
Traceback (most recent call last):
  File "/opt/conda/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
    fn(i, *args)
  File "/data/lsc/United-Perception/up/utils/env/launch.py", line 117, in _distributed_worker
    main_func(args)
  File "/data/lsc/United-Perception/up/commands/train.py", line 134, in main
    runner = RUNNER_REGISTRY.get(runner_cfg['type'])(cfg, **runner_cfg['kwargs'])
  File "/data/lsc/United-Perception/up/tasks/quant/runner/quant_runner.py", line 17, in __init__
    super(QuantRunner, self).__init__(config, work_dir, training)
  File "/data/lsc/United-Perception/up/runner/base_runner.py", line 59, in __init__
    self.build()
  File "/data/lsc/United-Perception/up/tasks/quant/runner/quant_runner.py", line 34, in build
    self.calibrate()
  File "/data/lsc/United-Perception/up/tasks/quant/runner/quant_runner.py", line 182, in calibrate
    self.model(batch)
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/data/lsc/United-Perception/up/tasks/quant/models/model_helper.py", line 76, in forward
    output = submodule(input)
  File "/opt/conda/lib/python3.8/site-packages/torch/fx/graph_module.py", line 308, in wrapped_call
    return cls_call(self, *args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/torch/fx/graph_module.py", line 308, in wrapped_call
    return cls_call(self, *args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "<eval_with_key_2>", line 4, in forward
    input_1_post_act_fake_quantizer = self.input_1_post_act_fake_quantizer(input_1);  input_1 = None
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/data/lsc/United-Perception/MQBench/mqbench/fake_quantize/fixed.py", line 20, in forward
    self.activation_post_process(X.detach())
AttributeError: 'dict' object has no attribute 'detach'

辛苦帮忙看下是什么问题？是mqbench还没有支持tengine么

opened by RedHandLM 11

Hi, will export to QLinear save weights in int8?
Using tensorrt backend, will QLinear make the onnx model smaller? I got some error when trying to save to QLinear:

deploy/common.py", line 138, in optimize_model assert node_detect, "Graph is illegel, error occured!" AssertionError: Graph is illegel, error occured!
bug
opened by jinfagang 10
how to use in mmdet build model

when use mmdet build this model, it will like: object {
module list aaaa module list bbb } when use prepare_by_platform to trace will get error like: TypeError: 'xxxobject' object does not support indexing
Stale

opened by 791136190 10
how to ptq for Faster RCNN or SSD？

From QDROP paper，i notice the benchmark result include Faster RCNN；

Could you provide this examples？

In addition, it's best to provide PTQ of SSD，another import object detection network；

opened by wangshankun 9
onnx inference

Hello. Finish model translate to onnx-quant, however cant use onnx-runtime to inference. error log No Op registered for LearnablePerTensorAffine with domain_version of 11

opened by www516717402 9
多个不同scale的输入，量化影响了结果

任务的模型有两个输入，一个是image，经过backbone后得到image features，另一个是输入其他detection模型检测得到的bbox 坐标，坐标经过了归一化，是0~1之间的float值。坐标经过线性层以及卷积层的上采样，结果与image features做concat。使用MQBench量化后，发现INT8的推理结果，对于head1精度很高，但是head2有明显的精度损失。网络定义如下：

想问下这种结构的网络一般怎么处理？
Stale

opened by zhouyang1989 8
DDP multi-gpu training issues with Imagenet example

I am trying to use multi-gpu QAT training using Imagenet example code. It runs into issue after first iteration training update.

RuntimeError: grad.numel() == bucket_view.numel() INTERNAL ASSERT FAILED at "/pytorch/torch/lib/c10d/reducer.cpp":343, please report a bug to PyTorch.

The code works fine with multi-gpu training if I comment the wrapper code that quantize the original model i.e., model=prepare_by_platform(model, args.backend). Did anyone encounter the same issue?

opened by kartikgupta-at-anu 7
KeyError for Adaround、Qdrop

When I use the MQBench to quant RLFN model with Qdrop、adaround, some errors have occurred. env： Ubuntu18.04，cuda11.1， MQbench version: e2175203c8e62596e66500a720a6cb1d1fc1dacd RLFN is a super resolution model from: https://github.com/ofsoundof/NTIRE2022_ESR, the model id is 4.

error: [MQBENCH] INFO: Disable observer and Disable quantize. [MQBENCH] INFO: Disable observer and Enable quantize. [MQBENCH] INFO: prepare layer reconstruction for fea_conv [MQBENCH] INFO: the node list is below! [MQBENCH] INFO: [input_1_post_act_fake_quantizer, fea_conv, fea_conv_post_act_fake_quantizer_2] Traceback (most recent call last): File "quant.py", line 158, in main() File "quant.py", line 137, in main model = ptq_reconstruction(model, stacked_tensor, EasyDict(ptq_reconstruction_config)) File ".../mqbench/advanced_ptq.py", line 636, in ptq_reconstruction fp32_module = fp32_modules[qnode2fpnode_dict[layer_node_list[-1]]] KeyError: fea_conv_post_act_fake_quantizer_2

Here is my code tracking and analysis

（1）mode.code def forward(self, input): input_1 = input input_1_post_act_fake_quantizer = self.input_1_post_act_fake_quantizer(input_1); input_1 = None fea_conv = self.fea_conv(input_1_post_act_fake_quantizer); input_1_post_act_fake_quantizer = None fea_conv_post_act_fake_quantizer_2 = self.fea_conv_post_act_fake_quantizer(fea_conv) fea_conv_post_act_fake_quantizer_1 = self.fea_conv_post_act_fake_quantizer(fea_conv) fea_conv_post_act_fake_quantizer = self.fea_conv_post_act_fake_quantizer(fea_conv); fea_conv = None ... （2）"problems" 问题,quant model node.target多对1,导致quant_named_nodes缺少keys： mqbench/advanced_ptq.py-》qnode2fpnode(quant_modules, fp32_modules): def qnode2fpnode(quant_modules, fp32_modules): quant_named_nodes = {node.target: node for node in quant_modules} """ node:fea_conv_post_act_fake_quantizer_2 node.target:fea_conv_post_act_fake_quantizer node:fea_conv_post_act_fake_quantizer_1 node.target:fea_conv_post_act_fake_quantizer """ fp32_named_nodes = {node.target: node for node in fp32_modules} qnode2fpnode_dict = {quant_named_nodes[key]: fp32_named_nodes[key] for key in quant_named_nodes} return qnode2fpnode_dict

I am not familiar with the process of trained PTQ, so looking forward to your suggestions and Solutions.

opened by feixiang7701 7

MQBench的结果与SNPE DSP的结果不是位精确的

MQBench是一个非常有趣的项目。

环境 pytorch: 1.8.1 MQBench: branch main, e2175203 SNPE: snpe-1.61.0.3358

问题: 我用一个只有两层卷积模型做了一个简单的测试，比对MQBench 量化后的结果和SNPE DSP的结果，发现并不是位精确的，请问一下这是否是正常的，我是否有哪里做错了。

复现

MQBench量化

def seed_torchv2(seed: int = 42) -> None:
    np.random.seed(seed)
    random.seed(seed)
    os.environ['PYTHONHASHSEED'] = str(seed)
    torch.backends.cudnn.benchmark = False
    torch.backends.cudnn.deterministic = True
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.avg_pool = nn.AdaptiveAvgPool2d(1)
        self.conv = nn.Conv2d(3, 128,1,1, bias=True)
        self.conv2 = nn.Conv2d(128, 20,1,1,bias=True)
        self.relu = nn.ReLU()
        self.flat = nn.Flatten(1)

    def forward(self, x): # (1,3,20,20)
        x = self.avg_pool(x)
        x = self.conv(x)
        x = self.conv2(x)
        x = self.flat(x)
        return x

    
SIZE = 20
backend = BackendType.SNPE

np.set_printoptions(suppress=True, precision=6)
torch.set_printoptions(6)
seed_torchv2(42)


def gen_input_data(length=100):
    data = []
    for _ in range(length):
        data.append(np.ones((1,3,SIZE,SIZE), dtype=np.float32) * 0.1 * np.random.randint(0, 10))
    return np.stack(data, axis=0)


model = Net()          # use vision pre-defined model
model.eval()

train_data = gen_input_data(100)
dummy_input = np.zeros((1,3,SIZE,SIZE), dtype=np.float32) + 0.5


print("pytorch fp32 result")
print(model(torch.from_numpy(dummy_input.copy())).float())


# quant
model = prepare_by_platform(model, backend)

enable_calibration(model)

for i, d in enumerate(train_data):
    _ = model(torch.from_numpy(d).float())

enable_quantization(model)


print("quant sim result")
print(model(torch.from_numpy(dummy_input.copy())).float())


input_shape = {"image":[1,3,SIZE,SIZE]}
convert_deploy(model, backend, input_shape)

# save dummy input and test it on DSP
image = dummy_input.copy()
assert image.shape == (1,3,SIZE,SIZE)
assert image.dtype == np.float32
image.tofile("./tmp.raw")
print("#" * 50)

pytorch fp32 result
tensor([[-0.347889, -0.289117, -0.083191, -0.222827,  0.124699,  0.235278,
          0.434433, -0.302174, -0.047763,  0.229472, -0.037784,  0.082496,
         -0.150852, -0.170281,  0.130777,  0.146441, -0.494992, -0.182881,
          0.600709, -0.063706]], grad_fn=<ViewBackward>)

quant sim result
tensor([[-0.344930, -0.290467, -0.081694, -0.222389,  0.131618,  0.231466,
          0.435701, -0.299544, -0.049924,  0.226927, -0.036308,  0.081694,
         -0.149772, -0.172465,  0.131618,  0.149772, -0.494702, -0.181542,
          0.599088, -0.063540]], grad_fn=<ViewBackward>

DLC转换 ./snpe-onnx-to-dlc --input_network mqbench_qmodel_deploy_model.onnx --output_path tmp.dlc --quantization_overrides mqbench_qmodel_clip_ranges.json ./snpe-dlc-quantize --input_dlc tmp.dlc --input_list tmp_file.txt --output_dlc tmp_quat_mq.dlc --override_params --bias_bitwidth 32 tmp_file.txt和tmp_file_android.txt都只有一个文件就是tmp.raw,tmp.raw在上面python程序里面保存下来为一个3x20x20的float文件
SNPE DSP run ./snpe-net-run --container /sdcard/tmp_quat_mq.dlc --input_list /sdcard/tmp_file_android.txt --use_dsp

################################################## 74.raw (20,) [-0.34493 -0.285929 -0.081694 -0.222389 0.127079 0.236005 0.435701 -0.299544 -0.049924 0.226927 -0.036308 0.081694 -0.149772 -0.172465 0.131618 0.149772 -0.490163 -0.177003 0.599088 -0.068078]

比对quant sim result 和 DSP 的结果，可以看到粗斜体是二者不一致的地方

good first issue Stale

opened by changewOw 7

Train with PACT but the value for cliping weights and activations which denoted as `alpha` seems not change.

the value for cliping weights and activations which denoted as alpha is initialized to 6.0, In my opinion, this value should be updated during training, but I found it not, I am training with the imagenet_example just adding such following configs to make PACT working.

if args.quant:
        extra_params = {
            'extra_qconfig_dict': {
                'w_observer': "MinMaxObserver",
                'a_observer': "EMAMinMaxObserver",
                'w_fakequantize': "PACTFakeQuantize",
                'a_fakequantize': "PACTFakeQuantize",
                'a_fakeq_params': {},
                'w_qscheme': {
                    'bit': 8,
                    'symmetry': True,
                    'per_channel': False,
                    'pot_scale': False
                },
                'a_qscheme': {
                    'bit': 8,
                    'symmetry': True,
                    'per_channel': False,
                    'pot_scale': False
                }
            },
            'extra_quantizer_dict': {},
            'preserve_attr': {},
            'concrete_args': {},
            'extra_fuse_dict': {}
        }
        print("==> config with extra params", extra_params)
        model = prepare_by_platform(model, args.backend, extra_params)

bug

opened by jianyin2016 7

关于使用ONNX-QNN在生成Deploy模型出现的问题

您好，非常感谢您的出色工作。我是MQBench的初学者，在使用您mqbench的QNN方案对vgg19模型进行量化时，我发现当我使用以下config的时候，生成的onnx模型无法进行下一步的模型转换，也就是去除伪量化块，生成Deploy模型。请问这样的问题该如何解决？

            extra_qconfig_dict = {
                'w_observer': 'ClipStdObserver',
                'a_observer': 'ClipStdObserver',
                'w_fakequantize': 'DSQFakeQuantize',
                'a_fakequantize': 'DSQFakeQuantize',
                'w_qscheme': {
                    'bit': 8,
                    'symmetry': True,
                    'per_channel': False,
                    'pot_scale': True
                },
                'a_qscheme': {
                    'bit': 8,
                    'symmetry': True,
                    'per_channel': False,
                    'pot_scale': True
                }
            }
            prepare_custom_config_dict = {
                'extra_qconfig_dict': extra_qconfig_dict
            }
           self.model = prepare_by_platform(self.model, BackendType.ONNX_QNN, prepare_custom_config_dict)

报错信息如下

  File "openpose_mqb.py", line 411, in train
    convert_deploy(self.model, BackendType.ONNX_QNN, input_shape, model_name = 'model_QNN')
  File "MQBench-0.0.6-py3.9.egg/mqbench/convert_deploy.py", line 184, in convert_deploy
    convert_function(deploy_model, **kwargs)
  File "MQBench-0.0.6-py3.9.egg/mqbench/convert_deploy.py", line 138, in deploy_qparams_tvm
    ONNXQNNPass(onnx_model_path).run(model_name)
  File "MQBench-0.0.6-py3.9.egg/mqbench/deploy/deploy_onnx_qnn.py", line 273, in run
    self.format_qlinear_dtype_pass()
  File "MQBench-0.0.6-py3.9.egg/mqbench/deploy/deploy_onnx_qnn.py", line 258, in format_qlinear_dtype_pass
    scale, zero_point, qmin, qmax = node.input[1], node.input[2], node.input[3], node.input[4]
IndexError: list index (3) out of range

opened by Zhoukai1234 1

The QAT [email protected] of mobilenet_v2 a4w4 LSQ cannot be reproduced as the paper shown 70.6%.
Hi, thanks for providing this amazing quantization framework ! I want to reproduce the [email protected] of mobilenet_v2 a4w4 LSQ under academic setting. The quantization configuration is as below:

dict(qtype='affine', w_qscheme=QuantizeScheme(symmetry=True, per_channel=True, pot_scale=False, bit=4, symmetric_range=False, p=2.4), a_qscheme=QuantizeScheme(symmetry=True, per_channel=False, pot_scale=False, bit=4, symmetric_range=False, p=2.4), default_weight_quantize=LearnableFakeQuantize, default_act_quantize=LearnableFakeQuantize, default_weight_observer=MSEObserver, default_act_observer=EMAMSEObserver),

For the training strategy, I set weght decay=0, lr = 1e-3 and batch_size=128 per GPU using 8 cards Nvidia A100. And the adjust_learning_rate strategy is remained the same as main.py. However, the highest [email protected] I reproduced in the validation set was only 68.66%, which is far from the 70.6% as the paper presented.

Which part I have missed ?
opened by LuletterSoul 0
关于yolov5s进行PTQ量化出现TraceError问题
嗨大家好，

今天我尝试使用mqbench对yolov5s进行PTQ量化

yolov5s模型来自于：https://github.com/ultralytics/yolov5.git

当我尝试如下代码进行yolov5s量化处理时

from mqbench.prepare_by_platform import prepare_by_platform, BackendType backend = BackendType.ONNX_QNN model = prepare_by_platform(model, backend)

出现了这个问题

torch.fx.proxy.TraceError: symbolically traced variables cannot be used as inputs to control flow

请问大家，这个问题有什么好的，简便的方式处理呢？
opened by xiaopengaia 2
关于如何将conv和bn层进行合并的问题

嗨大家好，

今天在做resnet50量化的时候，想将conv层和bn层进行合并，然后进行量化

为此我找到了fuser_method_mappings.py这个文件

同时调用了fuse_conv_freezebn这个函数

但在进行合并的时候，发现需要将网络中的conv和bn单独提取处理来，进行合并

显然，这样操作似乎过于麻烦些

因此，我尝试寻找r50_8_8.yaml中能够针对conv和bn相互融合的参数，未果

想请教大家是如何合并bn和conv层的

有没有较好的简便的方法，或者在r50_8_8.yaml是否有参数能够进行处理呢？

希望得到指正，谢谢大家！

opened by xiaopengaia 2

Releases(v0.0.7)

v0.0.7(Jul 22, 2022)

Update torch version to 1.10.
Source code(tar.gz)
Source code(zip)
v0.0.6(Feb 24, 2022)
Vitis backend update

OpenVINO backend

TRT explicit mode.

Brecq / Qdrop

Object Detection with EOD

Benchmark / Docs update

CI

Other

Source code(tar.gz)
Source code(zip)
v0.0.5(Feb 24, 2022)
Branch of update.

Docs update

Custom tracer.

Source code(tar.gz)
Source code(zip)
pre-trained(Mar 2, 2022)
You can reproduce our results easily using the following pre-trained models:

MobileNet v2

RegNetX 600m

RegNetX 800m

ResNet 18

ResNet 50

Source code(tar.gz)
Source code(zip)
mobilenetv2_imagenet.pth.tar(26.95 MB)
regnetx_600m_imagenet.pth(23.86 MB)
regnetx_800m_imagenet.pth(27.93 MB)
resnet18_imagenet.pth.tar(44.64 MB)
resnet50_imagenet.pth.tar(97.75 MB)
v0.0.4(Nov 30, 2021)

v0.0.4 Note: This is a new branch.
Source code(tar.gz)
Source code(zip)
v0.0.3(Nov 4, 2021)
Update deploy to TVM

TQT fake quantize

Bug fixed

Source code(tar.gz)
Source code(zip)
v0.0.2(Sep 8, 2021)

Source code(tar.gz)
Source code(zip)
v0.0.1(Sep 8, 2021)

Source code(tar.gz)
Source code(zip)

Owner

Before Spring

GitHub Repository

Repository of our paper 'Refer-it-in-RGBD' in CVPR 2021

Refer-it-in-RGBD This is the repository of our paper 'Refer-it-in-RGBD: A Bottom-up Approach for 3D Visual Grounding in RGBD Images' in CVPR 2021 Pape

34 Nov 07, 2022

Boost learning for GNNs from the graph structure under challenging heterophily settings. (NeurIPS'20)

Beyond Homophily in Graph Neural Networks: Current Limitations and Effective Designs Jiong Zhu, Yujun Yan, Lingxiao Zhao, Mark Heimann, Leman Akoglu,

70 Dec 18, 2022

Technical Indicators implemented in Python only using Numpy-Pandas as Magic - Very Very Fast! Very tiny! Stock Market Financial Technical Analysis Python library . Quant Trading automation or cryptocoin exchange

MyTT Technical Indicators implemented in Python only using Numpy-Pandas as Magic - Very Very Fast! to Stock Market Financial Technical Analysis Python

34 Dec 27, 2022

LaneDetectionAndLaneKeeping - Lane Detection And Lane Keeping

LaneDetectionAndLaneKeeping This project is part of my bachelor's thesis. The go

5 Jun 27, 2022

Implementation of Retrieval-Augmented Denoising Diffusion Probabilistic Models in Pytorch

Retrieval-Augmented Denoising Diffusion Probabilistic Models (wip) Implementation of Retrieval-Augmented Denoising Diffusion Probabilistic Models in P

55 Jan 01, 2023

A high performance implementation of HDBSCAN clustering.

HDBSCAN HDBSCAN - Hierarchical Density-Based Spatial Clustering of Applications with Noise. Performs DBSCAN over varying epsilon values and integrates

2.3k Jan 02, 2023

This YoloV5 based model is fit to detect people and different types of land vehicles, and displaying their density on a fitted map, according to their coordinates and detected labels.

This YoloV5 based model is fit to detect people and different types of land vehicles, and displaying their density on a fitted map, according to their

8 May 22, 2022

Code for Iso-Points: Optimizing Neural Implicit Surfaces with Hybrid Representations

Implementation for Iso-Points (CVPR 2021) Official code for paper Iso-Points: Optimizing Neural Implicit Surfaces with Hybrid Representations paper |

66 Nov 08, 2022

Multimodal Descriptions of Social Concepts: Automatic Modeling and Detection of (Highly Abstract) Social Concepts evoked by Art Images

MUSCO - Multimodal Descriptions of Social Concepts Automatic Modeling of (Highly Abstract) Social Concepts evoked by Art Images This project aims to i

0 Aug 22, 2021

Implementation of our NeurIPS 2021 paper "A Bi-Level Framework for Learning to Solve Combinatorial Optimization on Graphs".

PPO-BiHyb This is the official implementation of our NeurIPS 2021 paper "A Bi-Level Framework for Learning to Solve Combinatorial Optimization on Grap

[email protected]"> 66 Nov 23, 2022

WatermarkRemoval-WDNet-WACV2021

WatermarkRemoval-WDNet-WACV2021 Thank you for your attention. Citation Please cite the related works in your publications if it helps your research: @

63 Dec 05, 2022

realsense d400 -> jpg + csv

Realsense-capture realsense d400 - jpg + csv Requirements RealSense sdk : Installation Python3 pyrealsense2 (RealSense SDK) Numpy OpenCV Tkinter Run

2 Mar 22, 2022

Code for paper entitled "Improving Novelty Detection using the Reconstructions of Nearest Neighbours"

NLN: Nearest-Latent-Neighbours A repository containing the implementation of the paper entitled Improving Novelty Detection using the Reconstructions

4 Dec 14, 2022

A Multi-modal Model Chinese Spell Checker Released on ACL2021.

ReaLiSe ReaLiSe is a multi-modal Chinese spell checking model. This the office code for the paper Read, Listen, and See: Leveraging Multimodal Informa

106 Dec 29, 2022

Libraries, tools and tasks created and used at DeepMind Robotics.

dm_robotics: Libraries, tools, and tasks created and used for Robotics research at DeepMind. Package overview Package Summary Transformations Rigid bo

273 Jan 06, 2023

DeFMO: Deblurring and Shape Recovery of Fast Moving Objects (CVPR 2021)

Evaluation, Training, Demo, and Inference of DeFMO DeFMO: Deblurring and Shape Recovery of Fast Moving Objects (CVPR 2021) Denys Rozumnyi, Martin R. O

139 Dec 26, 2022

Garbage classification using structure data.

垃圾分类模型使用说明 1.包含以下数据文件文件描述 data/MaterialMapping.csv 物体以及其归类的信息 data/TestRecords 光谱原始测试数据 CSV 文件 data/TestRecordDesc.zip CSV 文件描述文件 data/Boundaries.cs

1 Dec 10, 2021

Official PyTorch implementation of Learning Intra-Batch Connections for Deep Metric Learning (ICML 2021) published at International Conference on Machine Learning

About This repository the official PyTorch implementation of Learning Intra-Batch Connections for Deep Metric Learning. The config files contain the s

41 Dec 10, 2022

Unimodal Face Classification with Multimodal Training

Unimodal Face Classification with Multimodal Training This is a PyTorch implementation of the following paper: Unimodal Face Classification with Multi

3 Jul 06, 2022

Unbalanced Feature Transport for Exemplar-based Image Translation (CVPR 2021)

UNITE and UNITE+ Unbalanced Feature Transport for Exemplar-based Image Translation (CVPR 2021) Unbalanced Intrinsic Feature Transport for Exemplar-bas

183 Nov 09, 2022

MQBench: Towards Reproducible and Deployable Model Quantization Benchmark

Related tags

Overview

MQBench: Towards Reproducible and Deployable Model Quantization Benchmark

Table of Contents

Highlighted Features

Installation

How to Reproduce MQBench

How to self-implement a quantization algorithm

How to self-implement a hardware configuration

Submitting Your Results to MQBench

License

Comments

Releases(v0.0.7)

v0.0.7(Jul 22, 2022)

v0.0.6(Feb 24, 2022)

v0.0.5(Feb 24, 2022)

pre-trained(Mar 2, 2022)

v0.0.4(Nov 30, 2021)

v0.0.3(Nov 4, 2021)

v0.0.2(Sep 8, 2021)

v0.0.1(Sep 8, 2021)

Owner

Repository of our paper 'Refer-it-in-RGBD' in CVPR 2021

Boost learning for GNNs from the graph structure under challenging heterophily settings. (NeurIPS'20)

Technical Indicators implemented in Python only using Numpy-Pandas as Magic - Very Very Fast! Very tiny! Stock Market Financial Technical Analysis Python library . Quant Trading automation or cryptocoin exchange

LaneDetectionAndLaneKeeping - Lane Detection And Lane Keeping

Implementation of Retrieval-Augmented Denoising Diffusion Probabilistic Models in Pytorch

A high performance implementation of HDBSCAN clustering.

This YoloV5 based model is fit to detect people and different types of land vehicles, and displaying their density on a fitted map, according to their coordinates and detected labels.

Code for Iso-Points: Optimizing Neural Implicit Surfaces with Hybrid Representations

Multimodal Descriptions of Social Concepts: Automatic Modeling and Detection of (Highly Abstract) Social Concepts evoked by Art Images

Implementation of our NeurIPS 2021 paper "A Bi-Level Framework for Learning to Solve Combinatorial Optimization on Graphs".

WatermarkRemoval-WDNet-WACV2021

realsense d400 -> jpg + csv

Code for paper entitled "Improving Novelty Detection using the Reconstructions of Nearest Neighbours"

A Multi-modal Model Chinese Spell Checker Released on ACL2021.

Libraries, tools and tasks created and used at DeepMind Robotics.

DeFMO: Deblurring and Shape Recovery of Fast Moving Objects (CVPR 2021)

Garbage classification using structure data.

Official PyTorch implementation of Learning Intra-Batch Connections for Deep Metric Learning (ICML 2021) published at International Conference on Machine Learning

Unimodal Face Classification with Multimodal Training

Unbalanced Feature Transport for Exemplar-based Image Translation (CVPR 2021)