quantize aware training package for NCNN on pytorch

Last update: Nov 23, 2022

Related tags

Deep Learning ncnnqat

Overview

ncnnqat

ncnnqat is a quantize aware training package for NCNN on pytorch.

ncnnqat

Installation

Supported Platforms: Linux
Accelerators and GPUs: NVIDIA GPUs via CUDA driver 10.1.
Dependencies:
- python >= 3.5, < 4
- pytorch >= 1.6
- numpy >= 1.18.1
- onnx >= 1.7.0
- onnx-simplifier >= 0.3.6
Install ncnnqat via pypi:
```
$ pip install ncnnqat (to do....)
```
It is recommended to install from the source code

or Install ncnnqat via repo：

$ git clone https://github.com/ChenShisen/ncnnqat
$ cd ncnnqat
$ make install

Usage

register_quantization_hook and merge_freeze_bn

(suggest finetuning from a well-trained model, do it after a few epochs of training otherwise.)

from ncnnqat import unquant_weight, merge_freeze_bn, register_quantization_hook
...
...
    for epoch in range(epoch_train):
        model.train()
    if epoch==well_epoch:
        register_quantization_hook(model)
    if epoch>=well_epoch:
        model = merge_freeze_bn(model)  #it will change bn to eval() mode during training
...

Unquantize weight before update it

...
... 
    if epoch>=well_epoch:
        model.apply(unquant_weight)  # using original weight while updating
    optimizer.step()
...

Save weight and save ncnn quantize table after train

...
...
    onnx_path = "./xxx/model.onnx"
    table_path="./xxx/model.table"
    dummy_input = torch.randn(1, 3, img_size, img_size, device='cuda')
    input_names = [ "input" ]
    output_names = [ "fc" ]
    torch.onnx.export(model, dummy_input, onnx_path, verbose=False, input_names=input_names, output_names=output_names)
    save_table(model,onnx_path=onnx_path,table=table_path)

...

if use "model = nn.DataParallel(model)",pytorch unsupport torch.onnx.export,you should save state_dict first and prepare a new model with one gpu,then you will export onnx model.

...
...
    model_s = new_net() #
    model_s.cuda()
    register_quantization_hook(model_s)
    #model_s = merge_freeze_bn(model_s)
    onnx_path = "./xxx/model.onnx"
    table_path="./xxx/model.table"
    dummy_input = torch.randn(1, 3, img_size, img_size, device='cuda')
    input_names = [ "input" ]
    output_names = [ "fc" ]
    model_s.load_state_dict({k.replace('module.',''):v for k,v in model.state_dict().items()}) #model_s = model     model = nn.DataParallel(model)
          
    torch.onnx.export(model_s, dummy_input, onnx_path, verbose=False, input_names=input_names, output_names=output_names)
    save_table(model_s,onnx_path=onnx_path,table=table_path)
    

...

Code Examples

Cifar10 quantization aware training example.

python test/test_cifar10.py

SSD300 quantization aware training example.

   ln -s /your_coco_path/coco ./tests/ssd300/data

   python -m torch.distributed.launch \
    --nproc_per_node=4 \
    --nnodes=1 \
    --node_rank=0 \
    ./tests/ssd300/main.py \
    -d ./tests/ssd300/data/coco

    python ./tests/ssd300/main.py --onnx_save  #load model dict, export onnx and ncnn table

Results

Cifar10

result：

net fp32(onnx) ncnnqat ncnn aciq ncnn kl

mobilenet_v2 0.91 0.9066 0.9033 0.9066

resnet18 0.94 0.93333 0.9367 0.937

net	fp32(onnx)	ncnnqat	ncnn aciq	ncnn kl
mobilenet_v2	0.91	0.9066	0.9033	0.9066
resnet18	0.94	0.93333	0.9367	0.937

SSD300(resnet18|coco)

fp32:
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.193
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.344
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.191
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.042
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.195
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.328
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.199
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.293
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.309
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.084
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.326
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.501
Current AP: 0.19269

ncnnqat:
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.192
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.342
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.194
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.041
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.194
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.327
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.197
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.291
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.307
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.082
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.325
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.497
Current AP: 0.19202

Todo

....

quantize aware training package for NCNN on pytorch

Related tags

Overview

ncnnqat

Table of Contents

Installation

Usage

Code Examples

Results

Todo

Owner

Collect super-resolution related papers, data, repositories

HPRNet: Hierarchical Point Regression for Whole-Body Human Pose Estimation

OCRA (Object-Centric Recurrent Attention) source code

Sparse Progressive Distillation: Resolving Overfitting under Pretrain-and-Finetune Paradigm

A universal framework for learning timestamp-level representations of time series

public repo for ESTER dataset and modeling (EMNLP'21)

Causal estimators for use with WhyNot

Using machine learning to predict and analyze high and low reader engagement for New York Times articles posted to Facebook.

PyTorch source code for Distilling Knowledge by Mimicking Features

A list of Machine Learning Art Colabs

A Python library created to assist programmers with complex mathematical functions

Pynomial - a lightweight python library for implementing the many confidence intervals for the risk parameter of a binomial model

Quadruped-command-tracking-controller - Quadruped command tracking controller (flat terrain)

Python3 Implementation of (Subspace Constrained) Mean Shift Algorithm in Euclidean and Directional Product Spaces

The lightweight PyTorch wrapper for high-performance AI research. Scale your models, not the boilerplate.

Python library for computer vision labeling tasks. The core functionality is to translate bounding box annotations between different formats-for example, from coco to yolo.

A tensorflow=1.13 implementation of Deconvolutional Networks on Graph Data (NeurIPS 2021)

E2EC: An End-to-End Contour-based Method for High-Quality High-Speed Instance Segmentation

Release of the ConditionalQA dataset

A Shading-Guided Generative Implicit Model for Shape-Accurate 3D-Aware Image Synthesis