QKeras: a quantization deep learning library for Tensorflow Keras

Overview

QKeras

github.com/google/qkeras

QKeras 0.8 highlights:

  • Automatic quantization using QKeras;

  • Stochastic behavior (including stochastic rouding) is disabled during inference;

  • LeakyReLU for quantized_relu;

  • Qtools for estimating effort to perform inference;

    • Qtools will estimate the sizes and types of operations to perform inference, with its data sizes compatible with high-level synthesis datatypes. For example, quantized_bits and quantized_relu bits and int_bits from Qtools will match exactly ac_fixed datatypes (if you rely on QKeras alone, the correct datatype should be ac_fixed<bits, int_bits+is_negative, is_negative>, where is_negative has to be inferred from the other parameters of the quantizer.

Introduction

QKeras is a quantization extension to Keras that provides drop-in replacement for some of the Keras layers, especially the ones that creates parameters and activation layers, and perform arithmetic operations, so that we can quickly create a deep quantized version of Keras network.

According to Tensorflow documentation, Keras is a high-level API to build and train deep learning models. It's used for fast prototyping, advanced research, and production, with three key advantages:

  • User friendly

Keras has a simple, consistent interface optimized for common use cases. It provides clear and actionable feedback for user errors.

  • Modular and composable

Keras models are made by connecting configurable building blocks together, with few restrictions.

  • Easy to extend

Write custom building blocks to express new ideas for research. Create new layers, loss functions, and develop state-of-the-art models.

QKeras is being designed to extend the functionality of Keras using Keras' design principle, i.e. being user friendly, modular and extensible, adding to it being "minimally intrusive" of Keras native functionality.

In order to successfully quantize a model, users need to replace variable creating layers (Dense, Conv2D, etc) by their counterparts (QDense, QConv2D, etc), and any layers that perform math operations need to be quantized afterwards.

Publications

http://arxiv.org/abs/2006.10159

Layers Implemented in QKeras

  • QDense

  • QConv1D

  • QConv2D

  • QDepthwiseConv2D

  • QSeparableConv1D (depthwise + pointwise convolution, without quantizing the activation values after the depthwise step)

  • QSeparableConv2D (depthwise + pointwise convolution, without quantizing the activation values after the depthwise step)

  • QMobileNetSeparableConv2D (extended from MobileNet SeparableConv2D implementation, quantizes the activation values after the depthwise step)

  • QConv2DTranspose

  • QActivation

  • QAdaptiveActivation [EXPERIMENTAL]

  • QAveragePooling2D (in fact, an AveragePooling2D stacked with a QActivation layer for quantization of the result)

  • QBatchNormalization (is still in its experimental stage, as we have not seen the need to use this yet due to the normalization and regularization effects of stochastic activation functions.)

  • QOctaveConv2D

  • QSimpleRNN, QSimpleRNNCell

  • QLSTM, QLSTMCell

  • QGRU, QGRUCell

  • QBidirectional

It is worth noting that not all functionality is safe at this time to be used with other high-level operations, such as with layer wrappers. For example, Bidirectional layer wrappers are used with RNNs. If this is required, we encourage users to use quantization functions invoked as strings instead of the actual functions as a way through this, but we may change that implementation in the future.

A first attempt to create a safe mechanism in QKeras is the adoption of QActivation is a wrap-up that provides an encapsulation around the activation functions so that we can save and restore the network architecture, and duplicate them using Keras interface, but this interface has not been fully tested yet.

Activation Layers Implemented in QKeras

  • smooth_sigmoid(x)

  • hard_sigmoid(x)

  • binary_sigmoid(x)

  • binary_tanh(x)

  • smooth_tanh(x)

  • hard_tanh(x)

  • quantized_bits(bits=8, integer=0, symmetric=0, keep_negative=1)(x)

  • bernoulli(alpha=1.0)(x)

  • stochastic_ternary(alpha=1.0, threshold=0.33)(x)

  • ternary(alpha=1.0, threshold=0.33)(x)

  • stochastic_binary(alpha=1.0)(x)

  • binary(alpha=1.0)(x)

  • quantized_relu(bits=8, integer=0, use_sigmoid=0, negative_slope=0.0)(x)

  • quantized_ulaw(bits=8, integer=0, symmetric=0, u=255.0)(x)

  • quantized_tanh(bits=8, integer=0, symmetric=0)(x)

  • quantized_po2(bits=8, max_value=-1)(x)

  • quantized_relu_po2(bits=8, max_value=-1)(x)

The stochastic_* functions, bernoulli as well as quantized_relu and quantized_tanh rely on stochastic versions of the activation functions. They draw a random number with uniform distribution from _hard_sigmoid of the input x, and result is based on the expected value of the activation function. Please refer to the papers if you want to understand the underlying theory, or the documentation in qkeras/qlayers.py.

The parameters "bits" specify the number of bits for the quantization, and "integer" specifies how many bits of "bits" are to the left of the decimal point. Finally, our experience in training networks with QSeparableConv2D, both quantized_bits and quantized_tanh that generates values between [-1, 1), required symmetric versions of the range in order to properly converge and eliminate the bias.

Every time we use a quantization for weights and bias that can generate numbers outside the range [-1.0, 1.0], we need to adjust the *_range to the number. For example, if we have a quantized_bits(bits=6, integer=2) in a weight of a layer, we need to set the weight range to 2**2, which is equivalent to Catapult HLS ac_fixed<6, 3, true>. Similarly, for quantization functions that accept an alpha parameter, we need to specify a range of alpha, and for po2 type of quantizers, we need to specify the range of max_value.

Example

Suppose you have the following network.

An example of a very simple network is given below in Keras.

from keras.layers import *

x = x_in = Input(shape)
x = Conv2D(18, (3, 3), name="first_conv2d")(x)
x = Activation("relu")(x)
x = SeparableConv2D(32, (3, 3))(x)
x = Activation("relu")(x)
x = Flatten()(x)
x = Dense(NB_CLASSES)(x)
x = Activation("softmax")(x)

You can easily quantize this network as follows:

from keras.layers import *
from qkeras import *

x = x_in = Input(shape)
x = QConv2D(18, (3, 3),
        kernel_quantizer="stochastic_ternary",
        bias_quantizer="ternary", name="first_conv2d")(x)
x = QActivation("quantized_relu(3)")(x)
x = QSeparableConv2D(32, (3, 3),
        depthwise_quantizer=quantized_bits(4, 0, 1),
        pointwise_quantizer=quantized_bits(3, 0, 1),
        bias_quantizer=quantized_bits(3),
        depthwise_activation=quantized_tanh(6, 2, 1))(x)
x = QActivation("quantized_relu(3)")(x)
x = Flatten()(x)
x = QDense(NB_CLASSES,
        kernel_quantizer=quantized_bits(3),
        bias_quantizer=quantized_bits(3))(x)
x = QActivation("quantized_bits(20, 5)")(x)
x = Activation("softmax")(x)

The last QActivation is advisable if you want to compare results later on. Please find more cases under the directory examples.

QTools

The purpose of QTools is to assist hardware implementation of the quantized model and model energy consumption estimation. QTools has two functions: data type map generation and energy consumption estimation.

  • Data Type Map Generation: QTools automatically generate the data type map for weights, bias, multiplier, adder, etc. of each layer. The data type map includes operation type, variable size, quantizer type and bits, etc. Input of the QTools is:
  1. a given quantized model;
  2. a list of input quantizers for the model. Output of QTools json file that list the data type map of each layer (stored in qtools_instance._output_dict) Output methods include: qtools_stats_to_json, which is to output the data type map to a json file; qtools_stats_print which is to print out the data type map.
  • Energy Consumption Estimation: Another function of QTools is to estimate the model energy consumption in Pico Joules (pJ). It provides a tool for QKeras users to quickly estimate energy consumption for memory access and MAC operations in a quantized model derived from QKeras, especially when comparing power consumption of two models running on the same device.

As with any high-level model, it should be used with caution when attempting to estimate the absolute energy consumption of a model for a given technology, or when attempting to compare different technologies.

This tool also provides a measure for model tuning which needs to consider both accuracy and model energy consumption. The energy cost provided by this tool can be integrated into a total loss function which combines energy cost and accuracy.

  • Energy Model: The best work referenced by the literature on energy consumption was first computed by Horowitz M.: “1.1 computing’s energy problem ( and what we can do about it)”; IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2014

In this work, the author attempted to estimate the energy consumption for accelerators, and for 45 nm process, the data points he presented has since been used whenever someone wants to compare accelerator performance. QTools energy consumption on a 45nm process is based on the data published in this work.

  • Examples: Example of how to generate data type map can be found in qkeras/qtools/ examples/example_generate_json.py. Example of how to generate energy consumption estimation can be found in qkeras/qtools/examples/example_get_energy.py

AutoQKeras

AutoQKeras allows the automatic quantization and rebalancing of deep neural networks by treating quantization and rebalancing of an existing deep neural network as a hyperparameter search in Keras-Tuner using random search, hyperband or gaussian processes.

In order to contain the explosion of hyperparameters, users can group tasks by patterns, and perform distribute training using available resources.

Extensive documentation is present in notebook/AutoQKeras.ipynb.

Related Work

QKeras has been implemented based on the work of "B.Moons et al. - Minimum Energy Quantized Neural Networks", Asilomar Conference on Signals, Systems and Computers, 2017 and "Zhou, S. et al. - DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients," but the framework should be easily extensible. The original code from QNN can be found below.

https://github.com/BertMoons/QuantizedNeuralNetworks-Keras-Tensorflow

QKeras extends QNN by providing a richer set of layers (including SeparableConv2D, DepthwiseConv2D, ternary and stochastic ternary quantizations), besides some functions to aid the estimation for the accumulators and conversion between non-quantized to quantized networks. Finally, our main goal is easy of use, so we attempt to make QKeras layers a true drop-in replacement for Keras, so that users can easily exchange non-quantized layers by quantized ones.

Acknowledgements

Portions of QKeras were derived from QNN.

https://github.com/BertMoons/QuantizedNeuralNetworks-Keras-Tensorflow

Copyright (c) 2017, Bert Moons where it applies

Comments
  • Kernel weights and activations not quantized after training

    Kernel weights and activations not quantized after training

    Hi there! I was interested in implementing the Qkeras example for MNIST CNN model as given in the examples section - Link. This examples involves quantizing the weights and activations into INT4 or 4 bits using the quantized_bits(4,0,1) method for Conv kernels and activations. I was expecting the weights and activations to be in INT4 but they were in FP32 and there wasn't any integer left of decimal point. I ran some experiments with the quantized_bits() method and the results were quantized! And here are the weights and activations for the MNIST model after model_save_quantized_weights(): I would essentially want to save the quantized model with the INT8 or INT4 weights and convert into a TRT engine and do GPU inferencing. Any pointers? Thanks, Yoga

    opened by YogaVicky 11
  • Converting regular Keras weights to Qkeras

    Converting regular Keras weights to Qkeras

    Hello,

    First I wanted to say: kudos for creating this library; I'm really excited to try it out on different models!

    I saw in the readme:

    QKeras extends QNN by providing a richer set of layers (including SeparableConv2D, DepthwiseConv2D, ternary and stochastic ternary quantizations), besides some functions to aid the estimation for the accumulators and conversion between non-quantized to quantized networks.

    Is there any documentation on using those tools to convert pretrained weights (e.g. ImageNet) to the quantized versions?

    Thanks!

    opened by xhluca 10
  • Transposed convolution (deconvolution)

    Transposed convolution (deconvolution)

    This PR adds QConv2DTranspose which is useful in autoencoders. I can also add QConv1DTranspose but the equivalent Keras layer is only avaiable in nightly TF releases, not in stable channel yet.

    cla: yes ready to pull 
    opened by vloncar 9
  • Support JSON save/load

    Support JSON save/load

    Currently, QKeras doesn't support saving model with model.to_json() and loading with load_from_json. This extends the QDense, QConv1D, QConv2D, QDepthwiseConv2D and QBatchNormalization as well as the quantizers and activations to support this functionality.

    cla: yes ready to pull 
    opened by vloncar 8
  • add non-stochastic inference mode to quantizers

    add non-stochastic inference mode to quantizers

    Currently, models with stochastic quantizers peform stochastic operations even in inference mode. This mechanism prevents this and uses the non-stochastic version of the quantizer if in inference mode.

    cla: no 
    opened by jecorona97 7
  • Fixing Conv1D's weight info extraction

    Fixing Conv1D's weight info extraction

    Hi,

    This is a PR to fix the print_qstats error I mentioned in #13 . Essentially I just replaced kernel_h and kernel_w extraction by kernel_length for Conv1D. Someone might have copied the code from Conv2D and forgot to change it. I tested and it worked fine.

    Regards,

    Duc.

    cla: yes ready to pull 
    opened by Duchstf 7
  • Using qkeras layers concurrently with Tensorflow's pruning tools.

    Using qkeras layers concurrently with Tensorflow's pruning tools.

    Hello, very cool project!!

    I'm just wondering if it would be possible to train the model using qkeras layers with the pruning tools in Tensorflow's model optimization package. For example, can we have something like this?

    tf.keras.Sequential([
        sparsity.prune_low_magnitude(
            l.QConv2D(32, 5, padding='same', activation='relu'),
            input_shape=input_shape,
            **pruning_params)])
    

    Thanks,

    Duc.

    enhancement 
    opened by Duchstf 7
  • QSeparableConv1D and 2D

    QSeparableConv1D and 2D

    This PR adds QSeparableConv1D and QSeparableConv2D. The existing QSeparableConv2D (which expands to QDepthwiseConv2D and 1x1 QConv2D) that is based on MobileNet is retained and renamed QMobileNetSeparableConv2D

    cla: yes ready to pull 
    opened by vloncar 6
  • print_qstats(): operation type issue with Sequential() model

    print_qstats(): operation type issue with Sequential() model

    When I was applying quantization on a Keras Sequential() model, I found that there could be an issue about the operation type in print_stats() function.

    For example, with the model in example_mnist.py but coded by the Sequential() API, I got an output as below. The operation type for the first conv2d layer is unull_4_-1, whereas it is smult_4_8 with the functional API.

    Based on my experiments with some other models, this only happens to the first layer of the Sequential() model.

    Also, for smult_4_8, I would like to know what does the 8 stand for here?

    I am on: tensorflow-gpu 2.2.0 tensorflow-model-optimization 0.4.1

    Number of operations in model:
        conv2d_0_m                    : 25088 (unull_4_-1)
        conv2d_1_m                    : 663552 (smult_4_4)
        conv2d_2_m                    : 147456 (smult_4_4)
        dense                         : 5760  (smult_4_4)
    
    Number of operation types in model:
        smult_4_4                     : 816768
        unull_4_-1                    : 25088
    
    Weight profiling:
        conv2d_0_m_weights             : 128   (4-bit unit)
        conv2d_0_m_bias                : 32    (4-bit unit)
        conv2d_1_m_weights             : 18432 (4-bit unit)
        conv2d_1_m_bias                : 64    (4-bit unit)
        conv2d_2_m_weights             : 16384 (4-bit unit)
        conv2d_2_m_bias                : 64    (4-bit unit)
        dense_weights                  : 5760  (4-bit unit)
        dense_bias                     : 10    (4-bit unit)
    
    Weight sparsity:
    ... quantizing model
        conv2d_0_m                     : 0.1812
        conv2d_1_m                     : 0.1345
        conv2d_2_m                     : 0.1156
        dense                          : 0.1393
        ----------------------------------------
        Total Sparsity                 : 0.1278
    
    opened by HaoranREN 6
  • Return a function instead of calling it (in safe_eval)

    Return a function instead of calling it (in safe_eval)

    If quantizer is a function (like binary_tanh or hard_sigmoid, used as activations), safe_eval would try to make an instance of it and fail. This was due to the change introduced in #12. We should check if quantizer is class to be instantiated before being called or a function ready to be called.

    cla: yes ready to pull 
    opened by vloncar 6
  • QBatchNormalization with scale=False and model_save_quantized_weights

    QBatchNormalization with scale=False and model_save_quantized_weights

    When model_save_quantized_weights is called on a model including a QBatchNormalization with scale=False it seems that the wrong quantizers are used. QBatchNormalization.get_quantizers() returns a list with gamma_quantizer as first element even when there is no gamma, resulting in a disalignment between quantizers and weights in this point https://github.com/google/qkeras/blob/1f2134b48548a548f22ee7b75079cb9e34eaff5b/qkeras/utils.py#L159

    opened by lattuada-st 5
  • `pyparser` vs `pyparsing`

    `pyparser` vs `pyparsing`

    I see you have both pyparser and pyparsing in your requirements.txt. However, only pyparser is in the setup.py as a dependency. Moreover, I only see a use of the pyparsing library in the code.

    It seems to me that only pyparsing should be in the requirements.txt and in setup.py as a dependency. What do you all think?

    For reference:

    • pyparser: Code: https://keep.imfreedom.org/grim/pyparser, PyPI: https://pypi.org/project/pyparser/
    • pyparsing: Code: https://github.com/pyparsing/pyparsing, PyPI: https://pypi.org/project/pyparsing/
    opened by jmduarte 0
  • How do I save an AutoQKeras model that a different script can load?

    How do I save an AutoQKeras model that a different script can load?

    I can't figure out how to get back a model from an AutoQKeras search in one script, when in another script. I tried to use qmodel.save('qmodel') and qmodel = load_qmodel('qmodel'), but I get these errors.

    Traceback (most recent call last):
      File "code/auto_qkeras.py", line 578, in <module>
        aqk_model = load_qmodel('qmodel')
      File "/home/berian/.local/lib/python3.8/site-packages/qkeras/utils.py", line 928, in load_qmodel
        qmodel = tf.keras.models.load_model(filepath, custom_objects=custom_objects,
      File "/usr/local/lib/python3.8/dist-packages/keras/utils/traceback_utils.py", line 67, in error_handler
        raise e.with_traceback(filtered_tb) from None
      File "/usr/local/lib/python3.8/dist-packages/keras/saving/saved_model/load.py", line 1008, in revive_custom_object
        raise ValueError(
    ValueError: Unable to restore custom object of type _tf_keras_metric. Please make sure that any custom layers are included in the `custom_objects` arg when calling `load_model()` and make sure that all layers implement `get_config` and `from_config`.
    

    Following the AutoQKeras guide https://github.com/google/qkeras/blob/master/notebook/AutoQKeras.ipynb, there is an example for saving/loading weights into a QKeras model object. However, I won't have the model object from the AutoQKeras searchin a different script, so using qmodel.load_weights("qmodel.h5") is not feasible. I have also noticed that when I make my own QKeras model object, qmodel.save(...) and qmodel = load_qmodel(...) work just fine.

    Maybe there are some extra options I need to add to theload_qmodel(...) function? Or is there a better way altogether to transfer qmodel the object from one script to another?

    opened by alexberian 0
  • Cannot convert 6.0 to EagerTensor of dtype int64

    Cannot convert 6.0 to EagerTensor of dtype int64

    Hi all,

    My setup is:

    Arch Linux 5.15.78-1-lts Python 3.10.8 Tensorflow 2.11.0 Numpy 1.23.0 qkeras 0.9.0

    I am running the following example code:

    import tensorflow as tf
    import numpy as np
    from qkeras import QActivation
    
    
    # build the model
    l_0 = tf.keras.layers.Input(shape=2)
    l_1 = QActivation("bernoulli")(l_0)
    l_2 = tf.keras.layers.Dense(units=10, activation="sigmoid")(l_1)
    l_3 = QActivation("bernoulli")(l_2)
    out = tf.keras.layers.Dense(units=1, activation="sigmoid")(l_3)
    
    # create the model
    model = tf.keras.models.Model(inputs=l_0, outputs=out)
    model.compile(loss='binary_crossentropy')
    
    # create some data
    x = np.array([[1,2],[3,4],[5,6]])
    y = np.array([[0],[1],[1]])
    
    # fit the model
    model.fit(x, y)
    
    # eval the model layers
    layer_out = None
    for layer in model.layers:
        if "input" in layer.name:
            layer_out = layer(x)
        if "input" not in layer.name:
            layer_out = layer(layer_out)
    

    Until fitting everything works well but in the evaluation step of my model layers I encounter the following errro:

    Traceback (most recent call last):
      File "test.py", line 30, in <module>
        layer_out = layer(layer_out)
      File "keras/utils/traceback_utils.py", line 70, in error_handler
        raise e.with_traceback(filtered_tb) from None
      File "qkeras/qlayers.py", line 177, in call
        return self.quantizer(inputs)
      File "qkeras/quantizers.py", line 796, in __call__
        p = tf.keras.backend.sigmoid(self.temperature * x / std)
    TypeError: Exception encountered when calling layer 'q_activation' (type QActivation).
    
    Cannot convert 6.0 to EagerTensor of dtype int64
    
    Call arguments received by layer 'q_activation' (type QActivation):
      • inputs=tf.Tensor(shape=(3, 2), dtype=int64)
    

    I think the problem is caused because in quantizers.py the variables std and temperature are not match up with the input data type of x. One way to fix it is to change the code from line 790 to:

        std = tf.constant(1.0, dtype=tf.float32)
    
        if self.use_real_sigmoid:
          self.temperature = tf.constant(self.temperature, dtype=std.dtype)
          x = tf.cast(x, std.dtype)
          p = tf.keras.backend.sigmoid(self.temperature * x / std)
    

    with this one forces the type to be tf.float32.

    Cheers, Marius

    opened by makoeppel 0
  • Only Qconv layer's output tensors are quantized

    Only Qconv layer's output tensors are quantized

    Hello,

    I am using a quantized QKeras model, where all the Conv, BatchNormalization, and Dense parameters have been quantized to 4 bits.

    However, when I run the predict function of one image and then print the output tensors of the quantized layers, I can see that only the Qconv layer's output tensors are expressed in 4 bits. In contrast, the outputs tensors of the QBatchNormalization and the QDense are expressed in regular floating point.

    My question is: If I use a QKeras quantized model, does QKeras perform the quantization of the input tensors or output tensor of the quantized layers in the prediction function internally? Why is only the QConv layer's output expressed in 4 bits?

    ## Loading model
    model = qkeras_utils.load_qmodel(model_dir)
    model.summary()
    
    (train_images, train_labels), (test_images, test_labels) = keras.datasets.cifar10.load_data()
    
    class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer',
                   'dog', 'frog', 'horse', 'ship', 'truck']
    
    # Converting the pixels data to float type
    train_images = train_images.astype('float32')
    test_images = test_images.astype('float32')
     
    # Standardizing (255 is the total number of pixels an image can have)
    train_images = train_images / 255
    test_images = test_images / 255 
    
    num_classes = 10
    train_labels = to_categorical(train_labels, num_classes)
    test_labels = to_categorical(test_labels, num_classes)
    
    iterations = 1
    for i in range(iterations):
        print("Iteration ", i)
        image = test_images[i].reshape(-1, 32, 32, 3)
        #predictions = model.predict(image)
        get_all_layer_outputs = K.function([model.layers[0].input],
                                          [l.output for l in model.layers[0:]])
    
        layer_output = get_all_layer_outputs([image]) # return the same thing
        m = 0
        for j in layer_output:
            print(model.layers[m].__class__.__name__)
            print(j)
            m = m+1
        
    

    And my output:

    Layer (type)                 Output Shape              Param #   
    =================================================================
    conv2d (QConv2D)             (None, 32, 32, 32)        896       
    _________________________________________________________________
    batch_normalization (QBatchN (None, 32, 32, 32)        128       
    _________________________________________________________________
    conv2d_1 (QConv2D)           (None, 32, 32, 32)        9248      
    _________________________________________________________________
    batch_normalization_1 (QBatc (None, 32, 32, 32)        128       
    _________________________________________________________________
    max_pooling2d (MaxPooling2D) (None, 16, 16, 32)        0         
    _________________________________________________________________
    dropout (Dropout)            (None, 16, 16, 32)        0         
    _________________________________________________________________
    conv2d_2 (QConv2D)           (None, 16, 16, 64)        18496     
    _________________________________________________________________
    batch_normalization_2 (QBatc (None, 16, 16, 64)        256       
    _________________________________________________________________
    conv2d_3 (QConv2D)           (None, 16, 16, 64)        36928     
    _________________________________________________________________
    batch_normalization_3 (QBatc (None, 16, 16, 64)        256       
    _________________________________________________________________
    max_pooling2d_1 (MaxPooling2 (None, 8, 8, 64)          0         
    _________________________________________________________________
    dropout_1 (Dropout)          (None, 8, 8, 64)          0         
    _________________________________________________________________
    conv2d_4 (QConv2D)           (None, 8, 8, 128)         73856     
    _________________________________________________________________
    batch_normalization_4 (QBatc (None, 8, 8, 128)         512       
    _________________________________________________________________
    conv2d_5 (QConv2D)           (None, 8, 8, 128)         147584    
    _________________________________________________________________
    batch_normalization_5 (QBatc (None, 8, 8, 128)         512       
    _________________________________________________________________
    max_pooling2d_2 (MaxPooling2 (None, 4, 4, 128)         0         
    _________________________________________________________________
    dropout_2 (Dropout)          (None, 4, 4, 128)         0         
    _________________________________________________________________
    flatten (Flatten)            (None, 2048)              0         
    _________________________________________________________________
    dense (QDense)               (None, 128)               262272    
    _________________________________________________________________
    batch_normalization_6 (QBatc (None, 128)               512       
    _________________________________________________________________
    dropout_3 (Dropout)          (None, 128)               0         
    _________________________________________________________________
    dense_1 (QDense)             (None, 10)                1290      
    =================================================================
    ...
    
    QConv2D
    [[[[0.     0.     0.25   ... 0.     0.375  0.    ]
       [0.     0.     0.     ... 0.     0.6875 0.25  ]
       [0.     0.     0.     ... 0.     0.6875 0.1875]
    
    ...
    
    QBatchNormalization
    [[[[ 0.02544868  0.16547686  1.791272   ... -0.0244638   0.58454317
        -0.66077614]
       [ 0.02544868  0.16547686  0.0947198  ... -0.0244638   1.4546151
         1.0357761 ]
       [ 0.02544868  0.16547686  0.0947198  ... -0.0244638   1.4546151
         0.61163807]
    ...
    
    QConv2D
    [[[[0.     0.9375 0.     ... 0.     0.     0.9375]
       [0.     0.     0.     ... 0.375  0.     0.    ]
       [0.     0.     0.     ... 0.0625 0.     0.    ]
       ...
    
    opened by laumecha 0
  • Error in energy estimation for AveragePooling2D layers

    Error in energy estimation for AveragePooling2D layers

    Greetings, I am trying to quantize the network for the KWS application using DS CNN. The network is described here (LINK)(lines from 85 to 141).

    When running AutoQKeras, It shows an error on energy estimation for Average2D pooling layers:

    Traceback (most recent call last): File "/home/auto_qk.py", line 180, in autoqk = AutoQKeras(model, metrics=[keras.metrics.SparseCategoricalAccuracy()], custom_objects=custom_objects, **run_config) File "/usr/local/lib/python3.8/dist-packages/qkeras/autoqkeras/autoqkeras_internal.py", line 831, in init self.hypermodel = AutoQKHyperModel( File "/usr/local/lib/python3.8/dist-packages/qkeras/autoqkeras/autoqkeras_internal.py", line 125, in init self.reference_size = self.target.get_reference(model) File "/usr/local/lib/python3.8/dist-packages/qkeras/autoqkeras/forgiving_metrics/forgiving_energy.py", line 121, in get_reference energy_dict = q.pe( File "/usr/local/lib/python3.8/dist-packages/qkeras/qtools/run_qtools.py", line 85, in pe energy_dict = qenergy.energy_estimate( File "/usr/local/lib/python3.8/dist-packages/qkeras/qtools/qenergy/qenergy.py", line 302, in energy_estimate add_energy = OP[get_op_type(accumulator.output)]["add"]( AttributeError: 'NoneType' object has no attribute 'output'

    When I remove the Average2D pooling layer, the AutoQKeras does not produce the error. I tried to set quant parameters for AveragePooling, but no luck.

    Code for AutoQKeras:

    AutoQkeras start

    # set quantization configs 
    
    quantization_config = {
        "kernel": {
                "binary": 1,
                "stochastic_binary": 1,
                "ternary": 2,
                "stochastic_ternary": 2,
                "quantized_bits(2,0,1,1,alpha=\"auto_po2\")": 2,
                "quantized_bits(3,0,1,1,alpha=\"auto_po2\")": 3,
                "quantized_bits(4,0,1,1,alpha=\"auto_po2\")": 4,
                "quantized_bits(3,0,1,1,alpha=\"auto_po2\")": 5,
                "quantized_bits(4,0,1,1,alpha=\"auto_po2\")": 6
        },
        "bias": {
                "quantized_bits(2,0,1,1,alpha=\"auto_po2\")": 2,
                "quantized_bits(3,0,1,1,alpha=\"auto_po2\")": 3,
                "quantized_bits(4,0,1,1,alpha=\"auto_po2\")": 4,
                "quantized_bits(3,0,1,1,alpha=\"auto_po2\")": 5,
                "quantized_bits(4,0,1,1,alpha=\"auto_po2\")": 6
        },
        "activation": {
                "binary": 1,
                "ternary": 2,
                "quantized_bits(2,0,1,1,alpha=\"auto_po2\")": 2,
                "quantized_bits(3,0,1,1,alpha=\"auto_po2\")": 3,
                "quantized_bits(4,0,1,1,alpha=\"auto_po2\")": 4,
                "quantized_bits(3,0,1,1,alpha=\"auto_po2\")": 5,
                "quantized_bits(4,0,1,1,alpha=\"auto_po2\")": 6
        },
        "linear": {
                "binary": 1,
                "ternary": 2,
                "quantized_bits(2,0,1,1,alpha=\"auto_po2\")": 2,
                "quantized_bits(3,0,1,1,alpha=\"auto_po2\")": 3,
                "quantized_bits(4,0,1,1,alpha=\"auto_po2\")": 4,
                "quantized_bits(3,0,1,1,alpha=\"auto_po2\")": 5,
                "quantized_bits(4,0,1,1,alpha=\"auto_po2\")": 6
        }
    }
    
    # define limits 
    limit = {
        "Dense": [4, 4, 4],
        "Conv2D": [4, 4, 4],
        "DepthwiseConv2D": [4, 4, 4],
        "Activation": [4],
        "AveragePooling2D":  [4, 4, 4],
        "BatchNormalization": [],
        "Dense":[],
    }
    
    # define goal (delta = forgiving factor lets put at 8% like in tutorial )
    
    goal = {
        "type": "energy",
        "params": {
            "delta_p": 8.0,
            "delta_n": 8.0,
            "rate": 2.0,
            "stress": 1.0,
            "process": "horowitz",
            "parameters_on_memory": ["sram", "sram"],
            "activations_on_memory": ["sram", "sram"],
            "rd_wr_on_io": [False, False],
            "min_sram_size": [0, 0],
            "source_quantizers": ["int8"],
            "reference_internal": "int8",
            "reference_accumulator": "int32"
            }
    }
    
    # SOME RUN CONFIGS
    
    run_config = {
        "output_dir": Flags.bg_path + "auto_qk_dump",
        "goal": goal,
        "quantization_config": quantization_config,
        "learning_rate_optimizer": False,
        "transfer_weights": False,
        "mode": "random",
        "seed": 42,
        "limit": limit,
        "tune_filters": "layer",
        "tune_filters_exceptions": "^dense",
        # first layer is input, layer two layers are softmax and flatten
        "layer_indexes": range(1, len(model.layers)-1),
        "max_trials": 20
        }
    
    
    # Start autoQkeras 
    
    model.summary()
    model.compile(
        #optimizer=keras.optimizers.RMSprop(learning_rate=args.learning_rate),  # Optimizer
        optimizer=keras.optimizers.Adam(learning_rate=Flags.learning_rate),  # Optimizer
        # Loss function to minimize
        loss=keras.losses.SparseCategoricalCrossentropy(),
        # List of metrics to monitor
        metrics=[keras.metrics.SparseCategoricalAccuracy()],
    )   
    #model = keras.models.load_model(Flags.saved_model_path)
    
    custom_objects = {}
    autoqk = AutoQKeras(model, metrics=[keras.metrics.SparseCategoricalAccuracy()], custom_objects=custom_objects, **run_config)
    autoqk.fit(ds_train, validation_data=ds_val, epochs=Flags.epochs, callbacks=callbacks)
    
    qmodel = autoqk.get_best_model()
    model.save_weights(Flags.bg_path + "auto_qk_dump/","qmodel.h5")
    ### AutoQkeras stop
    
    opened by RatkoFri 0
Releases(v0.9.0)
  • v0.9.0(Feb 20, 2021)

    Major Features

    • qtools energy support for global_average_pooling layer.

    • Added layers for sequence model, LSTM, RNN, GRU.

    • Added activation and weight compression notebook.

    • Added QSeparableConv2D class

      • Renamed previous QSeparableConv2D layer to QMobileNetSeparableConv2D
      • It is more consistent with Keras SeparableConv2D API
    • Bugfix of QDepthwiseConv2D.

    • Added an experimental QAdaptiveActivation layer to learn quantizer integer bits from activation values.

    • Added weight sparsity calculation to model qstats.

    • Enabled AutoQKeras to use custom Keras Tuners.

    • Fixed various bugs in AutoQKeras.

    Thanks to our contributors

    This release contains contributions from many people at Google and CERN.

    Source code(tar.gz)
    Source code(zip)
  • v0.8.0(Jun 19, 2020)

    Major Features

    • Automatic quantization using QKeras;

    • Stochastic behavior (including stochastic rounding) is disabled during inference;

    • LeakyReLU for quantized_relu;

    • Qtools for estimating effort to perform inference;

      • Qtools will estimate the sizes and types of operations to perform inference, with its data sizes compatible with high-level synthesis datatypes. For example, quantized_bits and quantized_relu bits and int_bits from Qtools will match exactly ac_fixed datatypes (if you rely on QKeras alone, the correct datatype should be ac_fixed<bits, int_bits+is_negative, is_negative>, where is_negative has to be inferred from the other parameters of the quantizer.
    • Other bug fixes and enhancement.

    Thanks to our contributors

    This release contains contributions from many people at Google and CERN.

    Source code(tar.gz)
    Source code(zip)
  • v0.7.4(Apr 11, 2020)

  • v0.7.0(Mar 27, 2020)

    Major Features

    • Enhancement of binary and ternary quantization as well as their stochastic counterparts for parameters and activation.
    • Add auto scaling for low-bitwidth quantization.
    • Add jupyter notebook.

    Thanks to our Contributors

    This release contains contributions from many people at Google.

    Source code(tar.gz)
    Source code(zip)
  • v0.6.0(Mar 11, 2020)

    Major Features

    • Use Tensorflow 2.1+ and tf.keras.
      • QKeras does not support the standalone Keras anymore.
      • Use Python 3.
    • Support APIs of pruning and PrunableLayer from tensorflow_model_optimization for model sparsity.
    • Add QBatchNormalization layer.

    Thanks to our Contributors

    This release contains contributions from many people at Google and CERN.

    Source code(tar.gz)
    Source code(zip)
  • v0.5.0(Jan 12, 2020)

    QKeras 0.5.0 uses Tensorflow version < 2 and standalone Keras as backend.

    Major Features

    This is the first release of QKeras.

    Notes

    In the next release, we will support TensorFlow 2+ and tf.keras.

    Thanks to our Contributors

    This release contains contributions from many people at Google.

    Source code(tar.gz)
    Source code(zip)
Owner
Google
Google ❤️ Open Source
Google
Calibrate your listeners! Robust communication-based training for pragmatic speakers. Findings of EMNLP 2021.

Calibrate your listeners! Robust communication-based training for pragmatic speakers Rose E. Wang, Julia White, Jesse Mu, Noah D. Goodman Findings of

Rose E. Wang 3 Apr 02, 2022
Avatarify Python - Avatars for Zoom, Skype and other video-conferencing apps.

Avatarify Python - Avatars for Zoom, Skype and other video-conferencing apps.

Ali Aliev 15.3k Jan 05, 2023
Data and Code for paper Outlining and Filling: Hierarchical Query Graph Generation for Answering Complex Questions over Knowledge Graph is available for research purposes.

Data and Code for paper Outlining and Filling: Hierarchical Query Graph Generation for Answering Complex Questions over Knowledge Graph is available f

Yongrui Chen 5 Nov 10, 2022
Recommendationsystem - Movie-recommendation - matrixfactorization colloborative filtering recommendation system user

recommendationsystem matrixfactorization colloborative filtering recommendation

kunal jagdish madavi 1 Jan 01, 2022
A PyTorch library for Vision Transformers

VFormer A PyTorch library for Vision Transformers Getting Started Read the contributing guidelines in CONTRIBUTING.rst to learn how to start contribut

Society for Artificial Intelligence and Deep Learning 142 Nov 28, 2022
This is a template for the Non-autoregressive Deep Learning-Based TTS model (in PyTorch).

Non-autoregressive Deep Learning-Based TTS Template This is a template for the Non-autoregressive TTS model. It contains Data Preprocessing Pipeline D

Keon Lee 13 Dec 05, 2022
Code of TVT: Transferable Vision Transformer for Unsupervised Domain Adaptation

TVT Code of TVT: Transferable Vision Transformer for Unsupervised Domain Adaptation Datasets: Digit: MNIST, SVHN, USPS Object: Office, Office-Home, Vi

37 Dec 15, 2022
[NeurIPS 2021] “Improving Contrastive Learning on Imbalanced Data via Open-World Sampling”,

Improving Contrastive Learning on Imbalanced Data via Open-World Sampling Introduction Contrastive learning approaches have achieved great success in

VITA 24 Dec 17, 2022
DA2Lite is an automated model compression toolkit for PyTorch.

DA2Lite (Deep Architecture to Lite) is a toolkit to compress and accelerate deep network models. ⭐ Star us on GitHub — it helps!! Frameworks & Librari

Sinhan Kang 7 Mar 22, 2022
[NeurIPS 2021] Source code for the paper "Qu-ANTI-zation: Exploiting Neural Network Quantization for Achieving Adversarial Outcomes"

Qu-ANTI-zation This repository contains the code for reproducing the results of our paper: Qu-ANTI-zation: Exploiting Quantization Artifacts for Achie

Secure AI Systems Lab 8 Mar 26, 2022
Multi-Objective Loss Balancing for Physics-Informed Deep Learning

Multi-Objective Loss Balancing for Physics-Informed Deep Learning Code for ReLoBRaLo. Abstract Physics Informed Neural Networks (PINN) are algorithms

Rafael Bischof 16 Dec 12, 2022
Open source hardware and software platform to build a small scale self driving car.

Donkeycar is minimalist and modular self driving library for Python. It is developed for hobbyists and students with a focus on allowing fast experimentation and easy community contributions.

Autorope 2.4k Jan 04, 2023
Application of K-means algorithm on a music dataset after a dimensionality reduction with PCA

PCA for dimensionality reduction combined with Kmeans Goal The Goal of this notebook is to apply a dimensionality reduction on a big dataset in order

Arturo Ghinassi 0 Sep 17, 2022
Storchastic is a PyTorch library for stochastic gradient estimation in Deep Learning

Storchastic is a PyTorch library for stochastic gradient estimation in Deep Learning

Emile van Krieken 140 Dec 30, 2022
[ICLR 2022] DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR

DAB-DETR This is the official pytorch implementation of our ICLR 2022 paper DAB-DETR. Authors: Shilong Liu, Feng Li, Hao Zhang, Xiao Yang, Xianbiao Qi

336 Dec 25, 2022
A Machine Teaching Framework for Scalable Recognition

MEMORABLE This repository contains the source code accompanying our ICCV 2021 paper. A Machine Teaching Framework for Scalable Recognition Pei Wang, N

2 Dec 08, 2021
Utility code for use with PyXLL

pyxll-utils There is no need to use this package as of PyXLL 5. All features from this package are now provided by PyXLL. If you were using this packa

PyXLL 10 Dec 18, 2021
Reference code for the paper CAMS: Color-Aware Multi-Style Transfer.

CAMS: Color-Aware Multi-Style Transfer Mahmoud Afifi1, Abdullah Abuolaim*1, Mostafa Hussien*2, Marcus A. Brubaker1, Michael S. Brown1 1York University

Mahmoud Afifi 36 Dec 04, 2022
Detecting drunk people through thermal images using Deep Learning (CNN)

Drunk Detection CNN Detecting drunk people through thermal images using Deep Learning (CNN) Dataset We used thermal images provided by Electronics Lab

Giacomo Ferretti 3 Oct 27, 2022
Unofficial pytorch implementation for Self-critical Sequence Training for Image Captioning. and others.

An Image Captioning codebase This is a codebase for image captioning research. It supports: Self critical training from Self-critical Sequence Trainin

Ruotian(RT) Luo 906 Jan 03, 2023