Distiller is an open-source Python package for neural network compression research.

Overview

License DOI

Distiller is an open-source Python package for neural network compression research.

Network compression can reduce the memory footprint of a neural network, increase its inference speed and save energy. Distiller provides a PyTorch environment for prototyping and analyzing compression algorithms, such as sparsity-inducing methods and low-precision arithmetic.

Table of Contents

Highlighted features

  • Automatic Compression
  • Weight pruning
    • Element-wise pruning using magnitude thresholding, sensitivity thresholding, target sparsity level, and activation statistics
  • Structured pruning
    • Convolution: 2D (kernel-wise), 3D (filter-wise), 4D (layer-wise), and channel-wise structured pruning.
    • Fully-connected: column-wise and row-wise structured pruning.
    • Structure groups (e.g. structures of 4 filters).
    • Structure-ranking with using weights or activations criteria (Lp-norm, APoZ, gradients, random, etc.).
    • Support for new structures (e.g. block pruning)
  • Control
    • Soft (mask on forward-pass only) and hard pruning (permanently disconnect neurons)
    • Dual weight copies (compute loss on masked weights, but update unmasked weights)
    • Model thinning (AKA "network garbage removal") to permanently remove pruned neurons and connections.
  • Schedule
    • Flexible scheduling of pruning, regularization, and learning rate decay (compression scheduling)
    • One-shot and iterative pruning (and fine-tuning) are supported.
    • Easily control what is performed each training step (e.g. greedy layer by layer pruning to full model pruning).
    • Automatic gradual schedule (AGP) for pruning individual connections and complete structures.
    • The compression schedule is expressed in a YAML file so that a single file captures the details of experiments. This dependency injection design decouples the Distiller scheduler and library from future extensions of algorithms.
  • Element-wise and filter-wise pruning sensitivity analysis (using L1-norm thresholding). Examine the data from some of the networks we analyzed, using this notebook.
  • Regularization
    • L1-norm element-wise regularization
    • Group Lasso an group variance regularization
  • Quantization
    • Automatic mechanism to transform existing models to quantized versions, with customizable bit-width configuration for different layers. No need to re-write the model for different quantization methods.
    • Post-training quantization of trained full-precision models, dynamic and static (statistics-based)
    • Support for quantization-aware training in the loop
  • Knowledge distillation
    • Training with knowledge distillation, in conjunction with the other available pruning / regularization / quantization methods.
  • Conditional computation
    • Sample implementation of Early Exit
  • Low rank decomposition
  • Lottery Ticket Hypothesis training
  • Export statistics summaries using Pandas dataframes, which makes it easy to slice, query, display and graph the data.
  • A set of Jupyter notebooks to plan experiments and analyze compression results. The graphs and visualizations you see on this page originate from the included Jupyter notebooks.
    • Take a look at this notebook, which compares visual aspects of dense and sparse Alexnet models.
    • This notebook creates performance indicator graphs from model data.
  • Sample implementations of published research papers, using library-provided building blocks. See the research papers discussions in our model-zoo.
  • Logging to the console, text file and TensorBoard-formatted file.
  • Export to ONNX (export of quantized models pending ONNX standardization)

Installation

These instructions will help get Distiller up and running on your local machine.

1. Clone Distiller

Clone the Distiller code repository from github:

$ git clone https://github.com/IntelLabs/distiller.git

The rest of the documentation that follows, assumes that you have cloned your repository to a directory called distiller.

2. Create a Python virtual environment

We recommend using a Python virtual environment, but that of course, is up to you. There's nothing special about using Distiller in a virtual environment, but we provide some instructions, for completeness.
Before creating the virtual environment, make sure you are located in directory distiller. After creating the environment, you should see a directory called distiller/env.

Using virtualenv

If you don't have virtualenv installed, you can find the installation instructions here.

To create the environment, execute:

$ python3 -m virtualenv env

This creates a subdirectory named env where the python virtual environment is stored, and configures the current shell to use it as the default python environment.

Using venv

If you prefer to use venv, then begin by installing it:

$ sudo apt-get install python3-venv

Then create the environment:

$ python3 -m venv env

As with virtualenv, this creates a directory called distiller/env.

Activate the environment

The environment activation and deactivation commands for venv and virtualenv are the same.
!NOTE: Make sure to activate the environment, before proceeding with the installation of the dependency packages:

$ source env/bin/activate

3. Install the Distiller package

Finally, install the Distiller package and its dependencies using pip3:

$ cd distiller
$ pip3 install -e .

This installs Distiller in "development mode", meaning any changes made in the code are reflected in the environment without re-running the install command (so no need to re-install after pulling changes from the Git repository).

Notes:

  • Distiller has only been tested on Ubuntu 16.04 LTS, and with Python 3.5.
  • If you are not using a GPU, you might need to make small adjustments to the code.

Required PyTorch Version

Distiller is tested using the default installation of PyTorch 1.3.1, which uses CUDA 10.1. We use TorchVision version 0.4.2. These are included in Distiller's requirements.txt and will be automatically installed when installing the Distiller package as listed above.

If you do not use CUDA 10.1 in your environment, please refer to PyTorch website to install the compatible build of PyTorch 1.3.1 and torchvision 0.4.2.

Getting Started

Distiller comes with sample applications and tutorials covering a range of model types:

Model Type Sparsity Post-training quantization Quantization-aware training Auto Compression (AMC) Knowledge Distillation
Image classification
Word-level language model
Translation (GNMT)
Recommendation System (NCF)
Object Detection

Head to the examples directory for more details.

Other resources to refer to, beyond the examples:

Basic Usage Examples

The following are simple examples using Distiller's image classifcation sample, showing some of Distiller's capabilities.

Example: Simple training-only session (no compression)

The following will invoke training-only (no compression) of a network named 'simplenet' on the CIFAR10 dataset. This is roughly based on TorchVision's sample Imagenet training application, so it should look familiar if you've used that application. In this example we don't invoke any compression mechanisms: we just train because for fine-tuning after pruning, training is an essential part.

Note that the first time you execute this command, the CIFAR10 code will be downloaded to your machine, which may take a bit of time - please let the download process proceed to completion.

The path to the CIFAR10 dataset is arbitrary, but in our examples we place the datasets in the same directory level as distiller (i.e. ../../../data.cifar10).

First, change to the sample directory, then invoke the application:

$ cd distiller/examples/classifier_compression
$ python3 compress_classifier.py --arch simplenet_cifar ../../../data.cifar10 -p 30 -j=1 --lr=0.01

You can use a TensorBoard backend to view the training progress (in the diagram below we show a couple of training sessions with different LR values). For compression sessions, we've added tracing of activation and parameter sparsity levels, and regularization loss.

Example: Getting parameter statistics of a sparsified model

We've included in the git repository a few checkpoints of a ResNet20 model that we've trained with 32-bit floats. Let's load the checkpoint of a model that we've trained with channel-wise Group Lasso regularization.
With the following command-line arguments, the sample application loads the model (--resume) and prints statistics about the model weights (--summary=sparsity). This is useful if you want to load a previously pruned model, to examine the weights sparsity statistics, for example. Note that when you resume a stored checkpoint, you still need to tell the application which network architecture the checkpoint uses (-a=resnet20_cifar):

$ python3 compress_classifier.py --resume=../ssl/checkpoints/checkpoint_trained_ch_regularized_dense.pth.tar -a=resnet20_cifar ../../../data.cifar10 --summary=sparsity

You should see a text table detailing the various sparsities of the parameter tensors. The first column is the parameter name, followed by its shape, the number of non-zero elements (NNZ) in the dense model, and in the sparse model. The next set of columns show the column-wise, row-wise, channel-wise, kernel-wise, filter-wise and element-wise sparsities.
Wrapping it up are the standard-deviation, mean, and mean of absolute values of the elements.

In the Compression Insights notebook we use matplotlib to plot a bar chart of this summary, that indeed show non-impressive footprint compression.

Although the memory footprint compression is very low, this model actually saves 26.6% of the MACs compute.

$ python3 compress_classifier.py --resume=../ssl/checkpoints/checkpoint_trained_channel_regularized_resnet20_finetuned.pth.tar -a=resnet20_cifar ../../../data.cifar10 --summary=compute

Example: Post-training quantization

This example performs 8-bit quantization of ResNet20 for CIFAR10. We've included in the git repository the checkpoint of a ResNet20 model that we've trained with 32-bit floats, so we'll take this model and quantize it:

$ python3 compress_classifier.py -a resnet20_cifar ../../../data.cifar10 --resume ../ssl/checkpoints/checkpoint_trained_dense.pth.tar --quantize-eval --evaluate

The command-line above will save a checkpoint named quantized_checkpoint.pth.tar containing the quantized model parameters. See more examples here.

Explore the sample Jupyter notebooks

The set of notebooks that come with Distiller is described here, which also explains the steps to install the Jupyter notebook server.
After installing and running the server, take a look at the notebook covering pruning sensitivity analysis.

Sensitivity analysis is a long process and this notebook loads CSV files that are the output of several sessions of sensitivity analysis.

Running the tests

We are currently light-weight on test and this is an area where contributions will be much appreciated.
There are two types of tests: system tests and unit-tests. To invoke the unit tests:

$ cd distiller/tests
$ pytest

We use CIFAR10 for the system tests, because its size makes for quicker tests. To invoke the system tests, you need to provide a path to the CIFAR10 dataset which you've already downloaded. Alternatively, you may invoke full_flow_tests.py without specifying the location of the CIFAR10 dataset and let the test download the dataset (for the first invocation only). Note that --cifar1o-path defaults to the current directory.
The system tests are not short, and are even longer if the test needs to download the dataset.

$ cd distiller/tests
$ python full_flow_tests.py --cifar10-path=
   

   

The script exits with status 0 if all tests are successful, or status 1 otherwise.

Generating the HTML documentation site

Install mkdocs and the required packages by executing:

$ pip3 install -r doc-requirements.txt

To build the project documentation run:

$ cd distiller/docs-src
$ mkdocs build --clean

This will create a folder named 'site' which contains the documentation website. Open distiller/docs/site/index.html to view the documentation home page.

Versioning

We use SemVer for versioning. For the versions available, see the tags on this repository.

License

This project is licensed under the Apache License 2.0 - see the LICENSE.md file for details

Community

Github projects using Distiller

  • DeGirum Pruned Models - a repository containing pruned models and related information.

  • TorchFI - TorchFI is a fault injection framework build on top of PyTorch for research purposes.

  • hsi-toolbox - Hyperspectral CNN compression and band selection

Research papers citing Distiller

If you used Distiller for your work, please use the following citation:

@article{nzmora2019distiller,
  author       = {Neta Zmora and
                  Guy Jacob and
                  Lev Zlotnik and
                  Bar Elharar and
                  Gal Novik},
  title        = {Neural Network Distiller: A Python Package For DNN Compression Research},
  month        = {October},
  year         = {2019},
  url          = {https://arxiv.org/abs/1910.12232}
}

Acknowledgments

Any published work is built on top of the work of many other people, and the credit belongs to too many people to list here.

  • The Python and PyTorch developer communities have shared many invaluable insights, examples and ideas on the Web.
  • The authors of the research papers implemented in the Distiller model-zoo have shared their research ideas, theoretical background and results.

Built With

  • PyTorch - The tensor and neural network framework used by Distiller.
  • Jupyter - Notebook serving.
  • TensorBoard - Used to view training graphs.
  • Cadene - Pretrained PyTorch models.

Disclaimer

Distiller is released as a reference code for research purposes. It is not an official Intel product, and the level of quality and support may not be as expected from an official product. Additional algorithms and features are planned to be added to the library. Feedback and contributions from the open source and research communities are more than welcome.

Comments
  • Automated Deep Compression status

    Automated Deep Compression status

    Hello there, I am wondering about the state of the ADC implementation, and what remains to bring it to a functional state. In the ADC merge commit message, you mentioned that it is still WiP and that it is using an unreleased version of Coach. Is that still the case? Also, is there any documentation for how to use ADC in Distiller?

    Thanks

    automated compression 
    opened by amjad-twalo 28
  • sensitivity analysis fail

    sensitivity analysis fail

    Hi Neta,

    I tried to run the sensitivity analysis for filter with the following command 'python3 compress_classifier.py -a resnet20_cifar --data ../../../data.cifar10/ -j 12 --resume=../ssl/checkpoints/checkpoint_trained_dense.pth.tar --sense=filter', but got an error, detailed log:

    Logging to TensorBoard - remember to execute the server:

    tensorboard --logdir='./logs'

    => loading checkpoint ../ssl/checkpoints/checkpoint_trained_dense.pth.tar Checkpoint keys: arch optimizer compression_sched state_dict best_top1 epoch best [email protected]: 92.540 Loaded compression schedule from checkpoint (epoch 179) => loaded checkpoint '../ssl/checkpoints/checkpoint_trained_dense.pth.tar' (epoch 179) Optimizer Type: <class 'torch.optim.sgd.SGD'> Optimizer Args: {'lr': 0.1, 'momentum': 0.9, 'dampening': 0, 'weight_decay': 0.0001, 'nesterov': False} Files already downloaded and verified Files already downloaded and verified Dataset sizes: training=45000 validation=5000 test=10000 Running sensitivity tests Testing sensitivity of module.conv1.weight [0.0% sparsity] Traceback (most recent call last): File "compress_classifier.py", line 782, in main() File "compress_classifier.py", line 339, in main return sensitivity_analysis(model, criterion, test_loader, pylogger, args) File "compress_classifier.py", line 750, in sensitivity_analysis group=args.sensitivity) File "/home/chongyu/application/distiller/distiller/sensitivity.py", line 108, in perform_sensitivity_analysis scheduler.on_epoch_begin(0) File "/home/chongyu/application/distiller/distiller/scheduler.py", line 112, in on_epoch_begin policy.on_epoch_begin(self.model, self.zeros_mask_dict, meta) File "/home/chongyu/application/distiller/distiller/policy.py", line 123, in on_epoch_begin self.is_last_epoch = meta['current_epoch'] == (meta['ending_epoch'] - 1) TypeError: unsupported operand type(s) for -: 'NoneType' and 'int'

    It looks like there is no valid value for meta['ending_epoch']. Can you kindly suggest how to solve it? Thanks.

    opened by cattpku 22
  • can not work for object detection model

    can not work for object detection model

    I tried to use distiller for my cascade rcnn model but it did not work for me and i need some help.

    I used AGP pruning method and followed the schedule introduced in the guide doc to change my training code, and the backbone resnet50(used in senet model) parameters were selected to prune. However in the process the total sparsity just was 0. It failed. What is the problem and how should i adjust??

    in epoch loop:  e in train function: t when i want to see the mask info, i get this error: err i don't know why the mask is not working and i think the masks are set when epoch begin. ps: i just can't see any model compression project support object detection, including distiller and pocketflow? Why? What is the difference between object detection problems and classification model when we try to compress them ??

    looking forward to you answer~3q

    question pruning object detection 
    opened by Jakel21 16
  • Quantize my model question?

    Quantize my model question?

    Hi~, I implemented my model quantization like this: `import distiller from distiller.quantization import PostTrainLinearQuantizer

    quantizer = PostTrainLinearQuantizer(model) quantizer.prepare_model(torch.rand(*your_input_shape)) apputils.save_checkpoint(0, 'mymodel', model, optimizer=None, name='model', dir='quantization')`

    But output model did not resize to 1/4,just not changed, would you please to help me? Thanks!

    opened by BlossomingL 13
  • Post Training Quantization General Questions

    Post Training Quantization General Questions

    Hello,

    Im trying to quantise my CNN classification model using a simple post training quantisation. The paper by Ramakrishnan on Quantisizing CNN for efficient inference suggests that we can try to two ways to go about this.

    1. Weight Only quantisation- Either quantise the tensors layer wise or channel wise, they recommend channel wise quantisation.
    2. Activations and weights both.

    I tried to perform weight only quantisation but am getting horrible results. My 99% MNIST model was degraded to an accuracy of 11% after post training quantisation. The code for quantising the pytorch tensors is given here:

    
    QTensor = namedtuple('QTensor', ['tensor', 'scale', 'zero_point'])
    
    def quantize_tensor(x, num_bits=8):
        qmin = 0.
        qmax = 2.**num_bits - 1.
        min_val, max_val = x.min(), x.max()
    
        scale = (max_val - min_val) / (qmax - qmin)
    
        initial_zero_point = qmin - min_val / scale
    
        zero_point = 0
        if initial_zero_point < qmin:
            zero_point = qmin
        elif initial_zero_point > qmax:
            zero_point = qmax
        else:
            zero_point = initial_zero_point
    
        zero_point = int(zero_point)
        q_x = zero_point + x / scale
        q_x.clamp_(qmin, qmax).round_()
        q_x = q_x.round().byte()
        return QTensor(tensor=q_x, scale=scale, zero_point=zero_point)
    
    def dequantize_tensor(q_x):
        return q_x.scale * (q_x.tensor.float() - q_x.zero_point)
    

    And the function I run the 99% trained model to perform post training quantisation is as follows:

    def quantizeModel(model):
      """
      Post Training Quantisation
      """
      q_tensors = {}
      
      for k, m in enumerate(model.modules()):
    
          if  isinstance(m, nn.Conv2d):
              weight= m.weight.data
              q_weight = weight.clone()
            
              num_inp_channels = weight.shape[0]
              num_output_channels = weight.shape[1]
              
              # per channel quantisation
              for i in range(num_inp_channels):
                for j in range(num_output_channels):
                  quantized_weight = quantize_tensor(weight[i,j])
                  q_weight[i,j] = quantized_weight.tensor.float()
              m.weight.data = q_weight
            else:
              # for fully connected layers, layer wise quantisation
              weight = m.weight.data
              quantized_weight = quantize_tensor(weight)
              m.weight.data = quantized_weight.tensor.float()
      return model
    

    What am I doing wrong here? Any help would deeply appreciated ! I know this isn't distiller code but I was trying to build a very minimal example of post training quantisation and you folks are the experts. I am using a very simple model for MNIST classificatoin. My model code is given here:

    
    
    class Net(nn.Module):
        def __init__(self, mnist=True):
          
            super(Net, self).__init__()
            if mnist:
              num_channels = 1
            else:
              num_channels = 3
              
            self.conv1 = nn.Conv2d(num_channels, 20, 5, 1)
            self.conv2 = nn.Conv2d(20, 50, 5, 1)
            self.fc1 = nn.Linear(4*4*50, 500)
            self.fc2 = nn.Linear(500, 10)
    
        
        def quantize(self):
          """
          Post Training Quantisation
          """
          q_tensors = {}
    
          for k, m in enumerate(self.modules()):
    
            if isinstance(m, nn.Conv2d):   
              weight= m.weight.data
              q_weight = weight.clone()
    
              num_inp_channels = weight.shape[0]
              num_output_channels = weight.shape[1]
    
              # per channel quantisation
              for i in range(num_inp_channels):
                for j in range(num_output_channels):
                  quantized_weight = quantize_tensor(weight[i,j])
                  q_weight[i,j] = quantized_weight.tensor.float()
    
              m.weight.data = q_weight
            
          
          
        def forward(self, x):
            x = F.relu(self.conv1(x))
            x = F.max_pool2d(x, 2, 2)
            x = F.relu(self.conv2(x))
            x = F.max_pool2d(x, 2, 2)
            x = x.view(-1, 4*4*50)        
            x = F.relu(self.fc1(x))
            x = self.fc2(x)
    
            return F.log_softmax(x, dim=1)
        
    

    Best, Karanbir Chahal

    quantization 
    opened by karanchahal 13
  • lr_scheduler doesn't work when start training from checkpoint

    lr_scheduler doesn't work when start training from checkpoint

    Hello: I want to use MultiStepMultiGammaLR scheduler in my pruning lr_scheduler. When I using the compress_classifier.py to pruning the res_net20_cifar from the begining and define the lr_scheduler in the yaml file, it works well. But when I using the checkpoint to train and prune, the lr_scheduler defined in the yaml file doesn't work. The lr doesn't decay when the epoch achieve defined milestone.

    I use the script below: python3 compress_classifier.py --arch resnet20_cifar dataset/ -p=50 --lr=0.3 --epochs=150 -b 128 --compress=resnet20_cifar_ele_pruning.yaml -j=1 --vs 0 --deterministic --resume=logs/resnet20_cifar_baseline/checkpoint.pth.tar

    Below is my yaml setting

    version: 1
    pruners:
      low_pruner:
        class: AutomatedGradualPruner
        initial_sparsity : 0.05
        final_sparsity: 0.60
        weights: [module.layer1.2.conv1.weight,  module.layer1.2.conv1.weight,
                  module.layer1.0.conv1.weight,  module.layer1.0.conv2.weight,
                  module.layer1.1.conv1.weight,  module.layer1.1.conv2.weight]
    
      mid_pruner:
        class:  AutomatedGradualPruner
        initial_sparsity : 0.05
        final_sparsity: 0.67
        weights: [module.layer2.2.conv1.weight,  module.layer2.2.conv2.weight,
                  module.layer2.0.conv2.weight,  module.layer2.0.downsample.1.weight,
                  module.layer2.0.conv1.weight,  module.layer2.0.downsample.0.weight,
                  module.layer2.1.conv1.weight,  module.layer2.1.conv2.weight]
    
      high_pruner:
        class:  AutomatedGradualPruner
        initial_sparsity : 0.05
        final_sparsity: 0.76
        weights: [module.layer3.0.conv1.weight,  module.layer3.1.conv1.weight,
                  module.layer3.1.conv2.weight,  module.layer3.0.conv2.weight,
                  module.layer3.0.downsample.0.weight, module.layer3.0.downsample.1.weight,
                  module.fc.weight]
    lr_schedulers:
      training_lr:
        class: MultiStepMultiGammaLR
        milestones: [300, 302, 400]
        gammas: [0.1, 0.1, 0.5]
    
    policies:
        - pruner:
            instance_name: low_pruner
          starting_epoch: 300
          ending_epoch: 400
          frequency: 2
        - pruner:
            instance_name: mid_pruner
          starting_epoch: 300
          ending_epoch: 400
          frequency: 2
        - pruner:
            instance_name: high_pruner
          starting_epoch: 300
          ending_epoch: 400
          frequency: 2
        - lr_scheduler:
            instance_name: training_lr
          starting_epoch: 0
          ending_epoch: 400
          frequency: 1
    

    Is there any problem in my script and yaml setting?

    help wanted 
    opened by stvreumi 13
  • Modify image size and training for Inception Models

    Modify image size and training for Inception Models

    This PR is a fix for issue #422.

    The file data_loader had fixed classification image size for ImageNet as [1, 3, 224, 224]. However, all Inception models requires an input image size of [1, 3, 299, 299].

    To fix this issue, I modified the apputils/image_classifier.py file to add a new parameter to the load_data function. This function calls apputils.load_data, so I changed the corresponding function in apputils/data_loader.py.

    Also, image_classifier.py is modified to consider both losses from the normal classifier and aux_logits classifier of inception network, and the classification accuracy is calculated only from normal classifier.

    @nzmora : Please review the changes.

    PS: My fork is based on PyTorch 1.3, so all additional changes for PyTorch1.3 support are present in this PR.

    opened by soumendukrg 12
  • how to add middle layers' activation loss functions?

    how to add middle layers' activation loss functions?

    If I want to train a dorefa-Net quantization on alexnet , is the command line like this ? python compress_classifier.py -a alexnet /ImageNet_Share/ --compress=/distiller/examples/quantization/quant_aware_train/alexnet_bn_dorefa.yaml We do not need to modify the code in compress_classifier.py to do this training?

    quantization 
    opened by brisker 12
  • No module named 'gitdb.utils.compat'.

    No module named 'gitdb.utils.compat'.

    I installed all the environments and packages according to the tutorial and got this error when running the code: ModuleNotFoundError: No module named 'gitdb.utils.compat'. Ask for help, how can I resolve this error? thank you very much!

    opened by Blue-Eagle-10 10
  • Pruner doesnt prune anything

    Pruner doesnt prune anything

    Hi, for the last two days, I have been trying to prune two models, (mobilenetv1 and resnet50) using distillers CompressionScheduler and AutomatedGradualPruner. first of all let me say this, that I didnt use the yaml config file for this. I directly incorporated distiller in my own code as it seemed a better choice without much hassle. So based on the tiny example which is provided in the compress_classifier.py which by the way reads as :

    Integrating compression is very simple: simply add invocations of the appropriate
    compression_scheduler callbacks, for each stage in the training.  The training skeleton
    looks like the pseudo code below.  The boiler-plate Pytorch classification training
    is speckled with invocations of CompressionScheduler.
    For each epoch:
        compression_scheduler.on_epoch_begin(epoch)
        train()
        validate()
        compression_scheduler.on_epoch_end(epoch)
        save_checkpoint()
    train():
        For each training step:
            compression_scheduler.on_minibatch_begin(epoch)
            output = model(input)
            loss = criterion(output, target)
            compression_scheduler.before_backward_pass(epoch)
            loss.backward()
            compression_scheduler.before_parameter_optimization(epoch)
            optimizer.step()
            compression_scheduler.on_minibatch_end(epoch)
    
    

    So I did this and nothing worked! but before I show the snippet I used (the way I implemented this) and ask my question, may I ask one right now ? Does compression_scheduler by itself, does any compression ? I mean by default simply invoking it like above! would it run a default policy? or should we always add a policy for it to work? tlrd, I used the bare compression scheduler, and obviously nothing happened, except, I got a bit lower accuracy. I then added the agp pruner to it like this, and again nothing happened! :

    agp = distiller.pruning.AutomatedGradualPruner('retinaface', 0.15, 0.5, weights_agp_resnet_no_bn)
    scheduler = distiller.scheduler.CompressionScheduler(net)
    scheduler.add_policy(agp, epochs=[max_epoch], starting_epoch=50, ending_epoch=max_epoch, frequency=1)
    
    

    and this is what I have come up so far :

    you said do : 
    """
    train():
        For each training step:
            compression_scheduler.on_minibatch_begin(epoch)
            output = model(input)
            loss = criterion(output, target)
            compression_scheduler.before_backward_pass(epoch)
            loss.backward()
            compression_scheduler.before_parameter_optimization(epoch)
            optimizer.step()
            compression_scheduler.on_minibatch_end(epoch)
    """
    

    and I did :

    
    def train_one_epoch(net, data_loader, optimizer, criterion, cfg, gamma, current_epoch, step_index,scheduler, device=torch.device('cpu')
                         ):
        
        net.train()
        epoch_size = len(data_loader)
        # load train data
        for iteration, (images, targets) in enumerate(data_loader):
            load_t0 = time.time()
            images = images.to(device)
            targets = [anno.to(device) for anno in targets]
    
            scheduler.on_minibatch_begin(current_epoch, iteration, epoch_size)
    
            # forward
            out = net(images)
    
            loss_l, loss_c, loss_landm = criterion(out, priors, targets)
            loss = cfg['loc_weight'] * loss_l + loss_c + loss_landm
    
            scheduler.before_backward_pass(current_epoch, iteration, epoch_size, loss, optimizer)
    
           # backprop
            optimizer.zero_grad()
            loss.backward()
    
            scheduler.before_parameter_optimization(current_epoch, iteration, epoch_size, optimizer)
            
            optimizer.step()
    
            scheduler.on_minibatch_end(current_epoch, iteration, epoch_size, optimizer)
    
            lr = adjust_learning_rate(optimizer, gamma, current_epoch, step_index, iteration, epoch_size)
    
            load_t1 = time.time()
            batch_time = load_t1 - load_t0
            eta = int(batch_time * (len(data_loader) - iteration))
            
            print('Epoch:{}/{} || Iter: {}/{} || Loc: {:.4f} Cla: {:.4f} Landm: {:.4f} || LR: {:.8f} || Batchtime: {:.4f} s || ETA: {}'
                 .format(current_epoch, max_epoch, iteration + 1, epoch_size,
                  loss_l.item(), loss_c.item(), loss_landm.item(),
                 lr, batch_time, str(datetime.timedelta(seconds=eta))))
            
        return loss.item() 
    
    def adjust_learning_rate(optimizer, gamma, epoch, step_index, iteration, epoch_size):
        """Sets the learning rate
        # Adapted from PyTorch Imagenet example:
        # https://github.com/pytorch/examples/blob/master/imagenet/main.py
        """
        warmup_epoch = -1
        if epoch <= warmup_epoch:
            lr = 1e-6 + (initial_lr-1e-6) * iteration / (epoch_size * warmup_epoch)
        else:
            lr = initial_lr * (gamma ** (step_index))
        for param_group in optimizer.param_groups:
            param_group['lr'] = lr
        return lr
    
    

    and this is my main training loop : you said, do :

    """
    For each epoch:
        compression_scheduler.on_epoch_begin(epoch)
        train()
        validate()
        compression_scheduler.on_epoch_end(epoch)
        save_checkpoint()
    """
    

    and I did :

    
    if __name__ == '__main__':
        # train()
        ...
        if torch.cuda.is_available():
        ...
        device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
        optimizer = optim.SGD(net.parameters(), lr=initial_lr, momentum=momentum, weight_decay=weight_decay)
        
        criterion = MultiBoxLoss(num_classes, 0.35, True, 0, True, 7, 0.35, False)
        priorbox = PriorBox(cfg, image_size=(img_dim, img_dim))
        print('Training will happen on device {}'.format(device))
        with torch.no_grad():
            priors = priorbox.forward()
            priors = priors.to(device)
        ...
        #  I got this list by running this snippet : 
        #     model =  retinaface.RetinaFace(cfg=cfg_resnet50, phase='test')
        #     w=torch.load(weight_path, map_location=torch.device('cpu'))
        #     model.load_state_dict(w, strict=False)
        #     w_lst = [name for name,_ in w.items() if 'weight' in name]
        #     print (w_list) 
        weights_agp_resnet_no_bn = ['module.body.conv1.weight', 'module.body.layer1.0.conv1.weight', 'module.body.layer1.0.conv2.weight', 
                                    'module.body.layer1.0.conv3.weight', 'module.body.layer1.0.downsample.0.weight',
                                    'module.body.layer1.0.downsample.1.weight', 'module.body.layer1.1.conv1.weight',
                                    'module.body.layer1.1.conv2.weight', 'module.body.layer1.1.conv3.weight', 
                                    'module.body.layer1.2.conv1.weight', 'module.body.layer1.2.conv2.weight', 
                                    'module.body.layer1.2.conv3.weight', 'module.body.layer2.0.conv1.weight', 
                                    'module.body.layer2.0.conv2.weight', 'module.body.layer2.0.conv3.weight', 
                                    'module.body.layer2.0.downsample.0.weight', 'module.body.layer2.0.downsample.1.weight', 
                                    'module.body.layer2.1.conv1.weight', 'module.body.layer2.1.conv2.weight', 
                                    'module.body.layer2.1.conv3.weight', 'module.body.layer2.2.conv1.weight', 'module.body.layer2.2.conv2.weight', 
                                    'module.body.layer2.2.conv3.weight', 'module.body.layer2.3.conv1.weight', 'module.body.layer2.3.conv2.weight', 
                                    'module.body.layer2.3.conv3.weight', 'module.body.layer3.0.conv1.weight', 'module.body.layer3.0.conv2.weight', 
                                    'module.body.layer3.0.conv3.weight', 'module.body.layer3.0.downsample.0.weight', 'module.body.layer3.0.downsample.1.weight', 
                                    'module.body.layer3.1.conv1.weight', 'module.body.layer3.1.conv2.weight', 'module.body.layer3.1.conv3.weight', 
                                    'module.body.layer3.2.conv1.weight', 'module.body.layer3.2.conv2.weight', 'module.body.layer3.2.conv3.weight', 
                                    'module.body.layer3.3.conv1.weight', 'module.body.layer3.3.conv2.weight', 'module.body.layer3.3.conv3.weight', 
                                    'module.body.layer3.4.conv1.weight', 'module.body.layer3.4.conv2.weight', 'module.body.layer3.4.conv3.weight',
                                    'module.body.layer3.5.conv1.weight', 'module.body.layer3.5.conv2.weight', 'module.body.layer3.5.conv3.weight', 
                                    'module.body.layer4.0.conv1.weight', 'module.body.layer4.0.conv2.weight', 'module.body.layer4.0.conv3.weight',
                                    'module.body.layer4.0.downsample.0.weight', 'module.body.layer4.0.downsample.1.weight', 
                                    'module.body.layer4.1.conv1.weight', 'module.body.layer4.1.conv2.weight', 'module.body.layer4.1.conv3.weight', 
                                    'module.body.layer4.2.conv1.weight', 'module.body.layer4.2.conv2.weight', 'module.body.layer4.2.conv3.weight', 
                                    'module.fpn.output1.0.weight', 'module.fpn.output1.1.weight', 'module.fpn.output2.0.weight', 
                                    'module.fpn.output2.1.weight', 'module.fpn.output3.0.weight', 'module.fpn.output3.1.weight', 
                                    'module.fpn.merge1.0.weight', 'module.fpn.merge1.1.weight', 'module.fpn.merge2.0.weight', 
                                    'module.fpn.merge2.1.weight', 'module.ssh1.conv3X3.0.weight', 'module.ssh1.conv3X3.1.weight', 
                                    'module.ssh1.conv5X5_1.0.weight', 'module.ssh1.conv5X5_1.1.weight', 'module.ssh1.conv5X5_2.0.weight',
                                    'module.ssh1.conv5X5_2.1.weight', 'module.ssh1.conv7X7_2.0.weight', 'module.ssh1.conv7X7_2.1.weight', 
                                    'module.ssh1.conv7x7_3.0.weight', 'module.ssh1.conv7x7_3.1.weight', 'module.ssh2.conv3X3.0.weight', 
                                    'module.ssh2.conv3X3.1.weight', 'module.ssh2.conv5X5_1.0.weight', 'module.ssh2.conv5X5_1.1.weight', 
                                    'module.ssh2.conv5X5_2.0.weight', 'module.ssh2.conv5X5_2.1.weight', 'module.ssh2.conv7X7_2.0.weight',
                                    'module.ssh2.conv7X7_2.1.weight', 'module.ssh2.conv7x7_3.0.weight', 'module.ssh2.conv7x7_3.1.weight',
                                    'module.ssh3.conv3X3.0.weight', 'module.ssh3.conv3X3.1.weight', 'module.ssh3.conv5X5_1.0.weight', 
                                    'module.ssh3.conv5X5_1.1.weight', 'module.ssh3.conv5X5_2.0.weight', 'module.ssh3.conv5X5_2.1.weight', 
                                    'module.ssh3.conv7X7_2.0.weight', 'module.ssh3.conv7X7_2.1.weight', 'module.ssh3.conv7x7_3.0.weight', 
                                    'module.ssh3.conv7x7_3.1.weight', 'module.ClassHead.0.conv1x1.weight', 'module.ClassHead.1.conv1x1.weight', 
                                    'module.ClassHead.2.conv1x1.weight', 'module.BboxHead.0.conv1x1.weight', 'module.BboxHead.1.conv1x1.weight',
                                    'module.BboxHead.2.conv1x1.weight', 'module.LandmarkHead.0.conv1x1.weight', 'module.LandmarkHead.1.conv1x1.weight', 
                                    'module.LandmarkHead.2.conv1x1.weight']
    
        weights_agp= ['body.stage1.0.0.weight', 'body.stage1.0.1.weight', 'body.stage1.1.0.weight', 'body.stage1.1.1.weight',
              'body.stage1.1.3.weight', 'body.stage1.1.4.weight', 'body.stage1.2.0.weight', 'body.stage1.2.1.weight',
              'body.stage1.2.3.weight', 'body.stage1.2.4.weight', 'body.stage1.3.0.weight', 'body.stage1.3.1.weight',
              'body.stage1.3.3.weight', 'body.stage1.3.4.weight', 'body.stage1.4.0.weight', 'body.stage1.4.1.weight',
              'body.stage1.4.3.weight', 'body.stage1.4.4.weight', 'body.stage1.5.0.weight', 'body.stage1.5.1.weight',
              'body.stage1.5.3.weight', 'body.stage1.5.4.weight', 'body.stage2.0.0.weight', 'body.stage2.0.1.weight',
              'body.stage2.0.3.weight', 'body.stage2.0.4.weight', 'body.stage2.1.0.weight', 'body.stage2.1.1.weight',
              'body.stage2.1.3.weight', 'body.stage2.1.4.weight', 'body.stage2.2.0.weight', 'body.stage2.2.1.weight',
              'body.stage2.2.3.weight', 'body.stage2.2.4.weight', 'body.stage2.3.0.weight', 'body.stage2.3.1.weight',
              'body.stage2.3.3.weight', 'body.stage2.3.4.weight', 'body.stage2.4.0.weight', 'body.stage2.4.1.weight',
              'body.stage2.4.3.weight', 'body.stage2.4.4.weight', 'body.stage2.5.0.weight', 'body.stage2.5.1.weight',
              'body.stage2.5.3.weight', 'body.stage2.5.4.weight', 'body.stage3.0.0.weight', 'body.stage3.0.1.weight',
              'body.stage3.0.3.weight', 'body.stage3.0.4.weight', 'body.stage3.1.0.weight', 'body.stage3.1.1.weight',
              'body.stage3.1.3.weight', 'body.stage3.1.4.weight'] 
            #   ,'fpn.output1.0.weight', 'fpn.output1.1.weight', 'fpn.output2.0.weight', 'fpn.output2.1.weight',
            #   'fpn.output3.0.weight', 'fpn.output3.1.weight', 'fpn.merge1.0.weight', 'fpn.merge1.1.weight',
            #   'fpn.merge2.0.weight', 'fpn.merge2.1.weight', 
            #   'ssh1.conv3X3.0.weight', 'ssh1.conv3X3.1.weight', 'ssh1.conv5X5_1.0.weight', 'ssh1.conv5X5_1.1.weight',
            #   'ssh1.conv5X5_2.0.weight', 'ssh1.conv5X5_2.1.weight', 'ssh1.conv7X7_2.0.weight', 'ssh1.conv7X7_2.1.weight',
            #   'ssh1.conv7x7_3.0.weight', 'ssh1.conv7x7_3.1.weight', 'ssh2.conv3X3.0.weight', 'ssh2.conv3X3.1.weight',
            #   'ssh2.conv5X5_1.0.weight', 'ssh2.conv5X5_1.1.weight', 'ssh2.conv5X5_2.0.weight', 'ssh2.conv5X5_2.1.weight',
            #   'ssh2.conv7X7_2.0.weight', 'ssh2.conv7X7_2.1.weight', 'ssh2.conv7x7_3.0.weight', 'ssh2.conv7x7_3.1.weight',
            #   'ssh3.conv3X3.0.weight', 'ssh3.conv3X3.1.weight', 'ssh3.conv5X5_1.0.weight', 'ssh3.conv5X5_1.1.weight',
            #   'ssh3.conv5X5_2.0.weight', 'ssh3.conv5X5_2.1.weight', 'ssh3.conv7X7_2.0.weight', 'ssh3.conv7X7_2.1.weight',
            #   'ssh3.conv7x7_3.0.weight', 'ssh3.conv7x7_3.1.weight'#]
    
            #  ,'ClassHead.0.conv1x1.weight', 'ClassHead.1.conv1x1.weight', 'ClassHead.2.conv1x1.weight',
            #  'BboxHead.0.conv1x1.weight', 'BboxHead.1.conv1x1.weight', 'BboxHead.2.conv1x1.weight',
            #  'LandmarkHead.0.conv1x1.weight', 'LandmarkHead.1.conv1x1.weight','LandmarkHead.2.conv1x1.weight'] 
    
        agp = distiller.pruning.AutomatedGradualPruner('retinaface', 0.15, 0.5, weights_agp_resnet_no_bn)
        scheduler = distiller.scheduler.CompressionScheduler(net)
        scheduler.add_policy(agp, epochs=[max_epoch], starting_epoch=50, ending_epoch=max_epoch, frequency=1)
        compression_algorithm = agp.__class__.__name__
    
        interval = 20
        for i in range(max_epoch):
            scheduler.on_epoch_begin(i)
            net = net.to(device)
            train_one_epoch(net, data_loader, optimizer, criterion, cfg, gamma, i, step_index, scheduler, device)
            scheduler.on_epoch_end(i, optimizer)
            if i%interval==0:
                torch.save(net.state_dict(), save_folder + cfg['name'] + '_epoch_{}.pth'.format(i))
            
        fname = '_Final_{0}.pth'.format(compression_algorithm)
        torch.save(net.state_dict(), save_folder + cfg['name'] + fname)
        print('Pruning by {0} has been successfully done.'.format(compression_algorithm))
        
    

    So I guess based on what is explained in the mentioned source code, I have done everything, yet I dont get any pruning! at first I thought maybe, my model is so small and compact that all the weights are equally important and when for example agp messes with one weight, it finds out that its important and thus, leave it be. therefore, the model doesnt shrink! However, when I tried resnet50! the very same thing happens, the model snapshot on disk takes 109MB of space! and I have set its pruning level to be 50%! yet, at epoch 80/100, its still 109MB! signaling no pruning has occured.

    Am I doing something wrong here or is this simply normal, and mobilenet and resnet cant be pruned like this? Thanks a lot in advance

    opened by Coderx7 10
  • some problems with quantization

    some problems with quantization

    Hi, I'm learning to use the distiller. When I learn to use the quantization, it has the problem. In examples/classifier_compression/compress_classifier.py, I change the model to my small net and use the cifar dataset. my small net defined as below:

    class Net(nn.Module): def init(self): super(Net, self).init() self.conv1 = nn.Conv2d(3, 6, 5) self.pool = nn.MaxPool2d(2, 2) self.conv2 = nn.Conv2d(6, 16, 5) self.fc1 = nn.Linear(16 * 5 * 5, 120) self.fc2 = nn.Linear(120, 84) self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x
    

    and change the yaml file as below: version: 1

    quantizers: dorefa_quantizer: class: DorefaQuantizer bits_activations: 4 bits_weights: 4 bits_bias: 4 overrides: conv1: bits_weights: null bits_activations: null fc3: bits_weights: null bits_activations: null

    lr_schedulers: pruning_lr: class: ExponentialLR gamma: 0.9

    policies:

    • quantizer: instance_name: 'dorefa_quantizer' starting_epoch: 1 ending_epoch: 30 frequency: 1

    • lr_scheduler: instance_name: pruning_lr starting_epoch: 24 ending_epoch: 30 frequency: 1

    other args is use the default or None. When I run the compress_classifier.py I get the error:

    File "compress_classifier.py", line 786, in main() File "compress_classifier.py", line 292, in main msglogger.info(distiller.masks_sparsity_tbl_summary(model, compression_scheduler)) File "/usr/lib/python3.5/contextlib.py", line 77, in exit self.gen.throw(type, value, traceback) File "/home/zhang/distiller/distiller/data_loggers/collector.py", line 469, in collectors_context yield collectors_dict File "compress_classifier.py", line 287, in main loggers=[tflogger, pylogger], args=args) File "compress_classifier.py", line 399, in train optimizer.step() File "/usr/local/lib/python3.5/dist-packages/torch/optim/sgd.py", line 93, in step d_p.add_(weight_decay, p.data) RuntimeError: expected type torch.FloatTensor but got torch.cuda.FloatTensor

    And I also check all the tensor in model. It's the torch.cuda.FloatTenor. How can I fix it?

    question quantization 
    opened by miku39p 10
  • QAT for LSTM

    QAT for LSTM

    Hi everyone

    I saw in your documentation that you support post training quantization of LSTMs by replacing the PyTorch modules by your own implementation. Do you also support quantization aware training of LSTMs with this technique? Do you already have a tutorial for this?

    Thanks a lot for your help!

    opened by schmiph2 0
  • --load-serialized will make model fail to prune

    --load-serialized will make model fail to prune

    I found model without DataParallel wrapping it will fail to prune. i.e. --load-serialized will disable pruning.

    When I run

    python compress_classifier.py -a=resnet20_cifar -p=50 ../../../data/cifar10/ -j=22 --epochs=1 --lr=0.001 --masks-sparsity --compress=../agp-pruning/resnet18.schedule_agp.yaml --load-serialized
    

    The total sparsity will always be 0.00

    Total sparsity: 0.00
    

    But if I run the same command line without --load-serialized

    python compress_classifier.py -a=resnet20_cifar -p=50 ../../../data/cifar10/ -j=22 --epochs=1 --lr=0.001 --masks-sparsity --compress=../agp-pruning/resnet18.schedule_agp.yaml
    

    The total sparsity will be 1.53 after 1 epoch

    Total sparsity: 1.53
    
    opened by Little0o0 1
  • How to train my original dataset in distiller?

    How to train my original dataset in distiller?

    Does anyone know how to apply my original dataset to distiller? It seems that Distiller only supports very basic dataset like MNIST, cifar10 and Imagenet.

    opened by priNs0123 1
  •  Quantization don't reduce the model file size

    Quantization don't reduce the model file size

    I try to use Post Training Quantization to convert my float32 model to int8 follow the tutorial of quantizing GNMT. I change the model code to a distiller style and get a quantized model. This is some information about the quantized model:

    (context_attn): MultiHeadedAttention(
              (linear_keys): RangeLinearQuantParamLayerWrapper(
                weights_quant_settings=(num_bits=8 ; quant_mode=SYMMETRIC ; clip_mode=NONE ; clip_n_stds=None ; clip_half_range=False ; per_channel=False)
                output_quant_settings=(num_bits=8 ; quant_mode=SYMMETRIC ; clip_mode=NONE ; clip_n_stds=None ; clip_half_range=False ; per_channel=False)
                accum_quant_settings=(num_bits=32 ; quant_mode=SYMMETRIC ; clip_mode=NONE ; clip_n_stds=None ; clip_half_range=False ; per_channel=False)
                  inputs_quant_auto_fallback=True, forced_quant_settings_for_inputs=None
                scale_approx_mult_bits=None
                preset_activation_stats=True
                  output_scale=16.876938, output_zero_point=0.000000
                weights_scale=194.250000, weights_zero_point=0.000000
                (wrapped_module): Linear(in_features=512, out_features=512, bias=True)
              )
              (linear_values): RangeLinearQuantParamLayerWrapper(
                weights_quant_settings=(num_bits=8 ; quant_mode=SYMMETRIC ; clip_mode=NONE ; clip_n_stds=None ; clip_half_range=False ; per_channel=False)
                output_quant_settings=(num_bits=8 ; quant_mode=SYMMETRIC ; clip_mode=NONE ; clip_n_stds=None ; clip_half_range=False ; per_channel=False)
                accum_quant_settings=(num_bits=32 ; quant_mode=SYMMETRIC ; clip_mode=NONE ; clip_n_stds=None ; clip_half_range=False ; per_channel=False)
                  inputs_quant_auto_fallback=True, forced_quant_settings_for_inputs=None
                scale_approx_mult_bits=None
                preset_activation_stats=True
                  output_scale=13.410025, output_zero_point=0.000000
                weights_scale=204.500000, weights_zero_point=0.000000
                (wrapped_module): Linear(in_features=512, out_features=512, bias=True)
              )
              (linear_query): RangeLinearQuantParamLayerWrapper(
                weights_quant_settings=(num_bits=8 ; quant_mode=SYMMETRIC ; clip_mode=NONE ; clip_n_stds=None ; clip_half_range=False ; per_channel=False)
                output_quant_settings=(num_bits=8 ; quant_mode=SYMMETRIC ; clip_mode=NONE ; clip_n_stds=None ; clip_half_range=False ; per_channel=False)
                accum_quant_settings=(num_bits=32 ; quant_mode=SYMMETRIC ; clip_mode=NONE ; clip_n_stds=None ; clip_half_range=False ; per_channel=False)
                  inputs_quant_auto_fallback=True, forced_quant_settings_for_inputs=None
                scale_approx_mult_bits=None
                preset_activation_stats=True
                  output_scale=15.167286, output_zero_point=0.000000
                weights_scale=200.000000, weights_zero_point=0.000000
                (wrapped_module): Linear(in_features=512, out_features=512, bias=True)
              )
    

    My model is a transformer model from OpenNMT. It seems I do get a correct compressed model but the model file is larger than the original model, like 200MB -> 280 MB. Is there any method can reduce the model size as I think it's the most important feature from quantization. The code script is like this:

    model_path= sys.argv[1]
    
    parser = ArgumentParser(description='translate.py')
    opts.config_opts(parser)
    opts.translate_opts(parser)
    src = "test.en"
    output = src+".distiller_out"
    opt = parser.parse_args(f"--model {model_path} --src {src} --output {output} --gpu 0")
    
    #translate(opt)
    ArgumentParser.validate_translate_opts(opt)
    logger = init_logger(opt.log_file)
    
    translator = build_translator(opt, report_score=True)
    stats_file = "./acts_quantization_stats.yaml"
    
    def evaluate(model, output, num_batches=None):
        src_shards = split_corpus(opt.src, opt.shard_size)
        tgt_shards = split_corpus(opt.tgt, opt.shard_size)
        shard_pairs = zip(src_shards, tgt_shards)
    
        for i, (src_shard, tgt_shard) in enumerate(shard_pairs):
            logger.info("Translating shard %d." % i)
            translator.translate(
                src=src_shard,
                tgt=tgt_shard,
                src_dir=opt.src_dir,
                batch_size=opt.batch_size,
                batch_type=opt.batch_type,
                attn_debug=opt.attn_debug,
                align_debug=opt.align_debug
                )
    
        print("translate end")
    
    output = "output_file_distiller"
    
    if not os.path.isfile(stats_file): # Collect stats.
        #model_copy = deepcopy(model)
        model_copy = translator.model
        distiller.utils.assign_layer_fq_names(model_copy)
        
        def eval_for_stats(model):
            evaluate(model, output + '.temp', num_batches=None)
        collect_quant_stats(model_copy, eval_for_stats, save_dir='.')
        #del model_copy
        torch.cuda.empty_cache()
    
    quantizer = PostTrainLinearQuantizer(deepcopy(translator.model),
                                        mode="SYMMETRIC",  # As was suggested in GNMT's paper
                                        model_activation_stats=stats_file)
    for t, rf in quantizer.replacement_factory.items():
        if rf is not None:
            print("Replacing '{}' modules using '{}' function".format(t.__name__, rf.__name__))
    
    fake_input = torch.tensor([4,115,1480,73,12,4,18125,1424,234,26,12,3658,16278,36], dtype=torch.long)
    fake_input = fake_input.unsqueeze(-1).unsqueeze(-2)
    length  = torch.tensor([14])
    fake_inputs = (fake_input, fake_input, length)
    dummy_input = (torch.ones(1, 1, 2).to(dtype=torch.long),
                    torch.ones(1, 1, 2).to(dtype=torch.long),
                    torch.tensor([1]).to(dtype=torch.long),)
    quantizer.prepare_model(fake_inputs)
    print(quantizer.model)
    
    opened by SefaZeng 0
  • Reduce the yolov3 model size of keras(.h5) or darknet(.weight)

    Reduce the yolov3 model size of keras(.h5) or darknet(.weight)

    I want to reduce the size of yolov3 model( format is keras(.h5) or darknet(.weight) ) using pruning method.

    According to issue #62, I found the discription 「Sure, you can use Distiller with any network, not just the models that come from TorchVision. 」.

    Therefore, can I reduce model size of keras(.h5) or darknet(.weight) without converting these models into pytorch model?? In addition, I would appreciate it if you could let me know how to prune yolov3 model ( keras(.h5) or darknet(.weight) )

    opened by YsYusaito 0
Releases(v0.3.1)
  • v0.3.1(Apr 1, 2019)

  • v0.3.0(Feb 28, 2019)

  • v0.2.0(Jun 25, 2018)

    • PyTorch 0.4 support
    • An implementation of Baidu's RNN pruning paper from ICLR 2017 Narang, Sharan & Diamos, Gregory & Sengupta, Shubho & Elsen, Erich. (2017). Exploring Sparsity in Recurrent Neural Networks. (https://arxiv.org/abs/1704.05119)
    • Add a word language model pruning example using AGP and Baidu RNN pruning
    • Quantization aware training (4-bit quantization)
    • New models: pre-activation ResNet for ImageNet and CIFAR, and AlexNet with batch-norm
    • New quantization documentation content
    Source code(tar.gz)
    Source code(zip)
  • v0.1.0(May 16, 2018)

    We're tagging this version which uses PyTorch 0.3, and we want to want to move the 'master' branch to support PyTorch 0.4 and its API changes.

    Source code(tar.gz)
    Source code(zip)
Owner
Intel Labs
Intel Labs
PyTorch toolkit for biomedical imaging

farabio is a minimal PyTorch toolkit for out-of-the-box deep learning support in biomedical imaging. For further information, see Wikis and Docs.

San Askaruly 47 Dec 28, 2022
GPU-accelerated PyTorch implementation of Zero-shot User Intent Detection via Capsule Neural Networks

GPU-accelerated PyTorch implementation of Zero-shot User Intent Detection via Capsule Neural Networks This repository implements a capsule model Inten

Joel Huang 15 Dec 24, 2022
A tutorial on "Bayesian Compression for Deep Learning" published at NIPS (2017).

Code release for "Bayesian Compression for Deep Learning" In "Bayesian Compression for Deep Learning" we adopt a Bayesian view for the compression of

Karen Ullrich 190 Dec 30, 2022
Tacotron 2 - PyTorch implementation with faster-than-realtime inference

Tacotron 2 (without wavenet) PyTorch implementation of Natural TTS Synthesis By Conditioning Wavenet On Mel Spectrogram Predictions. This implementati

NVIDIA Corporation 4.1k Jan 03, 2023
higher is a pytorch library allowing users to obtain higher order gradients over losses spanning training loops rather than individual training steps.

higher is a library providing support for higher-order optimization, e.g. through unrolled first-order optimization loops, of "meta" aspects of these

Facebook Research 1.5k Jan 03, 2023
Fast, general, and tested differentiable structured prediction in PyTorch

Torch-Struct: Structured Prediction Library A library of tested, GPU implementations of core structured prediction algorithms for deep learning applic

HNLP 1.1k Jan 07, 2023
A collection of extensions and data-loaders for few-shot learning & meta-learning in PyTorch

Torchmeta A collection of extensions and data-loaders for few-shot learning & meta-learning in PyTorch. Torchmeta contains popular meta-learning bench

Tristan Deleu 1.7k Jan 06, 2023
Fast and Easy-to-use Distributed Graph Learning for PyTorch Geometric

Fast and Easy-to-use Distributed Graph Learning for PyTorch Geometric

Quiver Team 221 Dec 22, 2022
A simplified framework and utilities for PyTorch

Here is Poutyne. Poutyne is a simplified framework for PyTorch and handles much of the boilerplating code needed to train neural networks. Use Poutyne

GRAAL/GRAIL 534 Dec 17, 2022
ocaml-torch provides some ocaml bindings for the PyTorch tensor library.

ocaml-torch provides some ocaml bindings for the PyTorch tensor library. This brings to OCaml NumPy-like tensor computations with GPU acceleration and tape-based automatic differentiation.

Laurent Mazare 369 Jan 03, 2023
TorchSSL: A PyTorch-based Toolbox for Semi-Supervised Learning

TorchSSL: A PyTorch-based Toolbox for Semi-Supervised Learning

1k Dec 28, 2022
Code for paper "Energy-Constrained Compression for Deep Neural Networks via Weighted Sparse Projection and Layer Input Masking"

model_based_energy_constrained_compression Code for paper "Energy-Constrained Compression for Deep Neural Networks via Weighted Sparse Projection and

Haichuan Yang 16 Jun 15, 2022
A pure Python implementation of Compact Bilinear Pooling and Count Sketch for PyTorch.

Compact Bilinear Pooling for PyTorch. This repository has a pure Python implementation of Compact Bilinear Pooling and Count Sketch for PyTorch. This

Grégoire Payen de La Garanderie 234 Dec 07, 2022
PyTorch to TensorFlow Lite converter

PyTorch to TensorFlow Lite converter

Omer Ferhat Sarioglu 140 Dec 13, 2022
An implementation of Performer, a linear attention-based transformer, in Pytorch

Performer - Pytorch An implementation of Performer, a linear attention-based transformer variant with a Fast Attention Via positive Orthogonal Random

Phil Wang 900 Dec 22, 2022
The goal of this library is to generate more helpful exception messages for numpy/pytorch matrix algebra expressions.

Tensor Sensor See article Clarifying exceptions and visualizing tensor operations in deep learning code. One of the biggest challenges when writing co

Terence Parr 704 Dec 14, 2022
PyNIF3D is an open-source PyTorch-based library for research on neural implicit functions (NIF)-based 3D geometry representation.

PyNIF3D is an open-source PyTorch-based library for research on neural implicit functions (NIF)-based 3D geometry representation. It aims to accelerate research by providing a modular design that all

Preferred Networks, Inc. 96 Nov 28, 2022
A lightweight wrapper for PyTorch that provides a simple declarative API for context switching between devices, distributed modes, mixed-precision, and PyTorch extensions.

A lightweight wrapper for PyTorch that provides a simple declarative API for context switching between devices, distributed modes, mixed-precision, and PyTorch extensions.

Fidelity Investments 56 Sep 13, 2022
The easiest way to use deep metric learning in your application. Modular, flexible, and extensible. Written in PyTorch.

News March 3: v0.9.97 has various bug fixes and improvements: Bug fixes for NTXentLoss Efficiency improvement for AccuracyCalculator, by using torch i

Kevin Musgrave 5k Jan 02, 2023
Learning Sparse Neural Networks through L0 regularization

Example implementation of the L0 regularization method described at Learning Sparse Neural Networks through L0 regularization, Christos Louizos, Max W

AMLAB 202 Nov 10, 2022