CPU inference engine that delivers unprecedented performance for sparse models

Overview

icon for DeepSparse DeepSparse Engine

CPU inference engine that delivers unprecedented performance for sparse models

GitHub GitHub Documentation GitHub release Contributor Covenant

Overview

The DeepSparse Engine is a CPU runtime that delivers unprecedented performance by taking advantage of natural sparsity within neural networks to reduce compute required as well as accelerate memory bound workloads. It is focused on model deployment and scaling machine learning pipelines, fitting seamlessly into your existing deployments as an inference backend.

This repository includes package APIs along with examples to quickly get started learning about and actually running sparse models.

Related Products

  • SparseZoo: Neural network model repository for highly sparse models and optimization recipes
  • SparseML: Libraries for state-of-the-art deep neural network optimization algorithms, enabling simple pipelines integration with a few lines of code
  • Sparsify: Easy-to-use autoML interface to optimize deep neural networks for better inference performance and a smaller footprint

Compatibility

The DeepSparse Engine ingests models in the ONNX format, allowing for compatibility with PyTorch, TensorFlow, Keras, and many other frameworks that support it. This reduces the extra work of preparing your trained model for inference to just one step of exporting.

Quick Tour

To expedite inference and benchmarking on real models, we include the sparsezoo package. SparseZoo hosts inference optimized models, trained on repeatable optimization recipes using state-of-the-art techniques from SparseML.

Quickstart with SparseZoo ONNX Models

MobileNetV1 Dense

Here is how to quickly perform inference with DeepSparse Engine on a pre-trained dense MobileNetV1 from SparseZoo.

from deepsparse import compile_model
from sparsezoo.models import classification
batch_size = 64

# Download model and compile as optimized executable for your machine
model = classification.mobilenet_v1()
engine = compile_model(model, batch_size=batch_size)

# Fetch sample input and predict output using engine
inputs = model.data_inputs.sample_batch(batch_size=batch_size)
outputs, inference_time = engine.timed_run(inputs)

MobileNetV1 Optimized

When exploring available optimized models, you can use the Zoo.search_optimized_models utility to find models that share a base.

Let us try this on the dense MobileNetV1 to see what is available.

from sparsezoo import Zoo
from sparsezoo.models import classification
print(Zoo.search_optimized_models(classification.mobilenet_v1()))

Output:

[Model(stub=cv/classification/mobilenet_v1-1.0/pytorch/sparseml/imagenet/base-none),
 Model(stub=cv/classification/mobilenet_v1-1.0/pytorch/sparseml/imagenet/pruned-conservative),
 Model(stub=cv/classification/mobilenet_v1-1.0/pytorch/sparseml/imagenet/pruned-moderate),
 Model(stub=cv/classification/mobilenet_v1-1.0/pytorch/sparseml/imagenet/pruned_quant-moderate)]

Great. We can see there are two pruned versions targeting FP32, conservative at 100% and moderate at >= 99% of baseline accuracy. There is also a pruned_quant variant targetting INT8.

Let's say you want to evaluate best performance on FP32 and are okay with a small drop in accuracy, so we can choose pruned-moderate over pruned-conservative.

from deepsparse import compile_model
from sparsezoo.models import classification
batch_size = 64

model = classification.mobilenet_v1(optim_name="pruned", optim_category="moderate")
engine = compile_model(model, batch_size=batch_size)

inputs = model.data_inputs.sample_batch(batch_size=batch_size)
outputs, inference_time = engine.timed_run(inputs)

Quickstart with custom ONNX models

We accept ONNX files for custom models, too. Simply plug in your model to compare performance with other solutions.

> wget https://github.com/onnx/models/raw/master/vision/classification/mobilenet/model/mobilenetv2-7.onnx
Saving to: ‘mobilenetv2-7.onnx’
from deepsparse import compile_model
from deepsparse.utils import generate_random_inputs
onnx_filepath = "mobilenetv2-7.onnx"
batch_size = 16

# Generate random sample input
inputs = generate_random_inputs(onnx_filepath, batch_size)

# Compile and run
engine = compile_model(onnx_filepath, batch_size)
outputs = engine.run(inputs)

For a more in-depth read on available APIs and workflows, check out the examples and DeepSparse Engine documentation.

Hardware Support

The DeepSparse Engine is validated to work on x86 Intel and AMD CPUs running Linux operating systems.

It is highly recommended to run on a CPU with AVX-512 instructions available for optimal algorithms to be enabled.

Here is a table detailing specific support for some algorithms over different microarchitectures:

x86 Extension Microarchitectures Activation Sparsity Kernel Sparsity Sparse Quantization
AMD AVX2 Zen 2, Zen 3 not supported optimized not supported
Intel AVX2 Haswell, Broadwell, and newer not supported optimized not supported
Intel AVX-512 Skylake, Cannon Lake, and newer optimized optimized emulated
Intel AVX-512 VNNI (DL Boost) Cascade Lake, Ice Lake, Cooper Lake, Tiger Lake optimized optimized optimized

Installation

This repository is tested on Python 3.6+, and ONNX 1.5.0+. It is recommended to install in a virtual environment to keep your system in order.

Install with pip using:

pip install deepsparse

Then if you want to explore the examples, clone the repository and any install additional dependencies found in example folders.

Notebooks

For some step-by-step examples, we have Jupyter notebooks showing how to compile models with the DeepSparse Engine, check the predictions for accuracy, and benchmark them on your hardware.

Available Models and Recipes

A number of pre-trained baseline and recalibrated models models in the SparseZoo can be used with the engine for higher performance. The types available for each model architecture are noted in its SparseZoo model repository listing.

Resources and Learning More

Contributing

We appreciate contributions to the code, examples, and documentation as well as bug reports and feature requests! Learn how here.

Join the Community

For user help or questions about the DeepSparse Engine, use our GitHub Discussions. Everyone is welcome!

You can get the latest news, webinar and event invites, research papers, and other ML Performance tidbits by subscribing to the Neural Magic community.

For more general questions about Neural Magic, please email us at [email protected] or fill out this form.

License

The project's binary containing the DeepSparse Engine is licensed under the Neural Magic Engine License.

Example files and scripts included in this repository are licensed under the Apache License Version 2.0 as noted.

Release History

Official builds are hosted on PyPi

Track this project via GitHub Releases.

Citation

Find this project useful in your research or other communications? Please consider citing Neural Magic's paper:

@inproceedings{pmlr-v119-kurtz20a, 
    title = {Inducing and Exploiting Activation Sparsity for Fast Inference on Deep Neural Networks}, 
    author = {Kurtz, Mark and Kopinsky, Justin and Gelashvili, Rati and Matveev, Alexander and Carr, John and Goin, Michael and Leiserson, William and Moore, Sage and Nell, Bill and Shavit, Nir and Alistarh, Dan}, 
    booktitle = {Proceedings of the 37th International Conference on Machine Learning}, 
    pages = {5533--5543}, 
    year = {2020}, 
    editor = {Hal Daumé III and Aarti Singh}, 
    volume = {119}, 
    series = {Proceedings of Machine Learning Research},
    address = {Virtual}, 
    month = {13--18 Jul}, 
    publisher = {PMLR}, 
    pdf = {http://proceedings.mlr.press/v119/kurtz20a/kurtz20a.pdf},, 
    url = {http://proceedings.mlr.press/v119/kurtz20a.html}, 
    abstract = {Optimizing convolutional neural networks for fast inference has recently become an extremely active area of research. One of the go-to solutions in this context is weight pruning, which aims to reduce computational and memory footprint by removing large subsets of the connections in a neural network. Surprisingly, much less attention has been given to exploiting sparsity in the activation maps, which tend to be naturally sparse in many settings thanks to the structure of rectified linear (ReLU) activation functions. In this paper, we present an in-depth analysis of methods for maximizing the sparsity of the activations in a trained neural network, and show that, when coupled with an efficient sparse-input convolution algorithm, we can leverage this sparsity for significant performance gains. To induce highly sparse activation maps without accuracy loss, we introduce a new regularization technique, coupled with a new threshold-based sparsification method based on a parameterized activation function called Forced-Activation-Threshold Rectified Linear Unit (FATReLU). We examine the impact of our methods on popular image classification models, showing that most architectures can adapt to significantly sparser activation maps without any accuracy loss. Our second contribution is showing that these these compression gains can be translated into inference speedups: we provide a new algorithm to enable fast convolution operations over networks with sparse activations, and show that it can enable significant speedups for end-to-end inference on a range of popular models on the large-scale ImageNet image classification task on modern Intel CPUs, with little or no retraining cost.} 
}
Comments
  • YOLOv5 pruned_quant-aggressive_94 exception

    YOLOv5 pruned_quant-aggressive_94 exception

    Describe the bug I was trying to run demo code with YOLOv5 pruned_quant-aggressive_94 model on g4dn.x2large and encountered this exception.

    Stack trace

      | 2021-12-16T15:36:11.889+01:00 | Overwriting original model shape (640, 640) to (800, 800)
      | 2021-12-16T15:36:11.889+01:00 | Original model path: /mnt/pylot/unleash_models/yolov5_optimised/yolov5-s/pruned_quant-aggressive_94.onnx, new temporary model saved to /tmp/tmpd8kad_7r
      | 2021-12-16T15:36:11.890+01:00 | DeepSparse Engine, Copyright 2021-present / Neuralmagic, Inc. version: 0.9.1 (afc7e831) (release) (optimized) (system=avx512, binary=avx512)
      | 2021-12-16T15:36:13.559+01:00 | DeepSparse Engine, Copyright 2021-present / Neuralmagic, Inc. version: 0.9.1 (afc7e831) (release) (optimized)
      | 2021-12-16T15:36:13.559+01:00 | Date: 12-16-2021 @ 14:36:13 UTC
      | 2021-12-16T15:36:13.559+01:00 | OS: Linux ip-10-0-2-22.ap-southeast-2.compute.internal 4.14.173-137.229.amzn2.x86_64 #1 SMP Wed Apr 1 18:06:08 UTC 2020
      | 2021-12-16T15:36:13.559+01:00 | Arch: x86_64
      | 2021-12-16T15:36:13.559+01:00 | CPU: Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
      | 2021-12-16T15:36:13.559+01:00 | Vendor: GenuineIntel
      | 2021-12-16T15:36:13.559+01:00 | Cores/sockets/threads: [4, 1, 8]
      | 2021-12-16T15:36:13.559+01:00 | Available cores/sockets/threads: [4, 1, 8]
      | 2021-12-16T15:36:13.559+01:00 | L1 cache size data/instruction: 32k/32k
      | 2021-12-16T15:36:13.559+01:00 | L2 cache size: 1Mb
      | 2021-12-16T15:36:13.559+01:00 | L3 cache size: 35.75Mb
      | 2021-12-16T15:36:13.559+01:00 | Total memory: 30.9605G
      | 2021-12-16T15:36:13.559+01:00 | Free memory: 14.6592G
      | 2021-12-16T15:36:13.559+01:00 | Assertion at ./src/include/wand/jit/pooling/common.hpp:239
      | 2021-12-16T15:36:13.559+01:00 | Backtrace:
      | 2021-12-16T15:36:13.560+01:00 | 0# wand::detail::abort_prefix(std::ostream&, char const*, char const*, int, bool, bool, unsigned long) in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
      | 2021-12-16T15:36:13.560+01:00 | 1# wand::detail::assert_fail(char const*, char const*, int) in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
      | 2021-12-16T15:36:13.560+01:00 | 2# 0x00007F4B71E55271 in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
      | 2021-12-16T15:36:13.560+01:00 | 3# 0x00007F4B71E55125 in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
      | 2021-12-16T15:36:13.560+01:00 | 4# 0x00007F4B71E554FD in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
      | 2021-12-16T15:36:13.560+01:00 | 5# 0x00007F4B71E5A4E0 in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
      | 2021-12-16T15:36:13.560+01:00 | 6# 0x00007F4B71E5A89A in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
      | 2021-12-16T15:36:13.560+01:00 | 7# 0x00007F4B71E5CDE8 in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
      | 2021-12-16T15:36:13.560+01:00 | 8# 0x00007F4B7101F93B in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
      | 2021-12-16T15:36:13.560+01:00 | 9# 0x00007F4B7101FAF9 in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
      | 2021-12-16T15:36:13.560+01:00 | 10# 0x00007F4B7101B9D5 in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
      | 2021-12-16T15:36:13.560+01:00 | 11# 0x00007F4B71042618 in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
      | 2021-12-16T15:36:13.560+01:00 | 12# 0x00007F4B71042C91 in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
      | 2021-12-16T15:36:13.560+01:00 | 13# 0x00007F4B71070667 in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
      | 2021-12-16T15:36:13.560+01:00 | 14# 0x00007F4B70BFA76B in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
      | 2021-12-16T15:36:13.560+01:00 | 15# 0x00007F4B70BEA8FC in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
      | 2021-12-16T15:36:13.560+01:00 | 16# 0x00007F4B70BD7A4F in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
      | 2021-12-16T15:36:13.560+01:00 | 17# 0x00007F4B71156499 in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
      | 2021-12-16T15:36:13.560+01:00 | 18# 0x00007F4B70C0A3EF in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
      | 2021-12-16T15:36:13.560+01:00 | 19# 0x00007F4B70C28DCD in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
      | 2021-12-16T15:36:13.560+01:00 | 20# 0x00007F4B70C28EF3 in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
      | 2021-12-16T15:36:13.560+01:00 | 21# 0x00007F4B70C295B3 in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
      | 2021-12-16T15:36:13.560+01:00 | 22# 0x00007F4B71FB8E10 in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
      | 2021-12-16T15:36:13.560+01:00 | 23# 0x00007F4CFA2C06DB in /lib/x86_64-linux-gnu/libpthread.so.0
      | 2021-12-16T15:36:13.560+01:00 | Please email a copy of this stack trace and any additional information to: [email protected]
    

    Environment

    1. Ubuntu 18.04
    2. Python 3.8
    3. ML framework version(s)
    torch @ https://download.pytorch.org/whl/cu110/torch-1.7.1%2Bcu110-cp38-cp38-linux_x86_64.whl
    torchvision @ https://download.pytorch.org/whl/cu110/torchvision-0.8.2%2Bcu110-cp38-cp38-linux_x86_64.whl
    
    1. Other Python package versions
    sparseml==0.9.0
    sparsezoo==0.9.0
    numpy==1.21.4
    onnx==1.9.0
    onnxruntime==1.7.0
    

    Is there any chance you could help me out to debug that issue?

    bug 
    opened by SkalskiP 20
  • ImportError: cannot import name 'arrays_to_bytes' from 'deepsparse.utils'

    ImportError: cannot import name 'arrays_to_bytes' from 'deepsparse.utils'

    Describe the bug Trying to run the server-client example.

    Environment Include all relevant environment information: Ubuntu 18.04

    1. Python version : 3.8
    2. Other Python package versions [e.g. SparseML, Sparsify, numpy, ONNX]:

    deepsparse 0.1.1

    1. CPU info - output of deepsparse/src/deepsparse/arch.bin or output of cpu_architecture() as follows:

    {'vendor': 'GenuineIntel', 'isa': 'avx2', 'vnni': False, 'num_sockets': 1, 'available_sockets': 1, 'cores_per_socket': 8, 'available_cores_per_socket': 8, 'threads_per_core': 1, 'available_threads_per_core': 1, 'L1_instruction_cache_size': 32768, 'L1_data_cache_size': 32768, 'L2_cache_size': 262144, 'L3_cache_size': 12582912}

    To Reproduce from deepsparse.utils import arrays_to_bytes, bytes_to_arrays

    Errors Traceback (most recent call last): File "server.py", line 62, in from deepsparse.utils import arrays_to_bytes, bytes_to_arrays ImportError: cannot import name 'arrays_to_bytes' from 'deepsparse.utils'

    bug 
    opened by adrianosantospb 13
  • Zero shot text classification pipeline

    Zero shot text classification pipeline

    Based off of https://discuss.huggingface.co/t/new-pipeline-for-zero-shot-text-classification/681

    Implements zero shot text classification pipeline. Batch size is equal to the number of sequences and currently only supports dynamic label passing, with static label support to come in a future PR. This implementation allows for future implementation of zero shot text classification based on models trained on classes other than mnli.

    example dynamic labels:

    zero_shot_text_classifier = Pipeline.create(
        task="zero_shot_text_classification",
        model_scheme="mnli",
        model_config={"hypothesis_template": "This text is related to {}"},     
        model_path="zoo:nlp/text_classification/distilbert-none/pytorch/"
                   "huggingface/mnli/pruned80_quant-none-vnni")
    
    sequence_to_classify = "Who are you voting for in 2020?"
    candidate_labels = ["Europe", "public health", "politics"]
    zero_shot_text_classifier(sequences=sequence_to_classify, labels=candidate_labels)
    >>> ZeroShotTextClassificationOutput(
        sequences='Who are you voting for in 2020?',
        labels=['politics', 'public health', 'Europe'],
        scores=[0.9073666334152222, 0.046810582280159, 0.04582275450229645])
    

    example static labels:

    zero_shot_text_classifier = Pipeline.create(
        task="zero_shot_text_classification",
        batch_size=3,
        model_scheme="mnli",
        model_config={"hypothesis_template": "This text is related to {}"},
        model_path="zoo:nlp/text_classification/distilbert-none/pytorch/"
                   "huggingface/mnli/pruned80_quant-none-vnni",
        labels=["politics", "Europe", "public health"]
    )
    
    sequence_to_classify = "Who are you voting for in 2020?"
    zero_shot_text_classifier(sequences=sequence_to_classify)
    >>> ZeroShotTextClassificationOutput(
        sequences='Who are you voting for in 2020?',
        labels=['politics', 'public health', 'Europe'],
        scores=[0.9073666334152222, 0.046810582280159, 0.04582275450229645])
    

    Evaluation results: Dataset sst2 Config multi_class = True hypothesis_template = "The sentiment of this text is {}"

    deepsparse.transformers.eval_downstream model_path -d sst2 --zero-shot True
    

    | model | accuracy (nm pipeline) | expected accuracy(hf pipeline) | | -------| ---------| ---------| | bert base | 0.823 | 0.823 | | 90% pruned bert | 0.779 | 0.779 |

    opened by mgoin 11
  • Converting onnx model to deepsparse

    Converting onnx model to deepsparse

    Hi, I'm trying to convert an onnx model to a deepsparse model, here is the code:

    from deepsparse import compile_model
    from deepsparse.utils import generate_random_inputs
    onnx_filepath = "fom.onnx"
    batch_size = 1
    
    # Generate random sample input
    inputs = generate_random_inputs(onnx_filepath, batch_size)
    
    # Compile and run
    engine = compile_model(onnx_filepath, batch_size)
    outputs = engine.run(inputs)
    
    **Environment**
    Include all relevant environment information:
    1. Ubuntu 18.04:
    2. Python version 3.7.9. :
    3. DeepSparse version 0.8.0 :
    4. torch 1.9.0+cu102:
    5. Other Python package versions [e.g. SparseML, Sparsify, numpy, ONNX]:
    6. CPU {'vendor': 'GenuineIntel', 'isa': 'avx512', 'vnni': True, 'num_sockets': 2, 'available_sockets': 2, 'cores_per_socket': 18, 'available_cores_per_socket': 18, 'threads_per_core': 2, 'available_threads_per_core': 2, 'L1_instruction_cache_size': 32768, 'L1_data_cache_size': 32768, 'L2_cache_size': 1048576, 'L3_cache_size': 25952256}
    
    **Errors**
    [     INFO            onnx.py: 128 - generate_random_inputs() ] -- generating random input #0 of shape = [1, 3, 256, 256]
    [     INFO            onnx.py: 128 - generate_random_inputs() ] -- generating random input #1 of shape = [1, 10, 2]
    [     INFO            onnx.py: 128 - generate_random_inputs() ] -- generating random input #2 of shape = [1, 10, 2]
    DeepSparse Engine, Copyright 2021-present / Neuralmagic, Inc. version: 0.8.0 (68df72e1) (release) (optimized) (system=avx512, binary=avx512)
    DeepSparse Engine, Copyright 2021-present / Neuralmagic, Inc. version: 0.8.0 (68df72e1) (release) (optimized)
    Date: 12-05-2021 @ 12:58:29 EST
    OS: Linux visiongpu49 4.15.0-161-generic #169-Ubuntu SMP Fri Oct 15 13:41:54 UTC 2021
    Arch: x86_64
    CPU: Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz
    Vendor: GenuineIntel
    OS: Linux visiongpu49 4.15.0-161-generic #169-Ubuntu SMP Fri Oct 15 13:41:54 UTC 2021
    Arch: x86_64
    CPU: Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz
    Vendor: GenuineIntel
    Cores/sockets/threads: [36, 2, 72]
    Available cores/sockets/threads: [36, 2, 72]
    L1 cache size data/instruction: 32k/32k
    L2 cache size: 1Mb
    L3 cache size: 24.75Mb
    Total memory: 507.367G
    Free memory: 22.4387G
    
    Assertion at ./src/include/wand/engine/compute/planner.hpp:131
    
    Backtrace:
    0# wand::detail::abort_prefix(std::ostream&, char const*, char const*, int, bool, bool, unsigned long) in /data/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
    1# 0x00007F36EB17C234 in /data/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
    2# 0x00007F36EB185889 in /data/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
    3# 0x00007F36EB185982 in /data/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
    4# 0x00007F36EB18AA8A in /data/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
    5# 0x00007F36EB18AB00 in /data/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
    6# 0x00007F36EA7E985D in /data/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
    7# 0x00007F36EA7EE443 in /data/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
    8# 0x00007F36EA76BD6B in /data/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
    9# 0x00007F36EA75AB3F in /data/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
    10# 0x00007F36EA75C1C1 in /data/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
    11# 0x00007F36EADA9668 in /data/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
    12# 0x00007F36EADAC0A2 in /data/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
    13# 0x00007F36EADAF3B9 in /data/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
    14# 0x00007F36EA73B76C in /data/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
    15# 0x00007F36EA7414C3 in /data/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
    16# 0x00007F36EA6FB982 in /data/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
    17# 0x00007F36EA6FBC05 in /data/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
    18# deepsparse::ort_engine::init(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int, int, int, wand::safe_type<wand::parallel::use_current_affinity_tag, bool>, std::shared_ptr<wand::parallel::scheduler_factory_t>) in /data/lib/python3.7/site-packages/deepsparse/avx512/libdeepsparse.so
    19# 0x00007F3771031D1B in /data/lib/python3.7/site-packages/deepsparse/avx512/deepsparse_engine.so
    20# 0x00007F3771031F39 in /data/lib/python3.7/site-packages/deepsparse/avx512/deepsparse_engine.so
    21# 0x00007F377105D5C5 in /data/lib/python3.7/site-packages/deepsparse/avx512/deepsparse_engine.so
    22# 0x00007F377104B250 in /data/lib/python3.7/site-packages/deepsparse/avx512/deepsparse_engine.so
    23# _PyMethodDef_RawFastCallDict in python
    
    Please email a copy of this stack trace and any additional information to: [email protected]
    Aborted
    

    Do you have any ideas why the code is failing?

    bug 
    opened by joaanna 11
  • [BugFix] Server Computer Vision `from_file` fixes

    [BugFix] Server Computer Vision `from_file` fixes

    This PR is now a parent PR for 3 different bugs across deepsparse server The from_file bugfix was tested locally for all 3 CV Pipelines.

    There is a bug in the current implementation of deepsparse.server, when files are sent by client (in CV tasks) the server wasn't actually reading in the files sent over the network, but was looking for local files(on the server machine) with the same name, if such a file existed it would use that for inference and return the result thus giving an illusion that everything worked as intended when it actually didn't, on the other hand if a local file with the same name wasn't found a FileNotFoundError was thrown on the server side (Which should not happen cause the file is not intended to be on the server machine) as follows:

    FileNotFoundError: [Errno 2] No such file or directory: 'buddy.jpeg'
    

    This current PR fixes this, The changes are two fold:

    1. Changes were made on the server side to rely on actual filepointers rather than filenames
    2. The from_files factory method for all CV Schemas was updated to accept an Iterable of FilePointers rather than a List[str], List --> Iterable change was made to make the function depend on behavior rather than the actual type; str --> TextIO change was made to accept File Pointers, TextIO is a generic typing module for File Pointers
    3. Now we rely on PIL.Image.open(...) function to actually read in the contents from the file pointer; this library in included with torchvision(necessary requirements for all CV tasks) thus does not require any additional installation steps, or additional auto-installation code.

    This bug was found while fixing another bug in the documentation as reported by @dbarbuzzi, the docs did not pass in an input with correct batch size, that bug was also fixed as a part of this Pull request

    Testing:

    Note: Please Test with an image that is not in the same directory from where the server is run, to actually check if the bug was fixed

    Step 1: checkout this branch git checkout server-file-fixes Step 2: Start the server

    deepsparse.server \
        --task image_classification \
        --model_path "zoo:cv/classification/resnet_v1-50/pytorch/sparseml/imagenet/pruned95-none" \
        --port 5543
    

    Step 3: Use the following code to make requests, the returned status must be [200], An HTTP status code 200 means success, change the image path accordingly in the following code:

    import requests
    
    url = 'http://0.0.0.0:5543/predict/from_files'
    image_path = "/home/dummy-data/buddy.jpeg"
    path = [
        image_path,
    ]
    files = [('request', open(img, 'rb')) for img in path]
    resp = requests.post(url=url, files=files)
    print(resp)
    

    Also fixes the following issue thanks to @dbogunowicz

    bug 
    opened by rahul-tuli 9
  • Failed building wheel for deepsparse and onnx

    Failed building wheel for deepsparse and onnx

    Describe the bug Failed building wheel for deepsparse

    Environment Include all relevant environment information:

    1. OS [e.g. Ubuntu 18.04]: Ubuntu 18.04 via WSL on Windows 10
    2. Python version [e.g. 3.7]: Python 3.6.9 (default, Jun 29 2022, 11:45:57)

    To Reproduce pip install --upgrade deepsparse

    Errors

    Collecting deepsparse
      Using cached https://files.pythonhosted.org/packages/c9/38/442bcc9403aaf0dd082e23397a8a5e5ca43b5856058cffa0f5449c8f8a5c/deepsparse-1.1.0.tar.gz
    Requirement already up-to-date: click~=8.0.0 in ./deepsparse/lib/python3.6/site-packages (from deepsparse)
    Requirement already up-to-date: numpy>=1.16.3 in ./deepsparse/lib/python3.6/site-packages (from deepsparse)
    Collecting onnx<=1.12.0,>=1.5.0 (from deepsparse)
      Using cached https://files.pythonhosted.org/packages/2c/6a/39b0580858589a67c3322aabc2634f158391ffbf98fa410127533e7f1495/onnx-1.12.0.tar.gz
    Requirement already up-to-date: protobuf<4,>=3.12.2 in ./deepsparse/lib/python3.6/site-packages (from deepsparse)
    Collecting pydantic>=1.8.2 (from deepsparse)
      Using cached https://files.pythonhosted.org/packages/fe/27/0de772dcd0517770b265dbc3998ed3ee3aa2ba25ba67e3685116cbbbccc6/pydantic-1.9.2-py3-none-any.whl
    Collecting requests>=2.0.0 (from deepsparse)
      Using cached https://files.pythonhosted.org/packages/2d/61/08076519c80041bc0ffa1a8af0cbd3bf3e2b62af10435d269a9d0f40564d/requests-2.27.1-py2.py3-none-any.whl
    Collecting sparsezoo~=1.1.0 (from deepsparse)
      Using cached https://files.pythonhosted.org/packages/10/aa/147378c7d961986cafdcd2c6ea5461bfc2078b2337584d1100083a3aaa6c/sparsezoo-1.1.0-py3-none-any.whl
    Collecting tqdm>=4.0.0 (from deepsparse)
      Using cached https://files.pythonhosted.org/packages/47/bb/849011636c4da2e44f1253cd927cfb20ada4374d8b3a4e425416e84900cc/tqdm-4.64.1-py2.py3-none-any.whl
    Requirement already up-to-date: importlib-metadata; python_version < "3.8" in ./deepsparse/lib/python3.6/site-packages (from click~=8.0.0->deepsparse)
    Requirement already up-to-date: typing-extensions>=3.6.2.1 in ./deepsparse/lib/python3.6/site-packages (from onnx<=1.12.0,>=1.5.0->deepsparse)
    Collecting dataclasses>=0.6; python_version < "3.7" (from pydantic>=1.8.2->deepsparse)
      Using cached https://files.pythonhosted.org/packages/fe/ca/75fac5856ab5cfa51bbbcefa250182e50441074fdc3f803f6e76451fab43/dataclasses-0.8-py3-none-any.whl
    Collecting urllib3<1.27,>=1.21.1 (from requests>=2.0.0->deepsparse)
      Using cached https://files.pythonhosted.org/packages/6f/de/5be2e3eed8426f871b170663333a0f627fc2924cc386cd41be065e7ea870/urllib3-1.26.12-py2.py3-none-any.whl
    Collecting charset-normalizer~=2.0.0; python_version >= "3" (from requests>=2.0.0->deepsparse)
      Using cached https://files.pythonhosted.org/packages/06/b3/24afc8868eba069a7f03650ac750a778862dc34941a4bebeb58706715726/charset_normalizer-2.0.12-py3-none-any.whl
    Collecting certifi>=2017.4.17 (from requests>=2.0.0->deepsparse)
      Using cached https://files.pythonhosted.org/packages/1d/38/fa96a426e0c0e68aabc68e896584b83ad1eec779265a028e156ce509630e/certifi-2022.9.24-py3-none-any.whl
    Collecting idna<4,>=2.5; python_version >= "3" (from requests>=2.0.0->deepsparse)
      Using cached https://files.pythonhosted.org/packages/fc/34/3030de6f1370931b9dbb4dad48f6ab1015ab1d32447850b9fc94e60097be/idna-3.4-py3-none-any.whl
    Collecting pyyaml>=5.1.0 (from sparsezoo~=1.1.0->deepsparse)
      Using cached https://files.pythonhosted.org/packages/b3/85/79b9e5b4e8d3c0ac657f4e8617713cca8408f6cdc65d2ee6554217cedff1/PyYAML-6.0-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl
    Collecting importlib-resources; python_version < "3.7" (from tqdm>=4.0.0->deepsparse)
      Using cached https://files.pythonhosted.org/packages/24/1b/33e489669a94da3ef4562938cd306e8fa915e13939d7b8277cb5569cb405/importlib_resources-5.4.0-py3-none-any.whl
    Requirement already up-to-date: zipp>=0.5 in ./deepsparse/lib/python3.6/site-packages (from importlib-metadata; python_version < "3.8"->click~=8.0.0->deepsparse)
    Building wheels for collected packages: deepsparse, onnx
      Running setup.py bdist_wheel for deepsparse ... error
      Complete output from command /root/deepsparse/bin/python3 -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-9usltruh/deepsparse/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /tmp/tmpq080et7upip-wheel- --python-tag cp36:
      Loaded version 1.1.0 from /tmp/pip-build-9usltruh/deepsparse/src/deepsparse/generated_version.py
      Checking to see if /tmp/pip-build-9usltruh/deepsparse/src/deepsparse/arch.bin exists.. True
      /usr/lib/python3.6/distutils/dist.py:261: UserWarning: Unknown distribution option: 'long_description_content_type'
        warnings.warn(msg)
      usage: -c [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]
         or: -c --help [cmd1 cmd2 ...]
         or: -c --help-commands
         or: -c cmd --help
    
      error: invalid command 'bdist_wheel'
    
      ----------------------------------------
      Failed building wheel for deepsparse
      Running setup.py clean for deepsparse
      Running setup.py bdist_wheel for onnx ... error
      Complete output from command /root/deepsparse/bin/python3 -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-9usltruh/onnx/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /tmp/tmpcl3nl8sspip-wheel- --python-tag cp36:
      fatal: Kein Git-Repository (oder irgendeines der Elternverzeichnisse): .git
      /usr/lib/python3.6/distutils/dist.py:261: UserWarning: Unknown distribution option: 'long_description_content_type'
        warnings.warn(msg)
      usage: -c [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]
         or: -c --help [cmd1 cmd2 ...]
         or: -c --help-commands
         or: -c cmd --help
    
      error: invalid command 'bdist_wheel'
    
      ----------------------------------------
      Failed building wheel for onnx
      Running setup.py clean for onnx
    Failed to build deepsparse onnx
    Installing collected packages: onnx, dataclasses, pydantic, urllib3, charset-normalizer, certifi, idna, requests, importlib-resources, tqdm, pyyaml, sparsezoo, deepsparse
      Running setup.py install for onnx ... -^canceled
    ^COperation cancelled by user
    

    @mgoin @KSGulin @shubhra @Willtor @dbarbuzzi @tlrmchlsmth

    bug 
    opened by ErfolgreichCharismatisch 8
  • Huggingface base Wav2Vec2 model crashing

    Huggingface base Wav2Vec2 model crashing

    Describe the bug Hello,

    I am trying to compile the onnx-converted model of a sparse Huggingface base Wav2Vec2 model (where sparsity was obtained via unstructured magnitude pruning) through compile_model :

    dse_network = compile_model(onnx_filepath, batch_size=batch_size, num_cores=1, num_streams=1)

    My kernel crashed and I received the following message:

    Backtrace: 0# wand::detail::abort_prefix(std::ostream&, char const*, char const*, int, bool, bool, unsigned long) in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0 1# 0x00007FFB125A27C4 in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0 2# 0x00007FFB125A8906 in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0 3# 0x00007FFB125A89F2 in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0 4# 0x00007FFB125B12FA in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0 5# 0x00007FFB125B1370 in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0 6# 0x00007FFB11B1F76D in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0 7# 0x00007FFB11B25BCF in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0 8# 0x00007FFB11A92015 in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0 9# 0x00007FFB11A81939 in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0 10# 0x00007FFB11A82AF1 in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0 11# 0x00007FFB1213F938 in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0 12# 0x00007FFB121423B3 in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0 13# 0x00007FFB121456B9 in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0 14# 0x00007FFB11A6312B in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0 15# 0x00007FFB11A6B3CE in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0 16# 0x00007FFB11A11C1A in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0 17# 0x00007FFB11A11ED5 in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0 18# deepsparse::ort_engine::init(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, int, std::shared_ptrwand::parallel::scheduler_factory_t) in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libdeepsparse.so 19# 0x00007FFBE3641649 in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/deepsparse_engine.so 20# 0x00007FFBE364184B in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/deepsparse_engine.so 21# 0x00007FFBE36788B6 in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/deepsparse_engine.so 22# 0x00007FFBE364B0F9 in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/deepsparse_engine.so 23# 0x0000561F0FD79B66 in /opt/conda/bin/python

    Please email a copy of this stack trace and any additional information to: [email protected]

    Environment Include all relevant environment information:

    1. OS : Ubuntu 18.04.5 LTS
    2. Python version [e.g. 3.7]: Python 3.9.4
    3. DeepSparse version or commit hash [e.g. 0.1.0, f7245c8]: 1.0.2
    4. ML framework version(s) [e.g. torch 1.7.1]: torch 1.11.0
    5. Other Python package versions [e.g. SparseML, Sparsify, numpy, ONNX]: onnxruntime 1.12.0, onnx 1.12.0,
    6. CPU info - output of deepsparse/src/deepsparse/arch.bin or output of cpu_architecture() as follows:
    >>> import deepsparse.cpu
    >>> print(deepsparse.cpu.cpu_architecture())
    

    {'L1_data_cache_size': 32768, 'L1_instruction_cache_size': 32768, 'L2_cache_size': 1048576, 'L3_cache_size': 31719424, 'architecture': 'x86_64', 'available_cores_per_socket': 19, 'available_num_cores': 38, 'available_num_hw_threads': 76, 'available_num_numa': 2, 'available_num_sockets': 2, 'available_sockets': 2, 'available_threads_per_core': 2, 'cores_per_socket': 19, 'isa': 'avx512', 'num_cores': 38, 'num_hw_threads': 76, 'num_numa': 2, 'num_sockets': 2, 'threads_per_core': 2, 'vendor': 'GenuineIntel', 'vendor_id': 'Intel', 'vendor_model': 'Intel(R) Xeon(R) Gold 6161 CPU @ 2.20GHz', 'vnni': False}

    Would you please have any solution? Thank you

    bug 
    opened by Tim-blo 8
  • Cannot import deepsparse from WSL: cannot get cpu topology

    Cannot import deepsparse from WSL: cannot get cpu topology

    Describe the bug

    For testing purposes, I want to try if my code works on Windows Subsystem for Linux (WSL2). I'm using Ubuntu 18.04LTS.

    Once on Ubuntu on WSL, I create a new python virtual env, then pip install deepsparse.

    After that, while trying to import deepsparse I get:

    >>> import deepsparse
    arch.bin: ./src/include/cpu_info/cpu_info.hpp:515: std::shared_ptr<cpu_info::topology> cpu_info::detect_topology_from_cpuid_api(): Assertion `!thread.exists' failed.
    Traceback (most recent call last):
      File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/cpu.py", line 119, in _parse_arch_bin
        info_str = subprocess.check_output(file_path).decode("utf-8")
      File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/subprocess.py", line 424, in check_output
        return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
      File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/subprocess.py", line 528, in run
        raise CalledProcessError(retcode, process.args,
    subprocess.CalledProcessError: Command '/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/arch.bin' died with <Signals.SIGABRT: 6>.
     
    During handling of the above exception, another exception occurred:
     
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/__init__.py", line 28, in <module>
        from .engine import *
      File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/engine.py", line 44, in <module>
        from deepsparse.lib import init_deepsparse_lib
      File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/lib.py", line 27, in <module>
        CORES_PER_SOCKET, AVX_TYPE, VNNI = cpu_details()
      File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/cpu.py", line 216, in cpu_details
        arch = cpu_architecture()
      File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/cpu.py", line 148, in cpu_architecture
        arch = _parse_arch_bin()
      File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/cpu.py", line 47, in __call__
        self.memo[args] = self.f(*args)
      File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/cpu.py", line 123, in _parse_arch_bin
        raise OSError(
    OSError: neuralmagic: encountered exception while trying read arch.bin: Command '/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/arch.bin' died with <Signals.SIGABRT: 6>.
    

    Expected behavior

    Maybe it should work on WSL :)

    Environment Include all relevant environment information:

    1. OS [e.g. Ubuntu 18.04]: Ubuntu 18.04LTS (On Windows 10, WSL2)
    2. Python version [e.g. 3.7]: 3.9
    3. DeepSparse version or commit hash [e.g. 0.1.0, f7245c8]: 0.8.0
    4. ML framework version(s) [e.g. torch 1.7.1]: 1.10.0
    5. Other Python package versions [e.g. SparseML, Sparsify, numpy, ONNX]:
    6. CPU info - output of deepsparse/src/deepsparse/arch.bin or output of cpu_architecture() as follows:

    This is basically what's not working

    To Reproduce Exact steps to reproduce the behavior:

    1. On windows 10, activate WSL
    2. Install Ubuntu 18.04 from microsoft store
    3. On Ubuntu, create virtual env (I personnaly use mamba or conda)
    4. pip install deepsparse
    5. import deepsparse

    Errors If applicable, add a full print-out of any errors or exceptions that are raised or include screenshots to help explain your problem.

    >>> import deepsparse
    arch.bin: ./src/include/cpu_info/cpu_info.hpp:515: std::shared_ptr<cpu_info::topology> cpu_info::detect_topology_from_cpuid_api(): Assertion `!thread.exists' failed.
    Traceback (most recent call last):
      File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/cpu.py", line 119, in _parse_arch_bin
        info_str = subprocess.check_output(file_path).decode("utf-8")
      File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/subprocess.py", line 424, in check_output
        return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
      File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/subprocess.py", line 528, in run
        raise CalledProcessError(retcode, process.args,
    subprocess.CalledProcessError: Command '/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/arch.bin' died with <Signals.SIGABRT: 6>.
     
    During handling of the above exception, another exception occurred:
     
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/__init__.py", line 28, in <module>
        from .engine import *
      File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/engine.py", line 44, in <module>
        from deepsparse.lib import init_deepsparse_lib
      File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/lib.py", line 27, in <module>
        CORES_PER_SOCKET, AVX_TYPE, VNNI = cpu_details()
      File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/cpu.py", line 216, in cpu_details
        arch = cpu_architecture()
      File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/cpu.py", line 148, in cpu_architecture
        arch = _parse_arch_bin()
      File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/cpu.py", line 47, in __call__
        self.memo[args] = self.f(*args)
      File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/cpu.py", line 123, in _parse_arch_bin
        raise OSError(
    OSError: neuralmagic: encountered exception while trying read arch.bin: Command '/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/arch.bin' died with <Signals.SIGABRT: 6>.
    

    Additional context Add any other context about the problem here. Also include any relevant files.

    bug 
    opened by clementpoiret 8
  • Does a C API exist for deepsparse or is this python only and are all benchmarks via python?

    Does a C API exist for deepsparse or is this python only and are all benchmarks via python?

    Just a quick question. Is it possible to use deepsparse for inference directly in other languages e.g. C++, C# or similar? Or is all code written in python?

    enhancement 
    opened by nietras 8
  • getting low fps  & inference issue

    getting low fps & inference issue

    1.i used this repo https://github.com/neuralmagic/deepsparse/tree/main/examples/ultralytics-yolo & this command !python annotate.py
    zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned_quant-aggressive_94
    --source "/content/loc1min.mp4"
    --quantized-inputs
    --image-shape 416 416
    --save-dir '/content/ops/'
    --model-config '/content/coco128.yaml'
    --device 'cpu'

    & im getting low fps on cpu, (yolov5s model) its normal fps or should we get 50-60 fps? bcz you have mentioned that model will be 10x faster. but its very less.

    image
    1. i trained model usiing sparsml repo on coco128 data for 40 epochs & converted .pth model into onnx & tried same inference script !python annotate.py
      /content/sparseml/integrations/ultralytics-yolov5/yolov5/runs/train/exp2/weights/best.onnx
      --source "/content/loc1min.mp4"
      --quantized-inputs
      --image-shape 416 416
      --save-dir '/content/ops/'
      --model-config '/content/coco128.yaml' --device 'cpu' & getting this issue.
    image

    What's wrong here? my goal is to use a custom data train model with sparceml & do inference using deepspare.

    bug 
    opened by akashAD98 7
  • no better speed on yolo quant

    no better speed on yolo quant

    Hi! How is it going?

    At first ,thanks for your good repo and helping to make better and faster model. I use your yolo example for getting better speed, and I compare base, pruned and quant models as you said. but all result were aproximatly same . there is no vnni warning, and my server is ubuntu 18 my code is:

    import os

    models :

    yolov5s_base = "zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/base-none"

    yolov5s_pruned ="zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned-aggressive_96"

    yolov5s_pruned_quant = "zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned_quant-aggressive_94"

    source_img = "img.bmp"

    print("\n base inference:\n") bash_cmd = f"python annotate.py {yolov5s_base} --source {source_img} --image-shape 640 640 " os.system(bash_cmd)

    print("\n pruned inference:\n") bash_cmd = f"python annotate.py {yolov5s_pruned } --source {source_img} --image-shape 640 640 " os.system(bash_cmd)

    print("\n pruned_quant inference:\n") bash_cmd = f"python annotate.py {yolov5s_pruned_quant} --source {source_img} --quantized-inputs --image-shape 640 640 " os.system(bash_cmd)

    when I run this code in bash script, I get this results:

    base inference:

    2022-03-08 20:28:15 main INFO Results will be saved to annotation_results/deepsparse-annotations-8 model with stub zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/base-none downloaded to /home/fteam/.cache/sparsezoo/cdaaf2c9-a2f1-45d2-841d-45ce123e7b25/model.onnx 2022-03-08 20:28:17 main INFO Compiling DeepSparse model for /home/fteam/.cache/sparsezoo/cdaaf2c9-a2f1-45d2-841d-45ce123e7b25/model.onnx DeepSparse Engine, Copyright 2021-present / Neuralmagic, Inc. version: 0.10.0 (c2458ea3) (release) (optimized) (system=avx512, binary=avx512) 2022-03-08 20:28:18 main INFO Inference 0 processed in 128.20696830749512 ms 2022-03-08 20:28:18 main INFO Results saved to annotation_results/deepsparse-annotations-8

    pruned inference:

    2022-03-08 20:28:19 main INFO Results will be saved to annotation_results/deepsparse-annotations-9 model with stub zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned-aggressive_96 downloaded to /home/fteam/.cache/sparsezoo/c13e55cb-dd6c-4492-a079-8986af0b65e6/model.onnx 2022-03-08 20:28:21 main INFO Compiling DeepSparse model for /home/fteam/.cache/sparsezoo/c13e55cb-dd6c-4492-a079-8986af0b65e6/model.onnx DeepSparse Engine, Copyright 2021-present / Neuralmagic, Inc. version: 0.10.0 (c2458ea3) (release) (optimized) (system=avx512, binary=avx512) 2022-03-08 20:28:23 main INFO Inference 0 processed in 124.91464614868164 ms 2022-03-08 20:28:23 main INFO Results saved to annotation_results/deepsparse-annotations-9

    pruned_quant inference:

    2022-03-08 20:28:24 main INFO Results will be saved to annotation_results/deepsparse-annotations-10 model with stub zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned_quant-aggressive_94 downloaded to /home/fteam/.cache/sparsezoo/aabc828b-c199-4766-95e1-53f2abd0fdd3/model.onnx 2022-03-08 20:28:26 main INFO Compiling DeepSparse model for /home/fteam/.cache/sparsezoo/aabc828b-c199-4766-95e1-53f2abd0fdd3/model.onnx DeepSparse Engine, Copyright 2021-present / Neuralmagic, Inc. version: 0.10.0 (c2458ea3) (release) (optimized) (system=avx512, binary=avx512) 2022-03-08 20:28:28 main INFO Inference 0 processed in 114.76516723632812 ms 2022-03-08 20:28:28 main INFO Results saved to annotation_results/deepsparse-annotations-10

    as you see quant pruned has no more speed ! pls guide me to get faster result thanks!

    opened by RasoulZamani 7
  • How does the quantized op infered?

    How does the quantized op infered?

    Hello, just out of curiosity. How does the quantized int conv op infered? It wasn't supported in onnxruntime I think? Not even a standared onnx op.

    How does it infered? 图片

    documentation 
    opened by jinfagang 2
  • [Fix] Update the code for handling ragged numpy arrays in numpy >= 1.24.0

    [Fix] Update the code for handling ragged numpy arrays in numpy >= 1.24.0

    Response to: https://github.com/neuralmagic/deepsparse/issues/825

    Since NumPy version 19.0, one must specify dtype=object when creating an array from "ragged" sequences, otherwise, one receives a warning:

    VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
    

    Starting NumpyPy version 1.24.0, this warning turns into an explicit error:

    ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (128,) + inhomogeneous part.
    

    This PR adds the code change necessary to remove the error. Afaik, only transformers QA code requires the update.

    This fix should also be backward compatible at least dating back to 1.19 (in our requirements we honor numpy>=1.16.3, maybe it is time to bump this version up?)

    opened by dbogunowicz 1
  • Broken Transformers QA Inference Pipeline

    Broken Transformers QA Inference Pipeline

    Describe the bug

    Transformers QA pipeline fails on a simple inference task.

    Expected behavior The inference pipeline for Question Answering should work without raising any errors.

    Environment Python version: 3.8 DeepSparse version: current main

    To Reproduce

    from deepsparse import Pipeline
    
    task = "question-answering"
    dense_qa_pipeline = Pipeline.create(
            task=task,
            model_path="zoo:nlp/question_answering/distilbert-none/pytorch/huggingface/squad/base-none",
            # or model_path = "zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/base-none",
            # was checking whether the problem is not model-dependent
        )
    
    question = "DeepSparse is sparsity-aware inference runtime offering GPU-class performance on CPUs and APIs to integrate ML into your application"
    q_context = "What is DeepSparse?"
    
    dense_qa_pipeline(question=question, context=q_context)
    

    Errors

    None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
    DeepSparse, Copyright 2021-present / Neuralmagic, Inc. version: 1.3.0.20221217 COMMUNITY | (d5bf112b) (release) (optimized) (system=avx2, binary=avx2)
    Traceback (most recent call last):
      File "/usr/lib/python3.8/code.py", line 90, in runcode
        exec(code, self.locals)
      File "<input>", line 1, in <module>
      File "/home/ubuntu/.pycharm_helpers/pydev/_pydev_bundle/pydev_umd.py", line 198, in runfile
        pydev_imports.execfile(filename, global_vars, local_vars)  # execute the script
      File "/home/ubuntu/.pycharm_helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
        exec(compile(contents+"\n", file, 'exec'), glob, loc)
      File "/home/ubuntu/damian/deepsparse_copy/hehe.py", line 14, in <module>
        dense_output = dense_qa_pipeline(question=question, context=q_context)
      File "/home/ubuntu/damian/deepsparse_copy/src/deepsparse/pipeline.py", line 217, in __call__
        engine_inputs: List[numpy.ndarray] = self.process_inputs(pipeline_inputs)
      File "/home/ubuntu/damian/deepsparse_copy/src/deepsparse/transformers/pipelines/question_answering.py", line 261, in process_inputs
        {
      File "/home/ubuntu/damian/deepsparse_copy/src/deepsparse/transformers/pipelines/question_answering.py", line 262, in <dictcomp>
        key: numpy.array(tokenized_example[key][span])
    ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (128,) + inhomogeneous part.
    

    Additional context The error occurs in https://github.com/neuralmagic/deepsparse/blob/main/src/deepsparse/transformers/pipelines/question_answering.py#L260.

    When attempting to perform dictionary comprehension:

    {
                        key: numpy.array(tokenized_example[key][span])
                        for key in tokenized_example.keys()
                        if key not in self.onnx_input_names
                    }
    

    Here: self.onnx_input_names = ['input_ids', 'attention_mask', 'token_type_ids'] tokenized_example.keys() = ['input_ids', 'token_type_ids', 'attention_mask', 'special_tokens_mask', 'offset_mapping', 'overflow_to_sample_mapping', 'example_id']

    As a result, we end up iterating over the list difference. One element of this resulting list, offset_mapping is the culprit:

    [tokenized_example[key][0] for key in['offset_mapping']]`
    

    results in : image

    Calling numpy.array(...) on this data structure envokes the error in question.

    Interestingly, when @mwitiderrick attempted to reproduce an error inside the collab notebook (not using the main, but the last release), the problem disappears: https://colab.research.google.com/drive/1aIrITYxgcR-5VmL4vm8P-6H4rvCBAeaX?usp=sharing However, it reappears (on the last release) when he attempted to run transformers QA pipeline in HF/Gradio: https://huggingface.co/spaces/neuralmagic/question-answering/blob/main/app.py

    bug 
    opened by dbogunowicz 0
  • Quantization and pruning for yolov7

    Quantization and pruning for yolov7

    I would like to perform quantization aware training for a custom object detector using yolov7 architecture. Could you please let me know if the functionality developed by deepsparse for yolov5 can be used straightway or what modifications do I need to make for me to use it for yolov7? Any leads would be appreciated. Thanks

    enhancement 
    opened by Sri20021 0
Releases(v1.3.0)
Owner
Neural Magic
Neural Magic
A graphical Semi-automatic annotation tool based on labelImg and Yolov5

💕YOLOV5 semi-automatic annotation tool (Based on labelImg)

EricFang 247 Jan 05, 2023
PyTorch implementation of residual gated graph ConvNets, ICLR’18

Residual Gated Graph ConvNets April 24, 2018 Xavier Bresson http://www.ntu.edu.sg/home/xbresson https://github.com/xbresson https://twitter.com/xbress

Xavier Bresson 112 Aug 10, 2022
Pytorch implementation of "M-LSD: Towards Light-weight and Real-time Line Segment Detection"

M-LSD: Towards Light-weight and Real-time Line Segment Detection Pytorch implementation of "M-LSD: Towards Light-weight and Real-time Line Segment Det

123 Jan 04, 2023
A PyTorch Lightning Callback for pushing models to the Hugging Face Hub 🤗⚡️

hf-hub-lightning A callback for pushing lightning models to the Hugging Face Hub. Note: I made this package for myself, mostly...if folks seem to be i

Nathan Raw 27 Dec 14, 2022
Using OpenAI's CLIP to upscale and enhance images

CLIP Upscaler and Enhancer Using OpenAI's CLIP to upscale and enhance images Based on nshepperd's JAX CLIP Guided Diffusion v2.4 Sample Results Viewpo

Tripp Lyons 5 Jun 14, 2022
Code for "Retrieving Black-box Optimal Images from External Databases" (WSDM 2022)

Retrieving Black-box Optimal Images from External Databases (WSDM 2022) We propose how a user retreives an optimal image from external databases of we

joisino 5 Apr 13, 2022
The Few-Shot Bot: Prompt-Based Learning for Dialogue Systems

Few-Shot Bot: Prompt-Based Learning for Dialogue Systems This repository includes the dataset, experiments results, and code for the paper: Few-Shot B

Andrea Madotto 103 Dec 28, 2022
gtfs2vec - Learning GTFS Embeddings for comparing PublicTransport Offer in Microregions

gtfs2vec This is a companion repository for a gtfs2vec - Learning GTFS Embeddings for comparing PublicTransport Offer in Microregions publication. Vis

Politechnika Wrocławska - repozytorium dla informatyków 5 Oct 10, 2022
Object Detection and Multi-Object Tracking

Object Detection and Multi-Object Tracking

Bobby Chen 1.6k Jan 04, 2023
TorchMD-Net provides state-of-the-art graph neural networks and equivariant transformer neural networks potentials for learning molecular potentials

TorchMD-net TorchMD-Net provides state-of-the-art graph neural networks and equivariant transformer neural networks potentials for learning molecular

TorchMD 104 Jan 03, 2023
[ICLR 2022] Pretraining Text Encoders with Adversarial Mixture of Training Signal Generators

AMOS This repository contains the scripts for fine-tuning AMOS pretrained models on GLUE and SQuAD 2.0 benchmarks. Paper: Pretraining Text Encoders wi

Microsoft 22 Sep 15, 2022
All course materials for the Zero to Mastery Machine Learning and Data Science course.

Zero to Mastery Machine Learning Welcome! This repository contains all of the code, notebooks, images and other materials related to the Zero to Maste

Daniel Bourke 1.6k Jan 08, 2023
Detectron2 is FAIR's next-generation platform for object detection and segmentation.

Detectron2 is Facebook AI Research's next generation software system that implements state-of-the-art object detection algorithms. It is a ground-up r

Facebook Research 23.3k Jan 08, 2023
NAS-FCOS: Fast Neural Architecture Search for Object Detection (CVPR 2020)

NAS-FCOS: Fast Neural Architecture Search for Object Detection This project hosts the train and inference code with pretrained model for implementing

Ning Wang 180 Dec 06, 2022
Open-source python package for the extraction of Radiomics features from 2D and 3D images and binary masks.

pyradiomics v3.0.1 Build Status Linux macOS Windows Radiomics feature extraction in Python This is an open-source python package for the extraction of

Artificial Intelligence in Medicine (AIM) Program 842 Dec 28, 2022
Research on Event Accumulator Settings for Event-Based SLAM

Research on Event Accumulator Settings for Event-Based SLAM This is the source code for paper "Research on Event Accumulator Settings for Event-Based

Robin Shaun 26 Dec 21, 2022
This is the code repository for the paper A hierarchical semantic segmentation framework for computer-vision-based bridge column damage detection

Bridge-damage-segmentation This is the code repository for the paper A hierarchical semantic segmentation framework for computer-vision-based bridge c

Jingxiao Liu 5 Dec 07, 2022
Fusion-DHL: WiFi, IMU, and Floorplan Fusion for Dense History of Locations in Indoor Environments

Fusion-DHL: WiFi, IMU, and Floorplan Fusion for Dense History of Locations in Indoor Environments Paper: arXiv (ICRA 2021) Video : https://youtu.be/CC

Sachini Herath 68 Jan 03, 2023
You are AllSet: A Multiset Function Framework for Hypergraph Neural Networks.

AllSet This is the repo for our paper: You are AllSet: A Multiset Function Framework for Hypergraph Neural Networks. We prepared all codes and a subse

Jianhao 51 Dec 24, 2022
Lite-HRNet: A Lightweight High-Resolution Network

LiteHRNet Benchmark 🔥 🔥 Based on MMsegmentation 🔥 🔥 Cityscapes FCN resize concat config mIoU last mAcc last eval last mIoU best mAcc best eval bes

16 Dec 12, 2022