ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

Overview

ONNX Runtime is a cross-platform inference and training machine-learning accelerator.

ONNX Runtime inference can enable faster customer experiences and lower costs, supporting models from deep learning frameworks such as PyTorch and TensorFlow/Keras as well as classical machine learning libraries such as scikit-learn, LightGBM, XGBoost, etc. ONNX Runtime is compatible with different hardware, drivers, and operating systems, and provides optimal performance by leveraging hardware accelerators where applicable alongside graph optimizations and transforms. Learn more →

ONNX Runtime training can accelerate the model training time on multi-node NVIDIA GPUs for transformer models with a one-line addition for existing PyTorch training scripts. Learn more →

Get Started

General Information: onnxruntime.ai

Usage documention and tutorials: onnxruntime.ai/docs

Companion sample repositories:

Build Pipeline Status

System CPU GPU EPs
Windows Build Status Build Status Build Status
Linux Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Mac Build Status
Build Status
Android Build Status
iOS Build Status
WebAssembly Build Status

Data/Telemetry

Windows distributions of this project may collect usage data and send it to Microsoft to help improve our products and services. See the privacy statement for more details.

Contributions and Feedback

We welcome contributions! Please see the contribution guidelines.

For feature requests or bug reports, please file a GitHub Issue.

For general discussion or questions, please use GitHub Discussions.

Code of Conduct

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

License

This project is licensed under the MIT License.

Comments
  • Openvino ep 2021.4 v3.3

    Openvino ep 2021.4 v3.3

    Changes enabled in OpenVINO EP for IO Buffer Optimization Enable Auto Plugin Feature

    Motivation and Context

    • Change was required to enable IO Buffer Optimization
    • Change was required to enable AutoPlugin, fix Multi, Hetero Flow
    • Change is ONNXRuntime API to get the Device Location for For ORT Value Tensor
    • If it fixes an open issue, please link to the issue here.
    opened by sfatimar 79
  • Java API for onnxruntime

    Java API for onnxruntime

    Description: This pull request provides a Java 8 API using JNI. It has unit tests ported from the v0.5.0 release of the C# API, I'll work on porting the new tests from the master branch over the next few weeks. I assume there will be some design & naming discussion on this PR so we can have that while I work on the unit tests.

    Currently it builds using a separate gradle project which I've tested on Mac & Linux. The build process involves running gradle clean build -x test; gradle build as the combination of a JNI and Java project in Gradle 5 isn't properly supported. I could do with some help integrating it into the CMake build system, but I've not used CMake much before. Integrating it into CMake will make it simpler to put in the appropriate provider compilation flags and fix the oddities in the build (as CMake has all the information necessary).

    opened by Craigacp 75
  • Support CUDA Graph

    Support CUDA Graph

    Description

    This PR wants to support the feature of CUDA Graph. This feature can significantly reduce the CPU overhead of calling CUDA APIs by submitting the entire graph to the GPU with a single call to cudaGraphLaunch.

    Motivation and Context

    • Why is this change required? What problem does it solve? This feature is pretty helpful to reduce the model latency, especially for the online inference, when the above CPU overhead is a bottleneck. For example, it can reduce the 95% latency of the transformer-based online inference model (with 148 millions of parameters) from 4.3ms to 2.1ms.
    opened by feihugis 72
  • Resolve Optim Params Issues

    Resolve Optim Params Issues

    • Includes a test of Optimizer Parameter Groups for the ONNX BERT Model (3 variations)
    • Resolves the issue of not passing default hyperparameters for parameters not in a group
    • Resolves the issue of sending 'lambda_coef' instead of 'lambda' to the backend
    • Resolves the issue of sending lr to the backend as a hyperparameter
    opened by rayankrish 68
  • Upgrade GIST memory compression nodes, kernels, optimizer rule, and cli

    Upgrade GIST memory compression nodes, kernels, optimizer rule, and cli

    Description: Extend Gist memory compression to support additional compression formats, support of new priority execution order, and other upgrades:

    • New Feature: GistPack1 compression. It compresses from float32/bool to 1 bit. It is used for lossless compression for dropout and relu nodes.
    • New Feature: GistPack8 compression. It compresses from 32 bits/16 bits to 8 bits. It is used for lossy compression for any operator.
    • New Feature: GistPackMsfp15 compression. It compresses 8 (or tile size) values each 32 bits wide to 8 (or tile size) values each 7 bits wide (sign and mantissa) and a single 8 bits shared exponent. It is used for lossy compression for any operator.
    • New Feature: GistPack16 compression. It compresses from 32 bits to 16 bits. It is used for lossy compression for any operator.
    • We also upgraded Gist rule to support different operators. We created a generic Gist rule as long as we provide a Pattern map. The pattern map has key as the target operator and value as the destination operator (e.g. PATTER_MAP[Sofmax] = {“SoftmaxGrad”}. Our rule is operator-agnostic, and makes Gist robust to support new operators in the future.
    • New test for Priority execution order for nested compression.
    • Gist upgrade to support priority execution order to trigger encoder (compression) and decoder (decompression) accordingly.
    • Gist CLI: --use_gist, --op <which operator is being targeted, e.g. Softmax is op 1> --gist_compr <GistPack1|GistPack8|GistPack16|GistPackMsfp15>

    Motivation and Context

    • Why is this change required? What problem does it solve? It fixes and improves Gist optimizer rule by changing Gist operators to handle 1 input and 1 output without the need of early encoder input or late decoder output. It also adds new compression format (Pack1, Pack8).
    training 
    opened by fninaparavecino 61
  • Multi-stream executor

    Multi-stream executor

    Description: This PR including following works:

    1. provide stream and related synchronization abstractions in onnxruntime.
    2. enhance onnxruntime's execution planner / executor / memory arena to support execute multiple streams in parallel.
    3. deprecate the parallel executor for cpu.
    4. deprecate the Fence mechanism.
    5. update the cuda / tensorrt EP to support the stream mechanism, support running different request in different cuda stream.

    Motivation and Context

    • Why is this change required? currently, the execution plan is just a linear list of those primitives, ort will execute them step by step. For any given graph, ORT will serialize it to a fixed execution order. This sequential execution design simplifies most scenarios, but it has the following limitations:
    1. it is difficult to enable inter-node parallelization, we have a half-baked parallel executor but it is very difficult to make it work with GPU.
    2. The fence mechanism can work with single gpu stream + cpu thread case, but when extend to multiple stream, it is difficult to manage the cross GPU stream synchronizations.
    3. our cuda EP rely on the BFCArena to make the memory management work with the GPU async kernels, but current BFCArena is not aware of the streams, so it doesn't behavior correctly when run with multiple streams.

    This PR enhance our existing execution plan and executor to support multiple stream execution. we use an unified algorithm to mange both single stream and multiple stream scenarios. This PR mainly focus on the infrastructure support for multiple stream execution, that is said, given a valid stream assignment, onnxruntime can execute it correctly. How to generate a good stream assignment for a given model will be in the future PR.

    opened by souptc 60
  • Amdmigraphx fix build error

    Amdmigraphx fix build error

    Description: Describe your changes. For build error related to EP API changes

    Motivation and Context

    1. ORT EP is changed to use shared lib, and APIs for EP is changed, AMD migraphx needs corresponding changes to work as an EP.
    2. Added a few operators that AMDMIGraphX implemented recently.
    • Why is this change required? What problem does it solve? See above explanation

    • If it fixes an open issue, please link to the issue here. No

    opened by scxiao 60
  • Python MacOS arm64 release binaries

    Python MacOS arm64 release binaries

    Describe the bug

    ONNX Runtime does not install using pip on M1.

    System information

    • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): macOS 11.2.1
    • ONNX Runtime installed from (source or binary): pip
    • Python version: 3.9.1

    To Reproduce

    ~: uname -v
    Darwin Kernel Version 20.3.0: Thu Jan 21 00:06:51 PST 2021; root:xnu-7195.81.3~1/RELEASE_ARM64_T8101
    ~: which python3
    /opt/homebrew/bin/python3
    ~: which pip
    /opt/homebrew/bin/pip
    ~: python3 --version
    Python 3.9.1
    ~: pip install onnxruntime
    ERROR: Could not find a version that satisfies the requirement onnxruntime
    ERROR: No matching distribution found for onnxruntime
    
    feature request 
    opened by lutzroeder 59
  • Bump numpy from 1.21.0 to 1.22.0 in /tools/ci_build/github/linux/docker/scripts/training/ortmodule/stage1/requirements_torch1.11.0_rocm4.3.1

    Bump numpy from 1.21.0 to 1.22.0 in /tools/ci_build/github/linux/docker/scripts/training/ortmodule/stage1/requirements_torch1.11.0_rocm4.3.1

    Bumps numpy from 1.21.0 to 1.22.0.

    Release notes

    Sourced from numpy's releases.

    v1.22.0

    NumPy 1.22.0 Release Notes

    NumPy 1.22.0 is a big release featuring the work of 153 contributors spread over 609 pull requests. There have been many improvements, highlights are:

    • Annotations of the main namespace are essentially complete. Upstream is a moving target, so there will likely be further improvements, but the major work is done. This is probably the most user visible enhancement in this release.
    • A preliminary version of the proposed Array-API is provided. This is a step in creating a standard collection of functions that can be used across application such as CuPy and JAX.
    • NumPy now has a DLPack backend. DLPack provides a common interchange format for array (tensor) data.
    • New methods for quantile, percentile, and related functions. The new methods provide a complete set of the methods commonly found in the literature.
    • A new configurable allocator for use by downstream projects.

    These are in addition to the ongoing work to provide SIMD support for commonly used functions, improvements to F2PY, and better documentation.

    The Python versions supported in this release are 3.8-3.10, Python 3.7 has been dropped. Note that 32 bit wheels are only provided for Python 3.8 and 3.9 on Windows, all other wheels are 64 bits on account of Ubuntu, Fedora, and other Linux distributions dropping 32 bit support. All 64 bit wheels are also linked with 64 bit integer OpenBLAS, which should fix the occasional problems encountered by folks using truly huge arrays.

    Expired deprecations

    Deprecated numeric style dtype strings have been removed

    Using the strings "Bytes0", "Datetime64", "Str0", "Uint32", and "Uint64" as a dtype will now raise a TypeError.

    (gh-19539)

    Expired deprecations for loads, ndfromtxt, and mafromtxt in npyio

    numpy.loads was deprecated in v1.15, with the recommendation that users use pickle.loads instead. ndfromtxt and mafromtxt were both deprecated in v1.17 - users should use numpy.genfromtxt instead with the appropriate value for the usemask parameter.

    (gh-19615)

    ... (truncated)

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    api 
    opened by dependabot[bot] 55
  • [Java] Adds support for DNNL, OpenVINO, TensorRT shared providers and refactors the CUDA shared provider loader

    [Java] Adds support for DNNL, OpenVINO, TensorRT shared providers and refactors the CUDA shared provider loader

    Description:

    Refactors the native library loading in Java to allow CUDA to be loaded on demand, fixing #7044. Then expands the shared provider library loading to DNNL, OpenVINO, TensorRT, fixing #6553.

    Added a flag to the native library loading to allow users to supply a directory which contains all the native libraries, fixing #8003. This is also the only way to make the shared library providers load from a different place than the jar, as the individual library path specification conflicts with the way that the ONNX Runtime native code loads the shared library providers.

    I also slightly refactored the Java cmake bits, and added the --console=plain flag to the gradle executions to stop gradle writing over cmake's output.

    Motivation and Context

    • Why is this change required? What problem does it solve? Re-enables DNNL, OpenVINO and TensorRT in Java by allowing them to be packaged in the jar and dynamically loaded in the same way CUDA is.
    • If it fixes an open issue, please link to the issue here. Fixes #6553. Fixes #7044. Fixes #8003.
    opened by Craigacp 54
  • Jetson Xavier - building from source

    Jetson Xavier - building from source

    1. I tried the solution proposed here: `../build.sh --config Release --update --build --build_wheel --use_tensorrt --cuda_home /usr/local/cuda --cudnn_home /usr/lib/aarch64-linux-gnu --tensorrt_home /usr/lib/aarch64-linux-gnu 2020-02-14 14:34:50,960 Build [INFO] - Build started 2020-02-14 14:34:50,960 Build [DEBUG] - Running subprocess in '/code/onnxruntime' ['git', 'submodule', 'sync', '--recursive'] Synchronizing submodule url for 'cmake/external/DNNLibrary' Synchronizing submodule url for 'cmake/external/DNNLibrary/third_party/flatbuffers' Synchronizing submodule url for 'cmake/external/DNNLibrary/third_party/glog' Synchronizing submodule url for 'cmake/external/DNNLibrary/third_party/onnx' Synchronizing submodule url for 'cmake/external/DNNLibrary/third_party/onnx/third_party/benchmark' Synchronizing submodule url for 'cmake/external/DNNLibrary/third_party/onnx/third_party/pybind11' Synchronizing submodule url for 'cmake/external/DNNLibrary/third_party/onnx/third_party/pybind11/tools/clang' Synchronizing submodule url for 'cmake/external/DNNLibrary/third_party/protobuf' Synchronizing submodule url for 'cmake/external/DNNLibrary/third_party/protobuf/third_party/benchmark' Synchronizing submodule url for 'cmake/external/DNNLibrary/third_party/protobuf/third_party/googletest' Synchronizing submodule url for 'cmake/external/DNNLibrary/third_party/pybind11' Synchronizing submodule url for 'cmake/external/DNNLibrary/third_party/pybind11/tools/clang' Synchronizing submodule url for 'cmake/external/cub' Synchronizing submodule url for 'cmake/external/date' Synchronizing submodule url for 'cmake/external/eigen' Synchronizing submodule url for 'cmake/external/gemmlowp' Synchronizing submodule url for 'cmake/external/googletest' Synchronizing submodule url for 'cmake/external/grpc' Synchronizing submodule url for 'cmake/external/grpc/third_party/abseil-cpp' Synchronizing submodule url for 'cmake/external/grpc/third_party/benchmark' Synchronizing submodule url for 'cmake/external/grpc/third_party/bloaty' Synchronizing submodule url for 'cmake/external/grpc/third_party/bloaty/third_party/googletest' Synchronizing submodule url for 'cmake/external/grpc/third_party/bloaty/third_party/libFuzzer' Synchronizing submodule url for 'cmake/external/grpc/third_party/bloaty/third_party/re2' Synchronizing submodule url for 'cmake/external/grpc/third_party/boringssl' Synchronizing submodule url for 'cmake/external/grpc/third_party/boringssl-with-bazel' Synchronizing submodule url for 'cmake/external/grpc/third_party/cares/cares' Synchronizing submodule url for 'cmake/external/grpc/third_party/data-plane-api' Synchronizing submodule url for 'cmake/external/grpc/third_party/gflags' Synchronizing submodule url for 'cmake/external/grpc/third_party/gflags/doc' Synchronizing submodule url for 'cmake/external/grpc/third_party/googleapis' Synchronizing submodule url for 'cmake/external/grpc/third_party/googletest' Synchronizing submodule url for 'cmake/external/grpc/third_party/libcxx' Synchronizing submodule url for 'cmake/external/grpc/third_party/libcxxabi' Synchronizing submodule url for 'cmake/external/grpc/third_party/protobuf' Synchronizing submodule url for 'cmake/external/grpc/third_party/protobuf/third_party/benchmark' Synchronizing submodule url for 'cmake/external/grpc/third_party/protobuf/third_party/googletest' Synchronizing submodule url for 'cmake/external/grpc/third_party/protoc-gen-validate' Synchronizing submodule url for 'cmake/external/grpc/third_party/upb' Synchronizing submodule url for 'cmake/external/grpc/third_party/upb/third_party/protobuf' Synchronizing submodule url for 'cmake/external/grpc/third_party/upb/third_party/protobuf/third_party/benchmark' Synchronizing submodule url for 'cmake/external/grpc/third_party/upb/third_party/protobuf/third_party/googletest' Synchronizing submodule url for 'cmake/external/grpc/third_party/zlib' Synchronizing submodule url for 'cmake/external/mimalloc' Synchronizing submodule url for 'cmake/external/nsync' Synchronizing submodule url for 'cmake/external/onnx' Synchronizing submodule url for 'cmake/external/onnx/third_party/benchmark' Synchronizing submodule url for 'cmake/external/onnx/third_party/pybind11' Synchronizing submodule url for 'cmake/external/onnx/third_party/pybind11/tools/clang' Synchronizing submodule url for 'cmake/external/onnx-tensorrt' Synchronizing submodule url for 'cmake/external/onnx-tensorrt/third_party/onnx' Synchronizing submodule url for 'cmake/external/onnx-tensorrt/third_party/onnx/third_party/benchmark' Synchronizing submodule url for 'cmake/external/onnx-tensorrt/third_party/onnx/third_party/pybind11' Synchronizing submodule url for 'cmake/external/onnx-tensorrt/third_party/onnx/third_party/pybind11/tools/clang' Synchronizing submodule url for 'cmake/external/protobuf' Synchronizing submodule url for 'cmake/external/protobuf/third_party/benchmark' Synchronizing submodule url for 'cmake/external/protobuf/third_party/googletest' Synchronizing submodule url for 'cmake/external/re2' Synchronizing submodule url for 'cmake/external/spdlog' Synchronizing submodule url for 'cmake/external/tvm' Synchronizing submodule url for 'cmake/external/tvm/3rdparty/HalideIR' Synchronizing submodule url for 'cmake/external/tvm/3rdparty/dlpack' Synchronizing submodule url for 'cmake/external/tvm/3rdparty/dmlc-core' Synchronizing submodule url for 'cmake/external/tvm/3rdparty/rang' Synchronizing submodule url for 'cmake/external/wil' 2020-02-14 14:34:52,305 Build [DEBUG] - Running subprocess in '/code/onnxruntime' ['git', 'submodule', 'update', '--init', '--recursive'] 2020-02-14 14:34:54,502 Build [INFO] - Generating CMake build tree 2020-02-14 14:34:54,504 Build [DEBUG] - Running subprocess in '/code/onnxruntime/build/Linux/Release' ['/usr/local/bin/cmake', '/code/onnxruntime/cmake', '-Donnxruntime_RUN_ONNX_TESTS=OFF', '-Donnxruntime_GENERATE_TEST_REPORTS=ON', '-Donnxruntime_DEV_MODE=OFF', '-DPYTHON_EXECUTABLE=/usr/bin/python3', '-Donnxruntime_USE_CUDA=ON', '-Donnxruntime_USE_NSYNC=OFF', '-Donnxruntime_CUDNN_HOME=/usr/lib/aarch64-linux-gnu', '-Donnxruntime_USE_AUTOML=OFF', '-Donnxruntime_CUDA_HOME=/usr/local/cuda', '-Donnxruntime_USE_JEMALLOC=OFF', '-Donnxruntime_USE_MIMALLOC=OFF', '-Donnxruntime_ENABLE_PYTHON=ON', '-Donnxruntime_BUILD_CSHARP=OFF', '-Donnxruntime_BUILD_SHARED_LIB=OFF', '-Donnxruntime_USE_EIGEN_FOR_BLAS=ON', '-Donnxruntime_USE_OPENBLAS=OFF', '-Donnxruntime_USE_MKLDNN=OFF', '-Donnxruntime_USE_MKLML=OFF', '-Donnxruntime_USE_GEMMLOWP=OFF', '-Donnxruntime_USE_NGRAPH=OFF', '-Donnxruntime_USE_OPENVINO=OFF', '-Donnxruntime_USE_OPENVINO_BINARY=OFF', '-Donnxruntime_USE_OPENVINO_SOURCE=OFF', '-Donnxruntime_USE_OPENVINO_MYRIAD=OFF', '-Donnxruntime_USE_OPENVINO_GPU_FP32=OFF', '-Donnxruntime_USE_OPENVINO_GPU_FP16=OFF', '-Donnxruntime_USE_OPENVINO_CPU_FP32=OFF', '-Donnxruntime_USE_OPENVINO_VAD_M=OFF', '-Donnxruntime_USE_OPENVINO_VAD_F=OFF', '-Donnxruntime_USE_NNAPI=OFF', '-Donnxruntime_USE_OPENMP=ON', '-Donnxruntime_USE_TVM=OFF', '-Donnxruntime_USE_LLVM=OFF', '-Donnxruntime_ENABLE_MICROSOFT_INTERNAL=OFF', '-Donnxruntime_USE_BRAINSLICE=OFF', '-Donnxruntime_USE_NUPHAR=OFF', '-Donnxruntime_USE_EIGEN_THREADPOOL=OFF', '-Donnxruntime_USE_TENSORRT=ON', '-Donnxruntime_TENSORRT_HOME=/usr/lib/aarch64-linux-gnu', '-Donnxruntime_CROSS_COMPILING=OFF', '-Donnxruntime_BUILD_SERVER=OFF', '-Donnxruntime_BUILD_x86=OFF', '-Donnxruntime_USE_FULL_PROTOBUF=ON', '-Donnxruntime_DISABLE_CONTRIB_OPS=OFF', '-Donnxruntime_MSVC_STATIC_RUNTIME=OFF', '-Donnxruntime_ENABLE_LANGUAGE_INTEROP_OPS=OFF', '-Donnxruntime_USE_DML=OFF', '-DCUDA_CUDA_LIBRARY=/usr/local/cuda/lib64/stubs', '-Donnxruntime_PYBIND_EXPORT_OPSCHEMA=OFF', '-DCMAKE_BUILD_TYPE=Release'] Use gtest from submodule -- Found PythonInterp: /usr/bin/python3 (found version "3.6.9") -- Found PythonInterp: /usr/bin/python3 (found suitable version "3.6.9", minimum required is "3.5") Use protobuf from submodule -- The CUDA compiler identification is NVIDIA 10.0.326 -- Check for working CUDA compiler: /usr/local/cuda-10.0/bin/nvcc -- Check for working CUDA compiler: /usr/local/cuda-10.0/bin/nvcc - broken CMake Error at /usr/local/share/cmake-3.17/Modules/CMakeTestCUDACompiler.cmake:46 (message): The CUDA compiler

      "/usr/local/cuda-10.0/bin/nvcc"

    is not able to compile a simple test program.

    It fails with the following output:

    Change Dir: /code/onnxruntime/build/Linux/Release/CMakeFiles/CMakeTmp
    
    Run Build Command(s):/usr/bin/make cmTC_bb43d/fast && /usr/bin/make -f CMakeFiles/cmTC_bb43d.dir/build.make CMakeFiles/cmTC_bb43d.dir/build
    make[1]: Entering directory '/code/onnxruntime/build/Linux/Release/CMakeFiles/CMakeTmp'
    Building CUDA object CMakeFiles/cmTC_bb43d.dir/main.cu.o
    /usr/local/cuda-10.0/bin/nvcc    -cudart shared  -Xcompiler=-fPIE   -x cu -c /code/onnxruntime/build/Linux/Release/CMakeFiles/CMakeTmp/main.cu -o CMakeFiles/cmTC_bb43d.dir/main.cu.o
    Linking CUDA executable cmTC_bb43d
    /usr/local/bin/cmake -E cmake_link_script CMakeFiles/cmTC_bb43d.dir/link.txt --verbose=1
    /usr/bin/g++   CMakeFiles/cmTC_bb43d.dir/main.cu.o -o cmTC_bb43d  -lcudadevrt -lcudart_static  -L"/usr/local/cuda-10.0/targets/aarch64-linux/lib/stubs" -L"/usr/local/cuda-10.0/targets/aarch64-linux/lib" -lcudadevrt -lcudart
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::globalState::initializeDriverEntrypoints()':
    :(.text+0x23488): undefined reference to `dlsym'
    :(.text+0x234b0): undefined reference to `dlsym'
    :(.text+0x234d4): undefined reference to `dlsym'
    :(.text+0x234f8): undefined reference to `dlsym'
    :(.text+0x2351c): undefined reference to `dlsym'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o)::(.text+0x23540): more undefined references to `dlsym' follow
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::globalState::loadDriverInternal()':
    :(.text+0x288cc): undefined reference to `dlopen'
    :(.text+0x28904): undefined reference to `dlclose'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::__loadDriverInternalUtil()':
    :(.text+0x289e0): undefined reference to `dlopen'
    :(.text+0x28a14): undefined reference to `dlclose'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::globalState::initializeDriverInternal()':
    :(.text+0x2b664): undefined reference to `dlclose'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosInit()':
    :(.text+0x5c7bc): undefined reference to `dlerror'
    :(.text+0x5c7c8): undefined reference to `dlopen'
    :(.text+0x5c7dc): undefined reference to `dlsym'
    :(.text+0x5c7e4): undefined reference to `dlerror'
    :(.text+0x5c7f4): undefined reference to `dlclose'
    :(.text+0x5c838): undefined reference to `dlerror'
    :(.text+0x5c844): undefined reference to `dlopen'
    :(.text+0x5c858): undefined reference to `dlsym'
    :(.text+0x5c860): undefined reference to `dlerror'
    :(.text+0x5c870): undefined reference to `dlclose'
    :(.text+0x5c8b4): undefined reference to `dlerror'
    :(.text+0x5c8c0): undefined reference to `dlopen'
    :(.text+0x5c8d4): undefined reference to `dlsym'
    :(.text+0x5c8dc): undefined reference to `dlerror'
    :(.text+0x5c8ec): undefined reference to `dlclose'
    :(.text+0x5c930): undefined reference to `dlerror'
    :(.text+0x5c93c): undefined reference to `dlopen'
    :(.text+0x5c950): undefined reference to `dlsym'
    :(.text+0x5c958): undefined reference to `dlerror'
    :(.text+0x5c968): undefined reference to `dlclose'
    :(.text+0x5c9a0): undefined reference to `dlerror'
    :(.text+0x5c9ac): undefined reference to `dlopen'
    :(.text+0x5c9c0): undefined reference to `dlsym'
    :(.text+0x5c9c8): undefined reference to `dlerror'
    :(.text+0x5c9d8): undefined reference to `dlclose'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosSemaphoreCreate(sem_t*, int)':
    :(.text+0x5d910): undefined reference to `sem_init'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosSemaphoreDestroy(sem_t*)':
    :(.text+0x5d92c): undefined reference to `sem_destroy'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosSemaphoreWait(sem_t*, unsigned int)':
    :(.text+0x5da10): undefined reference to `sem_timedwait'
    :(.text+0x5da48): undefined reference to `sem_wait'
    :(.text+0x5da60): undefined reference to `sem_trywait'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosSemaphoreSignal(sem_t*)':
    :(.text+0x5dab0): undefined reference to `sem_post'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosVirtualReserveInRangeBug1778973WARInit()':
    :(.text+0x5f448): undefined reference to `pthread_mutexattr_init'
    :(.text+0x5f464): undefined reference to `pthread_mutexattr_settype'
    :(.text+0x5f474): undefined reference to `pthread_mutexattr_setpshared'
    :(.text+0x5f484): undefined reference to `pthread_mutexattr_setprotocol'
    :(.text+0x5f4a4): undefined reference to `pthread_mutexattr_destroy'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosPosixInit()':
    :(.text+0x5f4f0): undefined reference to `dlerror'
    :(.text+0x5f4fc): undefined reference to `dlopen'
    :(.text+0x5f510): undefined reference to `dlsym'
    :(.text+0x5f518): undefined reference to `dlerror'
    :(.text+0x5f528): undefined reference to `dlclose'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosVirtualReserveInRange(unsigned long, void*, void*, unsigned long)':
    :(.text+0x5f768): undefined reference to `pthread_once'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosLoadLibrary(char const*)':
    :(.text+0x5fc8c): undefined reference to `dlerror'
    :(.text+0x5fca0): undefined reference to `dlopen'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosLoadLibraryUnsafe(char const*)':
    :(.text+0x5fcb4): undefined reference to `dlerror'
    :(.text+0x5fcc8): undefined reference to `dlopen'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosFreeLibrary(void*)':
    :(.text+0x5fcd4): undefined reference to `dlclose'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosGetProcAddress(void*, char const*)':
    :(.text+0x5fce8): undefined reference to `dlsym'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosTlsAlloc(void (*)(void*))':
    :(.text+0x5fdec): undefined reference to `pthread_key_create'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosTlsFree(unsigned int)':
    :(.text+0x5fe10): undefined reference to `pthread_key_delete'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosTlsGetValue(unsigned int)':
    :(.text+0x5fe18): undefined reference to `pthread_getspecific'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosTlsSetValue(unsigned int, void*)':
    :(.text+0x5fe28): undefined reference to `pthread_setspecific'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosInitializeCriticalSectionWithSharedFlag(pthread_mutex_t*, int)':
    :(.text+0x5fef4): undefined reference to `pthread_mutexattr_init'
    :(.text+0x5ff14): undefined reference to `pthread_mutexattr_settype'
    :(.text+0x5ff24): undefined reference to `pthread_mutexattr_setpshared'
    :(.text+0x5ff34): undefined reference to `pthread_mutexattr_setprotocol'
    :(.text+0x5ff50): undefined reference to `pthread_mutexattr_destroy'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosInitializeCriticalSection(pthread_mutex_t*)':
    :(.text+0x5ff70): undefined reference to `pthread_mutexattr_init'
    :(.text+0x5ff8c): undefined reference to `pthread_mutexattr_settype'
    :(.text+0x5ff9c): undefined reference to `pthread_mutexattr_setpshared'
    :(.text+0x5ffac): undefined reference to `pthread_mutexattr_setprotocol'
    :(.text+0x5ffc8): undefined reference to `pthread_mutexattr_destroy'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosInitializeCriticalSectionShared(pthread_mutex_t*)':
    :(.text+0x5ffe8): undefined reference to `pthread_mutexattr_init'
    :(.text+0x60004): undefined reference to `pthread_mutexattr_settype'
    :(.text+0x60014): undefined reference to `pthread_mutexattr_setpshared'
    :(.text+0x60024): undefined reference to `pthread_mutexattr_setprotocol'
    :(.text+0x60040): undefined reference to `pthread_mutexattr_destroy'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosTryEnterCriticalSection(pthread_mutex_t*)':
    :(.text+0x60058): undefined reference to `pthread_mutex_trylock'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosInitRWLockEx(void**, void*, unsigned long)':
    :(.text+0x600b4): undefined reference to `pthread_rwlockattr_init'
    :(.text+0x600c4): undefined reference to `pthread_rwlockattr_setpshared'
    :(.text+0x600d4): undefined reference to `pthread_rwlock_init'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosInitRWLock(void**)':
    :(.text+0x60114): undefined reference to `pthread_rwlockattr_init'
    :(.text+0x60144): undefined reference to `pthread_rwlockattr_setpshared'
    :(.text+0x60154): undefined reference to `pthread_rwlock_init'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosAcquireReaderLock(void**)':
    :(.text+0x60164): undefined reference to `pthread_rwlock_rdlock'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosAcquireWriterLock(void**)':
    :(.text+0x6016c): undefined reference to `pthread_rwlock_wrlock'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosTryAcquireReaderLock(void**)':
    :(.text+0x6017c): undefined reference to `pthread_rwlock_tryrdlock'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosTryAcquireWriterLock(void**)':
    :(.text+0x601a4): undefined reference to `pthread_rwlock_trywrlock'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosReleaseReaderLock(void**)':
    :(.text+0x601c4): undefined reference to `pthread_rwlock_unlock'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosReleaseWriterLock(void**)':
    :(.text+0x601cc): undefined reference to `pthread_rwlock_unlock'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosDestroyRWLockEx(void**)':
    :(.text+0x601d4): undefined reference to `pthread_rwlock_destroy'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosDestroyRWLock(void**)':
    :(.text+0x601ec): undefined reference to `pthread_rwlock_destroy'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosOnce(int*, void (*)())':
    :(.text+0x60210): undefined reference to `pthread_once'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosCondCreateWithSharedFlag(pthread_cond_t*, int)':
    :(.text+0x60250): undefined reference to `pthread_condattr_setpshared'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosCondCreate(pthread_cond_t*)':
    :(.text+0x602b0): undefined reference to `pthread_condattr_setpshared'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosCondCreateShared(pthread_cond_t*)':
    :(.text+0x60310): undefined reference to `pthread_condattr_setpshared'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosThreadCreateWithName(cudart::CUOSthread_st**, int (*)(void*), void*, char const*)':
    :(.text+0x60564): undefined reference to `pthread_create'
    :(.text+0x60578): undefined reference to `pthread_setname_np'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosThreadCreate(cudart::CUOSthread_st**, int (*)(void*), void*)':
    :(.text+0x60640): undefined reference to `pthread_create'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosThreadJoin(cudart::CUOSthread_st*, int*)':
    :(.text+0x606a8): undefined reference to `pthread_join'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosThreadDetach(cudart::CUOSthread_st*)':
    :(.text+0x60708): undefined reference to `pthread_detach'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosHasThreadExited(cudart::CUOSthread_st*)':
    :(.text+0x60758): undefined reference to `pthread_kill'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosShmCreateNamedEx(void*, char const*, unsigned long, cudart::cuosShmInfoEx_st**)':
    :(.text+0x60ee0): undefined reference to `shm_unlink'
    :(.text+0x60ef8): undefined reference to `shm_open'
    :(.text+0x60f98): undefined reference to `shm_unlink'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosShmOpenNamedEx(void*, char const*, unsigned long, cudart::cuosShmInfoEx_st**)':
    :(.text+0x61124): undefined reference to `shm_open'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosShmCloseEx(cudart::cuosShmInfoEx_st*, unsigned int, unsigned int)':
    :(.text+0x61370): undefined reference to `shm_unlink'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosSetThreadName(cudart::CUOSthread_st*, char const*)':
    :(.text+0x62294): undefined reference to `pthread_setname_np'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `CUOSdlsymLoader<int (*)(int, sockaddr*, unsigned int*, int)>::~CUOSdlsymLoader()':
    :(.text._ZN15CUOSdlsymLoaderIPFiiP8sockaddrPjiEED2Ev[_ZN15CUOSdlsymLoaderIPFiiP8sockaddrPjiEED5Ev]+0x18): undefined reference to `dlclose'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `CUOSdlsymLoader<int (*)(int*, int)>::~CUOSdlsymLoader()':
    :(.text._ZN15CUOSdlsymLoaderIPFiPiiEED2Ev[_ZN15CUOSdlsymLoaderIPFiPiiEED5Ev]+0x18): undefined reference to `dlclose'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `CUOSdlsymLoader<int (*)(unsigned long, unsigned long, unsigned long const*)>::~CUOSdlsymLoader()':
    :(.text._ZN15CUOSdlsymLoaderIPFimmPKmEED2Ev[_ZN15CUOSdlsymLoaderIPFimmPKmEED5Ev]+0x18): undefined reference to `dlclose'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `CUOSdlsymLoader<int (*)(unsigned long, unsigned long, unsigned long*)>::~CUOSdlsymLoader()':
    :(.text._ZN15CUOSdlsymLoaderIPFimmPmEED2Ev[_ZN15CUOSdlsymLoaderIPFimmPmEED5Ev]+0x18): undefined reference to `dlclose'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `CUOSdlsymLoader<int (*)()>::~CUOSdlsymLoader()':
    :(.text._ZN15CUOSdlsymLoaderIPFivEED2Ev[_ZN15CUOSdlsymLoaderIPFivEED5Ev]+0x18): undefined reference to `dlclose'
    collect2: error: ld returned 1 exit status
    CMakeFiles/cmTC_bb43d.dir/build.make:103: recipe for target 'cmTC_bb43d' failed
    make[1]: *** [cmTC_bb43d] Error 1
    make[1]: Leaving directory '/code/onnxruntime/build/Linux/Release/CMakeFiles/CMakeTmp'
    Makefile:138: recipe for target 'cmTC_bb43d/fast' failed
    make: *** [cmTC_bb43d/fast] Error 2
    

    CMake will not be able to correctly generate this project. Call Stack (most recent call first): CMakeLists.txt:715 (enable_language)

    -- Configuring incomplete, errors occurred! See also "/code/onnxruntime/build/Linux/Release/CMakeFiles/CMakeOutput.log". See also "/code/onnxruntime/build/Linux/Release/CMakeFiles/CMakeError.log". Traceback (most recent call last): File "/code/onnxruntime/tools/ci_build/build.py", line 1043, in sys.exit(main()) File "/code/onnxruntime/tools/ci_build/build.py", line 972, in main args, cmake_extra_args) File "/code/onnxruntime/tools/ci_build/build.py", line 422, in generate_build_tree run_subprocess(cmake_args + ["-DCMAKE_BUILD_TYPE={}".format(config)], cwd=config_build_dir) File "/code/onnxruntime/tools/ci_build/build.py", line 196, in run_subprocess return subprocess.run(args, cwd=cwd, check=True, stdout=stdout, stderr=stderr, env=my_env, shell=shell) File "/usr/lib/python3.6/subprocess.py", line 438, in run output=stdout, stderr=stderr) subprocess.CalledProcessError: Command '['/usr/local/bin/cmake', '/code/onnxruntime/cmake', '-Donnxruntime_RUN_ONNX_TESTS=OFF', '-Donnxruntime_GENERATE_TEST_REPORTS=ON', '-Donnxruntime_DEV_MODE=OFF', '-DPYTHON_EXECUTABLE=/usr/bin/python3', '-Donnxruntime_USE_CUDA=ON', '-Donnxruntime_USE_NSYNC=OFF', '-Donnxruntime_CUDNN_HOME=/usr/lib/aarch64-linux-gnu', '-Donnxruntime_USE_AUTOML=OFF', '-Donnxruntime_CUDA_HOME=/usr/local/cuda', '-Donnxruntime_USE_JEMALLOC=OFF', '-Donnxruntime_USE_MIMALLOC=OFF', '-Donnxruntime_ENABLE_PYTHON=ON', '-Donnxruntime_BUILD_CSHARP=OFF', '-Donnxruntime_BUILD_SHARED_LIB=OFF', '-Donnxruntime_USE_EIGEN_FOR_BLAS=ON', '-Donnxruntime_USE_OPENBLAS=OFF', '-Donnxruntime_USE_MKLDNN=OFF', '-Donnxruntime_USE_MKLML=OFF', '-Donnxruntime_USE_GEMMLOWP=OFF', '-Donnxruntime_USE_NGRAPH=OFF', '-Donnxruntime_USE_OPENVINO=OFF', '-Donnxruntime_USE_OPENVINO_BINARY=OFF', '-Donnxruntime_USE_OPENVINO_SOURCE=OFF', '-Donnxruntime_USE_OPENVINO_MYRIAD=OFF', '-Donnxruntime_USE_OPENVINO_GPU_FP32=OFF', '-Donnxruntime_USE_OPENVINO_GPU_FP16=OFF', '-Donnxruntime_USE_OPENVINO_CPU_FP32=OFF', '-Donnxruntime_USE_OPENVINO_VAD_M=OFF', '-Donnxruntime_USE_OPENVINO_VAD_F=OFF', '-Donnxruntime_USE_NNAPI=OFF', '-Donnxruntime_USE_OPENMP=ON', '-Donnxruntime_USE_TVM=OFF', '-Donnxruntime_USE_LLVM=OFF', '-Donnxruntime_ENABLE_MICROSOFT_INTERNAL=OFF', '-Donnxruntime_USE_BRAINSLICE=OFF', '-Donnxruntime_USE_NUPHAR=OFF', '-Donnxruntime_USE_EIGEN_THREADPOOL=OFF', '-Donnxruntime_USE_TENSORRT=ON', '-Donnxruntime_TENSORRT_HOME=/usr/lib/aarch64-linux-gnu', '-Donnxruntime_CROSS_COMPILING=OFF', '-Donnxruntime_BUILD_SERVER=OFF', '-Donnxruntime_BUILD_x86=OFF', '-Donnxruntime_USE_FULL_PROTOBUF=ON', '-Donnxruntime_DISABLE_CONTRIB_OPS=OFF', '-Donnxruntime_MSVC_STATIC_RUNTIME=OFF', '-Donnxruntime_ENABLE_LANGUAGE_INTEROP_OPS=OFF', '-Donnxruntime_USE_DML=OFF', '-DCUDA_CUDA_LIBRARY=/usr/local/cuda/lib64/stubs', '-Donnxruntime_PYBIND_EXPORT_OPSCHEMA=OFF', '-DCMAKE_BUILD_TYPE=Release']' returned non-zero exit status 1. `

    opened by AndreV84 52
  • make WITHCACHE as an option in MacOS workflow

    make WITHCACHE as an option in MacOS workflow

    Description

    1. Set the WithCache default value as false in Mac OS CI workflow too.
    2. Add date of today in cache key to avoid cache size keep increasing too.

    WithCache, the pipeline duration reduced from 70 more minutes to 10 more minutes

    opened by mszhanyi 0
  • please reopen the issue

    please reopen the issue

    Describe the issue

    Could you please reopen this issue? We get the same problem in opset_version=16. issue: https://github.com/microsoft/onnxruntime/issues/2756#issue-543199292.

    Urgency

    No response

    Target platform

    Windows

    Build script

    .

    Error / output

    onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running BatchNormalization node. Name:'BatchNormalization_123' Status Message: D:\a_work\1\s\onnxruntime\core\framework\op_kernel.cc:81 onnxruntime::OpKernelContext::OutputMLValue status.IsOK() was false. Shape mismatch attempting to re-use buffer. {1,3,256,192} != {1,6,256,192}. Validate usage of dim_value (values should be > 0) and dim_param (all values with the same string should equate to the same size) in shapes in the model.

    Visual Studio Version

    No response

    GCC / Compiler Version

    No response

    build platform:windows 
    opened by shu0o0yX 0
  • CUDNN error executing cudnnConvolutionForward

    CUDNN error executing cudnnConvolutionForward

    Describe the issue

    Hi, I'm running the same ONNX model on many different machines in azure (all of the same type, same configuration, docker, etc...) and on some of them I get the following error on the first batch which is being executed:

    <class 'onnxruntime.capi.onnxruntime_pybind11_state.Fail'>
    
    [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running Conv node. Name:'efficientnetb4/stem_conv/Conv2D' Status Message: CUDNN error executing cudnnConvolutionForward(s_.handle, &alpha, s_.x_tensor, s_.x_data, s_.w_desc, s_.w_data, s_.conv_desc, s_.algo, workspace.get(), s_.workspace_bytes, &beta, s_.y_tensor, s_.y_data)
    

    It happens only on some of the machines, and only on the first message.

    To reproduce

    onnxruntime-gpu==1.10.0

     ONNX_PROVIDERS = [
         ('CUDAExecutionProvider', {
             'device_id': 0,
             'cudnn_conv_algo_search': 'DEFAULT', 
         }),
     ]
    ONNX_SESSION_OPTIONS = onnxruntime.SessionOptions()
    ONNX_SESSION_OPTIONS.graph_optimization_level = onnxruntime.GraphOptimizationLevel.ORT_ENABLE_ALL
    efficientnet = onnxruntime.InferenceSession(str(fe_net_weights),
                                                            sess_options=ONNX_SESSION_OPTIONS,
                                                            providers=ONNX_PROVIDERS)
    
    feature_extractor.run([output_layer], {"input": input})
    

    Urgency

    No response

    Platform

    Linux

    OS Version

    Ubuntu 20.04

    ONNX Runtime Installation

    Released Package

    ONNX Runtime Version or Commit ID

    onnxruntime-gpu==1.10.0

    ONNX Runtime API

    Python

    Architecture

    X64

    Execution Provider

    CUDA

    Execution Provider Library Version

    cuda 11.3.0, cudnn8

    ep:CUDA 
    opened by kfirgoldwsc 0
  • How to save inference onnx model?

    How to save inference onnx model?

    Describe the issue

    Now I can build my own training session from torch net, but when I save onnx model after training, BatchNormalization is in training mode and can not fuse to conv. What should I do to save inference model ? current format: 1

    expect format: 0

    To reproduce

    2

    Urgency

    No response

    ONNX Runtime Installation

    Built from Source

    ONNX Runtime Version or Commit ID

    1.8.1

    PyTorch Version

    3.7

    Execution Provider

    CUDA

    Execution Provider Library Version

    No response

    training ep:CUDA 
    opened by ArtyZe 0
  • [MIGraphX] update the MIGraphX version used in ORT to rocm-5.4.0

    [MIGraphX] update the MIGraphX version used in ORT to rocm-5.4.0

    Description

    Update the MIGraphX version used in ORT to rocm-5.4.0

    Motivation and Context

    The previous branch migraphx_for_ort has stopped updating, it is too far away from the MIgraphX latest release branch. More discussion here: https://github.com/microsoft/onnxruntime/issues/14126#issuecomment-1373201049

    opened by PeixuanZuo 0
  • Update HistogramCalibrater.collect_data method to reduce memory consumption

    Update HistogramCalibrater.collect_data method to reduce memory consumption

    Description

    Updated HistogramCalibrater.collect_data method.

    Inference results are no longer appended to self.intermediate_outputs list. Instead, self.collector.collect method is called inside a while loop.

    Motivation and Context

    When CalibrationMethod.Entropy or CalibrationMethod.Percentile is specified, HistogramCalibrater class is used.

    In the HistogramCalibrater.collect_data method, all the intermediate outputs are taken in prior to collect histograms using HistogramCollector class. But this two-pass scheme consumes a lot of memory when a network has many intermediate output nodes and there're a lot of data that CalibrationDataReader provides.

    Please be noted that quantized models aren't identical after the changes. I suppose it won't cause harmful results though.

    opened by beru 0
Releases(v1.13.1)
Owner
Microsoft
Open source projects and samples from Microsoft
Microsoft
Image augmentation library in Python for machine learning.

Augmentor is an image augmentation library in Python for machine learning. It aims to be a standalone library that is platform and framework independe

Marcus D. Bloice 4.8k Jan 07, 2023
MIRACLE (Missing data Imputation Refinement And Causal LEarning)

MIRACLE (Missing data Imputation Refinement And Causal LEarning) Code Author: Trent Kyono This repository contains the code used for the "MIRACLE: Cau

van_der_Schaar \LAB 15 Dec 29, 2022
Neural-PIL: Neural Pre-Integrated Lighting for Reflectance Decomposition - NeurIPS2021

Neural-PIL: Neural Pre-Integrated Lighting for Reflectance Decomposition Project Page | Video | Paper Implementation for Neural-PIL. A novel method wh

Computergraphics (University of Tübingen) 64 Dec 29, 2022
Unified file system operation experience for different backend

megfile - Megvii FILE library Docs: http://megvii-research.github.io/megfile megfile provides a silky operation experience with different backends (cu

MEGVII Research 76 Dec 14, 2022
​TextWorld is a sandbox learning environment for the training and evaluation of reinforcement learning (RL) agents on text-based games.

TextWorld A text-based game generator and extensible sandbox learning environment for training and testing reinforcement learning (RL) agents. Also ch

Microsoft 983 Dec 23, 2022
the official implementation of the paper "Isometric Multi-Shape Matching" (CVPR 2021)

Isometric Multi-Shape Matching (IsoMuSh) Paper-CVF | Paper-arXiv | Video | Code Citation If you find our work useful in your research, please consider

Maolin Gao 9 Jul 17, 2022
Multi-objective constrained optimization for energy applications via tree ensembles

Multi-objective constrained optimization for energy applications via tree ensembles

C⚙G - Imperial College London 1 Nov 19, 2021
Image Segmentation Evaluation

Image Segmentation Evaluation Martin Keršner, [email protected] Evaluation

Martin Kersner 273 Oct 28, 2022
torchsummaryDynamic: support real FLOPs calculation of dynamic network or user-custom PyTorch ops

torchsummaryDynamic Improved tool of torchsummaryX. torchsummaryDynamic support real FLOPs calculation of dynamic network or user-custom PyTorch ops.

Bohong Chen 1 Jan 07, 2022
This repository includes code of my study about Asynchronous in Frequency domain of GAN images.

Exploring the Asynchronous of the Frequency Spectra of GAN-generated Facial Images Binh M. Le & Simon S. Woo, "Exploring the Asynchronous of the Frequ

4 Aug 06, 2022
RaftMLP: How Much Can Be Done Without Attention and with Less Spatial Locality?

RaftMLP RaftMLP: How Much Can Be Done Without Attention and with Less Spatial Locality? By Yuki Tatsunami and Masato Taki (Rikkyo University) [arxiv]

Okojo 20 Aug 31, 2022
Acoustic mosquito detection code with Bayesian Neural Networks

HumBugDB Acoustic mosquito detection with Bayesian Neural Networks. Extract audio or features from our large-scale dataset on Zenodo. This repository

31 Nov 28, 2022
Jittor is a high-performance deep learning framework based on JIT compiling and meta-operators.

Jittor: a Just-in-time(JIT) deep learning framework Quickstart | Install | Tutorial | Chinese Jittor is a high-performance deep learning framework bas

2.7k Jan 03, 2023
In this project we predict the forest cover type using the cartographic variables in the training/test datasets.

Kaggle Competition: Forest Cover Type Prediction In this project we predict the forest cover type (the predominant kind of tree cover) using the carto

Marianne Joy Leano 1 Mar 15, 2022
LEDNet: A Lightweight Encoder-Decoder Network for Real-time Semantic Segmentation

LEDNet: A Lightweight Encoder-Decoder Network for Real-time Semantic Segmentation Table of Contents: Introduction Project Structure Installation Datas

Yu Wang 492 Dec 02, 2022
OpenMMLab Detection Toolbox and Benchmark

MMDetection is an open source object detection toolbox based on PyTorch. It is a part of the OpenMMLab project.

OpenMMLab 22.5k Jan 05, 2023
Resources related to EMNLP 2021 paper "FAME: Feature-Based Adversarial Meta-Embeddings for Robust Input Representations"

FAME: Feature-based Adversarial Meta-Embeddings This is the companion code for the experiments reported in the paper "FAME: Feature-Based Adversarial

Bosch Research 11 Nov 27, 2022
PyG (PyTorch Geometric) - A library built upon PyTorch to easily write and train Graph Neural Networks (GNNs)

PyG (PyTorch Geometric) is a library built upon PyTorch to easily write and train Graph Neural Networks (GNNs) for a wide range of applications related to structured data.

PyG 16.5k Jan 08, 2023
Official implementation for “Unsupervised Low-Light Image Enhancement via Histogram Equalization Prior”

HEP Unsupervised Low-Light Image Enhancement via Histogram Equalization Prior Implementation Python3 PyTorch=1.0 NVIDIA GPU+CUDA Training process The

FengZhang 34 Dec 04, 2022
A small demonstration of using WebDataset with ImageNet and PyTorch Lightning

A small demonstration of using WebDataset with ImageNet and PyTorch Lightning This is a small repo illustrating how to use WebDataset on ImageNet. usi

50 Dec 16, 2022