DirectML is a high-performance, hardware-accelerated DirectX 12 library for machine learning.

Overview

DirectML

DirectML is a high-performance, hardware-accelerated DirectX 12 library for machine learning. DirectML provides GPU acceleration for common machine learning tasks across a broad range of supported hardware and drivers, including all DirectX 12-capable GPUs from vendors such as AMD, Intel, NVIDIA, and Qualcomm.

When used standalone, the DirectML API is a low-level DirectX 12 library and is suitable for high-performance, low-latency applications such as frameworks, games, and other real-time applications. The seamless interoperability of DirectML with Direct3D 12 as well as its low overhead and conformance across hardware makes DirectML ideal for accelerating machine learning when both high performance is desired, and the reliability and predictability of results across hardware is critical.

More information about DirectML can be found in Introduction to DirectML.

Visit the DirectX Landing Page for more resources for DirectX developers.

Getting Started with DirectML

DirectML is distributed as a system component of Windows 10, and is available as part of the Windows 10 operating system (OS) in Windows 10, version 1903 (10.0; Build 18362), and newer.

Starting with DirectML version 1.4.0, DirectML is also available as a standalone redistributable package (see Microsoft.AI.DirectML), which is useful for applications that wish to use a fixed version of DirectML, or when running on older versions of Windows 10.

Hardware requirements

DirectML requires a DirectX 12 capable device. Almost all commercially-available graphics cards released in the last several years support DirectX 12. Examples of compatible hardware include:

  • AMD GCN 1st Gen (Radeon HD 7000 series) and above
  • Intel Haswell (4th-gen core) HD Integrated Graphics and above
  • NVIDIA Kepler (GTX 600 series) and above
  • Qualcomm Adreno 600 and above

For application developers

DirectML exposes a native C++ DirectX 12 API. The header and library (DirectML.h/DirectML.lib) are available as part of the redistributable NuGet package, and are also included in the Windows 10 SDK version 10.0.18362 or newer.

For users, data scientists, and researchers

DirectML is built-in as a backend to several frameworks such as Windows ML, ONNX Runtime, and TensorFlow.

See the following sections for more information:

DirectML Samples

DirectML C++ sample code is available under Samples.

  • HelloDirectML: A minimal "hello world" application that executes a single DirectML operator.
  • DirectMLSuperResolution: A sample that uses DirectML to execute a basic super-resolution model to upscale video from 540p to 1080p in real time.
  • yolov4: YOLOv4 is an object detection model capable of recognizing up to 80 different classes of objects in an image. This sample contains a complete end-to-end implementation of the model using DirectML, and is able to run in real time on a user-provided video stream.

DirectML Python sample code is available under Python/samples. The samples require PyDirectML, an open source Python projection library for DirectML, which can be built and installed to a Python executing environment from Python/src. Refer to the Python/README.md file for more details.

Windows ML on DirectML

Windows ML (WinML) is a high-performance, reliable API for deploying hardware-accelerated ML inferences on Windows devices. DirectML provides the GPU backend for Windows ML.

DirectML acceleration can be enabled in Windows ML using the LearningModelDevice with any one of the DirectX DeviceKinds.

For more information, see Get Started with Windows ML.

ONNX Runtime on DirectML

ONNX Runtime is a cross-platform inferencing and training accelerator compatible with many popular ML/DNN frameworks, including PyTorch, TensorFlow/Keras, scikit-learn, and more.

DirectML is available as an optional execution provider for ONNX Runtime that provides hardware acceleration when running on Windows 10.

For more information about getting started, see Using the DirectML execution provider.

TensorFlow with DirectML

TensorFlow is a popular open source platform for machine learning and is a leading framework for training of machine learning models.

DirectML acceleration for TensorFlow 1.15 is currently available for Public Preview. TensorFlow on DirectML enables training and inference of complex machine learning models on a wide range of DirectX 12-compatible hardware.

TensorFlow on DirectML is supported on both the latest versions of Windows 10 and the Windows Subsystem for Linux, and is available for download as a PyPI package. For more information about getting started, see GPU accelerated ML training (docs.microsoft.com)

PyTorch with DirectML

DirectML acceleration for PyTorch 1.8.0 is currently available for Public Preview. PyTorch with DirectML enables training and inference of complex machine learning models on a wide range of DirectX 12-compatible hardware.

PyTorch on DirectML is supported on both the latest versions of Windows 10 and the Windows Subsystem for Linux, and is available for download as a PyPI package. For more information about getting started, see GPU accelerated ML training (docs.microsoft.com)

Feedback

We look forward to hearing from you!

External Links

Documentation

DirectML programming guide
DirectML API reference

More information

Introducing DirectML (Game Developers Conference '19)
Accelerating GPU Inferencing with DirectML and DirectX 12 (SIGGRAPH '18)
Windows AI: hardware-accelerated ML on Windows devices (Microsoft Build '20)
Gaming with Windows ML (DirectX Developer Blog)
DirectML at GDC 2019 (DirectX Developer Blog)
DirectX Linux (DirectX Developer Blog)

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.

When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Comments
  • DirectML is x2.8 slower than CUDA

    DirectML is x2.8 slower than CUDA

    I tested training the same deepfake model on the same hardware using tensorflow-cuda and tensorflow-directml. (my project https://github.com/iperov/DeepFaceLab)

    DirectML: avg iter time 626ms DMLvsCUDA1

    CUDA: avg iter time 222ms DMLvsCUDA2

    DirectML is x2.8 slower :-(

    I think that's what I was talking about here https://github.com/microsoft/DirectML/issues/104

    So what is the point of using DirectML if every millisecond of training acceleration is important in today's world?

    x2.8 slower is serious performance degradation. I reached the same speed in my weekend OpenCL NN library in pure python (https://github.com/iperov/litenn)

    But you are guys from microsoft company. Don't you think there is no point in further development of DirectML until you reach the level of CUDA performance?

    opened by iperov 36
  • Could not load dynamic library 'libcuda.so.1'

    Could not load dynamic library 'libcuda.so.1'

    Followed the instructions here

    ~ » cat /proc/version                                                                                                                                                             1 ↵ [email protected]
    Linux version 4.4.0-20150-Microsoft ([email protected]) (gcc version 5.4.0 (GCC) ) #1000-Microsoft Thu Jun 12 17:34:00 PST 2020
    

    I'm running build 20150, but am getting this error:

    Python 3.6.10 |Anaconda, Inc.| (default, May  8 2020, 02:54:21)
    [GCC 7.3.0] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import tensorflow.compat.v1 as tf
    >>>
    >>> tf.enable_eager_execution(tf.ConfigProto(log_device_placement=True))
    >>>
    >>> print(tf.add([1.0, 2.0], [3.0, 4.0]))
    2020-06-17 16:36:05.469811: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
    2020-06-17 16:36:05.469926: E tensorflow/stream_executor/cuda/cuda_driver.cc:313] failed call to cuInit: UNKNOWN ERROR (303)
    2020-06-17 16:36:05.470029: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (MAKERPC): /proc/driver/nvidia/version does not exist
    2020-06-17 16:36:05.470532: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
    2020-06-17 16:36:05.483133: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 3400000000 Hz
    2020-06-17 16:36:05.487879: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7fffe52ac420 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
    2020-06-17 16:36:05.488038: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
    tf.Tensor([4. 6.], shape=(2,), dtype=float32)
    
    opened by jflam 23
  • [installation] Could not find a version that satisfies the requirement tensorflow-directml (from versions: none)

    [installation] Could not find a version that satisfies the requirement tensorflow-directml (from versions: none)

    Hi,

    After following the steps described in https://docs.microsoft.com/en-us/windows/win32/direct3d12/gpu-tensorflow-wsl till pip install tensorflow-directml,

    the error appeared as

    ERROR: Could not find a version that satisfies the requirement tensorflow-directml (from versions: none) ERROR: No matching distribution found for tensorflow-directml

    BTW, I am using python 3.8

    and I did python list tensorflow*, which outputed

    Package Version


    certifi 2020.6.20 pip 20.1.1 setuptools 49.2.0.post20200714 wheel 0.34.2

    opened by shuwang1 19
  • How to get available devices and set a specific device in Pytorch-DML?

    How to get available devices and set a specific device in Pytorch-DML?

    Hi, For accessing available devices in Pytorch we'd normally do :

        print(f'available devices: {torch.cuda.device_count()}')
        print(f'current device: { torch.cuda.current_device()}')
    

    However, I noticed this fails (AssertionError: Torch not compiled with CUDA enabled).
    I thought the transition would be minimal, and stuff like this would work out of the box! especially so, after noting we cant write:

        print(f'available devices: {torch.dml.device_count()}')
        print(f'current device: { torch.dml.current_device()}')
    

    as it fails with the error :

    AttributeError: module 'torch.dml' has no attribute 'device_count'
    

    Apart from this, trying to specify a device using the form "dml:number" fails if number>1! that is this fails for "dml:1":

    import torch 
    import time
    def bench(device ='cpu'):
        print(f'running on {device}:')
        a = torch.randn(size=(2000,2000)).to(device=device)
        b = torch.randn(size=(2000,2000)).to(device=device)
       
        start = time.time()
        c = a+b
        end = time.time()
        
        # print(f'available devices: {torch.dml.device_count()}')
        # print(f'current device: { torch.dml.current_device()}')
        print(f'--took {end-start:.2f} seconds')
    
    bench('cpu')
    bench('dml')
    bench('dml:0')
    bench('dml:1')    
    

    it outputs :

    running on cpu:
    --took 0.00 seconds
    running on dml:
    --took 0.01 seconds
    running on dml:0:
    --took 0.00 seconds
    running on dml:1:
    

    and thats it, it doesnt execute when it comes to "dml:1".

    also trying to do :

    import torch 
    import time
    def bench(device ='cpu'):
        print(f'running on {device}:')
        a = torch.randn(size=(2000,2000)).to(device=device)
        b = torch.randn_like(a).to(device=device)
        
        start = time.time()
        c = a+b
        end = time.time()
        
        # print(f'available devices: {torch.dml.device_count()}')
        # print(f'current device: { torch.dml.current_device()}')
        print(f'--took {end-start:.2f} seconds')
    
    bench('cpu')
    bench('dml')
    bench('dml:0')
    bench('dml:1')    
    

    Fails with the following error :

    running on cpu:
    --took 0.00 seconds
    running on dml:
    Traceback (most recent call last):
      File "g:\tests.py", line 1246, in <module>
        bench('dml')
      File "g:\tests.py", line 1235, in bench
        b = torch.randn_like(a).to(device=device)
    RuntimeError: Could not run 'aten::normal_' with arguments from the 'UNKNOWN_TENSOR_TYPE_ID' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom 
    build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::normal_' is only available for these backends: [CPU, BackendSelect, Named, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradNestedTensor, UNKNOWN_TENSOR_TYPE_ID, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, Autocast, Batched, VmapMode].
    
    CPU: registered at D:\a\_work\1\s\build\aten\src\ATen\RegisterCPU.cpp:5926 [kernel]
    BackendSelect: fallthrough registered at D:\a\_work\1\s\aten\src\ATen\core\BackendSelectFallbackKernel.cpp:3 [backend fallback]
    Named: fallthrough registered at D:\a\_work\1\s\aten\src\ATen\core\NamedRegistrations.cpp:11 [kernel]
    AutogradOther: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
    AutogradCPU: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
    AutogradCUDA: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
    AutogradXLA: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
    AutogradNestedTensor: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
    UNKNOWN_TENSOR_TYPE_ID: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
    AutogradPrivateUse1: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
    AutogradPrivateUse2: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
    AutogradPrivateUse3: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
    Tracer: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\TraceType_4.cpp:10612 [kernel]
    Autocast: fallthrough registered at D:\a\_work\1\s\aten\src\ATen\autocast_mode.cpp:250 [backend fallback]
    Batched: registered at D:\a\_work\1\s\aten\src\ATen\BatchingRegistrations.cpp:1016 [backend fallback]
    VmapMode: registered at D:\a\_work\1\s\aten\src\ATen\VmapModeRegistrations.cpp:37 [kernel]
    
    
    pytorch-directml 
    opened by Coderx7 11
  • Conv2D-Fail: internal compiler error, abnormal program termination

    Conv2D-Fail: internal compiler error, abnormal program termination

    I ran across directML a few hours ago and am currently playing around with it on a Surface Pro 6 with an Intel HD Graphics 620. To set it all up, I followed this article to the letter: https://docs.microsoft.com/en-us/windows/win32/direct3d12/gpu-tensorflow-windows

    For testing purposes, I used a slightly modified version of my small go-to script:

    import tensorflow.compat.v1 as tf 
    
    tf.enable_eager_execution(tf.ConfigProto(log_device_placement=False)) 
    
    fashion_mnist = tf.keras.datasets.fashion_mnist
    (train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()
    
    
    class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
                   'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']
    
    train_images = train_images.reshape(60000, 28, 28, 1)
    train_images = train_images / 255.0
    
    test_images = test_images.reshape(10000, 28, 28, 1)
    test_images = test_images / 255.0
    
    #model = tf.keras.Sequential([
    #    tf.keras.layers.Flatten(input_shape=(28, 28, 1)),
    #    tf.keras.layers.Dense(128, activation=tf.nn.relu),
    #    tf.keras.layers.Dense(10, activation=tf.nn.softmax)
    #])
    
    model = tf.keras.Sequential([
        tf.keras.layers.Conv2D(64, (3,3), activation=tf.nn.relu, input_shape=(28, 28, 1)),
        tf.keras.layers.MaxPooling2D(2,2),
        tf.keras.layers.Flatten(),
        tf.keras.layers.Dense(128, activation=tf.nn.relu),
        tf.keras.layers.Dense(10, activation=tf.nn.softmax)
    ])
    
    model.compile(optimizer='adam',
                  loss='sparse_categorical_crossentropy',
                  metrics=['accuracy'])
    
    model.fit(train_images, train_labels, epochs=5)
    
    test_loss, test_acc = model.evaluate(test_images,  test_labels, verbose=2)
    
    print('Test accuracy:', test_acc)
    

    The version of the model without convolutions runs absolutely fine. But as soon as I add the Conv2D layer, nothing works anymore.

    The entire output I get is:

    2021-04-23 21:23:05.241248: I tensorflow/stream_executor/platform/default/dso_loader.cc:99] Successfully opened dynamic library C:\Users\cyphus309\.conda\envs\directml\lib\site-packages\tensorflow_core\python/directml.b6e3bc69b89cfca5486e178bb9d51724d0c4a94a.dll
    2021-04-23 21:23:05.298554: I tensorflow/core/common_runtime/dml/dml_device_cache.cc:249] DirectML device enumeration: found 1 compatible adapters.
    2021-04-23 21:23:05.299189: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
    2021-04-23 21:23:05.331743: I tensorflow/core/common_runtime/dml/dml_device_cache.cc:185] DirectML: creating device on adapter 0 (Intel(R) HD Graphics 620)
    2021-04-23 21:23:05.363568: I tensorflow/stream_executor/platform/default/dso_loader.cc:99] Successfully opened dynamic library Kernel32.dll
    Train on 60000 samples
    Epoch 1/5
    
    internal compiler error, abnormal program termination
    
    

    Any ideas?

    bug 
    opened by kampfhamster309 11
  • Tensorflow directml crashes my python session

    Tensorflow directml crashes my python session

    Hi,

    I've recently purchased a 6900 xt GPU which I would like to use with tensorflow. I followed the installation guide on https://docs.microsoft.com/en-us/windows/win32/direct3d12/gpu-tensorflow-windows which worked but the issue I have now is that whenever I try to use tensorflow it closes my python environment.

    I've attached an image to show what I mean. I can import tensorflow fine and it shows me that I have version 1.15.5 available. The problem is when I want to check if my GPU is available I get two messages and then it crashes me out of my python environment.

    Does anybody know how to solve this issue and what is going on?

    Thank you in advance!

    amd_tf_problem

    bug 
    opened by bwintertkb 9
  • C++ DirectML.dll causes crash in debug x64 mode when using NuGet package Microsoft.AI.MachineLearning 1.5.2

    C++ DirectML.dll causes crash in debug x64 mode when using NuGet package Microsoft.AI.MachineLearning 1.5.2

    Hello,

    I'm experiencing a runtime crash with the C++ DirectML API in Debug x64 mode after upgrading my NuGet package Microsoft.AI.MachineLearning from version 1.4.0 to 1.5.2. There is no error in Release x64 mode.

    The reason why I'm using this package is because the included DirectML.dll improves DirectML performance greatly. There seems to be an issue when creating a DirectMLOperator. The operator type is DML_OPERATOR_JOIN.

    Can you please help me identify the issue? Also how can I find the latest DirectML.dll file without downloading the package?

    DirectML dll error

    opened by momower1 9
  • Performance will be improved by setting input strides=output strides for Clip in DirectMLX

    Performance will be improved by setting input strides=output strides for Clip in DirectMLX

    I am investigating for the performance of MobileNet V2 from TFLite models with "nhwc" layout and MobileNet V2 from ONNX models with "nchw" layout on the implementation with DirectML and DirectMLX API.

    I find that nhwc MobileNetV2 model has lots of Clip after Conv2d, the Clip will cost much time on inference. I guess that the Clip will do memory copy and hasn't be optimized in compilation stage.

    I have a workaround to resolve this problem: set Clip's input strides same as its' output strides by changing this lineto TensorDesc outputTensor = inputTensor in DirectMLX.h, the Clip will be optimized just like fused into Conv2d, and then the inference time will be significantly reduced to be as same as nchw MobileNetV2.

    When building nhwc MobileNetV2 model, we need append Identity after each Conv2d to transpose output tensor from default nchw to nhwc, then transpose this output tensor from nhwc to nchw as the next Conv2d's input tensor. In my opinion, I suppose that the Identity and Reinterpret can be optimized by DML in this model like: Conv0->Identity(nchw->nhwc)->Reinterpret strides(nhwc->nchw)->Conv1 just like transpose sinking in OpenVINO backend.

    I guess that the Identity and Reinterpret sinking may be blocked when there is Clip like: Conv0->Identity(nchw->nhwc)->Clip->Reinterpret strides(nhwc->nchw)->Conv1 . I verified that if I remove Identity to run Conv0->Reinterpret strides(nchw->nhwc)->Clip(input strides = output strides)->Reinterpret strides(nhwc->nchw)->Conv1, the inference time will be much lower than before.

    So in conclusion, I suggest setting Clip's input strides same as its' output strides by changing this line to TensorDesc outputTensor = inputTensor in DirectMLX.h.

    opened by mingmingtasd 8
  • TensorFlow & DirectML & ROCm  performance and roadmap

    TensorFlow & DirectML & ROCm performance and roadmap

    The current DirectML library for GPU is more 2x slower than the TensorFlow CPU library. When DirectML team will improve the performance of the library? Could you share a roadmap of DirectML? Will DirectML team cooperate with ROCm team (https://github.com/RadeonOpenCompute/ROCm), Intel and Nvidia for improving performance?

    opened by YuriyTigiev 8
  • pytorch-directml simple command error

    pytorch-directml simple command error

    just trying simple command with pytorch-directml 1.8.0a0.dev220224 and getting error

    >>> torch.tensor([1], dtype=torch.float32, device='dml')
    
    Traceback (most recent call last):
      File "<console>", line 1, in <module>
      File "D:\DevelopPPP\projects\DeepFakeBox\_internal\python\lib\site-packages\torch\tensor.py", line 193, in __repr__
        return torch._tensor_str._str(self)
      File "D:\DevelopPPP\projects\DeepFakeBox\_internal\python\lib\site-packages\torch\_tensor_str.py", line 383, in _str
        return _str_intern(self)
      File "D:\DevelopPPP\projects\DeepFakeBox\_internal\python\lib\site-packages\torch\_tensor_str.py", line 358, in _str_intern
        tensor_str = _tensor_str(self, indent)
      File "D:\DevelopPPP\projects\DeepFakeBox\_internal\python\lib\site-packages\torch\_tensor_str.py", line 242, in _tensor_str
        formatter = _Formatter(get_summarized_data(self) if summarize else self)
      File "D:\DevelopPPP\projects\DeepFakeBox\_internal\python\lib\site-packages\torch\_tensor_str.py", line 90, in __init__
        nonzero_finite_vals = torch.masked_select(tensor_view, torch.isfinite(tensor_view) & tensor_view.ne(0))
    RuntimeError: Could not run 'aten::masked_select' with arguments from the 'UNKNOWN_TENSOR_TYPE_ID' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::masked_select' is only available for these backends: [CPU, BackendSelect, Named, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradNestedTensor, UNKNOWN_TENSOR_TYPE_ID, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, Autocast, Batched, VmapMode].
    
    CPU: registered at D:\a\_work\1\s\pytorch-directml\build\aten\src\ATen\RegisterCPU.cpp:5926 [kernel]
    BackendSelect: fallthrough registered at D:\a\_work\1\s\pytorch-directml\aten\src\ATen\core\BackendSelectFallbackKernel.cpp:3 [backend fallback]
    Named: fallthrough registered at D:\a\_work\1\s\pytorch-directml\aten\src\ATen\core\NamedRegistrations.cpp:11 [kernel]
    AutogradOther: registered at D:\a\_work\1\s\pytorch-directml\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
    AutogradCPU: registered at D:\a\_work\1\s\pytorch-directml\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
    AutogradCUDA: registered at D:\a\_work\1\s\pytorch-directml\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
    AutogradXLA: registered at D:\a\_work\1\s\pytorch-directml\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
    AutogradNestedTensor: registered at D:\a\_work\1\s\pytorch-directml\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
    UNKNOWN_TENSOR_TYPE_ID: registered at D:\a\_work\1\s\pytorch-directml\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
    AutogradPrivateUse1: registered at D:\a\_work\1\s\pytorch-directml\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
    AutogradPrivateUse2: registered at D:\a\_work\1\s\pytorch-directml\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
    AutogradPrivateUse3: registered at D:\a\_work\1\s\pytorch-directml\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
    Tracer: registered at D:\a\_work\1\s\pytorch-directml\torch\csrc\autograd\generated\TraceType_4.cpp:10612 [kernel]
    Autocast: fallthrough registered at D:\a\_work\1\s\pytorch-directml\aten\src\ATen\autocast_mode.cpp:250 [backend fallback]
    Batched: registered at D:\a\_work\1\s\pytorch-directml\aten\src\ATen\BatchingRegistrations.cpp:1016 [backend fallback]
    VmapMode: fallthrough registered at D:\a\_work\1\s\pytorch-directml\aten\src\ATen\VmapModeRegistrations.cpp:33 [backend fallback]
    

    cpu is fine

    >>> torch.tensor([1], dtype=torch.float32, device='cpu')
    tensor([1.])
    
    pytorch-directml 
    opened by iperov 7
  • Is there any low power mode for DirectML

    Is there any low power mode for DirectML

    hi, now I have a quick enough model (120fps) and will run at 20fps, what i need is use as low as possible gpu power. but i find the gpu frequency jump to 1150mhz too many times. as compare to "https://voovmeeting.com/download-center.html?from=1001" tencent meeting , I found when I enable human segmentation , in a 8xxx laptop, the gpu frequency hold below 400mhz , but GPU load over 75%, that is strange for frequency policy.
    so I guess , maybe directx12 or dx11 has some low power mode ? or some other ways, for ex. add some wait in each OP (for ex. convolution op)

    opened by liyuming1978 7
  • pytorch-directml produce

    pytorch-directml produce "[W dml_heap_allocator.cc:97] DML allocator out of memory!"

    I was trying to run the simple code below:

    import torch import torch_directml dml = torch_directml.device()

    print(f"dml={dml}")

    tensor1 = torch.tensor([1]) print(tensor1) tensor1=tensor1.to(dml)

    when runing tensor1.to(dml), i got the following error: [W dml_heap_allocator.cc:97] DML allocator out of memory! Traceback (most recent call last): File "/home/fnz/workspace/direct-ml/main.py", line 9, in tensor1=tensor1.to(dml) RuntimeError: Unknown error -2147024882

    It seems that my pytorch-directml doesn't work at all.

    below is my package in conda: (direct_ml) [email protected]:~/workspace/direct-ml$ conda list | grep torch torch 1.13.1 pypi_0 pypi torch-directml 0.1.13.dev221216 pypi_0 pypi

    BTW, my environment is wsl2 on top of windows 11 pro .

    The tensorflow directml seems working well.

    any idea ?

    thanks

    Feng

    opened by virtual-feng 1
  • torch-directml : torch.div with trunc rounding on int64 fails with RuntimeError

    torch-directml : torch.div with trunc rounding on int64 fails with RuntimeError

    Hi, Because 'aten::fmod.Tensor_out' is not implemented, I tried to implement it myself. I encountered a new error when using the rounding mode trunc with a int64 tensor.

    Code:

    import torch
    import torch_directml
    dml = torch_directml.device()
    
    a = torch.tensor([1,2,3]).to(dml) #
    b = 2
    a = a - torch.div(a, b, rounding_mode="trunc") * b
    
    opened by Theucalyptus 0
  • Very low validation and testing accuracy on CNN

    Very low validation and testing accuracy on CNN

    Hello everyone. I am facing an issue. I am explaining what I am trying to do. I have a Traffic and Road sign dataset that contains 43 classes. I am trying to classify the images. I am using the resnet34 pre-trained model. I have AMD RX6600 GPU that I use for running the model. For running the model on my AMD GPU I am using Pytorch Directml. Until now everything has worked fine. Training speed is fast enough, and GPU utilization is near 100%. Training loss decreases per epoch. But when I check the model using validation data after one training phase, validation loss increases and validation accuracy is too low. But training is ok. When I run the same code on my friend’s PC who has NVIDIA GPU, all is ok. Validation loss decreases and it converges. And I got an accuracy of 98% when running the same code on NVIDIA GPU. I can not figure out what the problem is. I also tune the hyperparameter but had no luck. And one strange thing is that this problem arises when I use CNN based model. I had run NLP pre-trained model BERT on my AMD GPU and there is no Issue. Validation loss decreases and it converges. Can anyone help me with this issue? I am giving the code below. Thanks in advance. Screenshot 2023-01-03 221733

    opened by AtiqurRahmanAni 0
  • Spacy seems outdated + problems running attention...

    Spacy seems outdated + problems running attention...

    Disclaimer: NOT a coder. Generally curious individual with just enough copy-paste and google skills. I may not know what I'm talking about.

    Just playing around with the repo. The install failed because of spacy version in requirements.txt for me. Using python 3.10 on Ubuntu 22.10. Changing Spacy to 3.4.4 (which I had cached, so I just did pip install spacy - to see whichever worked)

    It installed, but gave further warnings like ⚠ As of spaCy v3.0, shortcuts like 'en' are deprecated. Please use the full pipeline package name 'en_core_web_sm' instead. Collecting en-core-web-sm==3.4.1... and

    ⚠ As of spaCy v3.0, shortcuts like 'de' are deprecated. Please use the full pipeline package name 'de_core_news_sm' instead. Collecting de-core-news-sm==3.4.0

    opened by Vidyut 0
  • Operator 'aten::amax.out' is not currently supported on the DML backend.

    Operator 'aten::amax.out' is not currently supported on the DML backend.

    C:\ProgramData\Anaconda3\envs\torchdml\lib\site-packages\torch\optim\adamax.py:231: UserWarning: The operator 'aten::amax.out' is not currently supported on the DML backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at D:\a_work\1\s\pytorch-directml-plugin\torch_directml\csrc\dml\dml_cpu_fallback.cpp:16.) torch.amax(norm_buf, 0, keepdim=False, out=exp_inf)

    opened by rmskmr05 0
Releases(tensorflow-directml-1.15.3.dev200626)
Owner
Microsoft
Open source projects and samples from Microsoft
Microsoft
Machine Learning Algorithms ( Desion Tree, XG Boost, Random Forest )

implementation of machine learning Algorithms such as decision tree and random forest and xgboost on darasets then compare results for each and implement ant colony and genetic algorithms on tsp map,

Mohamadreza Rezaei 1 Jan 19, 2022
Python library for multilinear algebra and tensor factorizations

scikit-tensor is a Python module for multilinear algebra and tensor factorizations

Maximilian Nickel 394 Dec 09, 2022
PyPOTS - A Python Toolbox for Data Mining on Partially-Observed Time Series

A python toolbox/library for data mining on partially-observed time series, supporting tasks of forecasting/imputation/classification/clustering on incomplete multivariate time series with missing va

Wenjie Du 179 Dec 31, 2022
Python Automated Machine Learning library for tabular data.

Simple but powerful Automated Machine Learning library for tabular data. It uses efficient in-memory SAP HANA algorithms to automate routine Data Scie

Daniel Khromov 47 Dec 17, 2022
Evaluate on three different ML model for feature selection using Breast cancer data.

Anomaly-detection-Feature-Selection Evaluate on three different ML model for feature selection using Breast cancer data. ML models: SVM, KNN and MLP.

Tarek idrees 1 Mar 17, 2022
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Website | Documentation | Tutorials | Installation | Release Notes CatBoost is a machine learning method based on gradient boosting over decision tree

CatBoost 6.9k Jan 05, 2023
Machine Learning Course with Python:

A Machine Learning Course with Python Table of Contents Download Free Deep Learning Resource Guide Slack Group Introduction Motivation Machine Learnin

Instill AI 6.9k Jan 03, 2023
Predicting India’s COVID-19 Third Wave with LSTM

Predicting India’s COVID-19 Third Wave with LSTM Complete project of predicting new COVID-19 cases in the next 90 days with LSTM India is seeing a ste

Samrat Dutta 4 Jan 27, 2022
Using Logistic Regression and classifiers of the dataset to produce an accurate recall, f-1 and precision score

Using Logistic Regression and classifiers of the dataset to produce an accurate recall, f-1 and precision score

Thines Kumar 1 Jan 31, 2022
This is the code repository for Interpretable Machine Learning with Python, published by Packt.

Interpretable Machine Learning with Python, published by Packt

Packt 299 Jan 02, 2023
easyNeuron is a simple way to create powerful machine learning models, analyze data and research cutting-edge AI.

easyNeuron is a simple way to create powerful machine learning models, analyze data and research cutting-edge AI.

Neuron AI 5 Jun 18, 2022
ClearML - Auto-Magical Suite of tools to streamline your ML workflow. Experiment Manager, MLOps and Data-Management

ClearML - Auto-Magical Suite of tools to streamline your ML workflow Experiment Manager, MLOps and Data-Management ClearML Formerly known as Allegro T

ClearML 4k Jan 09, 2023
YouTube Spam Detection with python

YouTube Spam Detection This code deletes spam comment on youtube videos based on two characteristics (currently) If the author of the comment has a se

MohamadReza Taalebi 5 Sep 27, 2022
Python package for causal inference using Bayesian structural time-series models.

Python Causal Impact Causal inference using Bayesian structural time-series models. This package aims at defining a python equivalent of the R CausalI

Thomas Cassou 219 Dec 11, 2022
A collection of Machine Learning Models To Web Api which are built on open source technologies/frameworks like Django, Flask.

Author Ibrahim Koné From-Machine-Learning-Models-To-WebAPI A collection of Machine Learning Models To Web Api which are built on open source technolog

Ibrahim Koné 2 May 24, 2022
Time series forecasting with PyTorch

Our article on Towards Data Science introduces the package and provides background information. Pytorch Forecasting aims to ease state-of-the-art time

Jan Beitner 2.5k Jan 02, 2023
A flexible CTF contest platform for coming PKU GeekGame events

Project Guiding Star: the Backend A flexible CTF contest platform for coming PKU GeekGame events Still in early development Highlights Not configurabl

PKU GeekGame 14 Dec 15, 2022
Provide an input CSV and a target field to predict, generate a model + code to run it.

automl-gs Give an input CSV file and a target field you want to predict to automl-gs, and get a trained high-performing machine learning or deep learn

Max Woolf 1.8k Jan 04, 2023
This is a Cricket Score Predictor that predicts the first innings score of a T20 Cricket match using Machine Learning

This is a Cricket Score Predictor that predicts the first innings score of a T20 Cricket match using Machine Learning. It is a Web Application.

Developer Junaid 3 Aug 04, 2022
Feature-engine is a Python library with multiple transformers to engineer and select features for use in machine learning models.

Feature-engine is a Python library with multiple transformers to engineer and select features for use in machine learning models. Feature-engine's transformers follow scikit-learn's functionality wit

Soledad Galli 33 Dec 27, 2022