cuSignal - RAPIDS Signal Processing Library

Related tags

GPU Utilitiescusignal
Overview

 cuSignal

Build Status

The RAPIDS cuSignal project leverages CuPy, Numba, and the RAPIDS ecosystem for GPU accelerated signal processing. In some cases, cuSignal is a direct port of Scipy Signal to leverage GPU compute resources via CuPy but also contains Numba CUDA and Raw CuPy CUDA kernels for additional speedups for selected functions. cuSignal achieves its best gains on large signals and compute intensive functions but stresses online processing with zero-copy memory (pinned, mapped) between CPU and GPU.

NOTE: For the latest stable README.md ensure you are on the latest branch.

Table of Contents

Quick Start

cuSignal has an API that mimics SciPy Signal. In depth functionality is displayed in the notebooks section of the repo, but let's examine the workflow for Polyphase Resampling under multiple scenarios:

Scipy Signal (CPU)

import numpy as np
from scipy import signal

start = 0
stop = 10
num_samps = int(1e8)
resample_up = 2
resample_down = 3

cx = np.linspace(start, stop, num_samps, endpoint=False) 
cy = np.cos(-cx**2/6.0)

%%timeit
cf = signal.resample_poly(cy, resample_up, resample_down, window=('kaiser', 0.5))

This code executes on 2x Xeon E5-2600 in 2.36 sec.

cuSignal with Data Generated on the GPU with CuPy

import cupy as cp
import cusignal

start = 0
stop = 10
num_samps = int(1e8)
resample_up = 2
resample_down = 3

gx = cp.linspace(start, stop, num_samps, endpoint=False) 
gy = cp.cos(-gx**2/6.0)

%%timeit
gf = cusignal.resample_poly(gy, resample_up, resample_down, window=('kaiser', 0.5))

This code executes on an NVIDIA V100 in 13.8 ms, a 170x increase over SciPy Signal

cuSignal with Data Generated on the CPU with Mapped, Pinned (zero-copy) Memory

import cupy as cp
import numpy as np
import cusignal

start = 0
stop = 10
num_samps = int(1e8)
resample_up = 2
resample_down = 3

# Generate Data on CPU
cx = np.linspace(start, stop, num_samps, endpoint=False) 
cy = np.cos(-cx**2/6.0)

# Create shared memory between CPU and GPU and load with CPU signal (cy)
gpu_signal = cusignal.get_shared_mem(num_samps, dtype=np.float64)

%%time
# Move data to GPU/CPU shared buffer and run polyphase resampler
gpu_signal[:] = cy
gf = cusignal.resample_poly(gpu_signal, resample_up, resample_down, window=('kaiser', 0.5))

This code executes on an NVIDIA V100 in 174 ms.

cuSignal with Data Generated on the CPU and Copied to GPU [AVOID THIS FOR ONLINE SIGNAL PROCESSING]

import cupy as cp
import numpy as np
import cusignal

start = 0
stop = 10
num_samps = int(1e8)
resample_up = 2
resample_down = 3

# Generate Data on CPU
cx = np.linspace(start, stop, num_samps, endpoint=False) 
cy = np.cos(-cx**2/6.0)

%%time
gf = cusignal.resample_poly(cp.asarray(cy), resample_up, resample_down, window=('kaiser', 0.5))

This code executes on an NVIDIA V100 in 637 ms.

Documentation

The complete cuSignal API documentation including a complete list of functionality and examples can be found for both the Stable and Nightly (Experimental) releases.

cuSignal 0.17 API | cuSignal 0.18 Nightly

Installation

cuSignal has been tested on and supports all modern GPUs - from Maxwell to Ampere. While Anaconda is the preferred installation mechanism for cuSignal, developers and Jetson users should follow the source build instructions below. As of cuSignal 0.16, there isn't a cuSignal conda package for aarch64.

Conda, Linux OS

cuSignal can be installed with conda (Miniconda, or the full Anaconda distribution) from the rapidsai channel. If you're using a Jetson GPU, please follow the build instructions below

For cusignal version == 0.17:

For CUDA 10.1.2
conda install -c rapidsai -c nvidia -c numba -c conda-forge \
    cusignal=0.17 python=3.8 cudatoolkit=10.1

# or, for CUDA 10.2
conda install -c rapidsai -c nvidia -c numba -c conda-forge \
    cusignal=0.17 python=3.8 cudatoolkit=10.2

# or, for CUDA 11.0
conda install -c rapidsai -c nvidia -c numba -c conda-forge \
    cusignal=0.17 python=3.8 cudatoolkit=11.0

For the nightly verison of cusignal, currently 0.18a:

# For CUDA 10.1.2
conda install -c rapidsai-nightly -c nvidia -c numba -c conda-forge \
    cusignal python=3.8 cudatoolkit=10.1.2

# or, for CUDA 10.2
conda install -c rapidsai-nightly -c nvidia -c numba -c conda-forge \
    cusignal python=3.8 cudatoolkit=10.2

# or, for CUDA 11.0
conda install -c rapidsai-nightly -c nvidia -c numba -c conda-forge \
    cusignal python=3.8 cudatoolkit=11.0

cuSignal has been tested and confirmed to work with Python 3.6, 3.7, and 3.8.

See the Get RAPIDS version picker for more OS and version info.

Source, aarch64 (Jetson Nano, TK1, TX2, Xavier), Linux OS

In cuSignal 0.15 and beyond, we are moving our supported aarch64 Anaconda environment from conda4aarch64 to miniforge. Further, it's assumed that your Jetson device is running a current (>= 4.3) edition of JetPack and contains the CUDA Toolkit.

  1. Clone the repository

    # Set the location to cuSignal in an environment variable CUSIGNAL_HOME
    export CUSIGNAL_HOME=$(pwd)/cusignal
    
    # Download the cuSignal repo
    git clone https://github.com/rapidsai/cusignal.git $CUSIGNAL_HOME
  2. Install miniforge and create the cuSignal conda environment:

    cd $CUSIGNAL_HOME
    conda env create -f conda/environments/cusignal_jetson_base.yml

    Note: Compilation and installation of CuPy can be quite lengthy (~30+ mins), particularly on the Jetson Nano. Please consider setting this environment variable to decrease the CuPy dependency install time:

    export CUPY_NVCC_GENERATE_CODE="arch=compute_XX,code=sm_XX" with XX being your GPU's compute capability. If you'd like to compile to multiple architectures (e.g Nano and Xavier), concatenate the arch=... string with semicolins.

  3. Activate conda environment

    conda activate cusignal-dev

  4. Install cuSignal module

    cd $CUSIGNAL_HOME
    ./build.sh  # install cuSignal to $PREFIX if set, otherwise $CONDA_PREFIX
                # run ./build.sh -h to print the supported command line options.
  5. Once installed, periodically update environment

    cd $CUSIGNAL_HOME
    conda env update -f conda/environments/cusignal_jetson_base.yml
  6. Also, confirm unit testing via PyTest

    cd $CUSIGNAL_HOME/python
    pytest -v  # for verbose mode
    pytest -v -k <function name>  # for more select testing

Source, Linux OS

  1. Clone the repository

    # Set the location to cuSignal in an environment variable CUSIGNAL_HOME
    export CUSIGNAL_HOME=$(pwd)/cusignal
    
    # Download the cuSignal repo
    git clone https://github.com/rapidsai/cusignal.git $CUSIGNAL_HOME
  2. Download and install Anaconda or Miniconda then create the cuSignal conda environment:

    Base environment (core dependencies for cuSignal)

    cd $CUSIGNAL_HOME
    conda env create -f conda/environments/cusignal_base.yml

    Full environment (including RAPIDS's cuDF, cuML, cuGraph, and PyTorch)

    cd $CUSIGNAL_HOME
    conda env create -f conda/environments/cusignal_full.yml
  3. Activate conda environment

    conda activate cusignal-dev

  4. Install cuSignal module

    cd $CUSIGNAL_HOME
    ./build.sh  # install cuSignal to $PREFIX if set, otherwise $CONDA_PREFIX
                # run ./build.sh -h to print the supported command line options.
  5. Once installed, periodically update environment

    cd $CUSIGNAL_HOME
    conda env update -f conda/environments/cusignal_base.yml
  6. Also, confirm unit testing via PyTest

    cd $CUSIGNAL_HOME/python
    pytest -v  # for verbose mode
    pytest -v -k <function name>  # for more select testing

Source, Windows OS

We have confirmed that cuSignal successfully builds and runs on Windows by using CUDA on WSL. Please follow the instructions in the link to install WSL 2 and the associated CUDA drivers. You can then proceed to follow the cuSignal source build instructions, below.

  1. Download and install Andaconda for Windows. In an Anaconda Prompt, navigate to your checkout of cuSignal.

  2. Create cuSignal conda environment

    conda create --name cusignal-dev

  3. Activate conda environment

    conda activate cusignal-dev

  4. Install cuSignal Core Dependencies

    conda install numpy numba scipy cudatoolkit pip
    pip install cupy-cudaXXX
    

    Where XXX is the version of the CUDA toolkit you have installed. 10.1, for example is cupy-cuda101. See the CuPy Documentation for information on getting Windows wheels for other versions of CUDA.

  5. Install cuSignal module

    ./build.sh
    
  6. [Optional] Run tests In the cuSignal top level directory:

    pip install pytest
    pytest
    

Docker - All RAPIDS Libraries, including cuSignal

For cusignal version == 0.16:

# For CUDA 11.0
docker pull rapidsai/rapidsai:cuda11.0-runtime-ubuntu18.04
docker run --gpus all --rm -it -p 8888:8888 -p 8787:8787 -p 8786:8786 \
    rapidsai/rapidsai:cuda11.0-runtime-ubuntu18.04

For the nightly version of cusignal

docker pull rapidsai/rapidsai-nightly:cuda11.0-runtime-ubuntu18.04
docker run --gpus all --rm -it -p 8888:8888 -p 8787:8787 -p 8786:8786 \
    rapidsai/rapidsai-nightly:cuda11.0-runtime-ubuntu18.04

Please see the RAPIDS Release Selector for more information on supported Python, Linux, and CUDA versions.

SDR Integration

SoapySDR is a "vendor neutral and platform independent" library for software-defined radio usage. When used in conjunction with device (SDR) specific modules, SoapySDR allows for easy command-and-control of radios from Python or C++. To install SoapySDR into an existing cuSignal Conda environment, run:

conda install -c conda-forge soapysdr

A full list of subsequent modules, specific to your SDR are listed here, but some common ones:

  • rtlsdr: conda install -c conda-forge soapysdr-module-rtlsdr
  • Pluto SDR: conda install -c conda-forge soapysdr-module-plutosdr
  • UHD: conda install -c conda-forge soapysdr-module-uhd

Another popular SDR library, specific to the rtl-sdr, is pyrtlsdr.

For examples using SoapySDR, pyrtlsdr, and cuSignal, please see the notebooks/sdr directory.

Please note, for most rtlsdr devices, you'll need to blacklist the libdvb driver in Linux. To do this, run sudo vi /etc/modprobe.d/blacklist.conf and add blacklist dvb_usb_rtl28xxu to the end of the file. Restart your computer upon completion.

If you have a SDR that isn't listed above (like the LimeSDR), don't worry! You can symbolically link the system-wide Python bindings installed via apt-get to the local conda environment. Please file an issue if you run into any problems.

Benchmarking

cuSignal uses pytest-benchmark to compare performance between CPU and GPU signal processing implementations. To run cuSignal's benchmark suite, navigate to the topmost python directory ($CUSIGNAL_HOME/python) and run:

pytest --benchmark-enable --benchmark-gpu-disable

Benchmarks are disabled by default in setup.cfg providing only test correctness checks.

As with the standard pytest tool, the user can use the -v and -k flags for verbose mode and to select a specific benchmark to run. When intrepreting the output, we recommend comparing the mean execution time reported.

To reduce columns in benchmark result's table, add --benchmark-columns=LABELS, like --benchmark-columns=min,max,mean. For more information on pytest-benchmark please visit the Usage Guide.

Parameter --benchmark-gpu-disable is to disable memory checks from Rapids GPU benchmark tool. Doing so speeds up benchmarking.

If you wish to skip benchmarks of SciPy functions add -m "not cpu"

Lastly, benchmarks will be executed on local files. Therefore to test recent changes made to source, rebuild cuSignal.

Example

pytest -k upfirdn2d -m "not cpu" --benchmark-enable --benchmark-gpu-disable --benchmark-columns=mean

Output

cusignal/test/test_filtering.py ..................                                                                                                                                                                                                                                   [100%]


---------- benchmark 'UpFirDn2d': 18 tests -----------
Name (time in us, mem in bytes)         Mean          
------------------------------------------------------
test_upfirdn2d_gpu[-1-1-3-256]      195.2299 (1.0)    
test_upfirdn2d_gpu[-1-9-3-256]      196.1766 (1.00)   
test_upfirdn2d_gpu[-1-1-7-256]      196.2881 (1.01)   
test_upfirdn2d_gpu[0-2-3-256]       196.9984 (1.01)   
test_upfirdn2d_gpu[0-9-3-256]       197.5675 (1.01)   
test_upfirdn2d_gpu[0-1-7-256]       197.9015 (1.01)   
test_upfirdn2d_gpu[-1-9-7-256]      198.0923 (1.01)   
test_upfirdn2d_gpu[-1-2-7-256]      198.3325 (1.02)   
test_upfirdn2d_gpu[0-2-7-256]       198.4676 (1.02)   
test_upfirdn2d_gpu[0-9-7-256]       198.6437 (1.02)   
test_upfirdn2d_gpu[0-1-3-256]       198.7477 (1.02)   
test_upfirdn2d_gpu[-1-2-3-256]      200.1589 (1.03)   
test_upfirdn2d_gpu[-1-2-2-256]      213.0316 (1.09)   
test_upfirdn2d_gpu[0-1-2-256]       213.0944 (1.09)   
test_upfirdn2d_gpu[-1-9-2-256]      214.6168 (1.10)   
test_upfirdn2d_gpu[0-2-2-256]       214.6975 (1.10)   
test_upfirdn2d_gpu[-1-1-2-256]      216.4033 (1.11)   
test_upfirdn2d_gpu[0-9-2-256]       217.1675 (1.11)   
------------------------------------------------------

Contributing Guide

Review the CONTRIBUTING.md file for information on how to contribute code and issues to the project.

cuSignal Blogs and Talks

Comments
  • [BUG] cuSignal throws `RawModule` Error with CuPy 6.0 on Windows OS

    [BUG] cuSignal throws `RawModule` Error with CuPy 6.0 on Windows OS

    Hello, I have Nvidia Tesla K40 in my HP Z440 PC as shown below==> import numba.cuda

    numba.cuda.detect() Found 2 CUDA devices id 0 b'Tesla K40c' [SUPPORTED] compute capability: 3.5 pci device id: 0 pci bus id: 3 id 1 b'Quadro K620' [SUPPORTED] compute capability: 5.0 pci device id: 0 pci bus id: 2 Summary: 2/2 devices are supported Out[8]: True

    Have Nvidia Conda toolkit 11 as shown below==> (base) PS C:\Windows\system32> nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2020 NVIDIA Corporation Built on Mon_Oct_12_20:54:10_Pacific_Daylight_Time_2020 Cuda compilation tools, release 11.1, V11.1.105 Build cuda_11.1.relgpu_drvr455TC455_06.29190527_0 (base) PS C:\Windows\system32>

    Below is output of pip list command (Only cusignal relevant parts are posted)

    clyent 1.2.2
    colorama 0.4.3
    comtypes 1.1.7
    conda 4.9.2
    conda-build 3.18.11
    conda-package-handling 1.6.0
    conda-verify 3.4.2
    contextlib2 0.6.0.post1
    cryptography 2.8
    cupy 6.0.0
    cupy-cuda111 8.1.0

    Trying to run the example code in your site

    Created on Thu Nov 12 13:30:18 2020

    @author: ljoseph """

    import cupy as cp import cusignal

    start = 0 stop = 10 num_samps = int(1e8) resample_up = 2 resample_down = 3

    gx = cp.linspace(start, stop, num_samps, endpoint=False) gy = cp.cos(-gx**2/6.0)

    %%timeit

    gf = cusignal.resample_poly(gy, resample_up, resample_down, window=('kaiser', 0.5))

    Getting below error message==> runfile('C:/dpd/Python/Try/cusignal_1.py', wdir='C:/dpd/Python/Try') Traceback (most recent call last):

    File "C:\dpd\Python\Try\cusignal_1.py", line 21, in gf = cusignal.resample_poly(gy, resample_up, resample_down, window=('kaiser', 0.5))

    File "C:\ProgramData\Anaconda3\lib\site-packages\cusignal-0.17.0a0+29.ga3e5293-py3.7.egg\cusignal\filtering\resample.py", line 422, in resample_poly y = upfirdn(h, x, up, down, axis)

    File "C:\ProgramData\Anaconda3\lib\site-packages\cusignal-0.17.0a0+29.ga3e5293-py3.7.egg\cusignal\filtering\resample.py", line 521, in upfirdn return ufd.apply_filter(x, axis)

    File "C:\ProgramData\Anaconda3\lib\site-packages\cusignal-0.17.0a0+29.ga3e5293-py3.7.egg\cusignal\filtering_upfirdn_cuda.py", line 214, in apply_filter _populate_kernel_cache(out.dtype, k_type)

    File "C:\ProgramData\Anaconda3\lib\site-packages\cusignal-0.17.0a0+29.ga3e5293-py3.7.egg\cusignal\filtering_upfirdn_cuda.py", line 139, in populate_kernel_cache "cupy" + k_type + "" + str(np_type),

    File "C:\ProgramData\Anaconda3\lib\site-packages\cusignal-0.17.0a0+29.ga3e5293-py3.7.egg\cusignal\utils\helper_tools.py", line 54, in _get_function module = cp.RawModule(

    AttributeError: module 'cupy' has no attribute 'RawModule'

    Could you please help me on this.

    inactive-30d 
    opened by leyojoseph 33
  • [WIP] [ENH] Enhance FIR filter methods.

    [WIP] [ENH] Enhance FIR filter methods.

    This pull-request implements the following:

    • Initial conditions for the firfilter method.
    • Zero-phase shift method called firfilter2 (based on scipy.signal.filtfilt)
    • FIR filter design tool firwin2 (based on scipy.signal.firwin2)
    • Initial conditions constructor firfilter_zi.
    • Bindings for lfilter and lfilter_zi with NotImplementedError for IIR coefficients.
    • Ignore *.fatbin files.

    Dependencies:

    This pull-request depends on the cupy.apply_along_axis method that is scheduled to be released in version 9.0.0. Users can test this PR by installing cupy==9.0.0a1 from PyPi.

    Todo List:

    • [x] Implement tests for the new methods.
    • [ ] Wait for cupy==9.0.0 release.
    2 - In Progress conda improvement non-breaking Python 
    opened by luigifcruz 32
  • [DOC] Install information for Jetson Nano

    [DOC] Install information for Jetson Nano

    I think it might be possible to get this code running on the Jetson Nano, but I'm not able to at the moment.

    The conda version for the Nano that I'm trying is Archiconda, https://github.com/Archiconda .

    The Nano has Maxwell cores, which should be enough for this. But I don't think all the necessary dependencies have been put together in Archiconda, and I'm not exactly sure where to start. I get a series of PackagesNotFoundError messages saying that I can't install from current channels.

    Before I go too much further, happy to provide more details as requested, or to understand whether this is intended to work or known not to work for some more fundamental reason!

    good first issue 
    opened by vielmetti 25
  • [FEA] Implementation of lfilter.

    [FEA] Implementation of lfilter.

    The lfilter function would be really useful for demodulation. For example, FM demodulation (de-emphasis and stereo).

    Scipy Signal: https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.lfilter.html#scipy.signal.lfilter

    2 - In Progress feature request 
    opened by luigifcruz 24
  • [REVIEW] Added streams and double/complex64/complex128 functionality

    [REVIEW] Added streams and double/complex64/complex128 functionality

    ~~I would like to determine why my Numba kernels are running so much slower than my CuPy kernels.~~

    ~~Using floats (Numba)~~ ~~# @cuda.jit(fastmath=True) # 38 registers - 157-190us~~ ~~# @cuda.jit(void(float32[:], float32[:], int32, int32, int32, int32, int32, float32[:],), fastmath=True) # 48 registers - 160-190us~~ ~~@cuda.jit(void(float32[:], float32[:], int64, int64, int64, int64, int64, float32[:],), fastmath=True) # 38 registers - 157-190us~~ ~~def _numba_upfirdn_1d_float(x, h_trans_flip, up, down, x_shape_a, h_per_phase, padded_len, out):~~

    ~~Using doubles (Numba)~~ ~~# @cuda.jit(fastmath=True) # 38 registers - 157-190us~~ ~~# @cuda.jit(void(float64[:], float64[:], int32, int32, int32, int32, int32, float64[:],), fastmath=True) # 48 registers - 160-190us~~ ~~@cuda.jit(void(float64[:], float64[:], int64, int64, int64, int64, int64, float64[:],), fastmath=True) # 39 registers - 157-190us~~ ~~def _numba_upfirdn_1d_double(x, h_trans_flip, up, down, x_shape_a, h_per_phase, padded_len, out):~~

    ~~Using floats (CuPy Raw Kernel)~~ ~~# 21 registers - ~10us~~

    ~~Using doubles (CuPy Raw Kernel)~~ ~~# 30 registers - ~42us~~

    I've moved Systems profiling to a second GPU (no X-server) and I'm getting comparable results.

    Time(%) Total Time Instances Average Minimum Maximum Name
    ------- ---------- ------- --------- --------- ------- ------------------ 26.0 8041093 100 80410.9 78527 81535 _numba_upfirdn_1d_complex128$244
    23.6 7277675 100 72776.7 71648 74112 _cupy_upfirdn_1d_complex_double
    10.2 3142669 100 31426.7 31232 32128 _numba_upfirdn_1d_double$242
    9.4 2910471 100 29104.7 28960 30207 _numba_upfirdn_1d_complex64$243
    7.7 2379027 100 23790.3 23679 24704 _numba_upfirdn_1d_float$241
    6.7 2077875 100 20778.8 20448 20928 _cupy_upfirdn_1d_double
    4.5 1385173 100 13851.7 13728 13952 _cupy_upfirdn_1d_complex_float
    4.0 1244918 100 12449.2 12352 12512 _cupy_upfirdn_1d_float

    There is a significant difference between registers usage with CuPy but not Numba.

    These times are from Nsight Systems, using the attached python script. test.py.txt

    Also, is it possible to overload Numba kernels? Supposedly it is with https://github.com/numba/numba/issues/431. But I'm unable to make it work.~~

    I've added qdrep and sqlite files. It looks like CuPy compiles are getting cached in the same manner as Numba. cusignal.zip

    As I continue to optimize, I'm noticing that even though the CuPy kernels may be 2x faster than Numba, the required launch time of the CuPy kernel take 2x longer than Numba. Making Numba the favorable option... Screenshot below. It's quite possible I'm doing something sub-optimal numba_vs_cupy

    opened by mnicely 23
  • [QST] Is lazy evaluation used ?

    [QST] Is lazy evaluation used ?

    I'm trying to do a simple benchmarking of cuSignal vs signal on correlation. It seems that the correlate2d completes immediately, and the buffer transfer takes 2.8 seconds. Makes no sense for buffer size of 80 MBytes. Could it be the correlation is evaluated lazily only when buffer transfer is requested ?

    %time signal_t_gpu = cp.asarray(signal_t)
    %time pulse_t_2d_gpu = cp.asarray(pulse_t_2d)
    
    %time corr_cusig = cusignal.correlate2d(signal_t_gpu, pulse_t_2d_gpu, mode='valid')
    
    %time corr_cusig_np = corr_cusig.get()
    %time corr_cusig_np2 = cp.asnumpy(corr_cusig)
    %time corr_cusig_np3 = cp.asnumpy(corr_cusig)
    

    and getting :

    Wall time: 12 ms
    Wall time: 996 µs
    Wall time: 0 ns
    Wall time: 2.79 s
    Wall time: 30 ms
    Wall time: 30 ms
    
    question 
    opened by flytrex-vadim 22
  • [PR-REVIEW] GPU Accelerated SigMF Reader/Writer

    [PR-REVIEW] GPU Accelerated SigMF Reader/Writer

    Info

    This PR is to solve #117.

    Initial commit is skeleton code that does the following

    1. Uses mmap to load a binary to RAM
    2. Copy to DRAM
    3. Parse binary to new location

    TODO:

    1. Add helper function to encapsulate read_bin and parse_bin based on format type.
    3 - Ready for Review 
    opened by mnicely 20
  • [BUG] channelize_poly() CUDA_ERROR_FILE_NOT_FOUND after Jetson conda install (0.16)

    [BUG] channelize_poly() CUDA_ERROR_FILE_NOT_FOUND after Jetson conda install (0.16)

    Describe the bug channelize_poly() function is not finding cusignal/filtering/_channelizer.fatbin on Jetson Nano.

    Steps/Code to reproduce bug

    import cusignal
    cusignal.filtering.channelize_poly(cp.random.randn(128), cp.ones((16,))/16, 4)
    

    Expected behavior channelize_poly returns channelized output.

    Environment details (please complete the following information):

    • Environment location: [Jetson Nano SBC]
    • Method of cuSignal install: [conda] Following the Jetson install instructions to install a clone of branch-0.16:
    git clone https://github.com/rapidsai/cusignal.git $CUSIGNAL_HOME
    
    cd $CUSIGNAL_HOME
    conda env create -f conda/environments/cusignal_jetson_base.yml
    
    conda activate cusignal-dev
    
    cd $CUSIGNAL_HOME/python
    python setup.py install
    

    Running cusignal inside a python script (cuspec.py in stack trace).

    Additional context Ran test suite, and actually encountered similar errors (CUDA_ERROR_FILE_NOT_FOUND). I wasn't familiar with the test output so I wrote it off at the time. Do I need to build from source?

    Stack trace:

    Traceback (most recent call last):
      File "/home/evanmayer/github/rtlobs/cuspec.py", line 68, in <module>
        cusignal.filtering.channelize_poly(cp.random.randn(128), cp.ones((16,))/16, 4)
      File "/home/evanmayer/miniforge3/envs/cusignal-dev/lib/python3.8/site-packages/cusignal-0.16.0a0+160.g67a650e-py3.8.egg/cusignal/filtering/filtering.py", line 722, in channelize_poly
        _channelizer(x, h, y, n_chans, n_taps, n_pts)
      File "/home/evanmayer/miniforge3/envs/cusignal-dev/lib/python3.8/site-packages/cusignal-0.16.0a0+160.g67a650e-py3.8.egg/cusignal/filtering/_channelizer_cuda.py", line 84, in _channelizer
        _populate_kernel_cache(np_type, k_type)
      File "/home/evanmayer/miniforge3/envs/cusignal-dev/lib/python3.8/site-packages/cusignal-0.16.0a0+160.g67a650e-py3.8.egg/cusignal/filtering/_channelizer_cuda.py", line 54, in _populate_kernel_cache
        _cupy_kernel_cache[(str(np_type), k_type)] = _get_function(
      File "/home/evanmayer/miniforge3/envs/cusignal-dev/lib/python3.8/site-packages/cusignal-0.16.0a0+160.g67a650e-py3.8.egg/cusignal/utils/helper_tools.py", line 40, in _get_function
        module = cp.RawModule(path=dir + fatbin,)
      File "cupy/core/raw.pyx", line 260, in cupy.core.raw.RawModule.__init__
      File "cupy/cuda/function.pyx", line 191, in cupy.cuda.function.Module.load_file
      File "cupy/cuda/function.pyx", line 195, in cupy.cuda.function.Module.load_file
      File "cupy/cuda/driver.pyx", line 231, in cupy.cuda.driver.moduleLoad
      File "cupy/cuda/driver.pyx", line 118, in cupy.cuda.driver.check_status
    cupy.cuda.driver.CUDADriverError: CUDA_ERROR_FILE_NOT_FOUND: file not found
    
    bug ? - Needs Triage 
    opened by evanmayer 19
  • [BUG] [Jetson Nano Conda install hangs on installing pip dependencies]

    [BUG] [Jetson Nano Conda install hangs on installing pip dependencies]

    Describe the bug When creating the conda environment on a Jetson Nano Development kit, the installation proceeds until installing pip dependencies, where it hangs indefinitely.

    Steps/Code to reproduce bug Fresh Jetpack install on Jetson Nano board. Follow instructions for building from source on Jetson Nano exactly.

    Expected behavior Expected to install the environment.

    Environment details (please complete the following information):

    • Environment location: Jetson Nano board with Jetpack SDK
    • Method of cuSignal install: conda (specifically miniforge)

    I've never used conda before, so I don't know exactly what logs are needed, but this is the last output from the install before it hangs: Installing pip dependencies: ...working...

    2 - In Progress doc 
    opened by emeldar 18
  • [PR-REVIEW] Load fatbin at runtime

    [PR-REVIEW] Load fatbin at runtime

    This PR is to investigate the use of loading kernels via PTX and cubins. Instead of compiling them at runtime.

    The idea is to skip the process from Source Code to PTX or cubin. This should eliminate the need to precompile desired kernels, making the UI a little more friendly.

    There are pros and cons to both PTX and cubin.

    PTX

    Pro: We can choose a single architecture (default 3.0) and any hardware will JIT based on Compute Capability. Con: This can leave performance on the table and can be slower than cubins

    Cubin

    Pro: Optimal performance and (slightly) faster load because it skip the JIT Con: Requires a cubin for each supported architecture (i.e. sm30, sm35, sm50, sm52, sm60, sm62, sm70, sm72, sm75, sm80)

    Each methode required more space for files and cu file (if we decide to load it)

    Currently, using PTX method.

    Anecdotal results. PTX and cubin is ~18x faster on first pass

    SOURCE CODE
    Time(%)      Total Time   Instances         Average         Minimum         Maximum  Range               
    -------  --------------  ----------  --------------  --------------  --------------  --------------------
       78.3       259557232         100       2595572.3         2163073         2954536  gpu_lombscargle_4   
       18.9        62547903           1      62547903.0        62547903        62547903  gpu_lombscargle_0 <-- 
        1.0         3339638           1       3339638.0         3339638         3339638  gpu_lombscargle_1   
        0.9         2960195           1       2960195.0         2960195         2960195  gpu_lombscargle_2   
        0.9         2951219           1       2951219.0         2951219         2951219  gpu_lombscargle_3
    PTX
    Time(%)      Total Time   Instances         Average         Minimum         Maximum  Range               
    -------  --------------  ----------  --------------  --------------  --------------  --------------------
       95.2       247349403         100       2473494.0         2121631         2904074  gpu_lombscargle_4   
        1.3         3447313           1       3447313.0         3447313         3447313  gpu_lombscargle_0 <--  
        1.2         3234977           1       3234977.0         3234977         3234977  gpu_lombscargle_1   
        1.1         2904661           1       2904661.0         2904661         2904661  gpu_lombscargle_2   
        1.1         2902191           1       2902191.0         2902191         2902191  gpu_lombscargle_3
    CUBIN
    Time(%)      Total Time   Instances         Average         Minimum         Maximum  Range               
    -------  --------------  ----------  --------------  --------------  --------------  --------------------
       95.2       239095813         100       2390958.1         2065998         2840041  gpu_lombscargle_4   
        1.3         3325468           1       3325468.0         3325468         3325468  gpu_lombscargle_0 <--  
        1.3         3163933           1       3163933.0         3163933         3163933  gpu_lombscargle_1   
        1.1         2828210           1       2828210.0         2828210         2828210  gpu_lombscargle_2   
        1.1         2823180           1       2823180.0         2823180         2823180  gpu_lombscargle_3
    
    3 - Ready for Review 
    opened by mnicely 17
  • [REVIEW] Use Numba 0.49+ API where required

    [REVIEW] Use Numba 0.49+ API where required

    https://github.com/numba/numba/pull/5197 refactors many of Numba's submodules. Mirror the required import changes in cusignal.

    This keeps cusignal in sync with numba master and the 0.49 release candidate but breaks compatibility with the current numba release (0.48).

    This should go in eventually but also needs to make sure we keep compatibility with whatever version of numba we're getting in the conda channels selected by our packaging (and certainly updates the minimum required version) - need help with this part.

    5 - Ready to Merge 
    opened by dicta 15
  • [DOC] Links need fix in README

    [DOC] Links need fix in README

    Report incorrect documentation

    Location of incorrect documentation Table of Contents in README.md

    Describe the problems or issues found in the documentation The provided hyperlinks (anchors) within README.md file are broken/not updated.

    Steps taken to verify documentation is incorrect List any steps you have taken: Checked by clicking the links, verified with latest links.

    Suggested fix for documentation In the Table of Contents "Installation" section in README.md:

    • Replace anchor tag for Conda: Linux OS installation from #conda-linux-os to #conda-linux-os-preferred.
    • Replace anchor tag for Source, aarch64 (Jetson Nano, TK1, TX2, Xavier, AGX Clara DevKit), Linux OS installation from #source-aarch64-jetson-nano-tk1-tx2-xavier-linux-os to #source-aarch64-jetson-nano-tk1-tx2-xavier-agx-clara-devkit-linux-os.
    doc ? - Needs Triage 
    opened by AGoyal0512 0
  • [gpuCI] Forward-merge branch-22.12 to branch-23.02 [skip gpuci]

    [gpuCI] Forward-merge branch-22.12 to branch-23.02 [skip gpuci]

    Forward-merge triggered by push to branch-22.12 that creates a PR to keep branch-23.02 up-to-date. If this PR is unable to be immediately merged due to conflicts, it will remain open for the team to manually merge.

    opened by GPUtester 1
  • Add differentiable polyphase resampler

    Add differentiable polyphase resampler

    closes #491

    Adds a differentiable polyphase resampler in a new diff submodule. Includes some basic unit tests and a demonstration of how to incorporate the new layer in a Pytorch sequential model.

    improvement non-breaking Python inactive-30d 
    opened by mbolding3 8
  • [FEA] pytorch polyphase resampler

    [FEA] pytorch polyphase resampler

    Opening this issue to facilitate discussion. A Pytorch compatible wrapper around cusignal's polyphase resampler is here (WIP).

    @awthomp Where is a good place to put the source? Currently putting it in branch 22.08 in a new sub-directory python/cusignal/pytorch but curious if a place already exists.

    Also the backward method works (passes gradcheck, uses cusignal correlate) but is not optimized. It can be re-implemented most likely as another resample_poly call (since it upscales and convolves). That is on the to-do list.

    2 - In Progress feature request inactive-30d 
    opened by mbolding3 12
  • [BUG] FutureWarning from CuPy on `resample`

    [BUG] FutureWarning from CuPy on `resample`

    Describe the bug When resampling a CuPy array, we get a FutureWarning instructing us not to use a non-tuple sequence for multidimensional indexing:

    /opt/conda/envs/rapids/lib/python3.9/site-packages/cusignal-22.4.0a0+g8878bf7-py3.9.egg/cusignal/filtering/resample.py:269: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[cupy.array(seq)]`, which will result either in an error or a different result.
      Y[sl] = X[sl]
    

    Steps/Code to reproduce bug With latest 22.06 conda nightlies:

    import cupy as cp
    import cusignal
    
    start = 0
    stop = 10
    num = int(1e8)
    resample_num = int(1e5)
    
    gx = cp.linspace(start, stop, num, endpoint=False)
    gy = cp.cos(-gx**2/6.0)
    
    gf = cusignal.resample(gy, resample_num)
    

    Expected behavior Would not expect this warning to appear when resampling.

    Environment details (please complete the following information):

    • Environment location: bare-metal
    • Method of cuSignal install: conda

    Additional context Came across this issue while testing the filtering examples notebook.

    doc ? - Needs Triage inactive-30d inactive-90d 
    opened by charlesbluca 2
Releases(v22.12.00)
  • v22.12.00(Dec 8, 2022)

    📖 Documentation

    • Readme update (#522) @awthomp
    • Revisit WSL install instructions: additional dependency needed for pytest (#514) @evanmayer

    🛠️ Improvements

    • Flake8 migrated to GitHub and that broke some pre-commit checks (#518) @jacobtomlinson
    • fix filtering.resample output for even values of num parameter (#517) @mattkinsey
    • Use rapidsai CODE_OF_CONDUCT.md (#516) @bdice
    • Update channel priority (#515) @bdice
    • Remove stale labeler (#512) @raydouglass
    • Add option for smaller dataset in IO notebook (#473) @charlesbluca
    Source code(tar.gz)
    Source code(zip)
  • v23.02.00a(Dec 7, 2022)

  • v22.10.01(Oct 18, 2022)

  • v22.10.00(Oct 12, 2022)

  • v22.08.00(Aug 17, 2022)

    📖 Documentation

    • Switch to using common js & css (#499) @galipremsagar
    • Refresh README (#487) @awthomp

    🛠️ Improvements

    • Revert "Allow CuPy 11" (#497) @galipremsagar
    • Allow CuPy 11 (#494) @jakirkham
    Source code(tar.gz)
    Source code(zip)
  • v22.06.00(Jun 7, 2022)

    🛠️ Improvements

    • Simplify conda recipes (#484) @Ethyling
    • Use conda to build python packages during GPU tests (#480) @Ethyling
    • Fix pinned buffer IO issues (#479) @charlesbluca
    • Use pre-commit to enforce Python style checks (#478) @charlesbluca
    • Extend get_pinned_mem to work with more dtype / shapes (#477) @charlesbluca
    • Add/fix installation sections for SDR example notebooks (#476) @charlesbluca
    • Use conda compilers (#461) @Ethyling
    • Build packages using mambabuild (#453) @Ethyling
    Source code(tar.gz)
    Source code(zip)
  • v22.04.00(Apr 6, 2022)

    🐛 Bug Fixes

    • Fix docs builds (#455) @ajschmidt8

    📖 Documentation

    • Fixes a list of minor errors in the example codes #457 (#458) @sosae0

    🚀 New Features

    • adding complex parameter to chirp and additional tests (#450) @mnicely

    🛠️ Improvements

    • Temporarily disable new ops-bot functionality (#465) @ajschmidt8
    • Add .github/ops-bot.yaml config file (#463) @ajschmidt8
    • correlation lags function (#459) @sosae0
    Source code(tar.gz)
    Source code(zip)
  • v22.02.00(Feb 2, 2022)

    🛠️ Improvements

    • Allow CuPy 10 (#448) @jakirkham
    • Speedup: Single-precision hilbert, resample, and lfilter_zi. (#447) @luigifcruz
    • Add Nemo Machine Translation to SDR Notebook (#445) @awthomp
    • Add citrinet and fm_demod cusignal function to notebook (#444) @awthomp
    • Add FM Demodulation to cuSignal (#443) @awthomp
    • Revamp Offline RTL-SDR Notebook - FM Demod and NeMo Speech to Text (#442) @awthomp
    • Bypass Covariance Matrix Calculation if Supplied in MVDR Beamformer (#437) @awthomp
    Source code(tar.gz)
    Source code(zip)
  • v21.12.00(Dec 8, 2021)

    🐛 Bug Fixes

    • Data type conversion for cwt. (#429) @shevateng0
    • Fix indexing error in CWT (#425) @awthomp

    📖 Documentation

    • Use PyData Sphinx Theme for Generated Documentation (#436) @cmpadden
    • Doc fix for FIR Filters (#426) @awthomp

    🛠️ Improvements

    • Fix Changelog Merge Conflicts for branch-21.12 (#439) @ajschmidt8
    • remove use_numba from notebooks - deprecated (#433) @awthomp
    • Allow complex wavelet output for morlet2 (#428) @shevateng0
    Source code(tar.gz)
    Source code(zip)
  • v21.10.00(Oct 7, 2021)

    🐛 Bug Fixes

    • Change type check to use isinstance instead of str compare (#415) @jrc-exp

    📖 Documentation

    • Fix typo in readme (#413) @awthomp
    • Add citation file (#412) @awthomp
    • README overhaul (#411) @awthomp

    🛠️ Improvements

    • Fix Forward-Merge Conflicts (#417) @ajschmidt8
    • Adding CFAR (#409) @mbolding3
    • support space in workspace (#349) @jolorunyomi
    Source code(tar.gz)
    Source code(zip)
  • v21.08.00(Aug 4, 2021)

    🐛 Bug Fixes

    • Remove pytorch from cusignal CI/CD (#404) @awthomp
    • fix firwin bug where fs is ignored if nyq provided (#400) @awthomp
    • Fixed imaginary part being removed in delay mode of ambgfun (#397) @cliffburdick

    🛠️ Improvements

    • mvdr perf optimizations and addition of elementwise divide kernel (#403) @awthomp
    • Update sphinx config (#395) @ajschmidt8
    • Add Ambiguity Function (ambgfun) (#393) @awthomp
    • Fix 21.08 forward-merge conflicts (#392) @ajschmidt8
    • Adding MVDR (Capon) Beamformer (#383) @awthomp
    • Fix merge conflicts (#379) @ajschmidt8
    Source code(tar.gz)
    Source code(zip)
  • v21.06.00(Jun 9, 2021)

    🛠️ Improvements

    • Perf Improvements for UPFIRDN (#378) @mnicely
    • Perf Improvements to SOS Filter (#377) @mnicely
    • Update environment variable used to determine cuda_version (#376) @ajschmidt8
    • Update CHANGELOG.md links for calver (#373) @ajschmidt8
    • Merge branch-0.19 into branch-21.06 (#372) @ajschmidt8
    • Update docs build script (#369) @ajschmidt8
    Source code(tar.gz)
    Source code(zip)
  • v0.19.0(Apr 21, 2021)

    🐛 Bug Fixes

    • Fix bug in casting array to cupy (#340) @awthomp

    🚀 New Features

    • Add morlet2 (#336) @mnicely
    • Increment Max CuPy Version in CI (#328) @awthomp

    🛠️ Improvements

    • Add cusignal source dockerfile (#343) @awthomp
    • Update min scipy and cupy versions (#339) @awthomp
    • Add Taylor window (#338) @mnicely
    • Skip online signal processing tools testing (#337) @awthomp
    • Add 2D grid-stride loop to fix BUG 295 (#335) @mnicely
    • Update Changelog Link (#334) @ajschmidt8
    • Fix bug in bin_reader that ignored dtype (#333) @awthomp
    • Remove six dependency (#332) @awthomp
    • Prepare Changelog for Automation (#331) @ajschmidt8
    • Update 0.18 changelog entry (#330) @ajschmidt8
    • Fix merge conflicts in #315 (#316) @ajschmidt8
    Source code(tar.gz)
    Source code(zip)
  • v0.18.0(Feb 24, 2021)

    Bug Fixes 🐛

    • Fix labeler.yml GitHub Action (#301) @ajschmidt8
    • Fix Branch 0.18 merge 0.17 (#298) @BradReesWork

    Documentation 📖

    • Add WSL instructions for cuSignal Windows Builds (#323) @awthomp
    • Fix Radar API Docs (#311) @awthomp
    • Update cuSignal Documentation to Include Radar Functions (#309) @awthomp
    • Specify CuPy install time on Jetson Platform (#306) @awthomp
    • Update README to optimize CuPy build time on Jetson (#305) @awthomp

    New Features 🚀

    • Include scaffolding for new radar/phased array module and add new pulse compression feature (#300) @awthomp

    Improvements 🛠️

    • Update stale GHA with exemptions & new labels (#321) @mike-wendt
    • Add GHA to mark issues/prs as stale/rotten (#319) @Ethyling
    • Prepare Changelog for Automation (#314) @ajschmidt8
    • Auto-label PRs based on their content (#313) @jolorunyomi
    • Fix typo in convolution jupyter notebook example (#310) @awthomp
    • Add Pulse-Doppler Processing to radartools (#307) @awthomp
    • Create labeler.yml (#299) @jolorunyomi
    • Clarify GPU timing in E2E Jupyter Notebook (#297) @awthomp
    • Bump cuSignal Version (#296) @awthomp
    Source code(tar.gz)
    Source code(zip)
  • v0.17.0(Dec 10, 2020)

  • v0.16.0(Oct 21, 2020)

  • v0.15.0(Sep 16, 2020)

Owner
RAPIDS
Open GPU Data Science
RAPIDS
cuML - RAPIDS Machine Learning Library

cuML - GPU Machine Learning Algorithms cuML is a suite of libraries that implement machine learning algorithms and mathematical primitives functions t

RAPIDS 3.1k Jan 04, 2023
Library for faster pinned CPU <-> GPU transfer in Pytorch

SpeedTorch Faster pinned CPU tensor - GPU Pytorch variabe transfer and GPU tensor - GPU Pytorch variable transfer, in certain cases. Update 9-29-1

Santosh Gupta 657 Dec 19, 2022
A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.

NVIDIA DALI The NVIDIA Data Loading Library (DALI) is a library for data loading and pre-processing to accelerate deep learning applications. It provi

NVIDIA Corporation 4.2k Jan 08, 2023
A NumPy-compatible array library accelerated by CUDA

CuPy : A NumPy-compatible array library accelerated by CUDA Website | Docs | Install Guide | Tutorial | Examples | API Reference | Forum CuPy is an im

CuPy 6.6k Jan 05, 2023
Conda package for artifact creation that enables offline environments. Ideal for air-gapped deployments.

Conda-Vendor Conda Vendor is a tool to create local conda channels and manifests for vendored deployments Installation To install with pip, run: pip i

MetroStar - Tech 13 Nov 17, 2022
Python interface to GPU-powered libraries

Package Description scikit-cuda provides Python interfaces to many of the functions in the CUDA device/runtime, CUBLAS, CUFFT, and CUSOLVER libraries

Lev E. Givon 924 Dec 26, 2022
Python 3 Bindings for NVML library. Get NVIDIA GPU status inside your program.

py3nvml Documentation also available at readthedocs. Python 3 compatible bindings to the NVIDIA Management Library. Can be used to query the state of

Fergal Cotter 212 Jan 04, 2023
📊 A simple command-line utility for querying and monitoring GPU status

gpustat Just less than nvidia-smi? NOTE: This works with NVIDIA Graphics Devices only, no AMD support as of now. Contributions are welcome! Self-Promo

Jongwook Choi 3.2k Jan 04, 2023
General purpose GPU compute framework for cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). Blazing fast, mobile-enabled, asynchronous and optimized for advanced GPU data processing usecases.

Vulkan Kompute The general purpose GPU compute framework for cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). Blazing fast, mobile-enabl

The Institute for Ethical Machine Learning 1k Dec 26, 2022
ArrayFire: a general purpose GPU library.

ArrayFire is a general-purpose library that simplifies the process of developing software that targets parallel and massively-parallel architectures i

ArrayFire 4k Dec 29, 2022
QPT-Quick packaging tool 前项式Python环境快捷封装工具

QPT - Quick packaging tool 快捷封装工具 GitHub主页 | Gitee主页 QPT是一款可以“模拟”开发环境的多功能封装工具,一行命令即可将普通的Python脚本打包成EXE可执行程序,与此同时还可轻松引入CUDA等深度学习加速库, 尽可能在用户使用时复现您的开发环境。

GT-Zhang 545 Dec 28, 2022
A Python function for Slurm, to monitor the GPU information

Gpu-Monitor A Python function for Slurm, where I couldn't use nvidia-smi to monitor the GPU information. whole repo is not finish Installation TODO Mo

Squidward Tentacles 2 Feb 11, 2022
cuSignal - RAPIDS Signal Processing Library

cuSignal The RAPIDS cuSignal project leverages CuPy, Numba, and the RAPIDS ecosystem for GPU accelerated signal processing. In some cases, cuSignal is

RAPIDS 646 Dec 30, 2022
jupyter/ipython experiment containers for GPU and general RAM re-use

ipyexperiments jupyter/ipython experiment containers and utils for profiling and reclaiming GPU and general RAM, and detecting memory leaks. About Thi

Stas Bekman 153 Dec 07, 2022
cuGraph - RAPIDS Graph Analytics Library

cuGraph - GPU Graph Analytics The RAPIDS cuGraph library is a collection of GPU accelerated graph algorithms that process data found in GPU DataFrames

RAPIDS 1.2k Jan 01, 2023
Python 3 Bindings for the NVIDIA Management Library

====== pyNVML ====== *** Patched to support Python 3 (and Python 2) *** ------------------------------------------------ Python bindings to the NVID

Nicolas Hennion 95 Jan 01, 2023
A Python module for getting the GPU status from NVIDA GPUs using nvidia-smi programmically in Python

GPUtil GPUtil is a Python module for getting the GPU status from NVIDA GPUs using nvidia-smi. GPUtil locates all GPUs on the computer, determines thei

Anders Krogh Mortensen 927 Dec 08, 2022
CUDA integration for Python, plus shiny features

PyCUDA lets you access Nvidia's CUDA parallel computation API from Python. Several wrappers of the CUDA API already exist-so what's so special about P

Andreas Klöckner 1.4k Jan 02, 2023
BlazingSQL is a lightweight, GPU accelerated, SQL engine for Python. Built on RAPIDS cuDF.

A lightweight, GPU accelerated, SQL engine built on the RAPIDS.ai ecosystem. Get Started on app.blazingsql.com Getting Started | Documentation | Examp

BlazingSQL 1.8k Jan 02, 2023
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch

Introduction This repository holds NVIDIA-maintained utilities to streamline mixed precision and distributed training in Pytorch. Some of the code her

NVIDIA Corporation 6.9k Dec 28, 2022