CUDA Python Low-level Bindings

Overview

CUDA-Python

Building

Requirements

Dependencies of the CUDA-Python bindings and some versions that are known to work are as follows:

  • Driver: Linux (450.80.02 or later) Windows(456.38 or later)
  • CUDA Toolkit 11.0 to 11.4 - e.g. 11.4.48
  • Cython - e.g. 0.29.21
  • Versioneer - e.g. 0.20

Compilation

To compile the extension in-place, run:

python setup.py build_ext --inplace

To compile for debugging the extension modules with gdb, pass the --debug argument to setup.py.

Develop installation

You can use

python setup.py develop

to use the module in-place in your current Python environment (e.g. for testing of porting other libraries to use the binding).

Build the Docs

conda env create -f docs/environment-docs.yml
conda activate cuda-python-docs

Then compile and install cuda-python following the steps above.

cd docs
make html
open build/html/index.html

Build the Docs

conda env create -f docs_src/environment-docs.yml
conda activate cuda-python-docs

Then compile and install cuda-python following the steps above.

cd docs_src
make html
open build/html/index.html

Publish the Docs

git checkout gh-pages
cd docs_src
make html
cp -a build/html/. ../docs/

Testing

Requirements

Dependencies of the test execution and some versions that are known to work are as follows:

  • numpy-1.19.5
  • numba-0.53.1
  • matplotlib-3.3.4
  • scipy-1.6.3
  • pytest-benchmark-3.4.1

Unit-tests

You can run the included tests with:

pytest

Samples

You can run the included tests with:

pytest examples

Benchmark

You can run benchmark only tests with:

pytest --benchmark-only

Examples

The included examples are:

  • examples/extra/jit_program.py: Demonstrates the use of the API to compile and launch a kernel on the device. Includes device memory allocation / deallocation, transfers between host and device, creation and usage of streams, and context management.
  • examples/extra/numba_emm_plugin.py: Implements a Numba External Memory Management plugin, showing that this CUDA Python Driver API can coexist with other wrappers of the driver API.
Comments
  • Fails to build on AmazonLinux

    Fails to build on AmazonLinux

    Hi,

    I checked out the package and tried to build it on AmazonLinux but it fails to compile. Please see the build output below. I also tried all other commands there were mentioned in installation guide, but all failed with the same issue.

    Cuda : 11.2 GCC: 9.3

    $ python setup.py build
    Compiling cuda/_cuda/ccuda.pyx because it changed.
    Compiling cuda/_cuda/cnvrtc.pyx because it changed.
    [1/2] Cythonizing cuda/_cuda/ccuda.pyx
    [2/2] Cythonizing cuda/_cuda/cnvrtc.pyx
    Compiling cuda/_lib/utils.pyx because it changed.
    [1/1] Cythonizing cuda/_lib/utils.pyx
    Compiling cuda/_lib/ccudart/ccudart.pyx because it changed.
    Compiling cuda/_lib/ccudart/utils.pyx because it changed.
    [1/2] Cythonizing cuda/_lib/ccudart/ccudart.pyx
    [2/2] Cythonizing cuda/_lib/ccudart/utils.pyx
    Compiling cuda/ccuda.pyx because it changed.
    Compiling cuda/ccudart.pyx because it changed.
    Compiling cuda/cnvrtc.pyx because it changed.
    Compiling cuda/cuda.pyx because it changed.
    Compiling cuda/cudart.pyx because it changed.
    Compiling cuda/nvrtc.pyx because it changed.
    [1/6] Cythonizing cuda/ccuda.pyx
    [2/6] Cythonizing cuda/ccudart.pyx
    [3/6] Cythonizing cuda/cnvrtc.pyx
    [4/6] Cythonizing cuda/cuda.pyx
    [5/6] Cythonizing cuda/cudart.pyx
    [6/6] Cythonizing cuda/nvrtc.pyx
    Compiling cuda/tests/test_ccuda.pyx because it changed.
    Compiling cuda/tests/test_ccudart.pyx because it changed.
    Compiling cuda/tests/test_interoperability_cython.pyx because it changed.
    [1/3] Cythonizing cuda/tests/test_ccuda.pyx
    [2/3] Cythonizing cuda/tests/test_ccudart.pyx
    [3/3] Cythonizing cuda/tests/test_interoperability_cython.pyx
    running build
    running build_py
    creating build
    creating build/lib.linux-x86_64-3.8
    creating build/lib.linux-x86_64-3.8/cuda
    copying cuda/__init__.py -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/_version.py -> build/lib.linux-x86_64-3.8/cuda
    creating build/lib.linux-x86_64-3.8/cuda/_cuda
    copying cuda/_cuda/__init__.py -> build/lib.linux-x86_64-3.8/cuda/_cuda
    creating build/lib.linux-x86_64-3.8/cuda/_lib
    copying cuda/_lib/__init__.py -> build/lib.linux-x86_64-3.8/cuda/_lib
    creating build/lib.linux-x86_64-3.8/cuda/benchmarks
    copying cuda/benchmarks/__init__.py -> build/lib.linux-x86_64-3.8/cuda/benchmarks
    copying cuda/benchmarks/kernels.py -> build/lib.linux-x86_64-3.8/cuda/benchmarks
    copying cuda/benchmarks/perf_test_utils.py -> build/lib.linux-x86_64-3.8/cuda/benchmarks
    copying cuda/benchmarks/test_cupy.py -> build/lib.linux-x86_64-3.8/cuda/benchmarks
    copying cuda/benchmarks/test_launch_latency.py -> build/lib.linux-x86_64-3.8/cuda/benchmarks
    copying cuda/benchmarks/test_numba.py -> build/lib.linux-x86_64-3.8/cuda/benchmarks
    copying cuda/benchmarks/test_pointer_attributes.py -> build/lib.linux-x86_64-3.8/cuda/benchmarks
    creating build/lib.linux-x86_64-3.8/cuda/tests
    copying cuda/tests/__init__.py -> build/lib.linux-x86_64-3.8/cuda/tests
    copying cuda/tests/test_cuda.py -> build/lib.linux-x86_64-3.8/cuda/tests
    copying cuda/tests/test_cudart.py -> build/lib.linux-x86_64-3.8/cuda/tests
    copying cuda/tests/test_cython.py -> build/lib.linux-x86_64-3.8/cuda/tests
    copying cuda/tests/test_interoperability.py -> build/lib.linux-x86_64-3.8/cuda/tests
    copying cuda/tests/test_kernelParams.py -> build/lib.linux-x86_64-3.8/cuda/tests
    copying cuda/tests/test_nvrtc.py -> build/lib.linux-x86_64-3.8/cuda/tests
    creating build/lib.linux-x86_64-3.8/cuda/_lib/ccudart
    copying cuda/_lib/ccudart/__init__.py -> build/lib.linux-x86_64-3.8/cuda/_lib/ccudart
    copying cuda/__init__.pxd -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/ccuda.pxd -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/ccudart.pxd -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/cnvrtc.pxd -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/cuda.pxd -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/cudart.pxd -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/nvrtc.pxd -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/ccuda.pyx -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/ccudart.pyx -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/cnvrtc.pyx -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/cuda.pyx -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/cudart.pyx -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/nvrtc.pyx -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/ccuda.cpp -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/ccudart.cpp -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/cnvrtc.cpp -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/cuda.cpp -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/cudart.cpp -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/nvrtc.cpp -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/_cuda/ccuda.pxd -> build/lib.linux-x86_64-3.8/cuda/_cuda
    copying cuda/_cuda/cnvrtc.pxd -> build/lib.linux-x86_64-3.8/cuda/_cuda
    copying cuda/_cuda/loader.pxd -> build/lib.linux-x86_64-3.8/cuda/_cuda
    copying cuda/_cuda/ccuda.pyx -> build/lib.linux-x86_64-3.8/cuda/_cuda
    copying cuda/_cuda/cnvrtc.pyx -> build/lib.linux-x86_64-3.8/cuda/_cuda
    copying cuda/_cuda/loader.h -> build/lib.linux-x86_64-3.8/cuda/_cuda
    copying cuda/_cuda/loader.cpp -> build/lib.linux-x86_64-3.8/cuda/_cuda
    copying cuda/_cuda/ccuda.cpp -> build/lib.linux-x86_64-3.8/cuda/_cuda
    copying cuda/_cuda/cnvrtc.cpp -> build/lib.linux-x86_64-3.8/cuda/_cuda
    copying cuda/_lib/dlfcn.pxd -> build/lib.linux-x86_64-3.8/cuda/_lib
    copying cuda/_lib/param_packer.pxd -> build/lib.linux-x86_64-3.8/cuda/_lib
    copying cuda/_lib/utils.pxd -> build/lib.linux-x86_64-3.8/cuda/_lib
    copying cuda/_lib/utils.pyx -> build/lib.linux-x86_64-3.8/cuda/_lib
    copying cuda/_lib/param_packer.h -> build/lib.linux-x86_64-3.8/cuda/_lib
    copying cuda/_lib/param_packer.cpp -> build/lib.linux-x86_64-3.8/cuda/_lib
    copying cuda/_lib/utils.cpp -> build/lib.linux-x86_64-3.8/cuda/_lib
    copying cuda/tests/test_ccuda.pyx -> build/lib.linux-x86_64-3.8/cuda/tests
    copying cuda/tests/test_ccudart.pyx -> build/lib.linux-x86_64-3.8/cuda/tests
    copying cuda/tests/test_interoperability_cython.pyx -> build/lib.linux-x86_64-3.8/cuda/tests
    copying cuda/tests/test_ccuda.cpp -> build/lib.linux-x86_64-3.8/cuda/tests
    copying cuda/tests/test_ccudart.cpp -> build/lib.linux-x86_64-3.8/cuda/tests
    copying cuda/tests/test_interoperability_cython.cpp -> build/lib.linux-x86_64-3.8/cuda/tests
    copying cuda/_lib/ccudart/ccudart.pxd -> build/lib.linux-x86_64-3.8/cuda/_lib/ccudart
    copying cuda/_lib/ccudart/utils.pxd -> build/lib.linux-x86_64-3.8/cuda/_lib/ccudart
    copying cuda/_lib/ccudart/ccudart.pyx -> build/lib.linux-x86_64-3.8/cuda/_lib/ccudart
    copying cuda/_lib/ccudart/utils.pyx -> build/lib.linux-x86_64-3.8/cuda/_lib/ccudart
    copying cuda/_lib/ccudart/ccudart.cpp -> build/lib.linux-x86_64-3.8/cuda/_lib/ccudart
    copying cuda/_lib/ccudart/utils.cpp -> build/lib.linux-x86_64-3.8/cuda/_lib/ccudart
    UPDATING build/lib.linux-x86_64-3.8/cuda/_version.py
    set build/lib.linux-x86_64-3.8/cuda/_version.py to '11.7.1'
    running build_ext
    building 'cuda._cuda.ccuda' extension
    creating build/temp.linux-x86_64-3.8
    creating build/temp.linux-x86_64-3.8/cuda
    creating build/temp.linux-x86_64-3.8/cuda/_cuda
    /home/ec2-user/anaconda3/envs/tensorflow2_p38/bin/x86_64-conda-linux-gnu-cc -Wno-unused-result -Wsign-compare -DNDEBUG -fwrapv -O2 -Wall -Wstrict-prototypes -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -pipe -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -pipe -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /home/ec2-user/anaconda3/envs/tensorflow2_p38/include -DNDEBUG -D_FORTIFY_SOURCE=2 -O2 -isystem /home/ec2-user/anaconda3/envs/tensorflow2_p38/include -fPIC -I./cuda -I./cuda/_cuda -I/home/ec2-user/anaconda3/envs/tensorflow2_p38/include -I/usr/local/cuda-11.2/include -I/home/ec2-user/anaconda3/envs/tensorflow2_p38/include/python3.8 -c cuda/_cuda/ccuda.cpp -o build/temp.linux-x86_64-3.8/cuda/_cuda/ccuda.o -std=c++14 -fpermissive -Wno-deprecated-declarations -D _GLIBCXX_ASSERTIONS -fno-var-tracking-assignments -O3
    cc1plus: warning: command line option '-Wstrict-prototypes' is valid for C/ObjC but not for C++
    cuda/_cuda/ccuda.cpp: In function 'int __pyx_f_4cuda_5_cuda_5ccuda_cuPythonInit()':
    cuda/_cuda/ccuda.cpp:4202:138: error: 'CU_GET_PROC_ADDRESS_PER_THREAD_DEFAULT_STREAM' was not declared in this scope
     4202 |         __pyx_t_8 = __pyx_f_4cuda_5ccuda_cuGetProcAddress(((char const *)"cuMemcpy"), (&__pyx_v_4cuda_5_cuda_5ccuda___cuMemcpy), 0x1B58, CU_GET_PROC_ADDRESS_PER_THREAD_DEFAULT_STREAM); if (unlikely(__pyx_t_8 == ((CUresult)CUDA_ERROR_NOT_FOUND) && __Pyx_ErrOccurredWithGIL())) __PYX_ERR(0, 836, __pyx_L4_error)
          |                                                                                                                                          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp:4924:137: error: 'CU_GET_PROC_ADDRESS_DEFAULT' was not declared in this scope
     4924 |         __pyx_t_8 = __pyx_f_4cuda_5ccuda_cuGetProcAddress(((char const *)"cuMemcpy"), (&__pyx_v_4cuda_5_cuda_5ccuda___cuMemcpy), 0xFA0, CU_GET_PROC_ADDRESS_DEFAULT); if (unlikely(__pyx_t_8 == ((CUresult)CUDA_ERROR_NOT_FOUND) && __Pyx_ErrOccurredWithGIL())) __PYX_ERR(0, 917, __pyx_L4_error)
          |                                                                                                                                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp:5637:152: error: 'CU_GET_PROC_ADDRESS_DEFAULT' was not declared in this scope
     5637 |       __pyx_t_8 = __pyx_f_4cuda_5ccuda_cuGetProcAddress(((char const *)"cuGetErrorString"), (&__pyx_v_4cuda_5_cuda_5ccuda___cuGetErrorString), 0x1770, CU_GET_PROC_ADDRESS_DEFAULT); if (unlikely(__pyx_t_8 == ((CUresult)CUDA_ERROR_NOT_FOUND) && __Pyx_ErrOccurredWithGIL())) __PYX_ERR(0, 997, __pyx_L4_error)
          |                                                                                                                                                        ^~~~~~~~~~~~~~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp: At global scope:
    cuda/_cuda/ccuda.cpp:15609:73: error: 'CUflushGPUDirectRDMAWritesTarget' was not declared in this scope
    15609 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuFlushGPUDirectRDMAWrites(CUflushGPUDirectRDMAWritesTarget __pyx_v_target, CUflushGPUDirectRDMAWritesScope __pyx_v_scope) {
          |                                                                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp:15609:122: error: 'CUflushGPUDirectRDMAWritesScope' was not declared in this scope
    15609 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuFlushGPUDirectRDMAWrites(CUflushGPUDirectRDMAWritesTarget __pyx_v_target, CUflushGPUDirectRDMAWritesScope __pyx_v_scope) {
          |                                                                                                                          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp:15609:167: warning: expression list treated as compound expression in initializer [-fpermissive]
    15609 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuFlushGPUDirectRDMAWrites(CUflushGPUDirectRDMAWritesTarget __pyx_v_target, CUflushGPUDirectRDMAWritesScope __pyx_v_scope) {
          |                                                                                                                                                                       ^
    cuda/_cuda/ccuda.cpp:16977:94: error: 'CUexecAffinityType' has not been declared
    16977 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuDeviceGetExecAffinitySupport(int *__pyx_v_pi, CUexecAffinityType __pyx_v_typename, CUdevice __pyx_v_dev) {
          |                                                                                              ^~~~~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp: In function 'CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuDeviceGetExecAffinitySupport(int*, int, CUdevice)':
    cuda/_cuda/ccuda.cpp:17082:30: error: expected primary-expression before '(' token
    17082 |     __pyx_v_err = ((CUresult (*)(int *, CUexecAffinityType, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuDeviceGetExecAffinitySupport)(__pyx_v_pi, __pyx_v_typename, __pyx_v_dev);
          |                              ^
    cuda/_cuda/ccuda.cpp:17082:32: error: expected primary-expression before ')' token
    17082 |     __pyx_v_err = ((CUresult (*)(int *, CUexecAffinityType, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuDeviceGetExecAffinitySupport)(__pyx_v_pi, __pyx_v_typename, __pyx_v_dev);
          |                                ^
    cuda/_cuda/ccuda.cpp:17082:34: error: expected primary-expression before 'int'
    17082 |     __pyx_v_err = ((CUresult (*)(int *, CUexecAffinityType, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuDeviceGetExecAffinitySupport)(__pyx_v_pi, __pyx_v_typename, __pyx_v_dev);
          |                                  ^~~
    cuda/_cuda/ccuda.cpp:17082:41: error: 'CUexecAffinityType' was not declared in this scope
    17082 |     __pyx_v_err = ((CUresult (*)(int *, CUexecAffinityType, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuDeviceGetExecAffinitySupport)(__pyx_v_pi, __pyx_v_typename, __pyx_v_dev);
          |                                         ^~~~~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp:17082:69: error: expected primary-expression before ')' token
    17082 |     __pyx_v_err = ((CUresult (*)(int *, CUexecAffinityType, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuDeviceGetExecAffinitySupport)(__pyx_v_pi, __pyx_v_typename, __pyx_v_dev);
          |                                                                     ^
    cuda/_cuda/ccuda.cpp:17082:71: error: expected ')' before '__pyx_v_4cuda_5_cuda_5ccuda___cuDeviceGetExecAffinitySupport'
    17082 |     __pyx_v_err = ((CUresult (*)(int *, CUexecAffinityType, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuDeviceGetExecAffinitySupport)(__pyx_v_pi, __pyx_v_typename, __pyx_v_dev);
          |                   ~                                                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          |                                                                       )
    cuda/_cuda/ccuda.cpp: At global scope:
    cuda/_cuda/ccuda.cpp:17319:86: error: 'CUexecAffinityParam' has not been declared
    17319 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuCtxCreate_v3(CUcontext *__pyx_v_pctx, CUexecAffinityParam *__pyx_v_paramsArray, int __pyx_v_numParams, unsigned int __pyx_v_flags, CUdevice __pyx_v_dev) {
          |                                                                                      ^~~~~~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp: In function 'CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuCtxCreate_v3(CUctx_st**, int*, int, unsigned int, CUdevice)':
    cuda/_cuda/ccuda.cpp:17424:30: error: expected primary-expression before '(' token
    17424 |     __pyx_v_err = ((CUresult (*)(CUcontext *, CUexecAffinityParam *, int, unsigned int, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuCtxCreate_v3)(__pyx_v_pctx, __pyx_v_paramsArray, __pyx_v_numParams, __pyx_v_flags, __pyx_v_dev);
          |                              ^
    cuda/_cuda/ccuda.cpp:17424:32: error: expected primary-expression before ')' token
    17424 |     __pyx_v_err = ((CUresult (*)(CUcontext *, CUexecAffinityParam *, int, unsigned int, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuCtxCreate_v3)(__pyx_v_pctx, __pyx_v_paramsArray, __pyx_v_numParams, __pyx_v_flags, __pyx_v_dev);
          |                                ^
    cuda/_cuda/ccuda.cpp:17424:44: error: expected primary-expression before '*' token
    17424 |     __pyx_v_err = ((CUresult (*)(CUcontext *, CUexecAffinityParam *, int, unsigned int, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuCtxCreate_v3)(__pyx_v_pctx, __pyx_v_paramsArray, __pyx_v_numParams, __pyx_v_flags, __pyx_v_dev);
          |                                            ^
    cuda/_cuda/ccuda.cpp:17424:45: error: expected primary-expression before ',' token
    17424 |     __pyx_v_err = ((CUresult (*)(CUcontext *, CUexecAffinityParam *, int, unsigned int, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuCtxCreate_v3)(__pyx_v_pctx, __pyx_v_paramsArray, __pyx_v_numParams, __pyx_v_flags, __pyx_v_dev);
          |                                             ^
    cuda/_cuda/ccuda.cpp:17424:47: error: 'CUexecAffinityParam' was not declared in this scope
    17424 |     __pyx_v_err = ((CUresult (*)(CUcontext *, CUexecAffinityParam *, int, unsigned int, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuCtxCreate_v3)(__pyx_v_pctx, __pyx_v_paramsArray, __pyx_v_numParams, __pyx_v_flags, __pyx_v_dev);
          |                                               ^~~~~~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp:17424:68: error: expected primary-expression before ',' token
    17424 |     __pyx_v_err = ((CUresult (*)(CUcontext *, CUexecAffinityParam *, int, unsigned int, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuCtxCreate_v3)(__pyx_v_pctx, __pyx_v_paramsArray, __pyx_v_numParams, __pyx_v_flags, __pyx_v_dev);
          |                                                                    ^
    cuda/_cuda/ccuda.cpp:17424:70: error: expected primary-expression before 'int'
    17424 |     __pyx_v_err = ((CUresult (*)(CUcontext *, CUexecAffinityParam *, int, unsigned int, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuCtxCreate_v3)(__pyx_v_pctx, __pyx_v_paramsArray, __pyx_v_numParams, __pyx_v_flags, __pyx_v_dev);
          |                                                                      ^~~
    cuda/_cuda/ccuda.cpp:17424:75: error: expected primary-expression before 'unsigned'
    17424 |     __pyx_v_err = ((CUresult (*)(CUcontext *, CUexecAffinityParam *, int, unsigned int, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuCtxCreate_v3)(__pyx_v_pctx, __pyx_v_paramsArray, __pyx_v_numParams, __pyx_v_flags, __pyx_v_dev);
          |                                                                           ^~~~~~~~
    cuda/_cuda/ccuda.cpp:17424:97: error: expected primary-expression before ')' token
    17424 |     __pyx_v_err = ((CUresult (*)(CUcontext *, CUexecAffinityParam *, int, unsigned int, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuCtxCreate_v3)(__pyx_v_pctx, __pyx_v_paramsArray, __pyx_v_numParams, __pyx_v_flags, __pyx_v_dev);
          |                                                                                                 ^
    cuda/_cuda/ccuda.cpp:17424:99: error: expected ')' before '__pyx_v_4cuda_5_cuda_5ccuda___cuCtxCreate_v3'
    17424 |     __pyx_v_err = ((CUresult (*)(CUcontext *, CUexecAffinityParam *, int, unsigned int, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuCtxCreate_v3)(__pyx_v_pctx, __pyx_v_paramsArray, __pyx_v_numParams, __pyx_v_flags, __pyx_v_dev);
          |                   ~                                                                               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          |                                                                                                   )
    cuda/_cuda/ccuda.cpp: At global scope:
    cuda/_cuda/ccuda.cpp:20397:67: error: 'CUexecAffinityParam' was not declared in this scope
    20397 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuCtxGetExecAffinity(CUexecAffinityParam *__pyx_v_pExecAffinity, CUexecAffinityType __pyx_v_typename) {
          |                                                                   ^~~~~~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp:20397:88: error: '__pyx_v_pExecAffinity' was not declared in this scope
    20397 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuCtxGetExecAffinity(CUexecAffinityParam *__pyx_v_pExecAffinity, CUexecAffinityType __pyx_v_typename) {
          |                                                                                        ^~~~~~~~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp:20397:111: error: 'CUexecAffinityType' was not declared in this scope
    20397 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuCtxGetExecAffinity(CUexecAffinityParam *__pyx_v_pExecAffinity, CUexecAffinityType __pyx_v_typename) {
          |                                                                                                               ^~~~~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp:20397:146: warning: expression list treated as compound expression in initializer [-fpermissive]
    20397 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuCtxGetExecAffinity(CUexecAffinityParam *__pyx_v_pExecAffinity, CUexecAffinityType __pyx_v_typename) {
          |                                                                                                                                                  ^
    cuda/_cuda/ccuda.cpp:33564:75: error: 'CUDA_ARRAY_MEMORY_REQUIREMENTS' was not declared in this scope
    33564 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuArrayGetMemoryRequirements(CUDA_ARRAY_MEMORY_REQUIREMENTS *__pyx_v_memoryRequirements, CUarray __pyx_v_array, CUdevice __pyx_v_device) {
          |                                                                           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp:33564:107: error: '__pyx_v_memoryRequirements' was not declared in this scope
    33564 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuArrayGetMemoryRequirements(CUDA_ARRAY_MEMORY_REQUIREMENTS *__pyx_v_memoryRequirements, CUarray __pyx_v_array, CUdevice __pyx_v_device) {
          |                                                                                                           ^~~~~~~~~~~~~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp:33564:143: error: expected primary-expression before '__pyx_v_array'
    33564 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuArrayGetMemoryRequirements(CUDA_ARRAY_MEMORY_REQUIREMENTS *__pyx_v_memoryRequirements, CUarray __pyx_v_array, CUdevice __pyx_v_device) {
          |                                                                                                                                               ^~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp:33564:167: error: expected primary-expression before '__pyx_v_device'
    33564 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuArrayGetMemoryRequirements(CUDA_ARRAY_MEMORY_REQUIREMENTS *__pyx_v_memoryRequirements, CUarray __pyx_v_array, CUdevice __pyx_v_device) {
    ---
    truncated due to git issue limit
    ---
    cuda/_cuda/ccuda.cpp:58806:44: error: 'CUgraphMem_attribute' was not declared in this scope
    58806 |     __pyx_v_err = ((CUresult (*)(CUdevice, CUgraphMem_attribute, void *))__pyx_v_4cuda_5_cuda_5ccuda___cuDeviceSetGraphMemAttribute)(__pyx_v_device, __pyx_v_attr, __pyx_v_value);
          |                                            ^~~~~~~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp:58806:66: error: expected primary-expression before 'void'
    58806 |     __pyx_v_err = ((CUresult (*)(CUdevice, CUgraphMem_attribute, void *))__pyx_v_4cuda_5_cuda_5ccuda___cuDeviceSetGraphMemAttribute)(__pyx_v_device, __pyx_v_attr, __pyx_v_value);
          |                                                                  ^~~~
    cuda/_cuda/ccuda.cpp:58806:74: error: expected ')' before '__pyx_v_4cuda_5_cuda_5ccuda___cuDeviceSetGraphMemAttribute'
    58806 |     __pyx_v_err = ((CUresult (*)(CUdevice, CUgraphMem_attribute, void *))__pyx_v_4cuda_5_cuda_5ccuda___cuDeviceSetGraphMemAttribute)(__pyx_v_device, __pyx_v_attr, __pyx_v_value);
          |                   ~                                                      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          |                                                                          )
    cuda/_cuda/ccuda.cpp: At global scope:
    cuda/_cuda/ccuda.cpp:64515:65: error: 'CUuserObject' was not declared in this scope; did you mean 'CUsurfObject'?
    64515 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuUserObjectCreate(CUuserObject *__pyx_v_object_out, void *__pyx_v_ptr, CUhostFn __pyx_v_destroy, unsigned int __pyx_v_initialRefcount, unsigned int __pyx_v_flags) {
          |                                                                 ^~~~~~~~~~~~
          |                                                                 CUsurfObject
    cuda/_cuda/ccuda.cpp:64515:79: error: '__pyx_v_object_out' was not declared in this scope
    64515 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuUserObjectCreate(CUuserObject *__pyx_v_object_out, void *__pyx_v_ptr, CUhostFn __pyx_v_destroy, unsigned int __pyx_v_initialRefcount, unsigned int __pyx_v_flags) {
          |                                                                               ^~~~~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp:64515:99: error: expected primary-expression before 'void'
    64515 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuUserObjectCreate(CUuserObject *__pyx_v_object_out, void *__pyx_v_ptr, CUhostFn __pyx_v_destroy, unsigned int __pyx_v_initialRefcount, unsigned int __pyx_v_flags) {
          |                                                                                                   ^~~~
    cuda/_cuda/ccuda.cpp:64515:127: error: expected primary-expression before '__pyx_v_destroy'
    64515 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuUserObjectCreate(CUuserObject *__pyx_v_object_out, void *__pyx_v_ptr, CUhostFn __pyx_v_destroy, unsigned int __pyx_v_initialRefcount, unsigned int __pyx_v_flags) {
          |                                                                                                                               ^~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp:64515:144: error: expected primary-expression before 'unsigned'
    64515 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuUserObjectCreate(CUuserObject *__pyx_v_object_out, void *__pyx_v_ptr, CUhostFn __pyx_v_destroy, unsigned int __pyx_v_initialRefcount, unsigned int __pyx_v_flags) {
          |                                                                                                                                                ^~~~~~~~
    cuda/_cuda/ccuda.cpp:64515:182: error: expected primary-expression before 'unsigned'
    64515 | atic CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuUserObjectCreate(CUuserObject *__pyx_v_object_out, void *__pyx_v_ptr, CUhostFn __pyx_v_destroy, unsigned int __pyx_v_initialRefcount, unsigned int __pyx_v_flags) {
          |                                                                                                                                                                                    ^~~~~~~~
    
    cuda/_cuda/ccuda.cpp:64515:208: warning: expression list treated as compound expression in initializer [-fpermissive]
    64515 | a_5_cuda_5ccuda__cuUserObjectCreate(CUuserObject *__pyx_v_object_out, void *__pyx_v_ptr, CUhostFn __pyx_v_destroy, unsigned int __pyx_v_initialRefcount, unsigned int __pyx_v_flags) {
          |                                                                                                                                                                                    ^
    
    cuda/_cuda/ccuda.cpp:64686:65: error: 'CUuserObject' was not declared in this scope; did you mean 'CUsurfObject'?
    64686 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuUserObjectRetain(CUuserObject __pyx_v_object, unsigned int __pyx_v_count) {
          |                                                                 ^~~~~~~~~~~~
          |                                                                 CUsurfObject
    cuda/_cuda/ccuda.cpp:64686:94: error: expected primary-expression before 'unsigned'
    64686 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuUserObjectRetain(CUuserObject __pyx_v_object, unsigned int __pyx_v_count) {
          |                                                                                              ^~~~~~~~
    cuda/_cuda/ccuda.cpp:64686:120: warning: expression list treated as compound expression in initializer [-fpermissive]
    64686 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuUserObjectRetain(CUuserObject __pyx_v_object, unsigned int __pyx_v_count) {
          |                                                                                                                        ^
    cuda/_cuda/ccuda.cpp:64857:66: error: 'CUuserObject' was not declared in this scope; did you mean 'CUsurfObject'?
    64857 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuUserObjectRelease(CUuserObject __pyx_v_object, unsigned int __pyx_v_count) {
          |                                                                  ^~~~~~~~~~~~
          |                                                                  CUsurfObject
    cuda/_cuda/ccuda.cpp:64857:95: error: expected primary-expression before 'unsigned'
    64857 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuUserObjectRelease(CUuserObject __pyx_v_object, unsigned int __pyx_v_count) {
          |                                                                                               ^~~~~~~~
    cuda/_cuda/ccuda.cpp:64857:121: warning: expression list treated as compound expression in initializer [-fpermissive]
    64857 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuUserObjectRelease(CUuserObject __pyx_v_object, unsigned int __pyx_v_count) {
          |                                                                                                                         ^
    cuda/_cuda/ccuda.cpp:65028:93: error: 'CUuserObject' has not been declared
    65028 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuGraphRetainUserObject(CUgraph __pyx_v_graph, CUuserObject __pyx_v_object, unsigned int __pyx_v_count, unsigned int __pyx_v_flags) {
          |                                                                                             ^~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp: In function 'CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuGraphRetainUserObject(CUgraph, int, unsigned int, unsigned int)':
    cuda/_cuda/ccuda.cpp:65133:30: error: expected primary-expression before '(' token
    65133 |     __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphRetainUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count, __pyx_v_flags);
          |                              ^
    cuda/_cuda/ccuda.cpp:65133:32: error: expected primary-expression before ')' token
    65133 |     __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphRetainUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count, __pyx_v_flags);
          |                                ^
    cuda/_cuda/ccuda.cpp:65133:41: error: expected primary-expression before ',' token
    65133 |     __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphRetainUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count, __pyx_v_flags);
          |                                         ^
    cuda/_cuda/ccuda.cpp:65133:43: error: 'CUuserObject' was not declared in this scope; did you mean 'CUsurfObject'?
    65133 |     __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphRetainUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count, __pyx_v_flags);
          |                                           ^~~~~~~~~~~~
          |                                           CUsurfObject
    cuda/_cuda/ccuda.cpp:65133:57: error: expected primary-expression before 'unsigned'
    65133 |     __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphRetainUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count, __pyx_v_flags);
          |                                                         ^~~~~~~~
    cuda/_cuda/ccuda.cpp:65133:71: error: expected primary-expression before 'unsigned'
    65133 |     __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphRetainUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count, __pyx_v_flags);
          |                                                                       ^~~~~~~~
    cuda/_cuda/ccuda.cpp:65133:85: error: expected ')' before '__pyx_v_4cuda_5_cuda_5ccuda___cuGraphRetainUserObject'
    65133 |     __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphRetainUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count, __pyx_v_flags);
          |                   ~                                                                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          |                                                                                     )
    cuda/_cuda/ccuda.cpp: At global scope:
    cuda/_cuda/ccuda.cpp:65199:94: error: 'CUuserObject' has not been declared
    65199 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuGraphReleaseUserObject(CUgraph __pyx_v_graph, CUuserObject __pyx_v_object, unsigned int __pyx_v_count) {
          |                                                                                              ^~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp: In function 'CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuGraphReleaseUserObject(CUgraph, int, unsigned int)':
    cuda/_cuda/ccuda.cpp:65304:30: error: expected primary-expression before '(' token
    65304 |     __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphReleaseUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count);
          |                              ^
    cuda/_cuda/ccuda.cpp:65304:32: error: expected primary-expression before ')' token
    65304 |     __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphReleaseUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count);
          |                                ^
    cuda/_cuda/ccuda.cpp:65304:41: error: expected primary-expression before ',' token
    65304 |     __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphReleaseUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count);
          |                                         ^
    cuda/_cuda/ccuda.cpp:65304:43: error: 'CUuserObject' was not declared in this scope; did you mean 'CUsurfObject'?
    65304 |     __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphReleaseUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count);
          |                                           ^~~~~~~~~~~~
          |                                           CUsurfObject
    cuda/_cuda/ccuda.cpp:65304:57: error: expected primary-expression before 'unsigned'
    65304 |     __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphReleaseUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count);
          |                                                         ^~~~~~~~
    cuda/_cuda/ccuda.cpp:65304:71: error: expected ')' before '__pyx_v_4cuda_5_cuda_5ccuda___cuGraphReleaseUserObject'
    65304 |     __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphReleaseUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count);
          |                   ~                                                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          |                                                                       )
    cuda/_cuda/ccuda.cpp: At global scope:
    cuda/_cuda/ccuda.cpp:74604:69: error: 'CUmoduleLoadingMode' was not declared in this scope
    74604 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuModuleGetLoadingMode(CUmoduleLoadingMode *__pyx_v_mode) {
          |                                                                     ^~~~~~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp:74604:90: error: '__pyx_v_mode' was not declared in this scope; did you mean '__pyx_k_name'?
    74604 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuModuleGetLoadingMode(CUmoduleLoadingMode *__pyx_v_mode) {
          |                                                                                          ^~~~~~~~~~~~
          |                                                                                          __pyx_k_name
    cuda/_cuda/ccuda.cpp:74775:145: error: 'CUmemRangeHandleType' has not been declared
    74775 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuMemGetHandleForAddressRange(void *__pyx_v_handle, CUdeviceptr __pyx_v_dptr, size_t __pyx_v_size, CUmemRangeHandleType __pyx_v_handleType, unsigned PY_LONG_LONG __pyx_v_flags) {
          |                                                                                                                                                 ^~~~~~~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp: In function 'CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuMemGetHandleForAddressRange(void*, CUdeviceptr, size_t, int, long long unsigned int)':
    cuda/_cuda/ccuda.cpp:74880:30: error: expected primary-expression before '(' token
    74880 |     __pyx_v_err = ((CUresult (*)(void *, CUdeviceptr, size_t, CUmemRangeHandleType, unsigned PY_LONG_LONG))__pyx_v_4cuda_5_cuda_5ccuda___cuMemGetHandleForAddressRange)(__pyx_v_handle, __pyx_v_dptr, __pyx_v_size, __pyx_v_handleType, __pyx_v_flags);
          |                              ^
    cuda/_cuda/ccuda.cpp:74880:32: error: expected primary-expression before ')' token
    74880 |     __pyx_v_err = ((CUresult (*)(void *, CUdeviceptr, size_t, CUmemRangeHandleType, unsigned PY_LONG_LONG))__pyx_v_4cuda_5_cuda_5ccuda___cuMemGetHandleForAddressRange)(__pyx_v_handle, __pyx_v_dptr, __pyx_v_size, __pyx_v_handleType, __pyx_v_flags);
          |                                ^
    cuda/_cuda/ccuda.cpp:74880:34: error: expected primary-expression before 'void'
    74880 |     __pyx_v_err = ((CUresult (*)(void *, CUdeviceptr, size_t, CUmemRangeHandleType, unsigned PY_LONG_LONG))__pyx_v_4cuda_5_cuda_5ccuda___cuMemGetHandleForAddressRange)(__pyx_v_handle, __pyx_v_dptr, __pyx_v_size, __pyx_v_handleType, __pyx_v_flags);
          |                                  ^~~~
    cuda/_cuda/ccuda.cpp:74880:53: error: expected primary-expression before ',' token
    74880 |     __pyx_v_err = ((CUresult (*)(void *, CUdeviceptr, size_t, CUmemRangeHandleType, unsigned PY_LONG_LONG))__pyx_v_4cuda_5_cuda_5ccuda___cuMemGetHandleForAddressRange)(__pyx_v_handle, __pyx_v_dptr, __pyx_v_size, __pyx_v_handleType, __pyx_v_flags);
          |                                                     ^
    cuda/_cuda/ccuda.cpp:74880:61: error: expected primary-expression before ',' token
    74880 |     __pyx_v_err = ((CUresult (*)(void *, CUdeviceptr, size_t, CUmemRangeHandleType, unsigned PY_LONG_LONG))__pyx_v_4cuda_5_cuda_5ccuda___cuMemGetHandleForAddressRange)(__pyx_v_handle, __pyx_v_dptr, __pyx_v_size, __pyx_v_handleType, __pyx_v_flags);
          |                                                             ^
    cuda/_cuda/ccuda.cpp:74880:63: error: 'CUmemRangeHandleType' was not declared in this scope; did you mean 'CUmemHandleType'?
    74880 |     __pyx_v_err = ((CUresult (*)(void *, CUdeviceptr, size_t, CUmemRangeHandleType, unsigned PY_LONG_LONG))__pyx_v_4cuda_5_cuda_5ccuda___cuMemGetHandleForAddressRange)(__pyx_v_handle, __pyx_v_dptr, __pyx_v_size, __pyx_v_handleType, __pyx_v_flags);
          |                                                               ^~~~~~~~~~~~~~~~~~~~
          |                                                               CUmemHandleType
    cuda/_cuda/ccuda.cpp:74880:85: error: expected primary-expression before 'unsigned'
    74880 |     __pyx_v_err = ((CUresult (*)(void *, CUdeviceptr, size_t, CUmemRangeHandleType, unsigned PY_LONG_LONG))__pyx_v_4cuda_5_cuda_5ccuda___cuMemGetHandleForAddressRange)(__pyx_v_handle, __pyx_v_dptr, __pyx_v_size, __pyx_v_handleType, __pyx_v_flags);
          |                                                                                     ^~~~~~~~
    cuda/_cuda/ccuda.cpp:74880:108: error: expected ')' before '__pyx_v_4cuda_5_cuda_5ccuda___cuMemGetHandleForAddressRange'
    74880 |     __pyx_v_err = ((CUresult (*)(void *, CUdeviceptr, size_t, CUmemRangeHandleType, unsigned PY_LONG_LONG))__pyx_v_4cuda_5_cuda_5ccuda___cuMemGetHandleForAddressRange)(__pyx_v_handle, __pyx_v_dptr, __pyx_v_size, __pyx_v_handleType, __pyx_v_flags);
          |                   ~                                                                                        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          |                                                                                                            )
    error: command '/home/ec2-user/anaconda3/envs/tensorflow2_p38/bin/x86_64-conda-linux-gnu-cc' failed with exit status 1
    
    opened by pranavladkat 9
  • nvrtc.nvrtcCompileProgram is changing the preferred encoding from UTF-8 to ANSI_X3.4-1968

    nvrtc.nvrtcCompileProgram is changing the preferred encoding from UTF-8 to ANSI_X3.4-1968

    Dear developers,

    I found out that calling the NVRTC for compilation is changing the preferred encoding for the current Python instance.

    For more details and to reproduce the issue, please refer to this StackOverflow question.

    Do you have an idea on why this happens, and how it is possible to revert the preferred encoding to its original setting?

    Thank you in advance

    opened by redsnic 5
  • Failed to dlopen libcuda.so in WSL environment

    Failed to dlopen libcuda.so in WSL environment

    from cuda import cuda
    cuda.cuInit(0)
    
    ---------------------------------------------------------------------------
    RuntimeError                              Traceback (most recent call last)
    Input In [2], in <module>
    ----> 1 cuda.cuInit(0)
    
    File ~/miniconda3/envs/dev/lib/python3.8/site-packages/cuda/cuda.pyx:8876, in cuda.cuda.cuInit()
    
    File ~/miniconda3/envs/dev/lib/python3.8/site-packages/cuda/ccuda.pyx:17, in cuda.ccuda.cuInit()
    
    File ~/miniconda3/envs/dev/lib/python3.8/site-packages/cuda/_cuda/ccuda.pyx:3553, in cuda._cuda.ccuda._cuInit()
    
    File ~/miniconda3/envs/dev/lib/python3.8/site-packages/cuda/_cuda/ccuda.pyx:424, in cuda._cuda.ccuda.cuPythonInit()
    
    RuntimeError: Failed to dlopen libcuda.so
    

    This is because in a WSL environment libcuda.so lives in /usr/lib/wsl/lib which is not in the default search path of dlopen. For libraries that link against libcuda this isn't a problem because there's a file at /etc/ld.so.conf.d/ld.wsl.conf which instructs the linker as to where it can find the libraries, but unfortunately dlopen doesn't use this.

    As a workaround, adding /usr/lib/wsl/lib to the LD_LIBRARY_PATH environment variable resolves the problem.

    opened by kkraus14 4
  • No module named 'examples'

    No module named 'examples'

    I change directories to try to run some examples.

    (cython) [email protected]:~/Documents/cuda-start-dec2022/cuda-python/examples/0_Introduction$ python clock_nvrtc_test.py
    Traceback (most recent call last):
      File "/home/nyck33/Documents/cuda-start-dec2022/cuda-python/examples/0_Introduction/clock_nvrtc_test.py", line 10, in <module>
        from examples.common import common
    ModuleNotFoundError: No module named 'examples'
    

    What am I doing wrong? I am looking at pypi package called absolufy-imports to try to get this going.

    opened by nyck33 3
  • Adopting a set of

    Adopting a set of "supported" python versions

    Right now the project doesn't have any set of explicitly supported python versions. NEP 29 provides an example of how this can be done:

    All minor versions of Python released 42 months prior to the project, and at minimum the two latest minor versions.

    Minimum Python ... version support should be adjusted upward on [a] major and minor release, but never on a patch release.

    This language also allows forecasting of python versions and forecasting (of some degree) of the resources required to maintain the project due to PEP 602 which normalizes the release schedule of python versions.

    There are at least two areas this practically impacts:

    • Support for version specific issues. Having a specified set of support versions allows some version specific issues to be termed in or out of scope, and be prioritized appropriately.
    • Binary distributions are currently made available on pypi and the nvidia channel of conda-forge, this bounds for which versions of python the binaries are targeted.
    opened by m3vaz 3
  • cudart.cudaSetDevice allocates memory on GPU other than target

    cudart.cudaSetDevice allocates memory on GPU other than target

    cuda-python 11.6.1 cuda toolkit 11.2 Ubuntu Linux

    If you run something like the following on a multi-GPU machine

    device_num = 5
    err, = cuda.cuInit(0)
    err, device = cuda.cuDeviceGet(device_num)
    err, cuda_context = cuda.cuCtxCreate(0, device)
    err, = cudart.cudaSetDevice(device)
    

    The call to cudart.cudaSetDevice will properly set your device to '5', but it will also allocate ~305 MB of memory on device 0 (or whichever is the 0th device in the device list provided by CUDA_VISIBLE_DEVICES). I think this issue (possibly in the C-CUDA runtime underneath?) may possibly be the root of many downstream issues in libraries like Tensorflow and Pytorch who have similar issues where a user selects a device but still gets tons of allocations on other devices. This 305 MB may not sound like a lot, but I'm running a program on an Nvidia-DGX with 16 GPUs and I have 64 worker processes, causing 64*305 = 19GB of unusable space to be allocated on GPU 0, which crashes the program. I cannot simply set CUDA_VISIBLE_DEVICES to correct this problem because the workers are communicating via shared GPU memory (via cuIPCMemHandle) with their parent process, and the parent process needs access to all GPUs. Additionally, the worker processes are performing data augmentation on one GPU, while writing output to another GPU with a different device ID.

    I am trying to investigate a workaround to not call 'cudart.cudaSetDevice' at all, but when it is not called I cannot properly use the pointer given by cuda.cuMemAlloc to create a PyTorch tensor. When I call cudart.cudaSetDevice, I am able to use the pointer properly.

    opened by QuiteAFoxtrot 3
  • Use python exceptions instead of `err, ... =`

    Use python exceptions instead of `err, ... =`

    Congratulations on the GA release! 🥳

    I've been looking forward to the cuda bindings for a while, and was just looking through the docs.

    The overview notes an implementation of ASSERT_DRV, which already contains the caveat:

    In a future release, this may automatically raise exceptions using a Python object model.

    I'm not sure if that means that the errors are going to be subclasses of something like a CUDAError, or if that is to be interpreted some other way, but in any case, I was quite surprised about this choice of exception API

    Why not make the functions raise err by default? Right now, IIUC, every invocation would need to accept an extra err-return (and handle it with something like ASSERT_DRV). This seems like a really onerous task to achieve the default behaviour of "fail in case of something unexpected" (and actively choosing where to introduce try... except: handling to continue even if things fail).

    It seems like a bad trade-off for me (high verbosity, and easy to forget adding an ASSERT_DRV), but maybe I'm overlooking something?

    The reasons I'm raising this right now, is that this would be a pretty fundamental API change, and if there's any chance at all (assuming it's not already "zero" after GA), it would be ASAP.

    opened by h-vetinari 3
  • ERROR  Could not find a version that satisfies the requirement torch==1.12.1+cu113 (from versions: none)

    ERROR Could not find a version that satisfies the requirement torch==1.12.1+cu113 (from versions: none)

    venv "C:\stable-diffusion-webui-master\stable-diffusion-webui-master\venv\Scripts\Python.exe" Python 3.11.0 (main, Oct 24 2022, 18:26:48) [MSC v.1933 64 bit (AMD64)] Commit hash: Installing torch and torchvision Traceback (most recent call last): File "C:\stable-diffusion-webui-master\stable-diffusion-webui-master\launch.py", line 227, in prepare_enviroment() File "C:\stable-diffusion-webui-master\stable-diffusion-webui-master\launch.py", line 150, in prepare_enviroment run(f'"{python}" -m {torch_command}', "Installing torch and torchvision", "Couldn't install torch") File "C:\stable-diffusion-webui-master\stable-diffusion-webui-master\launch.py", line 33, in run raise RuntimeError(message) RuntimeError: Couldn't install torch. Command: "C:\stable-diffusion-webui-master\stable-diffusion-webui-master\venv\Scripts\python.exe" -m pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 --extra-index-url https://download.pytorch.org/whl/cu113 Error code: 1 stdout: Looking in indexes: https://pypi.org/simple, https://download.pytorch.org/whl/cu113

    stderr: ERROR: Could not find a version that satisfies the requirement torch==1.12.1+cu113 (from versions: none) ERROR: No matching distribution found for torch==1.12.1+cu113

    Press any key to continue . . .

    opened by GreatHK 2
  • First base of 'CUkernelNodeAttrValue_v1' is not an extension type

    First base of 'CUkernelNodeAttrValue_v1' is not an extension type

    I'm trying to compile cuda-python in a fairly minimal conda environment (nothing installed but the requirements), with cuda-11.6 installed, and seeing several instances of the following sort of error:

          Error compiling Cython file:
          ------------------------------------------------------------
          ...
                  Get memory address of class instance
    
              """
              pass
    
          cdef class CUkernelNodeAttrValue_v1(CUlaunchAttributeValue_union):
                                             ^
          ------------------------------------------------------------
    
          cuda/cuda.pxd:2637:36: First base of 'CUkernelNodeAttrValue_v1' is not an extension type
    

    I do have other cuda versions installed alongside 11.6 but judging from the output of Parsing headers in "/usr/local/cuda-11.6/include" it seems like it's probably finding the right version? Any advice on how to get past this, or debug it? Thanks!

    opened by bertmaher 2
  • Remove duplicate code in vectorAddMMAP example

    Remove duplicate code in vectorAddMMAP example

    The code to determine granularity is duplicated, immediately after perforing that check, the same code exists. The second entry is being eliminated by this change.

    opened by pentschev 2
  • _ZSt28__throw_bad_array_new_lengthv

    _ZSt28__throw_bad_array_new_lengthv

    ~/cuda-python$ pip install -e .
    Obtaining file:///home/vinuj/cuda-python
    Requirement already satisfied: cython in /home/vinuj/anaconda3/lib/python3.9/site-packages (from cuda-python==11.7.1) (0.29.28)
    Installing collected packages: cuda-python
      Attempting uninstall: cuda-python
        Found existing installation: cuda-python 11.7.1
        Uninstalling cuda-python-11.7.1:
          Successfully uninstalled cuda-python-11.7.1
      Running setup.py develop for cuda-python
    
        from cuda import cuda, cudart
    ImportError: /home/vinuj/cuda-python/cuda/cuda.cpython-39-x86_64-linux-gnu.so: undefined symbol: _ZSt28__throw_bad_array_new_lengthv
    
    
    opened by vinutah 2
  • Dropping Python 3.7

    Dropping Python 3.7

    We're considering dropping support for Python 3.7 for the next release. Per NEP 29, Python 3.7 drop schedule was almost a year ago and many associated libraries have already dropped it.

    Let us know if there's concerns in having Python 3.7 dropped next release. Thanks!

    opened by vzhurba01 0
  • No module named 'cuda._lib'; 'cuda' is not a package

    No module named 'cuda._lib'; 'cuda' is not a package

    After following the steps on cuda-python to install cuda-python with conda instruction, I try to

    from cuda import cuda, nvrtc
    

    as in the example in the pycharm python console, but it raises an error:

    Traceback (most recent call last):
      File "D:\Anaconda\envs\hierot\lib\code.py", line 90, in runcode
        exec(code, self.locals)
      File "<input>", line 1, in <module>
      File "D:\PyCharm Community Edition 2022.1.3\plugins\python-ce\helpers\pydev\_pydev_bundle\pydev_import_hook.py", line 21, in do_import
        module = self._system_import(name, *args, **kwargs)
      File "cuda\cuda.pyx", line 1, in init cuda.cuda
        # Copyright 2021-2022 NVIDIA Corporation.  All rights reserved.
      File "D:\PyCharm Community Edition 2022.1.3\plugins\python-ce\helpers\pydev\_pydev_bundle\pydev_import_hook.py", line 21, in do_import
        module = self._system_import(name, *args, **kwargs)
    ModuleNotFoundError: No module named 'cuda._lib'; 'cuda' is not a package
    

    But the code above can be successfully run in the terminal

    (hierot) D:\Projects\SimPlatform>python
    Python 3.9.13 (main, Aug 25 2022, 23:51:50) [MSC v.1916 64 bit (AMD64)] :: Anaconda, Inc. on win32
    Type "help", "copyright", "credits" or "license" for more information.
    >>> from cuda import cuda, nvrtc
    >>>
    

    Please help me with the problem, thanks in advance. Further information provided on request.

    I searched with

    ModuleNotFoundError: No module named 'xxx'
    

    Solutions suggest configure correct python interpreter, but I believe my interpreter is already properly configured.

    And search with

    No module named 'xxx'; 'yyy' is not a package
    

    Some says the cause is the name cuda is shadowed by the package name cuda, I think it might be the problem. Please check this.

    opened by HIEROT 0
  • Windows: ModuleNotFoundError: No module named 'win32api'

    Windows: ModuleNotFoundError: No module named 'win32api'

    Installing on Windows:

    python -m pip install cuda-python

    Then from python:

    from cuda import cuda

    Fails with

        File "cuda\cuda.pyx", line 1, in init cuda.cuda
    
        File "cuda\ccuda.pyx", line 1, in init cuda.ccuda
    
        File "cuda\_cuda\ccuda.pyx", line 8, in init cuda._cuda.ccuda
    
      ModuleNotFoundError: No module named 'win32api'
    

    I can fix this by installing pypiwin32 manually. But I think it should be listed in requirements.txt if platform_system is Windows.

    Thanks

    opened by ilyasher 0
  • more inference time in cuda env compared to cpu (occured only for a layer)

    more inference time in cuda env compared to cpu (occured only for a layer)

    Dear sir/madam: When I inference on a deep learning model (slowfast model), I'm facing a problem that my python program seems to take more inference time in cuda env compared to cpu. It's not the whole model but one specific layer takes more time on cuda env than cpu. I'm so confused that hope someone can help me with it. Here is the details. the specific layer is "slowway-conv1" layer as showned in the pic below representing the model structure of slowfast. image And my confusing result is as follows. the first for cuda and the second for cpu. image image In cuda env, I found the processing time of "conv1" (0.97s) accounts for a great proportion of the processing time of the whole model (1.04s), while in cpu env, the processing time of "conv1" (0.07s) only accounts for a very small proportion of the processing time of the whole model (4.43s). And I reckon that the proportion in cpu env is reasonable considering the calculation budget. Is my method of time measurement mistaken? I used the following code to measure time cost. image image If it's my fault that causing the confusing result, please kindly point out, or please give me some ideas to help me solve this problem. Thank you very much! Yours, Koala

    opened by koalaaaaaaaaa 0
  • cuda.cudart.cudaRuntimeGetVersion() hard-codes the runtime version, rather than querying the runtime

    cuda.cudart.cudaRuntimeGetVersion() hard-codes the runtime version, rather than querying the runtime

    The current implementation of cuda.cudart.cudaRuntimeGetVersion() hard-codes the runtime version, rather than querying the runtime for its version. This results in incorrect runtime versions if the runtime version is different from the version of cuda-python.

    https://github.com/NVIDIA/cuda-python/blob/746b773c91e1ede708fe9a584b8cdb1c0f32b51d/cuda/_lib/ccudart/ccudart.pyx#L79-L82

    https://github.com/NVIDIA/cuda-python/blob/746b773c91e1ede708fe9a584b8cdb1c0f32b51d/cuda/_lib/ccudart/utils.pyx#L37

    Additional context

    A workaround used in https://github.com/rapidsai/rmm/pull/946 is to use numba's API for this instead:

    import numba.cuda
    
    def cudaRuntimeGetVersion():
        major, minor = numba.cuda.runtime.get_version()
        return major * 1000 + minor * 10
    
    opened by bdice 6
Releases(v12.0.0)
Repo for EchoVPR: Echo State Networks for Visual Place Recognition

EchoVPR Repo for EchoVPR: Echo State Networks for Visual Place Recognition Currently under development Dirs: data: pre-collected hidden representation

Anil Ozdemir 4 Oct 04, 2022
A diff tool for language models

LMdiff Qualitative comparison of large language models. Demo & Paper: http://lmdiff.net LMdiff is a MIT-IBM Watson AI Lab collaboration between: Hendr

Hendrik Strobelt 27 Dec 29, 2022
PyTorch EO aims to make Deep Learning for Earth Observation data easy and accessible to real-world cases and research alike.

Pytorch EO Deep Learning for Earth Observation applications and research. 🚧 This project is in early development, so bugs and breaking changes are ex

earthpulse 28 Aug 25, 2022
A Simple Framwork for CV Pre-training Model (SOCO, VirTex, BEiT)

A Simple Framwork for CV Pre-training Model (SOCO, VirTex, BEiT)

Sense-GVT 14 Jul 07, 2022
ICCV2021 Oral SA-ConvONet: Sign-Agnostic Optimization of Convolutional Occupancy Networks

Sign-Agnostic Convolutional Occupancy Networks Paper | Supplementary | Video | Teaser Video | Project Page This repository contains the implementation

63 Nov 18, 2022
Constrained Logistic Regression - How to apply specific constraints to logistic regression's coefficients

Constrained Logistic Regression Sample implementation of constructing a logistic regression with given ranges on each of the feature's coefficients (v

1 Dec 29, 2021
AsymmetricGAN - Dual Generator Generative Adversarial Networks for Multi-Domain Image-to-Image Translation

AsymmetricGAN for Image-to-Image Translation AsymmetricGAN Framework for Multi-Domain Image-to-Image Translation AsymmetricGAN Framework for Hand Gest

Hao Tang 42 Jan 15, 2022
Deep Watershed Transform for Instance Segmentation

Deep Watershed Transform Performs instance level segmentation detailed in the following paper: Min Bai and Raquel Urtasun, Deep Watershed Transformati

193 Nov 20, 2022
Implementation of C-RNN-GAN.

Implementation of C-RNN-GAN. Publication: Title: C-RNN-GAN: Continuous recurrent neural networks with adversarial training Information: http://mogren.

Olof Mogren 427 Dec 25, 2022
Personalized Transfer of User Preferences for Cross-domain Recommendation (PTUPCDR)

This is the official implementation of our paper Personalized Transfer of User Preferences for Cross-domain Recommendation (PTUPCDR), which has been accepted by WSDM2022.

Yongchun Zhu 81 Dec 29, 2022
The source codes for TME-BNA: Temporal Motif-Preserving Network Embedding with Bicomponent Neighbor Aggregation.

TME The source codes for TME-BNA: Temporal Motif-Preserving Network Embedding with Bicomponent Neighbor Aggregation. Our implementation is based on TG

2 Feb 10, 2022
Deep Learning and Logical Reasoning from Data and Knowledge

Logic Tensor Networks (LTN) Logic Tensor Network (LTN) is a neurosymbolic framework that supports querying, learning and reasoning with both rich data

171 Dec 29, 2022
Yolo Traffic Light Detection With Python

Yolo-Traffic-Light-Detection This project is based on detecting the Traffic light. Pretained data is used. This application entertained both real time

Ananta Raj Pant 2 Aug 08, 2022
[CVPR'2020] DeepDeform: Learning Non-rigid RGB-D Reconstruction with Semi-supervised Data

DeepDeform (CVPR'2020) DeepDeform is an RGB-D video dataset containing over 390,000 RGB-D frames in 400 videos, with 5,533 optical and scene flow imag

Aljaz Bozic 165 Jan 09, 2023
Liecasadi - liecasadi implements Lie groups operation written in CasADi

liecasadi liecasadi implements Lie groups operation written in CasADi, mainly di

Artificial and Mechanical Intelligence 14 Nov 05, 2022
A tutorial on DataFrames.jl prepared for JuliaCon2021

JuliaCon2021 DataFrames.jl Tutorial This is a tutorial on DataFrames.jl prepared for JuliaCon2021. A video recording of the tutorial is available here

Bogumił Kamiński 106 Jan 09, 2023
Pytorch Implementation of Neural Analysis and Synthesis: Reconstructing Speech from Self-Supervised Representations

NANSY: Unofficial Pytorch Implementation of Neural Analysis and Synthesis: Reconstructing Speech from Self-Supervised Representations Notice Papers' D

Dongho Choi 최동호 104 Dec 23, 2022
Stream images from a connected camera over MQTT, view using Streamlit, record to file and sqlite

mqtt-camera-streamer Summary: Publish frames from a connected camera or MJPEG/RTSP stream to an MQTT topic, and view the feed in a browser on another

Robin Cole 183 Dec 16, 2022
PyTorch implementation of DirectCLR from paper Understanding Dimensional Collapse in Contrastive Self-supervised Learning

DirectCLR DirectCLR is a simple contrastive learning model for visual representation learning. It does not require a trainable projector as SimCLR. It

Meta Research 49 Dec 21, 2022
Hcaptcha-challenger - Gracefully face hCaptcha challenge with Yolov5(ONNX) embedded solution

hCaptcha Challenger 🚀 Gracefully face hCaptcha challenge with Yolov5(ONNX) embe

593 Jan 03, 2023