CUDA Python Low-level Bindings

Overview

CUDA-Python

Building

Requirements

Dependencies of the CUDA-Python bindings and some versions that are known to work are as follows:

  • Driver: Linux (450.80.02 or later) Windows(456.38 or later)
  • CUDA Toolkit 11.0 to 11.4 - e.g. 11.4.48
  • Cython - e.g. 0.29.21
  • Versioneer - e.g. 0.20

Compilation

To compile the extension in-place, run:

python setup.py build_ext --inplace

To compile for debugging the extension modules with gdb, pass the --debug argument to setup.py.

Develop installation

You can use

python setup.py develop

to use the module in-place in your current Python environment (e.g. for testing of porting other libraries to use the binding).

Build the Docs

conda env create -f docs/environment-docs.yml
conda activate cuda-python-docs

Then compile and install cuda-python following the steps above.

cd docs
make html
open build/html/index.html

Build the Docs

conda env create -f docs_src/environment-docs.yml
conda activate cuda-python-docs

Then compile and install cuda-python following the steps above.

cd docs_src
make html
open build/html/index.html

Publish the Docs

git checkout gh-pages
cd docs_src
make html
cp -a build/html/. ../docs/

Testing

Requirements

Dependencies of the test execution and some versions that are known to work are as follows:

  • numpy-1.19.5
  • numba-0.53.1
  • matplotlib-3.3.4
  • scipy-1.6.3
  • pytest-benchmark-3.4.1

Unit-tests

You can run the included tests with:

pytest

Samples

You can run the included tests with:

pytest examples

Benchmark

You can run benchmark only tests with:

pytest --benchmark-only

Examples

The included examples are:

  • examples/extra/jit_program.py: Demonstrates the use of the API to compile and launch a kernel on the device. Includes device memory allocation / deallocation, transfers between host and device, creation and usage of streams, and context management.
  • examples/extra/numba_emm_plugin.py: Implements a Numba External Memory Management plugin, showing that this CUDA Python Driver API can coexist with other wrappers of the driver API.
Comments
  • Fails to build on AmazonLinux

    Fails to build on AmazonLinux

    Hi,

    I checked out the package and tried to build it on AmazonLinux but it fails to compile. Please see the build output below. I also tried all other commands there were mentioned in installation guide, but all failed with the same issue.

    Cuda : 11.2 GCC: 9.3

    $ python setup.py build
    Compiling cuda/_cuda/ccuda.pyx because it changed.
    Compiling cuda/_cuda/cnvrtc.pyx because it changed.
    [1/2] Cythonizing cuda/_cuda/ccuda.pyx
    [2/2] Cythonizing cuda/_cuda/cnvrtc.pyx
    Compiling cuda/_lib/utils.pyx because it changed.
    [1/1] Cythonizing cuda/_lib/utils.pyx
    Compiling cuda/_lib/ccudart/ccudart.pyx because it changed.
    Compiling cuda/_lib/ccudart/utils.pyx because it changed.
    [1/2] Cythonizing cuda/_lib/ccudart/ccudart.pyx
    [2/2] Cythonizing cuda/_lib/ccudart/utils.pyx
    Compiling cuda/ccuda.pyx because it changed.
    Compiling cuda/ccudart.pyx because it changed.
    Compiling cuda/cnvrtc.pyx because it changed.
    Compiling cuda/cuda.pyx because it changed.
    Compiling cuda/cudart.pyx because it changed.
    Compiling cuda/nvrtc.pyx because it changed.
    [1/6] Cythonizing cuda/ccuda.pyx
    [2/6] Cythonizing cuda/ccudart.pyx
    [3/6] Cythonizing cuda/cnvrtc.pyx
    [4/6] Cythonizing cuda/cuda.pyx
    [5/6] Cythonizing cuda/cudart.pyx
    [6/6] Cythonizing cuda/nvrtc.pyx
    Compiling cuda/tests/test_ccuda.pyx because it changed.
    Compiling cuda/tests/test_ccudart.pyx because it changed.
    Compiling cuda/tests/test_interoperability_cython.pyx because it changed.
    [1/3] Cythonizing cuda/tests/test_ccuda.pyx
    [2/3] Cythonizing cuda/tests/test_ccudart.pyx
    [3/3] Cythonizing cuda/tests/test_interoperability_cython.pyx
    running build
    running build_py
    creating build
    creating build/lib.linux-x86_64-3.8
    creating build/lib.linux-x86_64-3.8/cuda
    copying cuda/__init__.py -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/_version.py -> build/lib.linux-x86_64-3.8/cuda
    creating build/lib.linux-x86_64-3.8/cuda/_cuda
    copying cuda/_cuda/__init__.py -> build/lib.linux-x86_64-3.8/cuda/_cuda
    creating build/lib.linux-x86_64-3.8/cuda/_lib
    copying cuda/_lib/__init__.py -> build/lib.linux-x86_64-3.8/cuda/_lib
    creating build/lib.linux-x86_64-3.8/cuda/benchmarks
    copying cuda/benchmarks/__init__.py -> build/lib.linux-x86_64-3.8/cuda/benchmarks
    copying cuda/benchmarks/kernels.py -> build/lib.linux-x86_64-3.8/cuda/benchmarks
    copying cuda/benchmarks/perf_test_utils.py -> build/lib.linux-x86_64-3.8/cuda/benchmarks
    copying cuda/benchmarks/test_cupy.py -> build/lib.linux-x86_64-3.8/cuda/benchmarks
    copying cuda/benchmarks/test_launch_latency.py -> build/lib.linux-x86_64-3.8/cuda/benchmarks
    copying cuda/benchmarks/test_numba.py -> build/lib.linux-x86_64-3.8/cuda/benchmarks
    copying cuda/benchmarks/test_pointer_attributes.py -> build/lib.linux-x86_64-3.8/cuda/benchmarks
    creating build/lib.linux-x86_64-3.8/cuda/tests
    copying cuda/tests/__init__.py -> build/lib.linux-x86_64-3.8/cuda/tests
    copying cuda/tests/test_cuda.py -> build/lib.linux-x86_64-3.8/cuda/tests
    copying cuda/tests/test_cudart.py -> build/lib.linux-x86_64-3.8/cuda/tests
    copying cuda/tests/test_cython.py -> build/lib.linux-x86_64-3.8/cuda/tests
    copying cuda/tests/test_interoperability.py -> build/lib.linux-x86_64-3.8/cuda/tests
    copying cuda/tests/test_kernelParams.py -> build/lib.linux-x86_64-3.8/cuda/tests
    copying cuda/tests/test_nvrtc.py -> build/lib.linux-x86_64-3.8/cuda/tests
    creating build/lib.linux-x86_64-3.8/cuda/_lib/ccudart
    copying cuda/_lib/ccudart/__init__.py -> build/lib.linux-x86_64-3.8/cuda/_lib/ccudart
    copying cuda/__init__.pxd -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/ccuda.pxd -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/ccudart.pxd -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/cnvrtc.pxd -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/cuda.pxd -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/cudart.pxd -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/nvrtc.pxd -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/ccuda.pyx -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/ccudart.pyx -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/cnvrtc.pyx -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/cuda.pyx -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/cudart.pyx -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/nvrtc.pyx -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/ccuda.cpp -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/ccudart.cpp -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/cnvrtc.cpp -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/cuda.cpp -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/cudart.cpp -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/nvrtc.cpp -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/_cuda/ccuda.pxd -> build/lib.linux-x86_64-3.8/cuda/_cuda
    copying cuda/_cuda/cnvrtc.pxd -> build/lib.linux-x86_64-3.8/cuda/_cuda
    copying cuda/_cuda/loader.pxd -> build/lib.linux-x86_64-3.8/cuda/_cuda
    copying cuda/_cuda/ccuda.pyx -> build/lib.linux-x86_64-3.8/cuda/_cuda
    copying cuda/_cuda/cnvrtc.pyx -> build/lib.linux-x86_64-3.8/cuda/_cuda
    copying cuda/_cuda/loader.h -> build/lib.linux-x86_64-3.8/cuda/_cuda
    copying cuda/_cuda/loader.cpp -> build/lib.linux-x86_64-3.8/cuda/_cuda
    copying cuda/_cuda/ccuda.cpp -> build/lib.linux-x86_64-3.8/cuda/_cuda
    copying cuda/_cuda/cnvrtc.cpp -> build/lib.linux-x86_64-3.8/cuda/_cuda
    copying cuda/_lib/dlfcn.pxd -> build/lib.linux-x86_64-3.8/cuda/_lib
    copying cuda/_lib/param_packer.pxd -> build/lib.linux-x86_64-3.8/cuda/_lib
    copying cuda/_lib/utils.pxd -> build/lib.linux-x86_64-3.8/cuda/_lib
    copying cuda/_lib/utils.pyx -> build/lib.linux-x86_64-3.8/cuda/_lib
    copying cuda/_lib/param_packer.h -> build/lib.linux-x86_64-3.8/cuda/_lib
    copying cuda/_lib/param_packer.cpp -> build/lib.linux-x86_64-3.8/cuda/_lib
    copying cuda/_lib/utils.cpp -> build/lib.linux-x86_64-3.8/cuda/_lib
    copying cuda/tests/test_ccuda.pyx -> build/lib.linux-x86_64-3.8/cuda/tests
    copying cuda/tests/test_ccudart.pyx -> build/lib.linux-x86_64-3.8/cuda/tests
    copying cuda/tests/test_interoperability_cython.pyx -> build/lib.linux-x86_64-3.8/cuda/tests
    copying cuda/tests/test_ccuda.cpp -> build/lib.linux-x86_64-3.8/cuda/tests
    copying cuda/tests/test_ccudart.cpp -> build/lib.linux-x86_64-3.8/cuda/tests
    copying cuda/tests/test_interoperability_cython.cpp -> build/lib.linux-x86_64-3.8/cuda/tests
    copying cuda/_lib/ccudart/ccudart.pxd -> build/lib.linux-x86_64-3.8/cuda/_lib/ccudart
    copying cuda/_lib/ccudart/utils.pxd -> build/lib.linux-x86_64-3.8/cuda/_lib/ccudart
    copying cuda/_lib/ccudart/ccudart.pyx -> build/lib.linux-x86_64-3.8/cuda/_lib/ccudart
    copying cuda/_lib/ccudart/utils.pyx -> build/lib.linux-x86_64-3.8/cuda/_lib/ccudart
    copying cuda/_lib/ccudart/ccudart.cpp -> build/lib.linux-x86_64-3.8/cuda/_lib/ccudart
    copying cuda/_lib/ccudart/utils.cpp -> build/lib.linux-x86_64-3.8/cuda/_lib/ccudart
    UPDATING build/lib.linux-x86_64-3.8/cuda/_version.py
    set build/lib.linux-x86_64-3.8/cuda/_version.py to '11.7.1'
    running build_ext
    building 'cuda._cuda.ccuda' extension
    creating build/temp.linux-x86_64-3.8
    creating build/temp.linux-x86_64-3.8/cuda
    creating build/temp.linux-x86_64-3.8/cuda/_cuda
    /home/ec2-user/anaconda3/envs/tensorflow2_p38/bin/x86_64-conda-linux-gnu-cc -Wno-unused-result -Wsign-compare -DNDEBUG -fwrapv -O2 -Wall -Wstrict-prototypes -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -pipe -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -pipe -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /home/ec2-user/anaconda3/envs/tensorflow2_p38/include -DNDEBUG -D_FORTIFY_SOURCE=2 -O2 -isystem /home/ec2-user/anaconda3/envs/tensorflow2_p38/include -fPIC -I./cuda -I./cuda/_cuda -I/home/ec2-user/anaconda3/envs/tensorflow2_p38/include -I/usr/local/cuda-11.2/include -I/home/ec2-user/anaconda3/envs/tensorflow2_p38/include/python3.8 -c cuda/_cuda/ccuda.cpp -o build/temp.linux-x86_64-3.8/cuda/_cuda/ccuda.o -std=c++14 -fpermissive -Wno-deprecated-declarations -D _GLIBCXX_ASSERTIONS -fno-var-tracking-assignments -O3
    cc1plus: warning: command line option '-Wstrict-prototypes' is valid for C/ObjC but not for C++
    cuda/_cuda/ccuda.cpp: In function 'int __pyx_f_4cuda_5_cuda_5ccuda_cuPythonInit()':
    cuda/_cuda/ccuda.cpp:4202:138: error: 'CU_GET_PROC_ADDRESS_PER_THREAD_DEFAULT_STREAM' was not declared in this scope
     4202 |         __pyx_t_8 = __pyx_f_4cuda_5ccuda_cuGetProcAddress(((char const *)"cuMemcpy"), (&__pyx_v_4cuda_5_cuda_5ccuda___cuMemcpy), 0x1B58, CU_GET_PROC_ADDRESS_PER_THREAD_DEFAULT_STREAM); if (unlikely(__pyx_t_8 == ((CUresult)CUDA_ERROR_NOT_FOUND) && __Pyx_ErrOccurredWithGIL())) __PYX_ERR(0, 836, __pyx_L4_error)
          |                                                                                                                                          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp:4924:137: error: 'CU_GET_PROC_ADDRESS_DEFAULT' was not declared in this scope
     4924 |         __pyx_t_8 = __pyx_f_4cuda_5ccuda_cuGetProcAddress(((char const *)"cuMemcpy"), (&__pyx_v_4cuda_5_cuda_5ccuda___cuMemcpy), 0xFA0, CU_GET_PROC_ADDRESS_DEFAULT); if (unlikely(__pyx_t_8 == ((CUresult)CUDA_ERROR_NOT_FOUND) && __Pyx_ErrOccurredWithGIL())) __PYX_ERR(0, 917, __pyx_L4_error)
          |                                                                                                                                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp:5637:152: error: 'CU_GET_PROC_ADDRESS_DEFAULT' was not declared in this scope
     5637 |       __pyx_t_8 = __pyx_f_4cuda_5ccuda_cuGetProcAddress(((char const *)"cuGetErrorString"), (&__pyx_v_4cuda_5_cuda_5ccuda___cuGetErrorString), 0x1770, CU_GET_PROC_ADDRESS_DEFAULT); if (unlikely(__pyx_t_8 == ((CUresult)CUDA_ERROR_NOT_FOUND) && __Pyx_ErrOccurredWithGIL())) __PYX_ERR(0, 997, __pyx_L4_error)
          |                                                                                                                                                        ^~~~~~~~~~~~~~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp: At global scope:
    cuda/_cuda/ccuda.cpp:15609:73: error: 'CUflushGPUDirectRDMAWritesTarget' was not declared in this scope
    15609 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuFlushGPUDirectRDMAWrites(CUflushGPUDirectRDMAWritesTarget __pyx_v_target, CUflushGPUDirectRDMAWritesScope __pyx_v_scope) {
          |                                                                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp:15609:122: error: 'CUflushGPUDirectRDMAWritesScope' was not declared in this scope
    15609 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuFlushGPUDirectRDMAWrites(CUflushGPUDirectRDMAWritesTarget __pyx_v_target, CUflushGPUDirectRDMAWritesScope __pyx_v_scope) {
          |                                                                                                                          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp:15609:167: warning: expression list treated as compound expression in initializer [-fpermissive]
    15609 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuFlushGPUDirectRDMAWrites(CUflushGPUDirectRDMAWritesTarget __pyx_v_target, CUflushGPUDirectRDMAWritesScope __pyx_v_scope) {
          |                                                                                                                                                                       ^
    cuda/_cuda/ccuda.cpp:16977:94: error: 'CUexecAffinityType' has not been declared
    16977 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuDeviceGetExecAffinitySupport(int *__pyx_v_pi, CUexecAffinityType __pyx_v_typename, CUdevice __pyx_v_dev) {
          |                                                                                              ^~~~~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp: In function 'CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuDeviceGetExecAffinitySupport(int*, int, CUdevice)':
    cuda/_cuda/ccuda.cpp:17082:30: error: expected primary-expression before '(' token
    17082 |     __pyx_v_err = ((CUresult (*)(int *, CUexecAffinityType, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuDeviceGetExecAffinitySupport)(__pyx_v_pi, __pyx_v_typename, __pyx_v_dev);
          |                              ^
    cuda/_cuda/ccuda.cpp:17082:32: error: expected primary-expression before ')' token
    17082 |     __pyx_v_err = ((CUresult (*)(int *, CUexecAffinityType, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuDeviceGetExecAffinitySupport)(__pyx_v_pi, __pyx_v_typename, __pyx_v_dev);
          |                                ^
    cuda/_cuda/ccuda.cpp:17082:34: error: expected primary-expression before 'int'
    17082 |     __pyx_v_err = ((CUresult (*)(int *, CUexecAffinityType, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuDeviceGetExecAffinitySupport)(__pyx_v_pi, __pyx_v_typename, __pyx_v_dev);
          |                                  ^~~
    cuda/_cuda/ccuda.cpp:17082:41: error: 'CUexecAffinityType' was not declared in this scope
    17082 |     __pyx_v_err = ((CUresult (*)(int *, CUexecAffinityType, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuDeviceGetExecAffinitySupport)(__pyx_v_pi, __pyx_v_typename, __pyx_v_dev);
          |                                         ^~~~~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp:17082:69: error: expected primary-expression before ')' token
    17082 |     __pyx_v_err = ((CUresult (*)(int *, CUexecAffinityType, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuDeviceGetExecAffinitySupport)(__pyx_v_pi, __pyx_v_typename, __pyx_v_dev);
          |                                                                     ^
    cuda/_cuda/ccuda.cpp:17082:71: error: expected ')' before '__pyx_v_4cuda_5_cuda_5ccuda___cuDeviceGetExecAffinitySupport'
    17082 |     __pyx_v_err = ((CUresult (*)(int *, CUexecAffinityType, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuDeviceGetExecAffinitySupport)(__pyx_v_pi, __pyx_v_typename, __pyx_v_dev);
          |                   ~                                                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          |                                                                       )
    cuda/_cuda/ccuda.cpp: At global scope:
    cuda/_cuda/ccuda.cpp:17319:86: error: 'CUexecAffinityParam' has not been declared
    17319 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuCtxCreate_v3(CUcontext *__pyx_v_pctx, CUexecAffinityParam *__pyx_v_paramsArray, int __pyx_v_numParams, unsigned int __pyx_v_flags, CUdevice __pyx_v_dev) {
          |                                                                                      ^~~~~~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp: In function 'CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuCtxCreate_v3(CUctx_st**, int*, int, unsigned int, CUdevice)':
    cuda/_cuda/ccuda.cpp:17424:30: error: expected primary-expression before '(' token
    17424 |     __pyx_v_err = ((CUresult (*)(CUcontext *, CUexecAffinityParam *, int, unsigned int, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuCtxCreate_v3)(__pyx_v_pctx, __pyx_v_paramsArray, __pyx_v_numParams, __pyx_v_flags, __pyx_v_dev);
          |                              ^
    cuda/_cuda/ccuda.cpp:17424:32: error: expected primary-expression before ')' token
    17424 |     __pyx_v_err = ((CUresult (*)(CUcontext *, CUexecAffinityParam *, int, unsigned int, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuCtxCreate_v3)(__pyx_v_pctx, __pyx_v_paramsArray, __pyx_v_numParams, __pyx_v_flags, __pyx_v_dev);
          |                                ^
    cuda/_cuda/ccuda.cpp:17424:44: error: expected primary-expression before '*' token
    17424 |     __pyx_v_err = ((CUresult (*)(CUcontext *, CUexecAffinityParam *, int, unsigned int, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuCtxCreate_v3)(__pyx_v_pctx, __pyx_v_paramsArray, __pyx_v_numParams, __pyx_v_flags, __pyx_v_dev);
          |                                            ^
    cuda/_cuda/ccuda.cpp:17424:45: error: expected primary-expression before ',' token
    17424 |     __pyx_v_err = ((CUresult (*)(CUcontext *, CUexecAffinityParam *, int, unsigned int, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuCtxCreate_v3)(__pyx_v_pctx, __pyx_v_paramsArray, __pyx_v_numParams, __pyx_v_flags, __pyx_v_dev);
          |                                             ^
    cuda/_cuda/ccuda.cpp:17424:47: error: 'CUexecAffinityParam' was not declared in this scope
    17424 |     __pyx_v_err = ((CUresult (*)(CUcontext *, CUexecAffinityParam *, int, unsigned int, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuCtxCreate_v3)(__pyx_v_pctx, __pyx_v_paramsArray, __pyx_v_numParams, __pyx_v_flags, __pyx_v_dev);
          |                                               ^~~~~~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp:17424:68: error: expected primary-expression before ',' token
    17424 |     __pyx_v_err = ((CUresult (*)(CUcontext *, CUexecAffinityParam *, int, unsigned int, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuCtxCreate_v3)(__pyx_v_pctx, __pyx_v_paramsArray, __pyx_v_numParams, __pyx_v_flags, __pyx_v_dev);
          |                                                                    ^
    cuda/_cuda/ccuda.cpp:17424:70: error: expected primary-expression before 'int'
    17424 |     __pyx_v_err = ((CUresult (*)(CUcontext *, CUexecAffinityParam *, int, unsigned int, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuCtxCreate_v3)(__pyx_v_pctx, __pyx_v_paramsArray, __pyx_v_numParams, __pyx_v_flags, __pyx_v_dev);
          |                                                                      ^~~
    cuda/_cuda/ccuda.cpp:17424:75: error: expected primary-expression before 'unsigned'
    17424 |     __pyx_v_err = ((CUresult (*)(CUcontext *, CUexecAffinityParam *, int, unsigned int, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuCtxCreate_v3)(__pyx_v_pctx, __pyx_v_paramsArray, __pyx_v_numParams, __pyx_v_flags, __pyx_v_dev);
          |                                                                           ^~~~~~~~
    cuda/_cuda/ccuda.cpp:17424:97: error: expected primary-expression before ')' token
    17424 |     __pyx_v_err = ((CUresult (*)(CUcontext *, CUexecAffinityParam *, int, unsigned int, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuCtxCreate_v3)(__pyx_v_pctx, __pyx_v_paramsArray, __pyx_v_numParams, __pyx_v_flags, __pyx_v_dev);
          |                                                                                                 ^
    cuda/_cuda/ccuda.cpp:17424:99: error: expected ')' before '__pyx_v_4cuda_5_cuda_5ccuda___cuCtxCreate_v3'
    17424 |     __pyx_v_err = ((CUresult (*)(CUcontext *, CUexecAffinityParam *, int, unsigned int, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuCtxCreate_v3)(__pyx_v_pctx, __pyx_v_paramsArray, __pyx_v_numParams, __pyx_v_flags, __pyx_v_dev);
          |                   ~                                                                               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          |                                                                                                   )
    cuda/_cuda/ccuda.cpp: At global scope:
    cuda/_cuda/ccuda.cpp:20397:67: error: 'CUexecAffinityParam' was not declared in this scope
    20397 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuCtxGetExecAffinity(CUexecAffinityParam *__pyx_v_pExecAffinity, CUexecAffinityType __pyx_v_typename) {
          |                                                                   ^~~~~~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp:20397:88: error: '__pyx_v_pExecAffinity' was not declared in this scope
    20397 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuCtxGetExecAffinity(CUexecAffinityParam *__pyx_v_pExecAffinity, CUexecAffinityType __pyx_v_typename) {
          |                                                                                        ^~~~~~~~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp:20397:111: error: 'CUexecAffinityType' was not declared in this scope
    20397 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuCtxGetExecAffinity(CUexecAffinityParam *__pyx_v_pExecAffinity, CUexecAffinityType __pyx_v_typename) {
          |                                                                                                               ^~~~~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp:20397:146: warning: expression list treated as compound expression in initializer [-fpermissive]
    20397 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuCtxGetExecAffinity(CUexecAffinityParam *__pyx_v_pExecAffinity, CUexecAffinityType __pyx_v_typename) {
          |                                                                                                                                                  ^
    cuda/_cuda/ccuda.cpp:33564:75: error: 'CUDA_ARRAY_MEMORY_REQUIREMENTS' was not declared in this scope
    33564 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuArrayGetMemoryRequirements(CUDA_ARRAY_MEMORY_REQUIREMENTS *__pyx_v_memoryRequirements, CUarray __pyx_v_array, CUdevice __pyx_v_device) {
          |                                                                           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp:33564:107: error: '__pyx_v_memoryRequirements' was not declared in this scope
    33564 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuArrayGetMemoryRequirements(CUDA_ARRAY_MEMORY_REQUIREMENTS *__pyx_v_memoryRequirements, CUarray __pyx_v_array, CUdevice __pyx_v_device) {
          |                                                                                                           ^~~~~~~~~~~~~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp:33564:143: error: expected primary-expression before '__pyx_v_array'
    33564 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuArrayGetMemoryRequirements(CUDA_ARRAY_MEMORY_REQUIREMENTS *__pyx_v_memoryRequirements, CUarray __pyx_v_array, CUdevice __pyx_v_device) {
          |                                                                                                                                               ^~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp:33564:167: error: expected primary-expression before '__pyx_v_device'
    33564 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuArrayGetMemoryRequirements(CUDA_ARRAY_MEMORY_REQUIREMENTS *__pyx_v_memoryRequirements, CUarray __pyx_v_array, CUdevice __pyx_v_device) {
    ---
    truncated due to git issue limit
    ---
    cuda/_cuda/ccuda.cpp:58806:44: error: 'CUgraphMem_attribute' was not declared in this scope
    58806 |     __pyx_v_err = ((CUresult (*)(CUdevice, CUgraphMem_attribute, void *))__pyx_v_4cuda_5_cuda_5ccuda___cuDeviceSetGraphMemAttribute)(__pyx_v_device, __pyx_v_attr, __pyx_v_value);
          |                                            ^~~~~~~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp:58806:66: error: expected primary-expression before 'void'
    58806 |     __pyx_v_err = ((CUresult (*)(CUdevice, CUgraphMem_attribute, void *))__pyx_v_4cuda_5_cuda_5ccuda___cuDeviceSetGraphMemAttribute)(__pyx_v_device, __pyx_v_attr, __pyx_v_value);
          |                                                                  ^~~~
    cuda/_cuda/ccuda.cpp:58806:74: error: expected ')' before '__pyx_v_4cuda_5_cuda_5ccuda___cuDeviceSetGraphMemAttribute'
    58806 |     __pyx_v_err = ((CUresult (*)(CUdevice, CUgraphMem_attribute, void *))__pyx_v_4cuda_5_cuda_5ccuda___cuDeviceSetGraphMemAttribute)(__pyx_v_device, __pyx_v_attr, __pyx_v_value);
          |                   ~                                                      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          |                                                                          )
    cuda/_cuda/ccuda.cpp: At global scope:
    cuda/_cuda/ccuda.cpp:64515:65: error: 'CUuserObject' was not declared in this scope; did you mean 'CUsurfObject'?
    64515 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuUserObjectCreate(CUuserObject *__pyx_v_object_out, void *__pyx_v_ptr, CUhostFn __pyx_v_destroy, unsigned int __pyx_v_initialRefcount, unsigned int __pyx_v_flags) {
          |                                                                 ^~~~~~~~~~~~
          |                                                                 CUsurfObject
    cuda/_cuda/ccuda.cpp:64515:79: error: '__pyx_v_object_out' was not declared in this scope
    64515 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuUserObjectCreate(CUuserObject *__pyx_v_object_out, void *__pyx_v_ptr, CUhostFn __pyx_v_destroy, unsigned int __pyx_v_initialRefcount, unsigned int __pyx_v_flags) {
          |                                                                               ^~~~~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp:64515:99: error: expected primary-expression before 'void'
    64515 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuUserObjectCreate(CUuserObject *__pyx_v_object_out, void *__pyx_v_ptr, CUhostFn __pyx_v_destroy, unsigned int __pyx_v_initialRefcount, unsigned int __pyx_v_flags) {
          |                                                                                                   ^~~~
    cuda/_cuda/ccuda.cpp:64515:127: error: expected primary-expression before '__pyx_v_destroy'
    64515 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuUserObjectCreate(CUuserObject *__pyx_v_object_out, void *__pyx_v_ptr, CUhostFn __pyx_v_destroy, unsigned int __pyx_v_initialRefcount, unsigned int __pyx_v_flags) {
          |                                                                                                                               ^~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp:64515:144: error: expected primary-expression before 'unsigned'
    64515 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuUserObjectCreate(CUuserObject *__pyx_v_object_out, void *__pyx_v_ptr, CUhostFn __pyx_v_destroy, unsigned int __pyx_v_initialRefcount, unsigned int __pyx_v_flags) {
          |                                                                                                                                                ^~~~~~~~
    cuda/_cuda/ccuda.cpp:64515:182: error: expected primary-expression before 'unsigned'
    64515 | atic CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuUserObjectCreate(CUuserObject *__pyx_v_object_out, void *__pyx_v_ptr, CUhostFn __pyx_v_destroy, unsigned int __pyx_v_initialRefcount, unsigned int __pyx_v_flags) {
          |                                                                                                                                                                                    ^~~~~~~~
    
    cuda/_cuda/ccuda.cpp:64515:208: warning: expression list treated as compound expression in initializer [-fpermissive]
    64515 | a_5_cuda_5ccuda__cuUserObjectCreate(CUuserObject *__pyx_v_object_out, void *__pyx_v_ptr, CUhostFn __pyx_v_destroy, unsigned int __pyx_v_initialRefcount, unsigned int __pyx_v_flags) {
          |                                                                                                                                                                                    ^
    
    cuda/_cuda/ccuda.cpp:64686:65: error: 'CUuserObject' was not declared in this scope; did you mean 'CUsurfObject'?
    64686 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuUserObjectRetain(CUuserObject __pyx_v_object, unsigned int __pyx_v_count) {
          |                                                                 ^~~~~~~~~~~~
          |                                                                 CUsurfObject
    cuda/_cuda/ccuda.cpp:64686:94: error: expected primary-expression before 'unsigned'
    64686 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuUserObjectRetain(CUuserObject __pyx_v_object, unsigned int __pyx_v_count) {
          |                                                                                              ^~~~~~~~
    cuda/_cuda/ccuda.cpp:64686:120: warning: expression list treated as compound expression in initializer [-fpermissive]
    64686 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuUserObjectRetain(CUuserObject __pyx_v_object, unsigned int __pyx_v_count) {
          |                                                                                                                        ^
    cuda/_cuda/ccuda.cpp:64857:66: error: 'CUuserObject' was not declared in this scope; did you mean 'CUsurfObject'?
    64857 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuUserObjectRelease(CUuserObject __pyx_v_object, unsigned int __pyx_v_count) {
          |                                                                  ^~~~~~~~~~~~
          |                                                                  CUsurfObject
    cuda/_cuda/ccuda.cpp:64857:95: error: expected primary-expression before 'unsigned'
    64857 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuUserObjectRelease(CUuserObject __pyx_v_object, unsigned int __pyx_v_count) {
          |                                                                                               ^~~~~~~~
    cuda/_cuda/ccuda.cpp:64857:121: warning: expression list treated as compound expression in initializer [-fpermissive]
    64857 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuUserObjectRelease(CUuserObject __pyx_v_object, unsigned int __pyx_v_count) {
          |                                                                                                                         ^
    cuda/_cuda/ccuda.cpp:65028:93: error: 'CUuserObject' has not been declared
    65028 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuGraphRetainUserObject(CUgraph __pyx_v_graph, CUuserObject __pyx_v_object, unsigned int __pyx_v_count, unsigned int __pyx_v_flags) {
          |                                                                                             ^~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp: In function 'CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuGraphRetainUserObject(CUgraph, int, unsigned int, unsigned int)':
    cuda/_cuda/ccuda.cpp:65133:30: error: expected primary-expression before '(' token
    65133 |     __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphRetainUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count, __pyx_v_flags);
          |                              ^
    cuda/_cuda/ccuda.cpp:65133:32: error: expected primary-expression before ')' token
    65133 |     __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphRetainUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count, __pyx_v_flags);
          |                                ^
    cuda/_cuda/ccuda.cpp:65133:41: error: expected primary-expression before ',' token
    65133 |     __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphRetainUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count, __pyx_v_flags);
          |                                         ^
    cuda/_cuda/ccuda.cpp:65133:43: error: 'CUuserObject' was not declared in this scope; did you mean 'CUsurfObject'?
    65133 |     __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphRetainUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count, __pyx_v_flags);
          |                                           ^~~~~~~~~~~~
          |                                           CUsurfObject
    cuda/_cuda/ccuda.cpp:65133:57: error: expected primary-expression before 'unsigned'
    65133 |     __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphRetainUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count, __pyx_v_flags);
          |                                                         ^~~~~~~~
    cuda/_cuda/ccuda.cpp:65133:71: error: expected primary-expression before 'unsigned'
    65133 |     __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphRetainUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count, __pyx_v_flags);
          |                                                                       ^~~~~~~~
    cuda/_cuda/ccuda.cpp:65133:85: error: expected ')' before '__pyx_v_4cuda_5_cuda_5ccuda___cuGraphRetainUserObject'
    65133 |     __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphRetainUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count, __pyx_v_flags);
          |                   ~                                                                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          |                                                                                     )
    cuda/_cuda/ccuda.cpp: At global scope:
    cuda/_cuda/ccuda.cpp:65199:94: error: 'CUuserObject' has not been declared
    65199 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuGraphReleaseUserObject(CUgraph __pyx_v_graph, CUuserObject __pyx_v_object, unsigned int __pyx_v_count) {
          |                                                                                              ^~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp: In function 'CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuGraphReleaseUserObject(CUgraph, int, unsigned int)':
    cuda/_cuda/ccuda.cpp:65304:30: error: expected primary-expression before '(' token
    65304 |     __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphReleaseUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count);
          |                              ^
    cuda/_cuda/ccuda.cpp:65304:32: error: expected primary-expression before ')' token
    65304 |     __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphReleaseUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count);
          |                                ^
    cuda/_cuda/ccuda.cpp:65304:41: error: expected primary-expression before ',' token
    65304 |     __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphReleaseUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count);
          |                                         ^
    cuda/_cuda/ccuda.cpp:65304:43: error: 'CUuserObject' was not declared in this scope; did you mean 'CUsurfObject'?
    65304 |     __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphReleaseUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count);
          |                                           ^~~~~~~~~~~~
          |                                           CUsurfObject
    cuda/_cuda/ccuda.cpp:65304:57: error: expected primary-expression before 'unsigned'
    65304 |     __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphReleaseUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count);
          |                                                         ^~~~~~~~
    cuda/_cuda/ccuda.cpp:65304:71: error: expected ')' before '__pyx_v_4cuda_5_cuda_5ccuda___cuGraphReleaseUserObject'
    65304 |     __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphReleaseUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count);
          |                   ~                                                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          |                                                                       )
    cuda/_cuda/ccuda.cpp: At global scope:
    cuda/_cuda/ccuda.cpp:74604:69: error: 'CUmoduleLoadingMode' was not declared in this scope
    74604 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuModuleGetLoadingMode(CUmoduleLoadingMode *__pyx_v_mode) {
          |                                                                     ^~~~~~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp:74604:90: error: '__pyx_v_mode' was not declared in this scope; did you mean '__pyx_k_name'?
    74604 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuModuleGetLoadingMode(CUmoduleLoadingMode *__pyx_v_mode) {
          |                                                                                          ^~~~~~~~~~~~
          |                                                                                          __pyx_k_name
    cuda/_cuda/ccuda.cpp:74775:145: error: 'CUmemRangeHandleType' has not been declared
    74775 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuMemGetHandleForAddressRange(void *__pyx_v_handle, CUdeviceptr __pyx_v_dptr, size_t __pyx_v_size, CUmemRangeHandleType __pyx_v_handleType, unsigned PY_LONG_LONG __pyx_v_flags) {
          |                                                                                                                                                 ^~~~~~~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp: In function 'CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuMemGetHandleForAddressRange(void*, CUdeviceptr, size_t, int, long long unsigned int)':
    cuda/_cuda/ccuda.cpp:74880:30: error: expected primary-expression before '(' token
    74880 |     __pyx_v_err = ((CUresult (*)(void *, CUdeviceptr, size_t, CUmemRangeHandleType, unsigned PY_LONG_LONG))__pyx_v_4cuda_5_cuda_5ccuda___cuMemGetHandleForAddressRange)(__pyx_v_handle, __pyx_v_dptr, __pyx_v_size, __pyx_v_handleType, __pyx_v_flags);
          |                              ^
    cuda/_cuda/ccuda.cpp:74880:32: error: expected primary-expression before ')' token
    74880 |     __pyx_v_err = ((CUresult (*)(void *, CUdeviceptr, size_t, CUmemRangeHandleType, unsigned PY_LONG_LONG))__pyx_v_4cuda_5_cuda_5ccuda___cuMemGetHandleForAddressRange)(__pyx_v_handle, __pyx_v_dptr, __pyx_v_size, __pyx_v_handleType, __pyx_v_flags);
          |                                ^
    cuda/_cuda/ccuda.cpp:74880:34: error: expected primary-expression before 'void'
    74880 |     __pyx_v_err = ((CUresult (*)(void *, CUdeviceptr, size_t, CUmemRangeHandleType, unsigned PY_LONG_LONG))__pyx_v_4cuda_5_cuda_5ccuda___cuMemGetHandleForAddressRange)(__pyx_v_handle, __pyx_v_dptr, __pyx_v_size, __pyx_v_handleType, __pyx_v_flags);
          |                                  ^~~~
    cuda/_cuda/ccuda.cpp:74880:53: error: expected primary-expression before ',' token
    74880 |     __pyx_v_err = ((CUresult (*)(void *, CUdeviceptr, size_t, CUmemRangeHandleType, unsigned PY_LONG_LONG))__pyx_v_4cuda_5_cuda_5ccuda___cuMemGetHandleForAddressRange)(__pyx_v_handle, __pyx_v_dptr, __pyx_v_size, __pyx_v_handleType, __pyx_v_flags);
          |                                                     ^
    cuda/_cuda/ccuda.cpp:74880:61: error: expected primary-expression before ',' token
    74880 |     __pyx_v_err = ((CUresult (*)(void *, CUdeviceptr, size_t, CUmemRangeHandleType, unsigned PY_LONG_LONG))__pyx_v_4cuda_5_cuda_5ccuda___cuMemGetHandleForAddressRange)(__pyx_v_handle, __pyx_v_dptr, __pyx_v_size, __pyx_v_handleType, __pyx_v_flags);
          |                                                             ^
    cuda/_cuda/ccuda.cpp:74880:63: error: 'CUmemRangeHandleType' was not declared in this scope; did you mean 'CUmemHandleType'?
    74880 |     __pyx_v_err = ((CUresult (*)(void *, CUdeviceptr, size_t, CUmemRangeHandleType, unsigned PY_LONG_LONG))__pyx_v_4cuda_5_cuda_5ccuda___cuMemGetHandleForAddressRange)(__pyx_v_handle, __pyx_v_dptr, __pyx_v_size, __pyx_v_handleType, __pyx_v_flags);
          |                                                               ^~~~~~~~~~~~~~~~~~~~
          |                                                               CUmemHandleType
    cuda/_cuda/ccuda.cpp:74880:85: error: expected primary-expression before 'unsigned'
    74880 |     __pyx_v_err = ((CUresult (*)(void *, CUdeviceptr, size_t, CUmemRangeHandleType, unsigned PY_LONG_LONG))__pyx_v_4cuda_5_cuda_5ccuda___cuMemGetHandleForAddressRange)(__pyx_v_handle, __pyx_v_dptr, __pyx_v_size, __pyx_v_handleType, __pyx_v_flags);
          |                                                                                     ^~~~~~~~
    cuda/_cuda/ccuda.cpp:74880:108: error: expected ')' before '__pyx_v_4cuda_5_cuda_5ccuda___cuMemGetHandleForAddressRange'
    74880 |     __pyx_v_err = ((CUresult (*)(void *, CUdeviceptr, size_t, CUmemRangeHandleType, unsigned PY_LONG_LONG))__pyx_v_4cuda_5_cuda_5ccuda___cuMemGetHandleForAddressRange)(__pyx_v_handle, __pyx_v_dptr, __pyx_v_size, __pyx_v_handleType, __pyx_v_flags);
          |                   ~                                                                                        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          |                                                                                                            )
    error: command '/home/ec2-user/anaconda3/envs/tensorflow2_p38/bin/x86_64-conda-linux-gnu-cc' failed with exit status 1
    
    opened by pranavladkat 9
  • nvrtc.nvrtcCompileProgram is changing the preferred encoding from UTF-8 to ANSI_X3.4-1968

    nvrtc.nvrtcCompileProgram is changing the preferred encoding from UTF-8 to ANSI_X3.4-1968

    Dear developers,

    I found out that calling the NVRTC for compilation is changing the preferred encoding for the current Python instance.

    For more details and to reproduce the issue, please refer to this StackOverflow question.

    Do you have an idea on why this happens, and how it is possible to revert the preferred encoding to its original setting?

    Thank you in advance

    opened by redsnic 5
  • Failed to dlopen libcuda.so in WSL environment

    Failed to dlopen libcuda.so in WSL environment

    from cuda import cuda
    cuda.cuInit(0)
    
    ---------------------------------------------------------------------------
    RuntimeError                              Traceback (most recent call last)
    Input In [2], in <module>
    ----> 1 cuda.cuInit(0)
    
    File ~/miniconda3/envs/dev/lib/python3.8/site-packages/cuda/cuda.pyx:8876, in cuda.cuda.cuInit()
    
    File ~/miniconda3/envs/dev/lib/python3.8/site-packages/cuda/ccuda.pyx:17, in cuda.ccuda.cuInit()
    
    File ~/miniconda3/envs/dev/lib/python3.8/site-packages/cuda/_cuda/ccuda.pyx:3553, in cuda._cuda.ccuda._cuInit()
    
    File ~/miniconda3/envs/dev/lib/python3.8/site-packages/cuda/_cuda/ccuda.pyx:424, in cuda._cuda.ccuda.cuPythonInit()
    
    RuntimeError: Failed to dlopen libcuda.so
    

    This is because in a WSL environment libcuda.so lives in /usr/lib/wsl/lib which is not in the default search path of dlopen. For libraries that link against libcuda this isn't a problem because there's a file at /etc/ld.so.conf.d/ld.wsl.conf which instructs the linker as to where it can find the libraries, but unfortunately dlopen doesn't use this.

    As a workaround, adding /usr/lib/wsl/lib to the LD_LIBRARY_PATH environment variable resolves the problem.

    opened by kkraus14 4
  • No module named 'examples'

    No module named 'examples'

    I change directories to try to run some examples.

    (cython) [email protected]:~/Documents/cuda-start-dec2022/cuda-python/examples/0_Introduction$ python clock_nvrtc_test.py
    Traceback (most recent call last):
      File "/home/nyck33/Documents/cuda-start-dec2022/cuda-python/examples/0_Introduction/clock_nvrtc_test.py", line 10, in <module>
        from examples.common import common
    ModuleNotFoundError: No module named 'examples'
    

    What am I doing wrong? I am looking at pypi package called absolufy-imports to try to get this going.

    opened by nyck33 3
  • Adopting a set of

    Adopting a set of "supported" python versions

    Right now the project doesn't have any set of explicitly supported python versions. NEP 29 provides an example of how this can be done:

    All minor versions of Python released 42 months prior to the project, and at minimum the two latest minor versions.

    Minimum Python ... version support should be adjusted upward on [a] major and minor release, but never on a patch release.

    This language also allows forecasting of python versions and forecasting (of some degree) of the resources required to maintain the project due to PEP 602 which normalizes the release schedule of python versions.

    There are at least two areas this practically impacts:

    • Support for version specific issues. Having a specified set of support versions allows some version specific issues to be termed in or out of scope, and be prioritized appropriately.
    • Binary distributions are currently made available on pypi and the nvidia channel of conda-forge, this bounds for which versions of python the binaries are targeted.
    opened by m3vaz 3
  • cudart.cudaSetDevice allocates memory on GPU other than target

    cudart.cudaSetDevice allocates memory on GPU other than target

    cuda-python 11.6.1 cuda toolkit 11.2 Ubuntu Linux

    If you run something like the following on a multi-GPU machine

    device_num = 5
    err, = cuda.cuInit(0)
    err, device = cuda.cuDeviceGet(device_num)
    err, cuda_context = cuda.cuCtxCreate(0, device)
    err, = cudart.cudaSetDevice(device)
    

    The call to cudart.cudaSetDevice will properly set your device to '5', but it will also allocate ~305 MB of memory on device 0 (or whichever is the 0th device in the device list provided by CUDA_VISIBLE_DEVICES). I think this issue (possibly in the C-CUDA runtime underneath?) may possibly be the root of many downstream issues in libraries like Tensorflow and Pytorch who have similar issues where a user selects a device but still gets tons of allocations on other devices. This 305 MB may not sound like a lot, but I'm running a program on an Nvidia-DGX with 16 GPUs and I have 64 worker processes, causing 64*305 = 19GB of unusable space to be allocated on GPU 0, which crashes the program. I cannot simply set CUDA_VISIBLE_DEVICES to correct this problem because the workers are communicating via shared GPU memory (via cuIPCMemHandle) with their parent process, and the parent process needs access to all GPUs. Additionally, the worker processes are performing data augmentation on one GPU, while writing output to another GPU with a different device ID.

    I am trying to investigate a workaround to not call 'cudart.cudaSetDevice' at all, but when it is not called I cannot properly use the pointer given by cuda.cuMemAlloc to create a PyTorch tensor. When I call cudart.cudaSetDevice, I am able to use the pointer properly.

    opened by QuiteAFoxtrot 3
  • Use python exceptions instead of `err, ... =`

    Use python exceptions instead of `err, ... =`

    Congratulations on the GA release! 🥳

    I've been looking forward to the cuda bindings for a while, and was just looking through the docs.

    The overview notes an implementation of ASSERT_DRV, which already contains the caveat:

    In a future release, this may automatically raise exceptions using a Python object model.

    I'm not sure if that means that the errors are going to be subclasses of something like a CUDAError, or if that is to be interpreted some other way, but in any case, I was quite surprised about this choice of exception API

    Why not make the functions raise err by default? Right now, IIUC, every invocation would need to accept an extra err-return (and handle it with something like ASSERT_DRV). This seems like a really onerous task to achieve the default behaviour of "fail in case of something unexpected" (and actively choosing where to introduce try... except: handling to continue even if things fail).

    It seems like a bad trade-off for me (high verbosity, and easy to forget adding an ASSERT_DRV), but maybe I'm overlooking something?

    The reasons I'm raising this right now, is that this would be a pretty fundamental API change, and if there's any chance at all (assuming it's not already "zero" after GA), it would be ASAP.

    opened by h-vetinari 3
  • ERROR  Could not find a version that satisfies the requirement torch==1.12.1+cu113 (from versions: none)

    ERROR Could not find a version that satisfies the requirement torch==1.12.1+cu113 (from versions: none)

    venv "C:\stable-diffusion-webui-master\stable-diffusion-webui-master\venv\Scripts\Python.exe" Python 3.11.0 (main, Oct 24 2022, 18:26:48) [MSC v.1933 64 bit (AMD64)] Commit hash: Installing torch and torchvision Traceback (most recent call last): File "C:\stable-diffusion-webui-master\stable-diffusion-webui-master\launch.py", line 227, in prepare_enviroment() File "C:\stable-diffusion-webui-master\stable-diffusion-webui-master\launch.py", line 150, in prepare_enviroment run(f'"{python}" -m {torch_command}', "Installing torch and torchvision", "Couldn't install torch") File "C:\stable-diffusion-webui-master\stable-diffusion-webui-master\launch.py", line 33, in run raise RuntimeError(message) RuntimeError: Couldn't install torch. Command: "C:\stable-diffusion-webui-master\stable-diffusion-webui-master\venv\Scripts\python.exe" -m pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 --extra-index-url https://download.pytorch.org/whl/cu113 Error code: 1 stdout: Looking in indexes: https://pypi.org/simple, https://download.pytorch.org/whl/cu113

    stderr: ERROR: Could not find a version that satisfies the requirement torch==1.12.1+cu113 (from versions: none) ERROR: No matching distribution found for torch==1.12.1+cu113

    Press any key to continue . . .

    opened by GreatHK 2
  • First base of 'CUkernelNodeAttrValue_v1' is not an extension type

    First base of 'CUkernelNodeAttrValue_v1' is not an extension type

    I'm trying to compile cuda-python in a fairly minimal conda environment (nothing installed but the requirements), with cuda-11.6 installed, and seeing several instances of the following sort of error:

          Error compiling Cython file:
          ------------------------------------------------------------
          ...
                  Get memory address of class instance
    
              """
              pass
    
          cdef class CUkernelNodeAttrValue_v1(CUlaunchAttributeValue_union):
                                             ^
          ------------------------------------------------------------
    
          cuda/cuda.pxd:2637:36: First base of 'CUkernelNodeAttrValue_v1' is not an extension type
    

    I do have other cuda versions installed alongside 11.6 but judging from the output of Parsing headers in "/usr/local/cuda-11.6/include" it seems like it's probably finding the right version? Any advice on how to get past this, or debug it? Thanks!

    opened by bertmaher 2
  • Remove duplicate code in vectorAddMMAP example

    Remove duplicate code in vectorAddMMAP example

    The code to determine granularity is duplicated, immediately after perforing that check, the same code exists. The second entry is being eliminated by this change.

    opened by pentschev 2
  • _ZSt28__throw_bad_array_new_lengthv

    _ZSt28__throw_bad_array_new_lengthv

    ~/cuda-python$ pip install -e .
    Obtaining file:///home/vinuj/cuda-python
    Requirement already satisfied: cython in /home/vinuj/anaconda3/lib/python3.9/site-packages (from cuda-python==11.7.1) (0.29.28)
    Installing collected packages: cuda-python
      Attempting uninstall: cuda-python
        Found existing installation: cuda-python 11.7.1
        Uninstalling cuda-python-11.7.1:
          Successfully uninstalled cuda-python-11.7.1
      Running setup.py develop for cuda-python
    
        from cuda import cuda, cudart
    ImportError: /home/vinuj/cuda-python/cuda/cuda.cpython-39-x86_64-linux-gnu.so: undefined symbol: _ZSt28__throw_bad_array_new_lengthv
    
    
    opened by vinutah 2
  • Dropping Python 3.7

    Dropping Python 3.7

    We're considering dropping support for Python 3.7 for the next release. Per NEP 29, Python 3.7 drop schedule was almost a year ago and many associated libraries have already dropped it.

    Let us know if there's concerns in having Python 3.7 dropped next release. Thanks!

    opened by vzhurba01 0
  • No module named 'cuda._lib'; 'cuda' is not a package

    No module named 'cuda._lib'; 'cuda' is not a package

    After following the steps on cuda-python to install cuda-python with conda instruction, I try to

    from cuda import cuda, nvrtc
    

    as in the example in the pycharm python console, but it raises an error:

    Traceback (most recent call last):
      File "D:\Anaconda\envs\hierot\lib\code.py", line 90, in runcode
        exec(code, self.locals)
      File "<input>", line 1, in <module>
      File "D:\PyCharm Community Edition 2022.1.3\plugins\python-ce\helpers\pydev\_pydev_bundle\pydev_import_hook.py", line 21, in do_import
        module = self._system_import(name, *args, **kwargs)
      File "cuda\cuda.pyx", line 1, in init cuda.cuda
        # Copyright 2021-2022 NVIDIA Corporation.  All rights reserved.
      File "D:\PyCharm Community Edition 2022.1.3\plugins\python-ce\helpers\pydev\_pydev_bundle\pydev_import_hook.py", line 21, in do_import
        module = self._system_import(name, *args, **kwargs)
    ModuleNotFoundError: No module named 'cuda._lib'; 'cuda' is not a package
    

    But the code above can be successfully run in the terminal

    (hierot) D:\Projects\SimPlatform>python
    Python 3.9.13 (main, Aug 25 2022, 23:51:50) [MSC v.1916 64 bit (AMD64)] :: Anaconda, Inc. on win32
    Type "help", "copyright", "credits" or "license" for more information.
    >>> from cuda import cuda, nvrtc
    >>>
    

    Please help me with the problem, thanks in advance. Further information provided on request.

    I searched with

    ModuleNotFoundError: No module named 'xxx'
    

    Solutions suggest configure correct python interpreter, but I believe my interpreter is already properly configured.

    And search with

    No module named 'xxx'; 'yyy' is not a package
    

    Some says the cause is the name cuda is shadowed by the package name cuda, I think it might be the problem. Please check this.

    opened by HIEROT 0
  • Windows: ModuleNotFoundError: No module named 'win32api'

    Windows: ModuleNotFoundError: No module named 'win32api'

    Installing on Windows:

    python -m pip install cuda-python

    Then from python:

    from cuda import cuda

    Fails with

        File "cuda\cuda.pyx", line 1, in init cuda.cuda
    
        File "cuda\ccuda.pyx", line 1, in init cuda.ccuda
    
        File "cuda\_cuda\ccuda.pyx", line 8, in init cuda._cuda.ccuda
    
      ModuleNotFoundError: No module named 'win32api'
    

    I can fix this by installing pypiwin32 manually. But I think it should be listed in requirements.txt if platform_system is Windows.

    Thanks

    opened by ilyasher 0
  • more inference time in cuda env compared to cpu (occured only for a layer)

    more inference time in cuda env compared to cpu (occured only for a layer)

    Dear sir/madam: When I inference on a deep learning model (slowfast model), I'm facing a problem that my python program seems to take more inference time in cuda env compared to cpu. It's not the whole model but one specific layer takes more time on cuda env than cpu. I'm so confused that hope someone can help me with it. Here is the details. the specific layer is "slowway-conv1" layer as showned in the pic below representing the model structure of slowfast. image And my confusing result is as follows. the first for cuda and the second for cpu. image image In cuda env, I found the processing time of "conv1" (0.97s) accounts for a great proportion of the processing time of the whole model (1.04s), while in cpu env, the processing time of "conv1" (0.07s) only accounts for a very small proportion of the processing time of the whole model (4.43s). And I reckon that the proportion in cpu env is reasonable considering the calculation budget. Is my method of time measurement mistaken? I used the following code to measure time cost. image image If it's my fault that causing the confusing result, please kindly point out, or please give me some ideas to help me solve this problem. Thank you very much! Yours, Koala

    opened by koalaaaaaaaaa 0
  • cuda.cudart.cudaRuntimeGetVersion() hard-codes the runtime version, rather than querying the runtime

    cuda.cudart.cudaRuntimeGetVersion() hard-codes the runtime version, rather than querying the runtime

    The current implementation of cuda.cudart.cudaRuntimeGetVersion() hard-codes the runtime version, rather than querying the runtime for its version. This results in incorrect runtime versions if the runtime version is different from the version of cuda-python.

    https://github.com/NVIDIA/cuda-python/blob/746b773c91e1ede708fe9a584b8cdb1c0f32b51d/cuda/_lib/ccudart/ccudart.pyx#L79-L82

    https://github.com/NVIDIA/cuda-python/blob/746b773c91e1ede708fe9a584b8cdb1c0f32b51d/cuda/_lib/ccudart/utils.pyx#L37

    Additional context

    A workaround used in https://github.com/rapidsai/rmm/pull/946 is to use numba's API for this instead:

    import numba.cuda
    
    def cudaRuntimeGetVersion():
        major, minor = numba.cuda.runtime.get_version()
        return major * 1000 + minor * 10
    
    opened by bdice 6
Releases(v12.0.0)
XtremeDistil framework for distilling/compressing massive multilingual neural network models to tiny and efficient models for AI at scale

XtremeDistilTransformers for Distilling Massive Multilingual Neural Networks ACL 2020 Microsoft Research [Paper] [Video] Releasing [XtremeDistilTransf

Microsoft 125 Jan 04, 2023
Pre-Trained Image Processing Transformer (IPT)

Pre-Trained Image Processing Transformer (IPT) By Hanting Chen, Yunhe Wang, Tianyu Guo, Chang Xu, Yiping Deng, Zhenhua Liu, Siwei Ma, Chunjing Xu, Cha

HUAWEI Noah's Ark Lab 332 Dec 18, 2022
Rule Based Classification Project For Python

Rule-Based-Classification-Project (ENG) Business Problem: A game company wants to create new level-based customer definitions (personas) by using some

Deniz Can OÄžUZ 4 Oct 29, 2022
📚 Papermill is a tool for parameterizing, executing, and analyzing Jupyter Notebooks.

papermill is a tool for parameterizing, executing, and analyzing Jupyter Notebooks. Papermill lets you: parameterize notebooks execute notebooks This

nteract 5.1k Jan 03, 2023
CCCL: Contrastive Cascade Graph Learning.

CCGL: Contrastive Cascade Graph Learning This repo provides a reference implementation of Contrastive Cascade Graph Learning (CCGL) framework as descr

Xovee Xu 19 Dec 05, 2022
Public implementation of the Convolutional Motif Kernel Network (CMKN) architecture

CMKN Implementation of the convolutional motif kernel network (CMKN) introduced in Ditz et al., "Convolutional Motif Kernel Network", 2021. Testing Yo

1 Nov 17, 2021
Code for the ICME 2021 paper "Exploring Driving-Aware Salient Object Detection via Knowledge Transfer"

TSOD Code for the ICME 2021 paper "Exploring Driving-Aware Salient Object Detection via Knowledge Transfer" Usage For training, open train_test, run p

Jinming Su 2 Dec 23, 2021
A Lightweight Experiment & Resource Monitoring Tool 📺

Lightweight Experiment & Resource Monitoring 📺 "Did I already run this experiment before? How many resources are currently available on my cluster?"

170 Dec 28, 2022
Density-aware Single Image De-raining using a Multi-stream Dense Network (CVPR 2018)

DID-MDN Density-aware Single Image De-raining using a Multi-stream Dense Network He Zhang, Vishal M. Patel [Paper Link] (CVPR'18) We present a novel d

He Zhang 224 Dec 12, 2022
Convnext-tf - Unofficial tensorflow keras implementation of ConvNeXt

ConvNeXt Tensorflow This is unofficial tensorflow keras implementation of ConvNe

29 Oct 06, 2022
HyperCube: Implicit Field Representations of Voxelized 3D Models

HyperCube: Implicit Field Representations of Voxelized 3D Models Authors: Magdalena Proszewska, Marcin Mazur, Tomasz Trzcinski, Przemysław Spurek [Pap

Magdalena Proszewska 3 Mar 09, 2022
Repository for the electrical and ICT benchmark model developed in the ERIGrid 2.0 project.

Benchmark Model Electrical and ICT System This repository contains the documentation, code, and models for the electrical and ICT benchmark model deve

ERIGrid 2.0 1 Nov 29, 2021
Text mining project; Using distilBERT to predict authors in the classification task authorship attribution.

DistilBERT-Text-mining-authorship-attribution Dataset used: https://www.kaggle.com/azimulh/tweets-data-for-authorship-attribution-modelling/version/2

1 Jan 13, 2022
MAT: Mask-Aware Transformer for Large Hole Image Inpainting

MAT: Mask-Aware Transformer for Large Hole Image Inpainting (CVPR2022, Oral) Wenbo Li, Zhe Lin, Kun Zhou, Lu Qi, Yi Wang, Jiaya Jia [Paper] News This

254 Dec 29, 2022
State of the Art Neural Networks for Generative Deep Learning

pyradox-generative State of the Art Neural Networks for Generative Deep Learning Table of Contents pyradox-generative Table of Contents Installation U

Ritvik Rastogi 8 Sep 29, 2022
AirPose: Multi-View Fusion Network for Aerial 3D Human Pose and Shape Estimation

AirPose AirPose: Multi-View Fusion Network for Aerial 3D Human Pose and Shape Estimation Check the teaser video This repository contains the code of A

Robot Perception Group 41 Dec 05, 2022
Human motion synthesis using Unity3D

Human motion synthesis using Unity3D Prerequisite: Software: amc2bvh.exe, Unity 2017, Blender. Unity: RockVR (Video Capture), scenes, character models

Hao Xu 9 Jun 01, 2022
BirdCLEF 2021 - Birdcall Identification 4th place solution

BirdCLEF 2021 - Birdcall Identification 4th place solution My solution detail kaggle discussion Inference Notebook (best submission) Environment Use K

tattaka 42 Jan 02, 2023
Causal Imitative Model for Autonomous Driving

Causal Imitative Model for Autonomous Driving Mohammad Reza Samsami, Mohammadhossein Bahari, Saber Salehkaleybar, Alexandre Alahi. arXiv 2021. [Projec

VITA lab at EPFL 8 Oct 04, 2022
PyTorch implementation for NED. It can be used to manipulate the facial emotions of actors in videos based on emotion labels or reference styles.

Neural Emotion Director (NED) - Official Pytorch Implementation Example video of facial emotion manipulation while retaining the original mouth motion

Foivos Paraperas 89 Dec 23, 2022