Python 3 Bindings for NVML library. Get NVIDIA GPU status inside your program.

Last update: Jan 04, 2023

Related tags

Overview

py3nvml

Documentation also available at readthedocs.

Python 3 compatible bindings to the NVIDIA Management Library. Can be used to query the state of the GPUs on your system. This was ported from the NVIDIA provided python bindings nvidia-ml-py, which only supported python 2. I have forked from version 7.352.0. The old library was itself a wrapper around the NVIDIA Management Library.

In addition to these NVIDIA functions to query the state of the GPU, I have written a couple functions/tools to help in using gpus (particularly for a shared gpu server). These are:

A function to 'restrict' the available GPUs by setting the CUDA_VISIBLE_DEVICES environment variable.
A script for displaying a differently formatted nvidia-smi.

See the Utils section below for more info.

Updates in Version 0.2.3

To try and keep py3nvml somewhat up-to-date with the constantly evolving nvidia drivers, I have done some work to the py3nvml.py3nvml module. In particular, I have updated all the constants that were missing in py3nvml and existing in the NVIDIA source as of version 418.43. In addition, I have wrapped all of these constants in Enums so it is easier to see what constants go together. Finally, for all the functions in py3nvml.py3nvml I have copied in the C docstring. While this will result in some strange looking docstrings which will be slightly incorrect, they should give good guidance on the scope of the function, something which was ill-defined before.

Finally, I will remove the py3nvml.nvidia_smi module in a future version, as I believe it was only ever meant as an example of how to use the nvml functions to query the gpus, and is now quite out of date. To get the same functionality, you can call nvidia-smi -q -x from python with subprocess.

Requires

Python 3.5+.

Installation

From PyPi:

$ pip install py3nvml

From GitHub:

$ pip install -e git+https://github.com/fbcotter/py3nvml#egg=py3nvml

Or, download and pip install:

$ git clone https://github.com/fbcotter/py3nvml
$ cd py3nvml
$ pip install .

Utils

(Added by me - not ported from NVIDIA library)

grab_gpus

You can call the grab_gpus(num_gpus, gpu_select, gpu_fraction=.95) function to check the available gpus and set the CUDA_VISIBLE_DEVICES environment variable as need be. It determines if a GPU is available by checking if the amount of free memory is below memory-usage is above/equal to the gpu_fraction value. The default of .95 allows for some small amount of memory to be taken before it deems the gpu as being 'used'.

I have found this useful as I have a shared gpu server and like to use tensorflow which is very greedy and calls to tf.Session() grabs all available gpus.

E.g.

import py3nvml
import tensorflow as tf
py3nvml.grab_gpus(3)
sess = tf.Session() # now we only grab 3 gpus!

Or the following will grab 2 gpus from the first 4 (and leave any higher gpus untouched)

py3nvml.grab_gpus(num_gpus=2, gpu_select=[0,1,2,3])
sess = tf.Session()

This will look for 3 available gpus in the range of gpus from 0 to 3. The range option is not necessary, and it only serves to restrict the search space for the grab_gpus.

You can adjust the memory threshold for determining if a GPU is free/used with the gpu_fraction parameter (default is 1):

# Will allocate a GPU if less than 20% of its memory is being used
py3nvml.grab_gpus(num_gpus=2, gpu_fraction=0.8)
sess = tf.Session()

This function has no return codes but may raise some warnings/exceptions:

If the method could not connect to any NVIDIA gpus, it will raise a RuntimeWarning.
If it could connect to the GPUs, but there were none available, it will raise a ValueError.
If it could connect to the GPUs but not enough were available (i.e. more than 1 was requested), it will take everything it can and raise a RuntimeWarning.

get_free_gpus

This tool can query the gpu status. Unlike the default for grab_gpus, which checks the memory usage of a gpu, this function checks if a process is running on a gpu. For a system with N gpus, returns a list of N booleans, where the nth value is True if no process was found running on gpu n. An example use is:

import py3nvml
free_gpus = py3nvml.get_free_gpus()
if True not in free_gpus:
    print('No free gpus found')

get_num_procs

This function is called by get_free_gpus. It simply returns a list of integers with the number of processes running on each gpu. E.g. if you had 1 process running on gpu 5 in an 8 gpu system, you would expect to get the following:

import py3nvml
num_procs = py3nvml.get_num_procs()
print(num_proces)
>>> [0, 0, 0, 0, 0, 1, 0, 0]

py3smi

I found the default nvidia-smi output was missing some useful info, so made use of the py3nvml/nvidia_smi.py module to query the device and get info on the GPUs, and then defined my own printout. I have included this as a script in scripts/py3smi. The print code is horribly messy but the query code is very simple and should be understandable.

Running pip install will now put this script in your python's bin, and you'll be able to run it from the command line. Here is a comparison of the two outputs:

For py3smi, you can specify an update period so it will refresh the feed every few seconds. I.e., similar to watch -n5 nvidia-smi, you can run py3smi -l 5.

You can also get the full output (very similar to nvidia-smi) by running py3smi -f (this shows a slightly modified process info pane below).

Regular Usage

Visit NVML reference for a list of the functions available and their help. Also the script py3smi is a bit hacky but shows examples of me querying the GPUs for info.

(below here is everything ported from pynvml)

from py3nvml.py3nvml import *
nvmlInit()
print("Driver Version: {}".format(nvmlSystemGetDriverVersion()))
# e.g. will print:
#   Driver Version: 352.00
deviceCount = nvmlDeviceGetCount()
for i in range(deviceCount):
    handle = nvmlDeviceGetHandleByIndex(i)
    print("Device {}: {}".format(i, nvmlDeviceGetName(handle)))
# e.g. will print:
#  Device 0 : Tesla K40c
#  Device 1 : Tesla K40c

nvmlShutdown()

Additionally, see py3nvml.nvidia_smi.py. This does the equivalent of the nvidia-smi command:

nvidia-smi -q -x

With

import py3nvml.nvidia_smi as smi
print(smi.XmlDeviceQuery())

Differences from NVML

The py3nvml library consists of python methods which wrap several NVML functions, implemented in a C shared library. Each function's use is the same with the following exceptions:

Instead of returning error codes, failing error codes are raised as Python exceptions. I.e. They should be wrapped with exception handlers.

try:
    nvmlDeviceGetCount()
except NVMLError as error:
    print(error)

C function output parameters are returned from the corresponding Python function as tuples, rather than requiring pointers. Eg the C function:

nvmlReturn_t nvmlDeviceGetEccMode(nvmlDevice_t device,
                                  nvmlEnableState_t *current,
                                  nvmlEnableState_t *pending);

Becomes

nvmlInit()
handle = nvmlDeviceGetHandleByIndex(0)
(current, pending) = nvmlDeviceGetEccMode(handle)

C structs are converted into Python classes. E.g. the C struct:

nvmlReturn_t DECLDIR nvmlDeviceGetMemoryInfo(nvmlDevice_t device,
                                             nvmlMemory_t *memory);
typedef struct nvmlMemory_st {
    unsigned long long total;
    unsigned long long free;
    unsigned long long used;
} nvmlMemory_t;

Becomes:

info = nvmlDeviceGetMemoryInfo(handle)
print("Total memory: {}MiB".format(info.total >> 20))
# will print:
#   Total memory: 5375MiB
print("Free memory: {}".format(info.free >> 20))
# will print:
#   Free memory: 5319MiB
print("Used memory: ".format(info.used >> 20))
# will print:
#   Used memory: 55MiB

Python handles string buffer creation. E.g. the C function:

nvmlReturn_t nvmlSystemGetDriverVersion(char* version,
                                        unsigned int length);

Can be called like so:

version = nvmlSystemGetDriverVersion()
nvmlShutdown()

5. All meaningful NVML constants and enums are exposed in Python. E.g. the constant NVML_TEMPERATURE_GPU is available under py3nvml.NVML_TEMPERATURE_GPU

The NVML_VALUE_NOT_AVAILABLE constant is not used. Instead None is mapped to the field.

Release Notes (for pynvml)

Version 2.285.0

Added new functions for NVML 2.285. See NVML documentation for more information.
Ported to support Python 3.0 and Python 2.0 syntax.
Added nvidia_smi.py tool as a sample app.

Version 3.295.0

Added new functions for NVML 3.295. See NVML documentation for more information.
Updated nvidia_smi.py tool - Includes additional error handling

Version 4.304.0

Added new functions for NVML 4.304. See NVML documentation for more information.
Updated nvidia_smi.py tool

Version 4.304.3

Fixing nvmlUnitGetDeviceCount bug

Version 5.319.0

Added new functions for NVML 5.319. See NVML documentation for more information.

Version 6.340.0

Added new functions for NVML 6.340. See NVML documentation for more information.

Version 7.346.0

Added new functions for NVML 7.346. See NVML documentation for more information.

Version 7.352.0

Added new functions for NVML 7.352. See NVML documentation for more information.

COPYRIGHT

LICENSE

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
Neither the name of the NVIDIA Corporation nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Comments

gpu_temp_max_gpu_threshold missing

I just found out that the GPU Max Operating Temp, exported with the XML tag gpu_temp_max_gpu_threshold, is missing from py3nvml.

Do you have any plan to add it?

Also, another missing tag is the cuda_version.

opened by leinardi 13
fix invalid escape sequences

python 3.7 gives a deprecation notice on N\A because it sees it as an escape sequence. I think it was intended as N/A so this is a proposal to fix the warning.

opened by thekyz 3

nvmlDeviceGetMemoryInfo has weird result

hi, when I use nvmlDeviceGetMemoryInfo to get the gpu memory used info, it will return a false number e.g. 16MB while watch -n 0.01 nvidia-smi would show ~11G used. Here is example code below:

py3nvml.nvmlInit()
device_count = py3nvml.nvmlDeviceGetCount()
assert gpuid < device_count
handle = py3nvml.nvmlDeviceGetHandleByIndex(gpuid)
mem_info = py3nvml.nvmlDeviceGetBAR1MemoryInfo(handle)
if mem_info != 'N/A':
    print(mem_info)
    used = mem_info.bar1Used >> 20
    total = mem_info.bar1Total >> 20
else:
    used = 0
    total = 0

opened by knsong 2

video_clock tag missing

Another tag missing is the video_clock:

		<clocks>
			<graphics_clock>300 MHz</graphics_clock>
			<sm_clock>300 MHz</sm_clock>
			<mem_clock>405 MHz</mem_clock>
			<video_clock>540 MHz</video_clock>
		</clocks>
		<max_clocks>
			<graphics_clock>2175 MHz</graphics_clock>
			<sm_clock>2175 MHz</sm_clock>
			<mem_clock>7000 MHz</mem_clock>
			<video_clock>1950 MHz</video_clock>
		</max_clocks>

opened by leinardi 2

module 'string' has no attribute 'join'

Hi, When I tried to do e.g. print(nvmlDeviceGetName(h)) I got the following error.

File "/home/**/build/nvidia-ml-py/src/py3nvml/py3nvml/py3nvml.py", line 399, in __str__
    return self.__class__.__name__ + "(" + string.join(result, ", ") + ")"
AttributeError: module 'string' has no attribute 'join'

I'm sorry but I'm blind to Python and I have no idea. Could you investigate it?

opened by ikfj 2

Add constraint on minimal graphics card memory size

The new parameters allows discarding the graphics card that don't have enough RAM capacity. The parameter value should be in MiB.

Updates the documentation to reflect the new value in arguments.

opened by m5imunovic 1
Fix for more than 8 GPUs

grab_gpus failed for me on a system with more than 8 GPUs. My fix is to use the same length (nvmlDeviceGetCount) for for gpu_check and gpu_select, which allows all the GPUs to be grabbed.

opened by hallbjorn 1
could not get the GPU MEM usage percent

we could see an almost 100% for GPU MEM usage, but tested the code, and could only get '0'.

My test code: from py3nvml.py3nvml import * nvmlInit() handle = nvmlDeviceGetHandleByIndex(0) result = nvmlDeviceGetUtilizationRates(handle) print(result.memory) # 0

opened by spenly 1
[bug] Error class in nvidia_smi.py

Great code. Could you please check handleError call in nvidia_smi.py line 184. I'm suggesting replace val = handleError(NVML_ERROR_NOT_SUPPORTED); by val = handleError(NVMLError(NVML_ERROR_NOT_SUPPORTED)); because it gives me error while comparing value attribute in error handler.

BR/thupalo

opened by thupalo 1
Not defined?

Hi, so the module was working before but for some reason it keeps giving me undefined for these commands:

nvmlDeviceGetFanSpeed(nvmlDeviceGetHandleByIndex(0)) nvmlDeviceGetTemperature(nvmlDeviceGetHandleByIndex(0), 0)

I've got a nVida GTX 1080 on windows 10 with Python 3.6.00. All I was trying to do was read my temperature and fan speeds.

opened by sometimescool22 1
Question when using py3nvml

I tried to run the code in the Usage part, but it's report that "ModuleNotFoundError: No module named 'py3nvml.pynvml' " How can I fix this?

opened by ghost 1
the type of nvmlDeviceGetPciInfo(handle).busId is "bytes" not "str"
When I call nvmlDeviceGetPciInfo(handle) function the "busId" is "bytes" not "str"

handle = nvmlDeviceGetHandleByIndex(i) devId = nvmlDeviceGetPciInfo(handle).busId

The whole nvmlPciInfo_t is:

nvmlPciInfo_t(busId: b'0000:00:0A.0', domain: 0x0000, bus: 0x00, device: 0x0A, pciDeviceId: 0x1EB810DE, pciSubSystemId: 0x12A210DE, reserved0: 0, reserved1: 0, reserved2: 0, reserved3: 0)

This line converts c_info to "str", but "busId" is "bytes" https://github.com/fbcotter/py3nvml/blob/master/py3nvml/py3nvml.py#L2646

Is it expected output?

My test environment is python3.6
opened by WangKunLoveReading 0

Releases(0.2.4)

0.2.4(Oct 11, 2019)
Minor update.

Fixed some alignment issues with long PIDs in py3smi.

Added ability to call py3nvml.grab_gpus with num_gpus=-1. This will grab all available GPUs. Could previously do this by setting num_gpus to a large number but this would throw a warning if it couldn't get all available gpus.

Source code(tar.gz)
Source code(zip)
0.2.3(Mar 4, 2019)

To try and keep py3nvml somewhat up-to-date with the constantly evolving nvidia drivers, I have done some work to the py3nvml.py3nvml module. In particular, I have updated all the constants that were missing in py3nvml and existing in the NVIDIA source as of version 418.43. In addition, I have wrapped all of these constants in Enums so it is easier to see what constants go together. Finally, for all the functions in py3nvml.py3nvml I have copied in the C docstring. While this will result in some strange looking docstrings which will be slightly incorrect, they should give good guidance on the scope of the function, something which was ill-defined before.

Finally, I will remove the py3nvml.nvidia_smi module in a future version, as I believe it was only ever meant as an example of how to use the nvml functions to query the gpus, and is now quite out of date. To get the same functionality, you can call nvidia-smi -q -x from python with subprocess.
Source code(tar.gz)
Source code(zip)
0.2.2(Oct 23, 2018)

Source code(tar.gz)
Source code(zip)
0.2.1(Jun 27, 2018)

Updated script to not use a big xml query but rather multiple small queries. Can now handle if a gpu falls off the bus too, still displaying info for the remaining gpus.
Source code(tar.gz)
Source code(zip)
0.2.0(May 17, 2018)

Added py3smi script
Source code(tar.gz)
Source code(zip)
0.0.1(Nov 21, 2016)

Source code(tar.gz)
Source code(zip)

Owner

Fergal Cotter

Senior Researcher at Wayve AI. Signal Processing PhD graduate from Cambridge. I like wavelets and neural nets!

GitHub Repository

A Python module for getting the GPU status from NVIDA GPUs using nvidia-smi programmically in Python

GPUtil GPUtil is a Python module for getting the GPU status from NVIDA GPUs using nvidia-smi. GPUtil locates all GPUs on the computer, determines thei

927 Dec 08, 2022

QPT-Quick packaging tool 前项式Python环境快捷封装工具

QPT - Quick packaging tool 快捷封装工具 GitHub主页 | Gitee主页 QPT是一款可以“模拟”开发环境的多功能封装工具，一行命令即可将普通的Python脚本打包成EXE可执行程序，与此同时还可轻松引入CUDA等深度学习加速库，尽可能在用户使用时复现您的开发环境。

545 Dec 28, 2022

📊 A simple command-line utility for querying and monitoring GPU status

gpustat Just less than nvidia-smi? NOTE: This works with NVIDIA Graphics Devices only, no AMD support as of now. Contributions are welcome! Self-Promo

3.2k Jan 04, 2023

cuGraph - RAPIDS Graph Analytics Library

cuGraph - GPU Graph Analytics The RAPIDS cuGraph library is a collection of GPU accelerated graph algorithms that process data found in GPU DataFrames

1.2k Jan 01, 2023

jupyter/ipython experiment containers for GPU and general RAM re-use

ipyexperiments jupyter/ipython experiment containers and utils for profiling and reclaiming GPU and general RAM, and detecting memory leaks. About Thi

153 Dec 07, 2022

A Python function for Slurm, to monitor the GPU information

Gpu-Monitor A Python function for Slurm, where I couldn't use nvidia-smi to monitor the GPU information. whole repo is not finish Installation TODO Mo

2 Feb 11, 2022

Conda package for artifact creation that enables offline environments. Ideal for air-gapped deployments.

Conda-Vendor Conda Vendor is a tool to create local conda channels and manifests for vendored deployments Installation To install with pip, run: pip i

13 Nov 17, 2022

ArrayFire: a general purpose GPU library.

ArrayFire is a general-purpose library that simplifies the process of developing software that targets parallel and massively-parallel architectures i

4k Dec 29, 2022

CUDA integration for Python, plus shiny features

PyCUDA lets you access Nvidia's CUDA parallel computation API from Python. Several wrappers of the CUDA API already exist-so what's so special about P

1.4k Jan 02, 2023

cuSignal - RAPIDS Signal Processing Library

cuSignal The RAPIDS cuSignal project leverages CuPy, Numba, and the RAPIDS ecosystem for GPU accelerated signal processing. In some cases, cuSignal is

646 Dec 30, 2022

General purpose GPU compute framework for cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). Blazing fast, mobile-enabled, asynchronous and optimized for advanced GPU data processing usecases.

Vulkan Kompute The general purpose GPU compute framework for cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). Blazing fast, mobile-enabl

1k Dec 26, 2022

A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.

NVIDIA DALI The NVIDIA Data Loading Library (DALI) is a library for data loading and pre-processing to accelerate deep learning applications. It provi

4.2k Jan 08, 2023

Python 3 Bindings for NVML library. Get NVIDIA GPU status inside your program.

Related tags

Overview

py3nvml

Updates in Version 0.2.3

Requires

Installation

Utils

grab_gpus

get_free_gpus

get_num_procs

py3smi

Regular Usage

Differences from NVML

Release Notes (for pynvml)

COPYRIGHT

LICENSE

Comments

Releases(0.2.4)

0.2.4(Oct 11, 2019)

0.2.3(Mar 4, 2019)

0.2.2(Oct 23, 2018)

0.2.1(Jun 27, 2018)

0.2.0(May 17, 2018)

0.0.1(Nov 21, 2016)

Owner

Fergal Cotter

A Python module for getting the GPU status from NVIDA GPUs using nvidia-smi programmically in Python

QPT-Quick packaging tool 前项式Python环境快捷封装工具

📊 A simple command-line utility for querying and monitoring GPU status

cuGraph - RAPIDS Graph Analytics Library

jupyter/ipython experiment containers for GPU and general RAM re-use

A Python function for Slurm, to monitor the GPU information

Conda package for artifact creation that enables offline environments. Ideal for air-gapped deployments.

ArrayFire: a general purpose GPU library.

CUDA integration for Python, plus shiny features

cuSignal - RAPIDS Signal Processing Library

General purpose GPU compute framework for cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). Blazing fast, mobile-enabled, asynchronous and optimized for advanced GPU data processing usecases.

A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.

cuDF - GPU DataFrame Library

A NumPy-compatible array library accelerated by CUDA

Python interface to GPU-powered libraries

cuML - RAPIDS Machine Learning Library

Library for faster pinned CPU <-> GPU transfer in Pytorch

BlazingSQL is a lightweight, GPU accelerated, SQL engine for Python. Built on RAPIDS cuDF.

Python 3 Bindings for NVML library. Get NVIDIA GPU status inside your program.

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch