Description
I am trying to integrate fbpic
, a well-known CUDA code (based on Python + Numba) for laser-plasma simulation with signac
. The integration repo is signac-driven-fbpic.
I managed to succesfully run on a single GPU, via python3 src/project.py run
from inside the signac
folder, but if I add --parallel
I get
numba.cuda.cudadrv.error.CudaDriverError: CUDA initialized before forking
The goal is to get 8 (independent) copies of fbpic
(with different input params) running in parallel on the 8 NVIDIA P100 GPUs that are on the same machine.
To reproduce
Clone the signac-driven-fbpic
repo and follow the install instructions. Then go to the signac
subfolder, and do
conda activate signac-driven-fbpic
python3 src/init.py
python3 src/project.py run --parallel
Error output
(signac-driven-fbpic) [email protected]:~/Development/signac-driven-fbpic/signac$ python3 src/project.py run --parallel --show-traceback
Using environment configuration: UnknownEnvironment
Serialize tasks|----------------------------------------------------------------------------------Serialize tasks|#####-----------------------------------------------------------------------------Serialize tasks|##########------------------------------------------------------------------------Serialize tasks|###############-------------------------------------------------------------------Serialize tasks|####################--------------------------------------------------------------Serialize tasks|##########################--------------------------------------------------------Serialize tasks|###############################---------------------------------------------------Serialize tasks|####################################----------------------------------------------Serialize tasks|#########################################-----------------------------------------Serialize tasks|###############################################-----------------------------------Serialize tasks|####################################################------------------------------Serialize tasks|#########################################################-------------------------Serialize tasks|##############################################################--------------------Serialize tasks|###################################################################---------------Serialize tasks|#########################################################################---------Serialize tasks|##############################################################################----Serialize tasks|##################################################################################Serialize tasks|##################################################################################Serialize tasks|##############################################################################################|100%
ERROR: Encountered error during program execution: 'CUDA initialized before forking'
Execute with '--show-traceback' or '--debug' to get more information.
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/andrei/anaconda3/envs/signac-driven-fbpic/lib/python3.6/multiprocessing/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "/home/andrei/anaconda3/envs/signac-driven-fbpic/lib/python3.6/site-packages/flow/project.py", line 2727, in _fork_with_serialization
project._fork(project._loads_op(operation))
File "/home/andrei/anaconda3/envs/signac-driven-fbpic/lib/python3.6/site-packages/flow/project.py", line 1467, in _fork
self._operation_functions[operation.name](operation.job)
File "src/project.py", line 172, in run_fbpic
verbose_level=2,
File "/home/andrei/anaconda3/envs/signac-driven-fbpic/lib/python3.6/site-packages/fbpic/main.py", line 232, in __init__
n_guard, n_damp, None, exchange_period, use_all_mpi_ranks )
File "/home/andrei/anaconda3/envs/signac-driven-fbpic/lib/python3.6/site-packages/fbpic/boundaries/boundary_communicator.py", line 267, in __init__
self.d_left_damp = cuda.to_device( self.left_damp )
File "/home/andrei/anaconda3/envs/signac-driven-fbpic/lib/python3.6/site-packages/numba/cuda/cudadrv/devices.py", line 212, in _require_cuda_context
return fn(*args, **kws)
File "/home/andrei/anaconda3/envs/signac-driven-fbpic/lib/python3.6/site-packages/numba/cuda/api.py", line 103, in to_device
to, new = devicearray.auto_device(obj, stream=stream, copy=copy)
File "/home/andrei/anaconda3/envs/signac-driven-fbpic/lib/python3.6/site-packages/numba/cuda/cudadrv/devicearray.py", line 683, in auto_device
devobj = from_array_like(obj, stream=stream)
File "/home/andrei/anaconda3/envs/signac-driven-fbpic/lib/python3.6/site-packages/numba/cuda/cudadrv/devicearray.py", line 621, in from_array_like
writeback=ary, stream=stream, gpu_data=gpu_data)
File "/home/andrei/anaconda3/envs/signac-driven-fbpic/lib/python3.6/site-packages/numba/cuda/cudadrv/devicearray.py", line 102, in __init__
gpu_data = devices.get_context().memalloc(self.alloc_size)
File "/home/andrei/anaconda3/envs/signac-driven-fbpic/lib/python3.6/site-packages/numba/cuda/cudadrv/driver.py", line 697, in memalloc
self._attempt_allocation(allocator)
File "/home/andrei/anaconda3/envs/signac-driven-fbpic/lib/python3.6/site-packages/numba/cuda/cudadrv/driver.py", line 680, in _attempt_allocation
allocator()
File "/home/andrei/anaconda3/envs/signac-driven-fbpic/lib/python3.6/site-packages/numba/cuda/cudadrv/driver.py", line 695, in allocator
driver.cuMemAlloc(byref(ptr), bytesize)
File "/home/andrei/anaconda3/envs/signac-driven-fbpic/lib/python3.6/site-packages/numba/cuda/cudadrv/driver.py", line 290, in safe_cuda_api_call
self._check_error(fname, retcode)
File "/home/andrei/anaconda3/envs/signac-driven-fbpic/lib/python3.6/site-packages/numba/cuda/cudadrv/driver.py", line 324, in _check_error
raise CudaDriverError("CUDA initialized before forking")
numba.cuda.cudadrv.error.CudaDriverError: CUDA initialized before forking
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "src/project.py", line 238, in <module>
Project().main()
File "/home/andrei/anaconda3/envs/signac-driven-fbpic/lib/python3.6/site-packages/flow/project.py", line 2721, in main
_exit_or_raise()
File "/home/andrei/anaconda3/envs/signac-driven-fbpic/lib/python3.6/site-packages/flow/project.py", line 2689, in main
args.func(args)
File "/home/andrei/anaconda3/envs/signac-driven-fbpic/lib/python3.6/site-packages/flow/project.py", line 2414, in _main_run
run()
File "/home/andrei/anaconda3/envs/signac-driven-fbpic/lib/python3.6/site-packages/flow/legacy.py", line 193, in wrapper
return func(self, jobs=jobs, names=names, *args, **kwargs)
File "/home/andrei/anaconda3/envs/signac-driven-fbpic/lib/python3.6/site-packages/flow/project.py", line 1597, in run
np=np, timeout=timeout, progress=progress)
File "/home/andrei/anaconda3/envs/signac-driven-fbpic/lib/python3.6/site-packages/flow/project.py", line 1421, in run_operations
pool, cloudpickle, operations, progress, timeout)
File "/home/andrei/anaconda3/envs/signac-driven-fbpic/lib/python3.6/site-packages/flow/project.py", line 1458, in _run_operations_in_parallel
result.get(timeout=timeout)
File "/home/andrei/anaconda3/envs/signac-driven-fbpic/lib/python3.6/multiprocessing/pool.py", line 644, in get
raise self._value
numba.cuda.cudadrv.error.CudaDriverError: CUDA initialized before forking
Relevant numba link.
System configuration
- Operating System: Ubuntu 16.04
- Version of Python: 3.6.8
- Version of signac: 1.1.0
- Version of signac-flow: 0.7.1
- NVIDIA Driver Version: 410.72
enhancement expertise needed