When running Segger CLI to segment Xenium version 4.0.0. data on a machine with 2 separate GPUs, I encounter a cascade of CUDA: illegal memory access errors when both GPUs are visible. This appears to be caused by automatic distributed process spawning. Limiting execution to a single GPU avoids the issue.
Steps to reproduce
- Run Segger CLI on a 2× NVIDIA GeForce RTX 4090 machine with the default environment.
- Observe output:
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
...
Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/2
Initializing distributed: GLOBAL_RANK: 1, MEMBER: 2/2
----------------------------------------------------------------------------------------------------
distributed_backend=nccl
All distributed processes registered. Starting with 2 processes
----------------------------------------------------------------------------------------------------
- Segger crashes with multiple errors such as:
RuntimeError: parallel_for: failed to synchronize: cudaErrorIllegalAddress
During handling of the above exception, another exception occurred:
RuntimeError: CUDA error: an illegal memory access was encountered
cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS
- Limiting the session to a single GPU resolves the issue:
export CUDA_VISIBLE_DEVICES=0
Output:
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
...
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
Execution completes without errors.
Environment
- Python: 3.11.14
- Segger: 0.2.0
- PyTorch: 2.5.0+cu121
- Lightning: 2.6.0
- CUDA: 12.2.0
- NVIDIA Drivers: 535.247.01
- GPU: 2 × NVIDIA GeForce RTX 4090
Relevant packages and versions:
| Package |
Version |
| torch_scatter |
2.1.2+pt25cu121 |
| cuml-cu12 |
25.4.0 |
| cugraph-cu12 |
25.4.1 |
| cuspatial-cu12 |
25.4.0 |
| cudf-cu12 |
25.4.0 |
| cupy-cuda12x |
13.6.0 |