Context
- Workflow: Release portable Linux PyTorch Wheels
- Workflow file:
.github/workflows/release_portable_linux_pytorch_wheels.yml
- Failing run: ↗ View run
- Platform:
Linux
- Impacted Arch:
gfx1152,gfx1150,gfx1153,gfx908,gfx1151,gfx90a,gfx950-dcgpu,gfx94X-dcgpu,
- PyTorch Versions:
nightly, release/2.12, release/2.11
- Python Versions: all (3.10, 3.11, 3.12, 3.13, 3.14)
- rocm_version:
7.14.0a20260609
- RCCL version bundled in SDK:
2.29.7
Summary
All 15 build jobs for gfx1152 fail at the PyTorch HIP compile stage. The failure is caused by NCCL_DEVICE_HAS_REDUCE_COPY being defined (RCCL 2.29.7 ≥ 2.29.7) without the !defined(USE_ROCM) guard that was already applied to the related NCCL_HAS_SYMMEM_DEVICE_SUPPORT macro. This causes the compiler to enter a #ifdef NCCL_DEVICE_HAS_REDUCE_COPY block that uses NVIDIA-internal device-side NCCL types (ncclDevComm, ncclCoopCta, ncclDevCommRequirements, etc.) that RCCL does not expose.
Error
nightly + release/2.12 — fails in nccl_reduce_scatter_offset.cu:
FAILED: [code=1] caffe2/CMakeFiles/torch_hip.dir/__/torch/csrc/distributed/c10d/symm_mem/ops/nccl_reduce_scatter_offset.cu.o
/__w/TheRock/TheRock/external-builds/pytorch/pytorch/torch/csrc/distributed/c10d/symm_mem/ops/nccl_reduce_scatter_offset.cu:78:5: error: unknown type name 'ncclDevComm'; did you mean 'ncclComm'?
78 | ncclDevComm devComm) {
/__w/TheRock/TheRock/external-builds/pytorch/pytorch/torch/csrc/distributed/c10d/symm_mem/ops/nccl_reduce_scatter_offset.cu:88:9: error: unknown type name 'ncclCoopCta'
/__w/TheRock/TheRock/external-builds/pytorch/pytorch/torch/csrc/distributed/c10d/symm_mem/ops/nccl_reduce_scatter_offset.cu:182:30: error: no member named 'get_devcomm' in 'c10d::symmetric_memory::NCCLDevCommManager'
/__w/TheRock/TheRock/external-builds/pytorch/pytorch/torch/csrc/distributed/c10d/symm_mem/ops/nccl_reduce_scatter_offset.cu:184:5: error: unknown type name 'ncclDevCommRequirements'
fatal error: too many errors emitted, stopping now [-ferror-limit=]
20 errors generated when compiling for gfx1152.
release/2.11 — same root cause, fails in nccl_extension.cu:
FAILED: .../torch_hip_generated_nccl_extension.cu.o
/__w/TheRock/TheRock/external-builds/pytorch/pytorch/torch/csrc/distributed/c10d/symm_mem/nccl_extension.cu:317:19: error: use of undeclared identifier 'NCCLDevCommManager'
317 | auto& manager = NCCLDevCommManager::get(device);
Suspected Root Cause
In torch/csrc/distributed/c10d/symm_mem/nccl_dev_cap.hpp, the macro NCCL_DEVICE_HAS_REDUCE_COPY is defined at NCCL ≥ 2.29.7 without a !defined(USE_ROCM) guard:
// Correctly guarded (added Feb 2026):
#if NCCL_VERSION_CODE >= NCCL_VERSION(2, 28, 0)
#if !defined(USE_ROCM)
#define NCCL_HAS_SYMMEM_DEVICE_SUPPORT
#include <nccl_device.h> // defines ncclDevComm, ncclCoopCta, etc.
#endif
#endif
// Missing guard (added Apr 2026):
#if NCCL_VERSION_CODE >= NCCL_VERSION(2, 29, 7)
#define NCCL_DEVICE_HAS_REDUCE_COPY // ← no !defined(USE_ROCM)
#endif
Because RCCL 2.29.7 satisfies the version threshold, NCCL_DEVICE_HAS_REDUCE_COPY is defined on ROCm, but nccl_device.h is not included (correctly blocked by !defined(USE_ROCM)), so the device-side types it defines are absent.
Suggested Fix
Add !defined(USE_ROCM) guard around NCCL_DEVICE_HAS_REDUCE_COPY in torch/csrc/distributed/c10d/symm_mem/nccl_dev_cap.hpp in upstream pytorch (both nightly/main and ROCm/pytorch release/2.12):
#if NCCL_VERSION_CODE >= NCCL_VERSION(2, 29, 7)
+#if !defined(USE_ROCM)
#define NCCL_DEVICE_HAS_REDUCE_COPY
+#endif
#endif
This mirrors the pattern already applied to NCCL_HAS_SYMMEM_DEVICE_SUPPORT.
Full Logs
- py 3.11, torch nightly
- py 3.12, torch release/2.12
- py 3.11, torch release/2.11
- py 3.10, torch nightly
- py 3.13, torch nightly
- py 3.10, torch release/2.12
- py 3.10, torch release/2.11
- py 3.12, torch release/2.11
- py 3.14, torch release/2.11
- py 3.13, torch release/2.12
- py 3.14, torch nightly
- py 3.12, torch nightly
- py 3.14, torch release/2.12
- py 3.13, torch release/2.11
- py 3.14, torch release/2.11
Context
.github/workflows/release_portable_linux_pytorch_wheels.ymlLinuxgfx1152,gfx1150,gfx1153,gfx908,gfx1151,gfx90a,gfx950-dcgpu,gfx94X-dcgpu,nightly,release/2.12,release/2.117.14.0a202606092.29.7Summary
All 15 build jobs for gfx1152 fail at the PyTorch HIP compile stage. The failure is caused by
NCCL_DEVICE_HAS_REDUCE_COPYbeing defined (RCCL 2.29.7 ≥ 2.29.7) without the!defined(USE_ROCM)guard that was already applied to the relatedNCCL_HAS_SYMMEM_DEVICE_SUPPORTmacro. This causes the compiler to enter a#ifdef NCCL_DEVICE_HAS_REDUCE_COPYblock that uses NVIDIA-internal device-side NCCL types (ncclDevComm,ncclCoopCta,ncclDevCommRequirements, etc.) that RCCL does not expose.Error
nightly + release/2.12 — fails in
nccl_reduce_scatter_offset.cu:release/2.11 — same root cause, fails in
nccl_extension.cu:Suspected Root Cause
In
torch/csrc/distributed/c10d/symm_mem/nccl_dev_cap.hpp, the macroNCCL_DEVICE_HAS_REDUCE_COPYis defined at NCCL ≥ 2.29.7 without a!defined(USE_ROCM)guard:Because RCCL 2.29.7 satisfies the version threshold,
NCCL_DEVICE_HAS_REDUCE_COPYis defined on ROCm, butnccl_device.his not included (correctly blocked by!defined(USE_ROCM)), so the device-side types it defines are absent.Suggested Fix
Add
!defined(USE_ROCM)guard aroundNCCL_DEVICE_HAS_REDUCE_COPYintorch/csrc/distributed/c10d/symm_mem/nccl_dev_cap.hppin upstream pytorch (bothnightly/mainandROCm/pytorch release/2.12):This mirrors the pattern already applied to
NCCL_HAS_SYMMEM_DEVICE_SUPPORT.Full Logs