Skip to content

[Issue] Linux PyTorch : build fails — ncclDevComm undeclared in nccl_reduce_scatter_offset.cu (NCCL_DEVICE_HAS_REDUCE_COPY not guarded for USE_ROCM) #5714

@TriveniTadapaneni

Description

@TriveniTadapaneni

Context

  • Workflow: Release portable Linux PyTorch Wheels
  • Workflow file: .github/workflows/release_portable_linux_pytorch_wheels.yml
  • Failing run: ↗ View run
  • Platform: Linux
  • Impacted Arch: gfx1152,gfx1150,gfx1153,gfx908,gfx1151,gfx90a,gfx950-dcgpu,gfx94X-dcgpu,
  • PyTorch Versions: nightly, release/2.12, release/2.11
  • Python Versions: all (3.10, 3.11, 3.12, 3.13, 3.14)
  • rocm_version: 7.14.0a20260609
  • RCCL version bundled in SDK: 2.29.7

Summary

All 15 build jobs for gfx1152 fail at the PyTorch HIP compile stage. The failure is caused by NCCL_DEVICE_HAS_REDUCE_COPY being defined (RCCL 2.29.7 ≥ 2.29.7) without the !defined(USE_ROCM) guard that was already applied to the related NCCL_HAS_SYMMEM_DEVICE_SUPPORT macro. This causes the compiler to enter a #ifdef NCCL_DEVICE_HAS_REDUCE_COPY block that uses NVIDIA-internal device-side NCCL types (ncclDevComm, ncclCoopCta, ncclDevCommRequirements, etc.) that RCCL does not expose.

Error

nightly + release/2.12 — fails in nccl_reduce_scatter_offset.cu:

FAILED: [code=1] caffe2/CMakeFiles/torch_hip.dir/__/torch/csrc/distributed/c10d/symm_mem/ops/nccl_reduce_scatter_offset.cu.o
/__w/TheRock/TheRock/external-builds/pytorch/pytorch/torch/csrc/distributed/c10d/symm_mem/ops/nccl_reduce_scatter_offset.cu:78:5: error: unknown type name 'ncclDevComm'; did you mean 'ncclComm'?
   78 |     ncclDevComm devComm) {
/__w/TheRock/TheRock/external-builds/pytorch/pytorch/torch/csrc/distributed/c10d/symm_mem/ops/nccl_reduce_scatter_offset.cu:88:9: error: unknown type name 'ncclCoopCta'
/__w/TheRock/TheRock/external-builds/pytorch/pytorch/torch/csrc/distributed/c10d/symm_mem/ops/nccl_reduce_scatter_offset.cu:182:30: error: no member named 'get_devcomm' in 'c10d::symmetric_memory::NCCLDevCommManager'
/__w/TheRock/TheRock/external-builds/pytorch/pytorch/torch/csrc/distributed/c10d/symm_mem/ops/nccl_reduce_scatter_offset.cu:184:5: error: unknown type name 'ncclDevCommRequirements'
fatal error: too many errors emitted, stopping now [-ferror-limit=]
20 errors generated when compiling for gfx1152.

release/2.11 — same root cause, fails in nccl_extension.cu:

FAILED: .../torch_hip_generated_nccl_extension.cu.o
/__w/TheRock/TheRock/external-builds/pytorch/pytorch/torch/csrc/distributed/c10d/symm_mem/nccl_extension.cu:317:19: error: use of undeclared identifier 'NCCLDevCommManager'
  317 |   auto& manager = NCCLDevCommManager::get(device);

Suspected Root Cause

In torch/csrc/distributed/c10d/symm_mem/nccl_dev_cap.hpp, the macro NCCL_DEVICE_HAS_REDUCE_COPY is defined at NCCL ≥ 2.29.7 without a !defined(USE_ROCM) guard:

// Correctly guarded (added Feb 2026):
#if NCCL_VERSION_CODE >= NCCL_VERSION(2, 28, 0)
#if !defined(USE_ROCM)
#define NCCL_HAS_SYMMEM_DEVICE_SUPPORT
#include <nccl_device.h>   // defines ncclDevComm, ncclCoopCta, etc.
#endif
#endif

// Missing guard (added Apr 2026):
#if NCCL_VERSION_CODE >= NCCL_VERSION(2, 29, 7)
#define NCCL_DEVICE_HAS_REDUCE_COPY   // ← no !defined(USE_ROCM)
#endif

Because RCCL 2.29.7 satisfies the version threshold, NCCL_DEVICE_HAS_REDUCE_COPY is defined on ROCm, but nccl_device.h is not included (correctly blocked by !defined(USE_ROCM)), so the device-side types it defines are absent.

Suggested Fix

Add !defined(USE_ROCM) guard around NCCL_DEVICE_HAS_REDUCE_COPY in torch/csrc/distributed/c10d/symm_mem/nccl_dev_cap.hpp in upstream pytorch (both nightly/main and ROCm/pytorch release/2.12):

 #if NCCL_VERSION_CODE >= NCCL_VERSION(2, 29, 7)
+#if !defined(USE_ROCM)
 #define NCCL_DEVICE_HAS_REDUCE_COPY
+#endif
 #endif

This mirrors the pattern already applied to NCCL_HAS_SYMMEM_DEVICE_SUPPORT.

Full Logs

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions