Conversation
Add a third GPU backend targeting AMD GPUs via HIP (ROCm). RDNA GPUs use wave32 matching CUDA's warp size, so the existing kernel logic in kernel_v1_impl.cuh is shared between CUDA and HIP with minimal #ifdef guards. The cooperative kernel launch pattern is also identical. Changes: - kernel_v1_impl.cuh: conditional include for HIP cooperative_groups - event_buffer.h: recognize __HIP_PLATFORM_AMD__ alongside __CUDACC__ - kernel_v1.hip.cpp: HIP kernel launch wrapper with warp size assertion - Cargo.toml: add hip = ["ulib/hip"] feature - build.rs: compile kernel_v1.hip.cpp with hipcc, link amdhip64 - loom.rs: add sim_hip() function and #[cfg(feature = "hip")] dispatch - vendor/eda-infra-rs: HIP support in ucc (cl_hip, bindgen) and ulib (Device::HIP, HipBuffer, FFI bindings, memfill.hip.cpp) - CLAUDE.md: document HIP build commands and workflow The hip feature is fully optional. Existing CUDA/Metal/CPU-only builds are unaffected. Co-developed-by: Claude Code v2.1.44 (claude-opus-4-6)
Add a hip-on-nvidia CI job that validates the HIP code path on the existing nvidia-runner-1 using HIP's NVIDIA backend. This installs ROCm/HIP packages alongside the CUDA toolkit and runs the same timing test and X-propagation tests as the CUDA job. Also update cl_hip() in ucc to detect HIP_PLATFORM=nvidia and skip AMD-specific --offload-arch flags (which would fail with nvcc). Supports UCC_HIP_TARGETS=none for explicit opt-out. Co-developed-by: Claude Code v2.1.44 (claude-opus-4-6)
Use apt-key add (legacy but reliable) plus keyserver fallback for the specific signing key. The signed-by approach with dearmored keys failed on Ubuntu 24.04 (noble). Co-developed-by: Claude Code v2.1.44 (claude-opus-4-6)
hip-runtime-nvidia pulls in nvidia-kernel-common which conflicts with the GPU runner's existing nvidia-kernel-common-570-server. Hold all existing nvidia-* packages before installing ROCm/HIP packages to prevent apt from trying to resolve driver conflicts. Co-developed-by: Claude Code v2.1.44 (claude-opus-4-6)
The hip-runtime-nvidia package depends on nvidia driver packages that conflict with the -570-server variants on the GPU runner. Use apt download + dpkg --force-depends to install just the HIP packages we need without resolving the full driver dependency tree. Co-developed-by: Claude Code v2.1.44 (claude-opus-4-6)
apt-get download without sudo cannot read root-owned apt config files (Permission denied on rocm.list). Use sudo and a temp dir. Co-developed-by: Claude Code v2.1.44 (claude-opus-4-6)
apt-get download fails completely if any listed package is missing. Download each HIP package individually with fallback for packages that don't exist in the repo (hip-runtime-nvidia-dev, hip-base). Co-developed-by: Claude Code v2.1.44 (claude-opus-4-6)
…ions Three fixes for HIP backend: 1. cl_hip() NVIDIA backend: generate a wrapper script around hipcc that strips -ffunction-sections/-fdata-sections before forwarding to hipcc. The cc crate adds these for clang-family compilers, but when hipcc wraps nvcc (HIP_PLATFORM=nvidia) they get passed through and nvcc rejects them. 2. Fix swapped hipMemcpy direction constants in copy() — (HIP,CPU) is host-to-device, (CPU,HIP) is device-to-host. 3. Refactor kernel_v1.hip.cpp warp size validation to a one-time check instead of checking on every kernel launch. Co-developed-by: Claude Code v2.1.44 (claude-opus-4-6)
The cc crate adds -Wall/-Wextra for clang-family compilers. hipcc passes them directly to nvcc which rejects them. Disable cc-crate warnings and add -Xcompiler -Wall manually in the wrapper. Co-developed-by: Claude Code v2.1.44 (claude-opus-4-6)
The linker couldn't find libamdhip64 because /opt/rocm/lib wasn't in the search path. Add cargo:rustc-link-search in both build.rs files using ROCM_PATH env var (defaulting to /opt/rocm). Also add LD_LIBRARY_PATH and LIBRARY_PATH to CI environment. Co-developed-by: Claude Code v2.1.44 (claude-opus-4-6)
On NVIDIA backend, libamdhip64.so doesn't exist — HIP functions are header-only CUDA wrappers. Compile thin hip_ffi_* wrapper functions with hipcc so they resolve to the correct runtime regardless of platform. Link amdhip64 on AMD, cudart on NVIDIA. Co-developed-by: Claude Code v2.1.44 (claude-opus-4-6)
When HIP_PLATFORM=nvidia, we link cudart instead of amdhip64. Add CUDA_PATH/lib64 to the linker search path so the linker can find libcudart on the CI runner. Co-developed-by: Claude Code v2.1.44 (claude-opus-4-6)
hipDeviceGetAttribute wraps cuDeviceGetAttribute from libcuda.so. Add cuda driver library to link list and stubs/ to search path. Co-developed-by: Claude Code v2.1.44 (claude-opus-4-6)
16b88bc to
5094ec6
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
kernel_v1_impl.cuhbetween CUDA and HIP with minimal#ifdefguards — RDNA wave32 matches CUDA's warp size so kernel logic ports directlycl_hip()to ucc build system with NVIDIA backend support (HIP_PLATFORM=nvidia)hipMalloc,hipFree,hipMemcpy, etc.) — no external Rust crate neededDevice::HIP(u8)variant to ulib with fullUVec<T>allocation/copy/sync supportsim_hip()dispatch in loom binary with automatic device selectionHIP_PLATFORM=nvidiaNew files
vendor/eda-infra-rs/ulib/src/hip_ffi.rsvendor/eda-infra-rs/ulib/csrc/memfill.hip.cppcsrc/kernel_v1.hip.cppBuild & test
CPU-only, Metal, and benchmark builds are verified unaffected.
Test plan
hip-on-nvidiajob passes (HIP compiled with NVIDIA backend on existing GPU runner)cargo build -r --bin loomcargo build -r --features metal --bin loomcargo bench --bench event_buffer--check-with-cpufor correctness validation