From 8cf46a52de8025e4034ca5166c89b468616111b9 Mon Sep 17 00:00:00 2001 From: ergawy Date: Tue, 14 Jan 2025 08:05:36 -0600 Subject: [PATCH] [AMDGPU][LTO] Disable adding `AMDGPULowerModuleLDSPass` to LTO This is a workaround to resolve https://ontrack-internal.amd.com/browse/SWDEV-502923 The AMDGPULowerModuleLDS pass is run 2 times. Both times are run on the combined/linked module for the entire input during LTO. The first run happens when lto::backend executes the the optimization pipeline (through calling opt) (https://github.com/llvm/llvm-project/blob/main/llvm/lib/LTO/LTOBackend.cpp#L551) and the second run happens when lto::backend later calls codegen (https://github.com/llvm/llvm-project/blob/main/llvm/lib/LTO/LTOBackend.cpp#L558). A crash happens in the second run because between the first and second runs, a new GV is added to the combined module: ``` @llvm.amdgcn.kernel.__omp_offloading_fc00_600030__QQmain_l5.lds = internal addrspace(3) global %llvm.amdgcn.kernel.__omp_offloading_fc00_600030__QQmain_l5.lds.t poison, align 16, !absolute_symbol !0 ``` This is the only absolute variable and it is not there yet during the first run of AMDGPULowerModuleLDS. In addition to the above absolute variable, we have other GVs that remain in the combined module (during the second run of the pass): ``` @_ZZ35__kmpc_nvptx_teams_reduce_nowait_v2E14ChunkTeamCount = internal unnamed_addr addrspace(3) global i32 undef, align 4 @_ZZ35__kmpc_nvptx_teams_reduce_nowait_v2E5Bound = internal unnamed_addr addrspace(3) global i32 undef, align 4 ``` which are not absolute, causing the issue (https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AMDGPU/AMDGPUMemoryUtils.cpp#L250). Disussing this with Jon Chesterfield, running the pass twice is actually incorrect (it should be run only once during code-gen, the first run of the pass creates "@llvm.amdgcn.kernel.__omp_offloading_fc00_600030__QQmain_l5.lds" which makes the second run fail). Commenting out these 2 lines "fixes" the issue: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp#L863-L864. We raised an issue upstream to further discuss proper fixes: https://github.com/llvm/llvm-project/issues/122891. --- llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp | 10 ++++++++-- llvm/test/CodeGen/AMDGPU/lto-lower-module-lds.ll | 1 + 2 files changed, 9 insertions(+), 2 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp index 7256eec89008a5..28d5716bb626db 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp @@ -856,8 +856,14 @@ void AMDGPUTargetMachine::registerPassBuilderCallbacks(PassBuilder &PB) { // module is partitioned for codegen. if (EnableSwLowerLDS) PM.addPass(AMDGPUSwLowerLDSPass(*this)); - if (EnableLowerModuleLDS) - PM.addPass(AMDGPULowerModuleLDSPass(*this)); + + // Most likely, adding this pass here is incorrect. Commenting out on + // ATD for now until we resolve the issue upstream. See: + // https://github.com/llvm/llvm-project/issues/122891 for the issue and + // https://ontrack-internal.amd.com/browse/SWDEV-502923?focusedId=17904500&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17904500 + // for an explanation why this is likely wrong. + //if (EnableLowerModuleLDS) + // PM.addPass(AMDGPULowerModuleLDSPass(*this)); if (Level != OptimizationLevel::O0) { // Do we really need internalization in LTO? if (InternalizeSymbols) { diff --git a/llvm/test/CodeGen/AMDGPU/lto-lower-module-lds.ll b/llvm/test/CodeGen/AMDGPU/lto-lower-module-lds.ll index f1d946376afe06..094f2da6b27f53 100644 --- a/llvm/test/CodeGen/AMDGPU/lto-lower-module-lds.ll +++ b/llvm/test/CodeGen/AMDGPU/lto-lower-module-lds.ll @@ -1,3 +1,4 @@ +; XFAIL: * ; Default O0 ; RUN: opt -mtriple=amdgcn-- -mcpu=gfx1030 %s -o %t.bc