Skip to content

Move RemoveRedundantDbgInstrs outside of inner loop in JumpThreading #123008

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jan 22, 2025

Conversation

huangjd
Copy link
Contributor

@huangjd huangjd commented Jan 15, 2025

This cleanup action only needs to be performed once when the entire optimization is converged. Doing it in every iteration has a very high time-complexity, as it queries every dbg value in a dense map

Compare before and after for one internal source file with many basic blocks

image
image

90% reduction in this extreme case.

…ading

This cleanup action only needs to be performed once when the entire optimization
is converged. Doing it in every iteration has a very high
time-complexity.
@llvmbot
Copy link
Member

llvmbot commented Jan 15, 2025

@llvm/pr-subscribers-llvm-transforms

Author: William Huang (huangjd)

Changes

This cleanup action only needs to be performed once when the entire optimization is converged. Doing it in every iteration has a very high time-complexity, as it queries every dbg value in a dense map

Compare before and after for one internal source file with many basic blocks

image
image

>90% reduction in this extreme case.


Full diff: https://github.com/llvm/llvm-project/pull/123008.diff

1 Files Affected:

  • (modified) llvm/lib/Transforms/Scalar/JumpThreading.cpp (+7-6)
diff --git a/llvm/lib/Transforms/Scalar/JumpThreading.cpp b/llvm/lib/Transforms/Scalar/JumpThreading.cpp
index 300a564e222e16..aae54d9763789b 100644
--- a/llvm/lib/Transforms/Scalar/JumpThreading.cpp
+++ b/llvm/lib/Transforms/Scalar/JumpThreading.cpp
@@ -330,11 +330,6 @@ bool JumpThreadingPass::runImpl(Function &F_, FunctionAnalysisManager *FAM_,
       while (processBlock(&BB)) // Thread all of the branches we can over BB.
         Changed = ChangedSinceLastAnalysisUpdate = true;
 
-      // Jump threading may have introduced redundant debug values into BB
-      // which should be removed.
-      if (Changed)
-        RemoveRedundantDbgInstrs(&BB);
-
       // Stop processing BB if it's the entry or is now deleted. The following
       // routines attempt to eliminate BB and locating a suitable replacement
       // for the entry is non-trivial.
@@ -366,7 +361,6 @@ bool JumpThreadingPass::runImpl(Function &F_, FunctionAnalysisManager *FAM_,
             // detect and transform nested loops later.
             !LoopHeaders.count(&BB) && !LoopHeaders.count(Succ) &&
             TryToSimplifyUncondBranchFromEmptyBlock(&BB, DTU.get())) {
-          RemoveRedundantDbgInstrs(Succ);
           // BB is valid for cleanup here because we passed in DTU. F remains
           // BB's parent until a DTU->getDomTree() event.
           LVI->eraseBlock(&BB);
@@ -377,6 +371,13 @@ bool JumpThreadingPass::runImpl(Function &F_, FunctionAnalysisManager *FAM_,
     EverChanged |= Changed;
   } while (Changed);
 
+  // Jump threading may have introduced redundant debug values into BB which
+  // should be removed.
+  if (EverChanged)
+    for (auto &BB : *F) {
+      RemoveRedundantDbgInstrs(&BB);
+    }
+
   LoopHeaders.clear();
   return EverChanged;
 }

@huangjd huangjd marked this pull request as draft January 15, 2025 04:11
@huangjd
Copy link
Contributor Author

huangjd commented Jan 15, 2025

@SLTozer I saw the patch fe6f6cd giving the reason for introducing the dbg instruction cleanup step. It was causing a compiler performance issue where there's a function with many dbg values and jump threading runs many iteration to converge. I think moving dbg instruction cleanup outside the loop is ok?

@huangjd huangjd marked this pull request as ready for review January 15, 2025 04:23
@SLTozer
Copy link
Contributor

SLTozer commented Jan 15, 2025

Optimizing this seems valuable, but it probably needs to be checked against the original reproducer. Removing redundant debug instructions in InstCombine isn't necessary at all in most cases, but for some pathological inputs we see a massive multiplication of debug intrinsics until the compiler runs out of memory. The reason we therefore remove intrinsics inside the loop rather than after it is because if we wait until after all the iterations have run, we may have already produced enough intrinsics to break the compiler.

Unfortunately, the info for the specific build has since been discarded, and the buildbot on which the issue appeared is no longer active. But the old commit message indicates that it was an issue with ThreadSanitizer builds, so I can check 1) whether I can reproduce the original issue if the cleanup step is removed, and 2) whether this change leads to the error manifesting again.

@huangjd
Copy link
Contributor Author

huangjd commented Jan 16, 2025

There's also a test having issue so I need clarification.

In Transforms/JumpThreading/thread-debug-info.ll

; Test for the cloning of dbg.values on elided instructions -- down one path
; being threaded, the `and` in the function below is optimised away, but its
; debug-info should still be preserved.
; Similarly, the call to f1 gets cloned, its dbg.value should be cloned too.
define void @test16(i1 %c, i1 %c2, i1 %c3, i1 %c4) nounwind ssp !dbg !30 {
; CHECK-LABEL: define void @test16(i1
entry:
  %cmp = icmp sgt i32 undef, 1, !dbg !33
  br i1 %c, label %land.end, label %land.rhs, !dbg !33

land.rhs:
  br i1 %c2, label %lor.lhs.false.i, label %land.end, !dbg !33

lor.lhs.false.i:
  br i1 %c3, label %land.end, label %land.end, !dbg !33

; CHECK-LABEL: land.end.thr_comm:
; CHECK-NEXT:  #dbg_value(i32 0,
; CHECK-NEXT:  #dbg_value(i32 1,
; CHECK-NEXT:  call void @f1()
; CHECK-NEXT:  br i1 %c4,

; CHECK-LABEL: land.end:
; CHECK-NEXT:  %0 = phi i1
; CHECK-NEXT:  #dbg_value(i32 0,
land.end:
  %0 = phi i1 [ true, %entry ], [ false, %land.rhs ], [false, %lor.lhs.false.i], [false, %lor.lhs.false.i]
  call void @llvm.dbg.value(metadata i32 0, metadata !32, metadata !DIExpression()), !dbg !33
  %cmp12 = and i1 %cmp, %0, !dbg !33
  %xor1 = xor i1 %cmp12, %c4, !dbg !33
  call void @llvm.dbg.value(metadata i32 1, metadata !32, metadata !DIExpression()), !dbg !33
  call void @f1()
  br i1 %xor1, label %if.then, label %if.end, !dbg !33

if.then:
  ret void, !dbg !33

if.end:
  ret void, !dbg !33
}

Why should duplicated debug value be preserved for the first case (before first call to f1)?

@SLTozer
Copy link
Contributor

SLTozer commented Jan 16, 2025

Why should duplicated debug value be preserved for the first case (before first call to f1)?

They shouldn't be - looking at your changes and assuming that you're seeing the test fail with the removal of those debug records, I'd guess that the redundant dbg_values end up together as a result of modifications that InstCombine makes after the last cleanup during the final iteration of the loop. Now that you've moved the cleanup to the very end, the redundant dbg_values are being observed by the cleanup step and removed.

…e end of jump threading, so duplicated dbg values from block duplication can be removed
Copy link
Contributor

@SLTozer SLTozer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've given a try at reproducing the prior issue in a build of LLVM and the llvm-test-suite with -fsanitize=thread, and didn't see it appear, so this is probably fine to land - it's possible that the call to remove redundant debug records is no longer necessary at all, but optimizing it at the very least is useful. Just be aware that the issue might still exist on some inputs I've not tested, so a revert might be necessary if it appears on the buildbots or somewhere downstream.

This LGTM, but before merging: I've left some inline comments that should be addressed, and when you merge this make sure that the [NFC] tag is removed from the commit message - besides the fact that it could be misleading if this patch indeed results in downstream errors, it does clearly have a small change on output in some cases now.

@huangjd huangjd changed the title [NFC] Move RemoveRedundantDbgInstrs outside of inner loop in JumpThreading Move RemoveRedundantDbgInstrs outside of inner loop in JumpThreading Jan 21, 2025
@huangjd huangjd merged commit e45de3d into llvm:main Jan 22, 2025
6 of 7 checks passed
@llvm-ci
Copy link
Collaborator

llvm-ci commented Jan 22, 2025

LLVM Buildbot has detected a new failure on builder clang-cmake-x86_64-avx512-win running on avx512-intel64-win while building llvm at step 4 "cmake stage 1".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/81/builds/3902

Here is the relevant piece of the build log for the reference
Step 4 (cmake stage 1) failure: 'cmake -G ...' (failure)
'cmake' is not recognized as an internal or external command,
operable program or batch file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants