[LoopPeel] Add new option to peeling loops to make PHIs into IVs #121104

kasuga-fj · 2024-12-25T11:16:55Z

LoopPeel currently considers PHI nodes only when they become loop invariants as a result of peeling. However, in some cases, peeling transforms PHI nodes into induction variables (IVs), potentially increasing opportunities for other optimizations such as loop vectorization. E.g.,

// TSVC s292
int im = N-1;
for (int i=0; i<N; i++) {
  a[i] = b[i] + b[im];
  im = i;
}

In this case, peeling by 1 iteration makes im into an IV, enabling it to be handled by the loop vectorizer.

This patch adds a new feature to peel loops when to convert PHIs into IVs. At the moment this feature is disabled by default.

Enabling it allows to vectorize the above example. I have measured on neoverse-v2 and observed a speedup of more than 60% (options: -O3 -ffast-math -mcpu=neoverse-v2 -mllvm -enable-peeling-for-iv).

This PR is taken over from #94900
Related #81851

llvmbot · 2024-12-25T11:17:28Z

@llvm/pr-subscribers-llvm-transforms

Author: Ryotaro Kasuga (kasuga-fj)

Changes

LoopPeel now only handles Phis when they become loop invariants by peeling. There are cases where peeling makes Phis loop invariants, and peeling in such cases is also useful for other optimizations, such as loop vectorization. For example, consider the following loops.

int im = N-1;
for (int i=0;i&lt;N;i++) {
  a[i] = b[i]+b[im];
  im = i;
}

In this case, peeling by 1 iteration makes im a loop induction, so we can vectorize the loop.
This patch allows to vectorize the kernel of s291 and s292 in TSVC. I have measured on neoverse-v2 and observed a speedup of more than 60% (options: -O3 -ffast-math -mcpu=neoverse-v2).
Note that in some cases there was unnecessary peeling when tried with llvm-test-suite. The causes include peeling for a remainder loop of vectorization and the limitations of analysis by SCEV. However, as far as I've tried, these unnecessary peels do not affect performance.

This PR is taken over from #94900
Resolve #81851

Full diff: https://github.com/llvm/llvm-project/pull/121104.diff

2 Files Affected:

(modified) llvm/lib/Transforms/Utils/LoopPeel.cpp (+104-29)
(modified) llvm/test/Transforms/LoopUnroll/peel-loop-phi-analysis.ll (+139)

diff --git a/llvm/lib/Transforms/Utils/LoopPeel.cpp b/llvm/lib/Transforms/Utils/LoopPeel.cpp
index 3cbde39b30b4e4..f17046db3ce470 100644
--- a/llvm/lib/Transforms/Utils/LoopPeel.cpp
+++ b/llvm/lib/Transforms/Utils/LoopPeel.cpp
@@ -13,6 +13,7 @@
 #include "llvm/ADT/DenseMap.h"
 #include "llvm/ADT/SmallVector.h"
 #include "llvm/ADT/Statistic.h"
+#include "llvm/Analysis/IVDescriptors.h"
 #include "llvm/Analysis/Loads.h"
 #include "llvm/Analysis/LoopInfo.h"
 #include "llvm/Analysis/LoopIterator.h"
@@ -151,6 +152,32 @@ namespace {
 // corresponding calls to g are determined and the code for computing
 // x, y, and a can be removed.
 //
+// Similarly, there are cases where peeling makes Phi nodes loop-inductions
+// (i.e., the value is increased or decreased by a fixed amount on every
+// iteration). For example, consider the following function.
+//
+//   #define N 100
+//   void f(int a[], int b[]) {
+//     int im = N - 1;
+//     for (int i = 0; i < N; i++) {
+//       a[i] = b[i] + b[im];
+//       im = i;
+//     }
+//   }
+//
+// The IR of the loop will look something like the following.
+//
+//   %i = phi i32 [ 0, %entry ], [ %i.next, %for.body ]
+//   %im = phi i32 [ 99, %entry ], [ %i, %for.body ]
+//   ...
+//   %i.next = add nuw nsw i32 %i, 1
+//   ...
+//
+// In this case, %im becomes a loop-induction variable by peeling 1 iteration,
+// because %i is a loop-induction one. The peeling count can be determined by
+// the same algorithm with loop-invariant case. Such peeling is profitable for
+// loop-vectorization.
+//
 // The PhiAnalyzer class calculates how many times a loop should be
 // peeled based on the above analysis of the phi nodes in the loop while
 // respecting the maximum specified.
@@ -160,7 +187,7 @@ class PhiAnalyzer {
 
   // Calculate the sufficient minimum number of iterations of the loop to peel
   // such that phi instructions become determined (subject to allowable limits)
-  std::optional<unsigned> calculateIterationsToPeel();
+  std::optional<unsigned> calculateIterationsToPeel(ScalarEvolution &SE);
 
 protected:
   using PeelCounter = std::optional<unsigned>;
@@ -175,13 +202,17 @@ class PhiAnalyzer {
 
   // Calculate the number of iterations after which the given value
   // becomes an invariant.
-  PeelCounter calculate(const Value &);
+  PeelCounter calculate(Value &, ScalarEvolution &SE);
+
+  // Returns true if the \p Phi is an induction in the target loop. This
+  // funciton is a wrapper of `InductionDescriptor::isInductionPHI`.
+  bool isInductionPHI(PHINode *Phi, ScalarEvolution &SE) const;
 
   const Loop &L;
   const unsigned MaxIterations;
 
-  // Map of Values to number of iterations to invariance
-  SmallDenseMap<const Value *, PeelCounter> IterationsToInvariance;
+  // Map of Values to number of iterations to invariance or induction
+  SmallDenseMap<const Value *, PeelCounter> IterationsToInvarianceOrInduction;
 };
 
 PhiAnalyzer::PhiAnalyzer(const Loop &L, unsigned MaxIterations)
@@ -190,6 +221,39 @@ PhiAnalyzer::PhiAnalyzer(const Loop &L, unsigned MaxIterations)
   assert(MaxIterations > 0 && "no peeling is allowed?");
 }
 
+bool PhiAnalyzer::isInductionPHI(PHINode *Phi, ScalarEvolution &SE) const {
+  if (!SE.isSCEVable(Phi->getType()))
+    return false;
+
+  const SCEV *Expr = SE.getSCEV(Phi);
+
+  // Ignore casts because they are noisy for peeling. For example, consider
+  // following loop.
+  //
+  //   unsigned long N = ...;
+  //   for (unsigned int i = 0; i < N; i++) {
+  //     ...
+  //   }
+  //
+  // The IR of the loop becomes something like the following.
+  //
+  //   %i = phi i32 [ 0, %entry ], [ %i.next, %body ]
+  //   %conv = phi i64 [ 0, %entry ], [ %conv.next, %body ]
+  //   ...
+  //   %i.next = add i32 %i, 1
+  //   %conv.next = zext i32 %i.next to i64
+  //   ...
+  //
+  // The SCEV of %conv becomes something like (zext i32 {0,+,1}<nuw><%body> to
+  // i64), and this is not an induction. However, as for peeling, it is better
+  // to ignore such outermost casts to avoid unnecessary peeling.
+  while (const auto *Cast = dyn_cast<SCEVCastExpr>(Expr))
+    Expr = Cast->getOperand();
+
+  InductionDescriptor ID;
+  return InductionDescriptor::isInductionPHI(Phi, &L, &SE, ID, Expr);
+}
+
 // This function calculates the number of iterations after which the value
 // becomes an invariant. The pre-calculated values are memorized in a map.
 // N.B. This number will be Unknown or <= MaxIterations.
@@ -204,59 +268,70 @@ PhiAnalyzer::PhiAnalyzer(const Loop &L, unsigned MaxIterations)
 //           %y = phi(0, 5)
 //           %a = %y + 1
 //   G(%y) = Unknown otherwise (including phi not in header block)
-PhiAnalyzer::PeelCounter PhiAnalyzer::calculate(const Value &V) {
+PhiAnalyzer::PeelCounter PhiAnalyzer::calculate(Value &V, ScalarEvolution &SE) {
   // If we already know the answer, take it from the map.
   // Otherwise, place Unknown to map to avoid infinite recursion. Such
   // cycles can never stop on an invariant.
-  auto [I, Inserted] = IterationsToInvariance.try_emplace(&V, Unknown);
+  auto [I, Inserted] =
+      IterationsToInvarianceOrInduction.try_emplace(&V, Unknown);
   if (!Inserted)
     return I->second;
 
   if (L.isLoopInvariant(&V))
     // Loop invariant so known at start.
-    return (IterationsToInvariance[&V] = 0);
-  if (const PHINode *Phi = dyn_cast<PHINode>(&V)) {
+    return (IterationsToInvarianceOrInduction[&V] = 0);
+  if (PHINode *Phi = dyn_cast<PHINode>(&V)) {
     if (Phi->getParent() != L.getHeader()) {
       // Phi is not in header block so Unknown.
-      assert(IterationsToInvariance[&V] == Unknown && "unexpected value saved");
+      assert(IterationsToInvarianceOrInduction[&V] == Unknown &&
+             "unexpected value saved");
       return Unknown;
     }
+
+    // If Phi is an induction, register it as a starting point.
+    if (isInductionPHI(Phi, SE))
+      return (IterationsToInvarianceOrInduction[&V] = 0);
+
     // We need to analyze the input from the back edge and add 1.
     Value *Input = Phi->getIncomingValueForBlock(L.getLoopLatch());
-    PeelCounter Iterations = calculate(*Input);
-    assert(IterationsToInvariance[Input] == Iterations &&
+    PeelCounter Iterations = calculate(*Input, SE);
+    assert(IterationsToInvarianceOrInduction[Input] == Iterations &&
            "unexpected value saved");
-    return (IterationsToInvariance[Phi] = addOne(Iterations));
+    return (IterationsToInvarianceOrInduction[Phi] = addOne(Iterations));
   }
   if (const Instruction *I = dyn_cast<Instruction>(&V)) {
     if (isa<CmpInst>(I) || I->isBinaryOp()) {
       // Binary instructions get the max of the operands.
-      PeelCounter LHS = calculate(*I->getOperand(0));
+      PeelCounter LHS = calculate(*I->getOperand(0), SE);
       if (LHS == Unknown)
         return Unknown;
-      PeelCounter RHS = calculate(*I->getOperand(1));
+      PeelCounter RHS = calculate(*I->getOperand(1), SE);
       if (RHS == Unknown)
         return Unknown;
-      return (IterationsToInvariance[I] = {std::max(*LHS, *RHS)});
+      return (IterationsToInvarianceOrInduction[I] = {std::max(*LHS, *RHS)});
     }
     if (I->isCast())
       // Cast instructions get the value of the operand.
-      return (IterationsToInvariance[I] = calculate(*I->getOperand(0)));
+      return (IterationsToInvarianceOrInduction[I] =
+                  calculate(*I->getOperand(0), SE));
   }
   // TODO: handle more expressions
 
   // Everything else is Unknown.
-  assert(IterationsToInvariance[&V] == Unknown && "unexpected value saved");
+  assert(IterationsToInvarianceOrInduction[&V] == Unknown &&
+         "unexpected value saved");
   return Unknown;
 }
 
-std::optional<unsigned> PhiAnalyzer::calculateIterationsToPeel() {
+std::optional<unsigned>
+PhiAnalyzer::calculateIterationsToPeel(ScalarEvolution &SE) {
   unsigned Iterations = 0;
   for (auto &PHI : L.getHeader()->phis()) {
-    PeelCounter ToInvariance = calculate(PHI);
-    if (ToInvariance != Unknown) {
-      assert(*ToInvariance <= MaxIterations && "bad result in phi analysis");
-      Iterations = std::max(Iterations, *ToInvariance);
+    PeelCounter ToInvarianceOrInduction = calculate(PHI, SE);
+    if (ToInvarianceOrInduction != Unknown) {
+      assert(*ToInvarianceOrInduction <= MaxIterations &&
+             "bad result in phi analysis");
+      Iterations = std::max(Iterations, *ToInvarianceOrInduction);
       if (Iterations == MaxIterations)
         break;
     }
@@ -585,14 +660,14 @@ void llvm::computePeelCount(Loop *L, unsigned LoopSize,
   // in TTI.getPeelingPreferences or by the flag -unroll-peel-count.
   unsigned DesiredPeelCount = TargetPeelCount;
 
-  // Here we try to get rid of Phis which become invariants after 1, 2, ..., N
-  // iterations of the loop. For this we compute the number for iterations after
-  // which every Phi is guaranteed to become an invariant, and try to peel the
-  // maximum number of iterations among these values, thus turning all those
-  // Phis into invariants.
+  // Here we try to get rid of Phis which become invariants or inductions after
+  // 1, 2, ..., N iterations of the loop. For this we compute the number for
+  // iterations after which every Phi is guaranteed to become an invariant or an
+  // induction, and try to peel the maximum number of iterations among these
+  // values, thus turning all those Phis into invariants or inductions.
   if (MaxPeelCount > DesiredPeelCount) {
     // Check how many iterations are useful for resolving Phis
-    auto NumPeels = PhiAnalyzer(*L, MaxPeelCount).calculateIterationsToPeel();
+    auto NumPeels = PhiAnalyzer(*L, MaxPeelCount).calculateIterationsToPeel(SE);
     if (NumPeels)
       DesiredPeelCount = std::max(DesiredPeelCount, *NumPeels);
   }
@@ -610,7 +685,7 @@ void llvm::computePeelCount(Loop *L, unsigned LoopSize,
     if (DesiredPeelCount + AlreadyPeeled <= UnrollPeelMaxCount) {
       LLVM_DEBUG(dbgs() << "Peel " << DesiredPeelCount
                         << " iteration(s) to turn"
-                        << " some Phis into invariants.\n");
+                        << " some Phis into invariants or inductions.\n");
       PP.PeelCount = DesiredPeelCount;
       PP.PeelProfiledIterations = false;
       return;
diff --git a/llvm/test/Transforms/LoopUnroll/peel-loop-phi-analysis.ll b/llvm/test/Transforms/LoopUnroll/peel-loop-phi-analysis.ll
index e24eeef52de4e9..0362892dcb1cdb 100644
--- a/llvm/test/Transforms/LoopUnroll/peel-loop-phi-analysis.ll
+++ b/llvm/test/Transforms/LoopUnroll/peel-loop-phi-analysis.ll
@@ -197,3 +197,142 @@ for.body:
   %exitcond = icmp eq i32 %inc, 100000
   br i1 %exitcond, label %for.cond.cleanup, label %for.body
 }
+
+; Check that phi analysis can handle a binary operator with induction variable.
+define void @_Z6binaryv_induction() {
+; The phis become induction through the chain of phis, with a unary
+; instruction on a loop induction.  Check that the phis for x, a, and y become
+; a loop induction since x is based on y, which is based on a, which is based
+; on a binary add of a constant and i, which is a loop induction.
+; Consider the calls to g:
+; First iteration: g(0), x=0, g(0), y=1, a=2
+; Second iteration: g(0), x=1, g(2), y=3, a=3
+; Third iteration: g(1), x=3, g(3), y=4, a=4
+; Fourth iteration (and subsequent): g(i), x=i+1, g(i+1), y=i+2, a=i+2
+; Therefore, peeling 3 times removes the phi nodes.
+;
+; void g(int);
+; void binary() {
+;   int x = 0;
+;   int y = 0;
+;   int a = 0;
+;   for(int i = 0; i <100000; ++i) {
+;     g(x);
+;     x = y;
+;     g(a);
+;     y = a + 1;
+;     a = i + 2;
+;   }
+; }
+; CHECK-LABEL: @_Z6binaryv_induction(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    br label [[FOR_BODY_PEEL_BEGIN:%.*]]
+; CHECK:       for.body.peel.begin:
+; CHECK-NEXT:    br label [[FOR_BODY_PEEL:%.*]]
+; CHECK:       for.body.peel:
+; CHECK-NEXT:    tail call void @_Z1gi(i32 signext 0)
+; CHECK-NEXT:    tail call void @_Z1gi(i32 signext 0)
+; CHECK-NEXT:    [[ADD_PEEL:%.*]] = add nuw nsw i32 0, 2
+; CHECK-NEXT:    [[INC_PEEL:%.*]] = add nuw nsw i32 0, 1
+; CHECK-NEXT:    [[EXITCOND_PEEL:%.*]] = icmp ne i32 [[INC_PEEL]], 100000
+; CHECK-NEXT:    br i1 [[EXITCOND_PEEL]], label [[FOR_BODY_PEEL_NEXT:%.*]], label [[FOR_COND_CLEANUP:%.*]]
+; CHECK:       for.body.peel.next:
+; CHECK-NEXT:    br label [[FOR_BODY_PEEL2:%.*]]
+; CHECK:       for.body.peel2:
+; CHECK-NEXT:    tail call void @_Z1gi(i32 signext 0)
+; CHECK-NEXT:    tail call void @_Z1gi(i32 signext [[ADD_PEEL]])
+; CHECK-NEXT:    [[ADD_PEEL3:%.*]] = add nuw nsw i32 [[INC_PEEL]], 2
+; CHECK-NEXT:    [[INC_PEEL4:%.*]] = add nuw nsw i32 [[INC_PEEL]], 1
+; CHECK-NEXT:    [[EXITCOND_PEEL5:%.*]] = icmp ne i32 [[INC_PEEL4]], 100000
+; CHECK-NEXT:    br i1 [[EXITCOND_PEEL5]], label [[FOR_BODY_PEEL_NEXT1:%.*]], label [[FOR_COND_CLEANUP]]
+; CHECK:       for.body.peel.next1:
+; CHECK-NEXT:    br label [[FOR_BODY_PEEL7:%.*]]
+; CHECK:       for.body.peel7:
+; CHECK-NEXT:    tail call void @_Z1gi(i32 signext 0)
+; CHECK-NEXT:    tail call void @_Z1gi(i32 signext [[ADD_PEEL3]])
+; CHECK-NEXT:    [[ADD_PEEL8:%.*]] = add nuw nsw i32 [[INC_PEEL4]], 2
+; CHECK-NEXT:    [[INC_PEEL9:%.*]] = add nuw nsw i32 [[INC_PEEL4]], 1
+; CHECK-NEXT:    [[EXITCOND_PEEL10:%.*]] = icmp ne i32 [[INC_PEEL9]], 100000
+; CHECK-NEXT:    br i1 [[EXITCOND_PEEL10]], label [[FOR_BODY_PEEL_NEXT6:%.*]], label [[FOR_COND_CLEANUP]]
+; CHECK:       for.body.peel.next6:
+; CHECK-NEXT:    br label [[FOR_BODY_PEEL_NEXT11:%.*]]
+; CHECK:       for.body.peel.next11:
+; CHECK-NEXT:    br label [[ENTRY_PEEL_NEWPH:%.*]]
+; CHECK:       entry.peel.newph:
+; CHECK-NEXT:    br label [[FOR_BODY:%.*]]
+; CHECK:       for.cond.cleanup.loopexit:
+; CHECK-NEXT:    br label [[FOR_COND_CLEANUP]]
+; CHECK:       for.cond.cleanup:
+; CHECK-NEXT:    ret void
+; CHECK:       for.body:
+; CHECK-NEXT:    [[I:%.*]] = phi i32 [ [[INC_PEEL9]], [[ENTRY_PEEL_NEWPH]] ], [ [[INC:%.*]], [[FOR_BODY]] ]
+; CHECK-NEXT:    [[X:%.*]] = phi i32 [ [[ADD_PEEL]], [[ENTRY_PEEL_NEWPH]] ], [ [[Y:%.*]], [[FOR_BODY]] ]
+; CHECK-NEXT:    [[A:%.*]] = phi i32 [ [[ADD_PEEL8]], [[ENTRY_PEEL_NEWPH]] ], [ [[ADD:%.*]], [[FOR_BODY]] ]
+; CHECK-NEXT:    [[Y]] = phi i32 [ [[ADD_PEEL3]], [[ENTRY_PEEL_NEWPH]] ], [ [[A]], [[FOR_BODY]] ]
+; CHECK-NEXT:    tail call void @_Z1gi(i32 signext [[X]])
+; CHECK-NEXT:    tail call void @_Z1gi(i32 signext [[A]])
+; CHECK-NEXT:    [[ADD]] = add nuw nsw i32 [[I]], 2
+; CHECK-NEXT:    [[INC]] = add nuw nsw i32 [[I]], 1
+; CHECK-NEXT:    [[EXITCOND:%.*]] = icmp ne i32 [[INC]], 100000
+; CHECK-NEXT:    br i1 [[EXITCOND]], label [[FOR_BODY]], label [[FOR_COND_CLEANUP_LOOPEXIT:%.*]], !llvm.loop [[LOOP3:![0-9]+]]
+;
+entry:
+  br label %for.body
+
+for.cond.cleanup:
+  ret void
+
+for.body:
+  %i = phi i32 [ 0, %entry ], [ %inc, %for.body ]
+  %x = phi i32 [ 0, %entry ], [ %y, %for.body ]
+  %a = phi i32 [ 0, %entry ], [ %add, %for.body ]
+  %y = phi i32 [ 0, %entry ], [ %a, %for.body ]
+  tail call void @_Z1gi(i32 signext %x)
+  tail call void @_Z1gi(i32 signext %a)
+  %add = add nuw nsw i32 %i, 2
+  %inc = add nuw nsw i32 %i, 1
+  %exitcond = icmp ne i32 %inc, 100000
+  br i1 %exitcond, label %for.body, label %for.cond.cleanup
+}
+
+; Check that phi analysis can handle cast operations with induction variable.
+define void @_Z6induction_with_cast(ptr noundef %a, i64 noundef %size) {
+; The original code is like as follows. We don't need peel the loop to make
+; phis loop induction.
+;
+; void f(unsigned int *a, unsigned long N) {
+;   for (unsigned int i=0; i<N; i++)
+;     a[i] = 10;
+; }
+;
+; CHECK-LABEL: @_Z6induction_with_cast(
+; CHECK-NEXT:  for.body.preheader:
+; CHECK-NEXT:    br label [[FOR_BODY:%.*]]
+; CHECK:       for.body:
+; CHECK-NEXT:    [[CONV6:%.*]] = phi i64 [ [[CONV:%.*]], [[FOR_BODY]] ], [ 0, [[FOR_BODY_PREHEADER:%.*]] ]
+; CHECK-NEXT:    [[I_05:%.*]] = phi i32 [ [[ADD:%.*]], [[FOR_BODY]] ], [ 0, [[FOR_BODY_PREHEADER]] ]
+; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw i32, ptr [[A:%.*]], i64 [[CONV6]]
+; CHECK-NEXT:    store i32 10, ptr [[ARRAYIDX]], align 4
+; CHECK-NEXT:    [[ADD]] = add i32 [[I_05]], 1
+; CHECK-NEXT:    [[CONV]] = zext i32 [[ADD]] to i64
+; CHECK-NEXT:    [[CMP:%.*]] = icmp ugt i64 [[SIZE:%.*]], [[CONV]]
+; CHECK-NEXT:    br i1 [[CMP]], label [[FOR_BODY]], label [[FOR_COND_CLEANUP:%.*]]
+; CHECK:       for.cond.cleanup:
+; CHECK-NEXT:    ret void
+;
+for.body.preheader:
+  br label %for.body
+
+for.body:
+  %conv6 = phi i64 [ %conv, %for.body ], [ 0, %for.body.preheader ]
+  %i.05 = phi i32 [ %add, %for.body ], [ 0, %for.body.preheader ]
+  %arrayidx = getelementptr inbounds nuw i32, ptr %a, i64 %conv6
+  store i32 10, ptr %arrayidx, align 4
+  %add = add i32 %i.05, 1
+  %conv = zext i32 %add to i64
+  %cmp = icmp ugt i64 %size, %conv
+  br i1 %cmp, label %for.body, label %for.cond.cleanup
+
+for.cond.cleanup:
+  ret void
+}

kasuga-fj · 2025-01-15T07:15:46Z

ping

madhur13490 · 2025-01-15T07:45:54Z

Overall, I am fine with the change. However, I feel you should assess the compile-time impact of this to be on the safe side. I suspect a negative effect on compile-time because the change will also consider Phi becoming IV.

Please refer to compile-time-tracker.

Can you please add a test case for the below case too?

#define N 100
void f(int a[], int b[]) {
  int im = N - 1;
  for (int i = 0; i < N; i++) {
    a[i] = b[i] + b[im];
    im = i;
  }
}

llvm/lib/Transforms/Utils/LoopPeel.cpp

kasuga-fj · 2025-01-15T08:11:49Z

@nikic Could you please evaluate the compile-time difference of this PR?

kasuga-fj · 2025-01-15T08:13:04Z

Can you please add a test case for the below case too?

Looks like I added the wrong test case, I will fix it. Thanks.

nikic · 2025-01-16T13:37:37Z

Compile-time: https://llvm-compile-time-tracker.com/compare.php?from=8fb29ba287d72392bd7900c33d2a8d2149126dbe&to=060d43fb0d920a663d5199ff37ebfdf204009177&stat=instructions:u

Haven't looked at the implementation yet...

kasuga-fj · 2025-01-17T06:38:23Z

Thanks!

@madhur13490 What do you think about the impact? I'm not sure if this is large regression or not. The +0.52% for SPASS is not negligible?

madhur13490 · 2025-01-22T07:18:39Z

@madhur13490 What do you think about the impact? I'm not sure if this is large regression or not. The +0.52% for SPASS is not negligible?

I am fine as long as @nikic is!

madhur13490 · 2025-01-22T07:21:33Z

Hi @jamieschmeiser @fhahn can you please also have a look?

jamieschmeiser · 2025-01-23T20:35:14Z

My only comments are stylistic. The reason I added the PhiAnalyzer and brought the code into the class when I reworked/expanded the algorithms was that it organized things and cleaned up the calling between the various functions. In that spirit, I would suggest adding ScalarEvolution as a member to PhiAnalyzer and setting it in the constructor rather than passing it around. This is a minor stylistic nit but (I think) it makes the code cleaner and more readable. I certainly wouldn't block this change for that if you do not agree. Thanks for doing this.

kasuga-fj · 2025-01-24T08:05:17Z

Hi @jamieschmeiser, I don't have a strong opinion on the style and have decided to follow the original thought. Thanks for your comment.

madhur13490 · 2025-01-28T08:17:01Z

LGTM. Please wait for @nikic for confirmation on compile-time impact.

kasuga-fj · 2025-02-12T02:23:04Z

I found that the SCEV calculation to call isInductionPHI was expensive. I replaced it with cheaper checks and the compilation time regression was mitigated on my local.

@nikic Could you evaluate the compile-time again?

LoopPeel now only handles Phis when they become loop invariants by peeling. There are cases where peeling makes Phis loop invariants, and peeling in such cases is also useful for other optimizations, such as loop vectorization. For example, consider the following loops. ``` int im = N-1; for (int i=0;i<N;i++) { a[i] = b[i]+b[im]; im = i; } ``` In this case, peeling by 1 iteration makes `im` a loop induction, so we can vectorize the loop. This patch allows to vectorize the kernel of s291 and s292 in TSVC. I have measured on neoverse-v2 and observed a speedup of more than 60% (options: `-O3 -ffast-math -mcpu=neoverse-v2`). Note that in some cases there was unnecessary peeling when tried with llvm-test-suite. The causes include peeling for a remainder loop of vectorization and the limitations of analysis by SCEV. However, as far as I've tried, these unnecessary peels do not affect performance.

kasuga-fj · 2025-02-28T14:28:48Z

Gentle ping

madhur13490 · 2025-03-04T14:18:43Z

@kasuga-fj I don't think compile-time measurements has to be done by Nikita only. You can also follow the steps on compile-time-tracker and get it triggered automatically. If your repo is already on compile-time-tracker's radar, following the branch naming convention will trigger the run automatically. If you have already passed that stage, then please post your results.

nikic · 2025-03-04T14:22:43Z

Ah sorry, I did test this back then but forgot the report the results. They look good: https://llvm-compile-time-tracker.com/compare.php?from=8374d421861cd3d47e21ae7889ba0b4c498e8d85&to=53ef622e8ef3d26660f29c6dff4d1fb869270a69&stat=instructions:u

llvm/lib/Transforms/Utils/LoopPeel.cpp

nikic · 2025-03-04T14:26:27Z

llvm/lib/Transforms/Utils/LoopPeel.cpp

+bool PhiAnalyzer::isInductionPHI(const PHINode *Phi) const {
+  // Currently, we only support loops that consist of one basic block. In this
+  // case, the phi can become an IV if it has an incoming value from the basic
+  // block that this phi is also included.


Suggested change

// block that this phi is also included.

// block where the phi is defined.

But I don't really get why we need this limitation. The extension to multiple blocks seems trivial -- just need to look for the input from getLoopLatch() instead?

And once you do that you can also replace your loop with getIncomingValueForBlock.

The purpose of that is to limit the number of incoming edges from the loop to one, and your suggestion looks better to me. Thanks! (Also, your point about the English language is also very much appreciated.)

llvm/lib/Transforms/Utils/LoopPeel.cpp

nikic · 2025-03-04T14:32:26Z

llvm/lib/Transforms/Utils/LoopPeel.cpp

+      if (BinOp->getOpcode() != Instruction::Add &&
+          BinOp->getOpcode() != Instruction::Sub)
+        return false;
+      if (!BinOp->hasNoUnsignedWrap() || !BinOp->hasNoSignedWrap())


What's the reason for this check?

I added them just in case because I don't fully understand how other similar functions (like isInductionPHI) handle these flags. This is not to say that I've found any cases where something weird happens without this check.

Now I don't think this check is necessary. However, removing them happened undesirable peelings. It seems that this check just happens to work well to deflect undesired loops.

llvm/test/Transforms/LoopUnroll/peel-loop-phi-analysis.ll

nikic · 2025-03-04T14:37:05Z

llvm/test/Transforms/LoopUnroll/peel-loop-phi-analysis.ll

It would be good to also add some negative tests where peeling would not result in an induction / peeling is not desirable.

E.g. you could test a pseudo-IV where the increment is loop varying.

Added two cases where the peeling is not desirable. The current implementation only considers constant increments, so I don't think there are any cases where peeling does not result in IV (there are cases where the variable is originally IV without peeling).

nikic · 2025-03-04T14:38:46Z

Note that in some cases there was unnecessary peeling when tried with llvm-test-suite. The causes include peeling for a remainder loop of vectorization and the limitations of analysis by SCEV. However, as far as I've tried, these unnecessary peels do not affect performance.

Do you have an example handy? Generally having some unnecessary peeling is expected, but peeling remainder loops is not great.

Co-authored-by: Nikita Popov <[email protected]>

github-actions · 2025-03-05T07:13:03Z

✅ With the latest revision this PR passed the C/C++ code formatter.

kasuga-fj

Note that in some cases there was unnecessary peeling when tried with llvm-test-suite. The causes include peeling for a remainder loop of vectorization and the limitations of analysis by SCEV. However, as far as I've tried, these unnecessary peels do not affect performance.

Do you have an example handy? Generally having some unnecessary peeling is expected, but peeling remainder loops is not great.

I found such cases when I first submitted this patch (I used SCEV to detect IVs at the time), but I just tried to run llvm-test-suite and it didn't reproduce. I'm now checking the details of the difference with and without this patch, but there does not appear to be a significant difference. The following loops were newly peeled.

EDIT: Some peelings in the following list is unnecessary one. They are fixed in c6f1a3a and the peeling no longer happens. I've checked that all others are what peeled as intended.

MultiSource/Applications/ClamAV/libclamav_matcher-ac.c:888:6
~~MultiSource/Applications/JM/ldecod/vlc.c:886:3~~
~~MultiSource/Applications/JM/ldecod/vlc.c:961:3~~
~~MultiSource/Applications/JM/lencod/cabac.c:1492:3~~
~~MultiSource/Applications/oggenc/oggenc.c:47108:7~~
MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/finalpin.c:434:2
MultiSource/Benchmarks/TSVC/CrossingThresholds-dbl/../tsc.inc:3310:3
MultiSource/Benchmarks/TSVC/CrossingThresholds-dbl/../tsc.inc:3342:3
MultiSource/Benchmarks/TSVC/CrossingThresholds-flt/../tsc.inc:3310:3
MultiSource/Benchmarks/TSVC/CrossingThresholds-flt/../tsc.inc:3342:3

llvm/lib/Transforms/Utils/LoopPeel.cpp

kasuga-fj · 2025-03-05T09:32:05Z

llvm/lib/Transforms/Utils/LoopPeel.cpp

+bool PhiAnalyzer::isInductionPHI(const PHINode *Phi) const {
+  // Currently, we only support loops that consist of one basic block. In this
+  // case, the phi can become an IV if it has an incoming value from the basic
+  // block that this phi is also included.


The purpose of that is to limit the number of incoming edges from the loop to one, and your suggestion looks better to me. Thanks! (Also, your point about the English language is also very much appreciated.)

kasuga-fj · 2025-03-05T09:49:14Z

llvm/lib/Transforms/Utils/LoopPeel.cpp

+      if (BinOp->getOpcode() != Instruction::Add &&
+          BinOp->getOpcode() != Instruction::Sub)
+        return false;
+      if (!BinOp->hasNoUnsignedWrap() || !BinOp->hasNoSignedWrap())


I added them just in case because I don't fully understand how other similar functions (like isInductionPHI) handle these flags. This is not to say that I've found any cases where something weird happens without this check.

kasuga-fj · 2025-03-05T13:35:33Z

llvm/test/Transforms/LoopUnroll/peel-loop-phi-analysis.ll

Added two cases where the peeling is not desirable. The current implementation only considers constant increments, so I don't think there are any cases where peeling does not result in IV (there are cases where the variable is originally IV without peeling).

llvm/test/Transforms/LoopUnroll/peel-loop-phi-analysis.ll

kasuga-fj · 2025-03-10T11:13:18Z

Updated the PR description.

kasuga-fj · 2025-03-14T12:06:26Z

Ping

kasuga-fj

Currently there are some cases where undesirable peeling happens. As I checked, there were roughly two major patterns:

Failed to analysis cast operations, like induction_undesirable_peel1 I added in test.
Peeling is performed properly to make PHIs into IVs, but such a peeling doesn't seem to be profitable.
- The main purpose of such peelings are for vectorization, but the loop cannot be vectorized due to other reason, e.g., it has function calls.

I think it's reasonable to address these issues in other patches. So, for now, I want to control this feature with a flag. After they are resolved, I'd like to enable it by default.

kasuga-fj · 2025-03-24T12:07:33Z

llvm/lib/Transforms/Utils/LoopPeel.cpp

+      if (BinOp->getOpcode() != Instruction::Add &&
+          BinOp->getOpcode() != Instruction::Sub)
+        return false;
+      if (!BinOp->hasNoUnsignedWrap() || !BinOp->hasNoSignedWrap())


Now I don't think this check is necessary. However, removing them happened undesirable peelings. It seems that this check just happens to work well to deflect undesired loops.

kasuga-fj · 2025-03-24T12:24:41Z

Gentle ping @nikic , thanks!

kasuga-fj · 2025-04-23T07:51:06Z

Gentle ping @nikic .
For now, I've added a flag and disabled this feature by default, so I don't think this patch will affect existing cases. I'd like to enable it by default in the future, after addressing the cases in peel-loop-phi-analysis-iv.ll, where undesired peeling happens.

madhur13490 · 2025-07-01T11:34:08Z

@kasuga-fj Are you planning to proceed on this patch?

kasuga-fj · 2025-07-01T13:10:57Z

Ah, sorry, I completely forgot about this PR. Thanks for the reminder, I will update it tomorrow.

kasuga-fj · 2025-07-02T12:10:19Z

Updated comments and tests.
@nikic I believe I've addressed all of your comments. Could you please take a look again? Thanks in advance!

llvmbot added the llvm:transforms label Dec 25, 2024

kasuga-fj requested review from nikic, sebpop and madhur13490 December 25, 2024 11:18

madhur13490 reviewed Jan 15, 2025

View reviewed changes

llvm/lib/Transforms/Utils/LoopPeel.cpp Outdated Show resolved Hide resolved

madhur13490 requested review from fhahn and jamieschmeiser January 22, 2025 07:20

nikic mentioned this pull request Jan 24, 2025

Task submission dtcxzyw/llvm-opt-benchmark#1312

Open

dtcxzyw mentioned this pull request Jan 24, 2025

pre-commit: PR121104 dtcxzyw/llvm-opt-benchmark#1991

Closed

madhur13490 approved these changes Jan 28, 2025

View reviewed changes

kasuga-fj force-pushed the peeling-loop-induction branch from d6164e7 to b21b998 Compare February 12, 2025 02:16

kasuga-fj force-pushed the peeling-loop-induction branch from 3006bb7 to 41fe38a Compare February 28, 2025 12:27

nikic reviewed Mar 4, 2025

View reviewed changes

Apply suggestions from code review

88272e4

Co-authored-by: Nikita Popov <[email protected]>

kasuga-fj added 2 commits March 5, 2025 07:27

Apply clang-format

ee26737

Address review comments

cd3fc9c

kasuga-fj commented Mar 5, 2025

View reviewed changes

kasuga-fj requested a review from nikic March 5, 2025 13:51

kasuga-fj added 3 commits March 6, 2025 10:27

Move the comments of the test outisde the function

9b03954

Fix the handling of binary instructions for inductions

c6f1a3a

Apply clang-format

0f41b26

kasuga-fj added 4 commits March 14, 2025 12:11

Fix comment

b05a507

Remove nsw/nuw check

bfddeff

Add an option to toggle peeling to make PHIs into IVs

1b18665

Revert unnecessary change

27a46a1

kasuga-fj commented Mar 24, 2025

View reviewed changes

kasuga-fj changed the title ~~[LoopPeel] Peel to make Phis loop inductions~~ [LoopPeel] Add new option to peeling loops to make PHIs into IVs Mar 24, 2025

kasuga-fj added 4 commits July 2, 2025 09:09

Merge branch 'main' into peeling-loop-induction

df8f554

Revise comments

cd9b436

Revise tests

9d88973

Remove unrelated change

39e8105

	// block that this phi is also included.
	// block where the phi is defined.

[LoopPeel] Add new option to peeling loops to make PHIs into IVs #121104

Are you sure you want to change the base?

[LoopPeel] Add new option to peeling loops to make PHIs into IVs #121104

Uh oh!

Conversation

kasuga-fj commented Dec 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Dec 25, 2024

Uh oh!

kasuga-fj commented Jan 15, 2025

Uh oh!

madhur13490 commented Jan 15, 2025

Uh oh!

Uh oh!

kasuga-fj commented Jan 15, 2025

Uh oh!

kasuga-fj commented Jan 15, 2025

Uh oh!

nikic commented Jan 16, 2025

Uh oh!

kasuga-fj commented Jan 17, 2025

Uh oh!

madhur13490 commented Jan 22, 2025

Uh oh!

madhur13490 commented Jan 22, 2025

Uh oh!

jamieschmeiser commented Jan 23, 2025

Uh oh!

kasuga-fj commented Jan 24, 2025

Uh oh!

madhur13490 commented Jan 28, 2025

Uh oh!

kasuga-fj commented Feb 12, 2025

Uh oh!

kasuga-fj commented Feb 28, 2025

Uh oh!

madhur13490 commented Mar 4, 2025

Uh oh!

nikic commented Mar 4, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nikic commented Mar 4, 2025

Uh oh!

github-actions bot commented Mar 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kasuga-fj left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kasuga-fj commented Dec 25, 2024 •

edited

Loading

github-actions bot commented Mar 5, 2025 •

edited

Loading

kasuga-fj left a comment •

edited

Loading