[VPlan] Consider address computation cost in VPInterleaveRecipe. #148808

rj-jesus · 2025-07-15T09:11:36Z

As a side-effect of #106431, we started vectorising loops such as:

void s351(float *a, float *b, float *c, long n) {
  float alpha = c[0];
  for (int i = 0; i < n; i += 5) {
    a[i] += alpha * b[i];
    a[i + 1] += alpha * b[i + 1];
    a[i + 2] += alpha * b[i + 2];
    a[i + 3] += alpha * b[i + 3];
    a[i + 4] += alpha * b[i + 4];
  }
}

which resulted in a slowdown of 54% (https://godbolt.org/z/o1fGfj6j7).

The estimated cost of the scalar and vectorised versions is currently very close (51 vs 50.5 per lane), in favour of the vectorised version. This patch helps VPlan favour the scalar version by taking into account the cost of address computations in VPInterleaveRecipe, as is already done for the scalar version in
LoopVectorizationCostModel::getMemoryInstructionCost and for other recipes such as VPWidenMemoryRecipe.

This affects

  LLVM :: Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-cost.ll

due to the new cost estimate. I've updated the test accordingly, but please let me know if this is not appropriate.

As mentioned in #82218, ideally the loop above should be rerolled before attempting vectorisation. I hope the approach in this patch is a reasonable alternative in its own right in the meantime, but if there's a better suggestion to achieve this, please let me know.

As a side-effect of llvm#106431, we started vectorising loops such as: ```cpp void s351(float *a, float *b, float *c, long n) { float alpha = c[0]; for (int i = 0; i < n; i += 5) { a[i] += alpha * b[i]; a[i + 1] += alpha * b[i + 1]; a[i + 2] += alpha * b[i + 2]; a[i + 3] += alpha * b[i + 3]; a[i + 4] += alpha * b[i + 4]; } } ``` (https://godbolt.org/z/o1fGfj6j7), which resulted in a slowdown of 54%. The estimated cost of the scalar and vectorised versions is currently very close (51 vs 50.5 per lane), in favour of the vectorised version. This patch helps VPlan favour the scalar version by taking into account the cost of address computations in VPInterleaveRecipe, as is already done for the scalar version in LoopVectorizationCostModel::getMemoryInstructionCost and for other recipes such as VPWidenMemoryRecipe. This affects ``` LLVM :: Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-cost.ll ``` due to the new cost estimate. I've updated the test accordingly, but please let me know if this is not appropriate. As mentioned in llvm#82218, ideally the loop above should be rerolled before attempting vectorisation. I'm hoping the approach in this patch is a reasonble alternative in its own right in the meantime. If there's a better suggestion to achieve this, please let me know. :)

llvmbot · 2025-07-15T09:12:09Z

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-vectorizers

Author: Ricardo Jesus (rj-jesus)

Changes

As a side-effect of #106431, we started vectorising loops such as:

void s351(float *a, float *b, float *c, long n) {
  float alpha = c[0];
  for (int i = 0; i &lt; n; i += 5) {
    a[i] += alpha * b[i];
    a[i + 1] += alpha * b[i + 1];
    a[i + 2] += alpha * b[i + 2];
    a[i + 3] += alpha * b[i + 3];
    a[i + 4] += alpha * b[i + 4];
  }
}

which resulted in a slowdown of 54% (https://godbolt.org/z/o1fGfj6j7).

The estimated cost of the scalar and vectorised versions is currently very close (51 vs 50.5 per lane), in favour of the vectorised version. This patch helps VPlan favour the scalar version by taking into account the cost of address computations in VPInterleaveRecipe, as is already done for the scalar version in
LoopVectorizationCostModel::getMemoryInstructionCost and for other recipes such as VPWidenMemoryRecipe.

This affects

  LLVM :: Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-cost.ll

due to the new cost estimate. I've updated the test accordingly, but please let me know if this is not appropriate.

As mentioned in #82218, ideally the loop above should be rerolled before attempting vectorisation. I hope the approach in this patch is a reasonable alternative in its own right in the meantime, but if there's a better suggestion to achieve this, please let me know.

Full diff: https://github.com/llvm/llvm-project/pull/148808.diff

2 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp (+3)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-cost.ll (+27-50)

diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index 75ade13b09d9c..73d84588fbe87 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -3666,6 +3666,9 @@ InstructionCost VPInterleaveRecipe::computeCost(ElementCount VF,
       InsertPos->getOpcode(), WideVecTy, IG->getFactor(), Indices,
       IG->getAlign(), AS, Ctx.CostKind, getMask(), NeedsMaskForGaps);
 
+  // Add the address computation cost.
+  Cost += Ctx.TTI.getAddressComputationCost(WideVecTy);
+
   if (!IG->isReverse())
     return Cost;
 
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-cost.ll b/llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-cost.ll
index 173766cc0a656..2917bc0404bfa 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-cost.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-cost.ll
@@ -8,46 +8,31 @@ define void @test_complex_add_float(ptr %res, ptr noalias %A, ptr noalias %B, i6
 ; CHECK-LABEL: define void @test_complex_add_float(
 ; CHECK-SAME: ptr [[RES:%.*]], ptr noalias [[A:%.*]], ptr noalias [[B:%.*]], i64 [[N:%.*]]) {
 ; CHECK-NEXT:  [[ENTRY:.*:]]
-; CHECK-NEXT:    [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[N]], 8
+; CHECK-NEXT:    [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[N]], 4
 ; CHECK-NEXT:    br i1 [[MIN_ITERS_CHECK]], label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
 ; CHECK:       [[VECTOR_PH]]:
-; CHECK-NEXT:    [[N_MOD_VF:%.*]] = urem i64 [[N]], 8
+; CHECK-NEXT:    [[N_MOD_VF:%.*]] = urem i64 [[N]], 4
 ; CHECK-NEXT:    [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]
 ; CHECK-NEXT:    br label %[[VECTOR_BODY:.*]]
 ; CHECK:       [[VECTOR_BODY]]:
 ; CHECK-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[VECTOR_BODY]] ]
-; CHECK-NEXT:    [[TMP1:%.*]] = add i64 [[INDEX]], 4
-; CHECK-NEXT:    [[GEP_A_0:%.*]] = getelementptr inbounds nuw { float, float }, ptr [[A]], i64 [[INDEX]]
-; CHECK-NEXT:    [[TMP3:%.*]] = getelementptr inbounds nuw { float, float }, ptr [[A]], i64 [[TMP1]]
-; CHECK-NEXT:    [[GEP_B_0:%.*]] = getelementptr inbounds nuw { float, float }, ptr [[B]], i64 [[INDEX]]
-; CHECK-NEXT:    [[TMP5:%.*]] = getelementptr inbounds nuw { float, float }, ptr [[B]], i64 [[TMP1]]
-; CHECK-NEXT:    [[WIDE_VEC:%.*]] = load <8 x float>, ptr [[GEP_A_0]], align 4
+; CHECK-NEXT:    [[TMP0:%.*]] = getelementptr inbounds nuw { float, float }, ptr [[A]], i64 [[INDEX]]
+; CHECK-NEXT:    [[TMP1:%.*]] = getelementptr inbounds nuw { float, float }, ptr [[B]], i64 [[INDEX]]
+; CHECK-NEXT:    [[WIDE_VEC:%.*]] = load <8 x float>, ptr [[TMP0]], align 4
 ; CHECK-NEXT:    [[STRIDED_VEC:%.*]] = shufflevector <8 x float> [[WIDE_VEC]], <8 x float> poison, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
 ; CHECK-NEXT:    [[STRIDED_VEC1:%.*]] = shufflevector <8 x float> [[WIDE_VEC]], <8 x float> poison, <4 x i32> <i32 1, i32 3, i32 5, i32 7>
-; CHECK-NEXT:    [[WIDE_VEC2:%.*]] = load <8 x float>, ptr [[TMP3]], align 4
+; CHECK-NEXT:    [[WIDE_VEC2:%.*]] = load <8 x float>, ptr [[TMP1]], align 4
 ; CHECK-NEXT:    [[STRIDED_VEC3:%.*]] = shufflevector <8 x float> [[WIDE_VEC2]], <8 x float> poison, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
 ; CHECK-NEXT:    [[STRIDED_VEC4:%.*]] = shufflevector <8 x float> [[WIDE_VEC2]], <8 x float> poison, <4 x i32> <i32 1, i32 3, i32 5, i32 7>
-; CHECK-NEXT:    [[WIDE_VEC5:%.*]] = load <8 x float>, ptr [[GEP_B_0]], align 4
-; CHECK-NEXT:    [[STRIDED_VEC6:%.*]] = shufflevector <8 x float> [[WIDE_VEC5]], <8 x float> poison, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
-; CHECK-NEXT:    [[STRIDED_VEC7:%.*]] = shufflevector <8 x float> [[WIDE_VEC5]], <8 x float> poison, <4 x i32> <i32 1, i32 3, i32 5, i32 7>
-; CHECK-NEXT:    [[WIDE_VEC8:%.*]] = load <8 x float>, ptr [[TMP5]], align 4
-; CHECK-NEXT:    [[STRIDED_VEC9:%.*]] = shufflevector <8 x float> [[WIDE_VEC8]], <8 x float> poison, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
-; CHECK-NEXT:    [[STRIDED_VEC10:%.*]] = shufflevector <8 x float> [[WIDE_VEC8]], <8 x float> poison, <4 x i32> <i32 1, i32 3, i32 5, i32 7>
-; CHECK-NEXT:    [[TMP6:%.*]] = fadd <4 x float> [[STRIDED_VEC]], [[STRIDED_VEC6]]
-; CHECK-NEXT:    [[TMP7:%.*]] = fadd <4 x float> [[STRIDED_VEC3]], [[STRIDED_VEC9]]
-; CHECK-NEXT:    [[TMP8:%.*]] = fadd <4 x float> [[STRIDED_VEC1]], [[STRIDED_VEC7]]
-; CHECK-NEXT:    [[TMP9:%.*]] = fadd <4 x float> [[STRIDED_VEC4]], [[STRIDED_VEC10]]
-; CHECK-NEXT:    [[TMP10:%.*]] = getelementptr inbounds nuw { float, float }, ptr [[RES]], i64 [[INDEX]]
-; CHECK-NEXT:    [[TMP11:%.*]] = getelementptr inbounds nuw { float, float }, ptr [[RES]], i64 [[TMP1]]
-; CHECK-NEXT:    [[TMP12:%.*]] = shufflevector <4 x float> [[TMP6]], <4 x float> [[TMP8]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
-; CHECK-NEXT:    [[INTERLEAVED_VEC:%.*]] = shufflevector <8 x float> [[TMP12]], <8 x float> poison, <8 x i32> <i32 0, i32 4, i32 1, i32 5, i32 2, i32 6, i32 3, i32 7>
-; CHECK-NEXT:    store <8 x float> [[INTERLEAVED_VEC]], ptr [[TMP10]], align 4
-; CHECK-NEXT:    [[TMP13:%.*]] = shufflevector <4 x float> [[TMP7]], <4 x float> [[TMP9]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
-; CHECK-NEXT:    [[INTERLEAVED_VEC11:%.*]] = shufflevector <8 x float> [[TMP13]], <8 x float> poison, <8 x i32> <i32 0, i32 4, i32 1, i32 5, i32 2, i32 6, i32 3, i32 7>
-; CHECK-NEXT:    store <8 x float> [[INTERLEAVED_VEC11]], ptr [[TMP11]], align 4
-; CHECK-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 8
-; CHECK-NEXT:    [[TMP14:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
-; CHECK-NEXT:    br i1 [[TMP14]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
+; CHECK-NEXT:    [[TMP2:%.*]] = fadd <4 x float> [[STRIDED_VEC]], [[STRIDED_VEC3]]
+; CHECK-NEXT:    [[TMP3:%.*]] = fadd <4 x float> [[STRIDED_VEC1]], [[STRIDED_VEC4]]
+; CHECK-NEXT:    [[TMP4:%.*]] = getelementptr inbounds nuw { float, float }, ptr [[RES]], i64 [[INDEX]]
+; CHECK-NEXT:    [[TMP5:%.*]] = shufflevector <4 x float> [[TMP2]], <4 x float> [[TMP3]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
+; CHECK-NEXT:    [[INTERLEAVED_VEC:%.*]] = shufflevector <8 x float> [[TMP5]], <8 x float> poison, <8 x i32> <i32 0, i32 4, i32 1, i32 5, i32 2, i32 6, i32 3, i32 7>
+; CHECK-NEXT:    store <8 x float> [[INTERLEAVED_VEC]], ptr [[TMP4]], align 4
+; CHECK-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
+; CHECK-NEXT:    [[TMP6:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
+; CHECK-NEXT:    br i1 [[TMP6]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
 ; CHECK:       [[MIDDLE_BLOCK]]:
 ; CHECK-NEXT:    [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]
 ; CHECK-NEXT:    br i1 [[CMP_N]], [[EXIT:label %.*]], label %[[SCALAR_PH]]
@@ -84,32 +69,24 @@ define void @test_complex_add_double(ptr %res, ptr noalias %A, ptr noalias %B, i
 ; CHECK-LABEL: define void @test_complex_add_double(
 ; CHECK-SAME: ptr [[RES:%.*]], ptr noalias [[A:%.*]], ptr noalias [[B:%.*]], i64 [[N:%.*]]) {
 ; CHECK-NEXT:  [[ENTRY:.*:]]
-; CHECK-NEXT:    [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[N]], 4
+; CHECK-NEXT:    [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[N]], 2
 ; CHECK-NEXT:    br i1 [[MIN_ITERS_CHECK]], label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
 ; CHECK:       [[VECTOR_PH]]:
-; CHECK-NEXT:    [[N_MOD_VF:%.*]] = urem i64 [[N]], 4
+; CHECK-NEXT:    [[N_MOD_VF:%.*]] = urem i64 [[N]], 2
 ; CHECK-NEXT:    [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]
 ; CHECK-NEXT:    br label %[[VECTOR_BODY:.*]]
 ; CHECK:       [[VECTOR_BODY]]:
 ; CHECK-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[VECTOR_BODY]] ]
-; CHECK-NEXT:    [[TMP1:%.*]] = add i64 [[INDEX]], 1
-; CHECK-NEXT:    [[TMP2:%.*]] = getelementptr inbounds nuw { double, double }, ptr [[A]], i64 [[INDEX]]
-; CHECK-NEXT:    [[TMP3:%.*]] = getelementptr inbounds nuw { double, double }, ptr [[A]], i64 [[TMP1]]
-; CHECK-NEXT:    [[TMP4:%.*]] = getelementptr inbounds nuw { double, double }, ptr [[B]], i64 [[INDEX]]
-; CHECK-NEXT:    [[TMP5:%.*]] = getelementptr inbounds nuw { double, double }, ptr [[B]], i64 [[TMP1]]
-; CHECK-NEXT:    [[STRIDED_VEC1:%.*]] = load <2 x double>, ptr [[TMP2]], align 4
-; CHECK-NEXT:    [[STRIDED_VEC5:%.*]] = load <2 x double>, ptr [[TMP3]], align 4
-; CHECK-NEXT:    [[STRIDED_VEC7:%.*]] = load <2 x double>, ptr [[TMP4]], align 4
-; CHECK-NEXT:    [[STRIDED_VEC11:%.*]] = load <2 x double>, ptr [[TMP5]], align 4
-; CHECK-NEXT:    [[TMP8:%.*]] = fadd <2 x double> [[STRIDED_VEC1]], [[STRIDED_VEC7]]
-; CHECK-NEXT:    [[TMP15:%.*]] = fadd <2 x double> [[STRIDED_VEC5]], [[STRIDED_VEC11]]
-; CHECK-NEXT:    [[TMP10:%.*]] = getelementptr inbounds nuw { double, double }, ptr [[RES]], i64 [[INDEX]]
-; CHECK-NEXT:    [[TMP11:%.*]] = getelementptr inbounds nuw { double, double }, ptr [[RES]], i64 [[TMP1]]
-; CHECK-NEXT:    store <2 x double> [[TMP8]], ptr [[TMP10]], align 4
-; CHECK-NEXT:    store <2 x double> [[TMP15]], ptr [[TMP11]], align 4
-; CHECK-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2
-; CHECK-NEXT:    [[TMP14:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
-; CHECK-NEXT:    br i1 [[TMP14]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
+; CHECK-NEXT:    [[TMP0:%.*]] = getelementptr inbounds nuw { double, double }, ptr [[A]], i64 [[INDEX]]
+; CHECK-NEXT:    [[TMP1:%.*]] = getelementptr inbounds nuw { double, double }, ptr [[B]], i64 [[INDEX]]
+; CHECK-NEXT:    [[WIDE_LOAD:%.*]] = load <2 x double>, ptr [[TMP0]], align 4
+; CHECK-NEXT:    [[WIDE_LOAD1:%.*]] = load <2 x double>, ptr [[TMP1]], align 4
+; CHECK-NEXT:    [[TMP2:%.*]] = fadd <2 x double> [[WIDE_LOAD]], [[WIDE_LOAD1]]
+; CHECK-NEXT:    [[TMP3:%.*]] = getelementptr inbounds nuw { double, double }, ptr [[RES]], i64 [[INDEX]]
+; CHECK-NEXT:    store <2 x double> [[TMP2]], ptr [[TMP3]], align 4
+; CHECK-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 1
+; CHECK-NEXT:    [[TMP4:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
+; CHECK-NEXT:    br i1 [[TMP4]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
 ; CHECK:       [[MIDDLE_BLOCK]]:
 ; CHECK-NEXT:    [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]
 ; CHECK-NEXT:    br i1 [[CMP_N]], [[EXIT:label %.*]], label %[[SCALAR_PH]]

david-arm · 2025-07-15T10:53:26Z

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

@@ -3666,6 +3666,9 @@ InstructionCost VPInterleaveRecipe::computeCost(ElementCount VF,
      InsertPos->getOpcode(), WideVecTy, IG->getFactor(), Indices,
      IG->getAlign(), AS, Ctx.CostKind, getMask(), NeedsMaskForGaps);

+  // Add the address computation cost.
+  Cost += Ctx.TTI.getAddressComputationCost(WideVecTy);


The main problem that I see here is that there is now an inconsistency between this and LoopVectorizationCostModel::getInterleaveGroupCost in the legacy cost model. This is potentially bad for two reasons:

It can lead to more asserts firing when the legacy and vplan cost models don't match.

In LoopVectorizationCostModel::setCostBasedWideningDecision we choose the best widening decision based on a comparison between interleaving, gather/scatter or scalar. If we just change VPInterleaveRecipe::computeCost then we only get the benefit of choosing a better interleave count/unroll factor, but the best decision may simply be to scalarise.

It might be good to see what happens if you add the same cost to LoopVectorizationCostModel::getInterleaveGroupCost?

I think by itself the idea of this patch seems reasonable, but it's worth pointing out that even if we make better vectorisation choices the outcome is still very fragile. That's because the address computation cost is only 1. Really, if the performance of the loop drops 50% then the cost model is also off by 50%, so it feels like something more fundamental is broken here and there is a high risk of the same regression reappearing.

I don't think this is the correct fix.

There is no extra address computations that needs to be done by interleave groups usually, unless there are gaps. We use the address from the first member, for which we will have a GEP recipe, which should already account for the cost of the GEP.

It may be the case that the cost of the interleave group itself may not be accurate and TTI may need to be updated for Grace?

The main problem that I see here is that there is now an inconsistency between this and LoopVectorizationCostModel::getInterleaveGroupCost

You're right, thanks for pointing this out! I'll update the patch shortly (I had initially also updated LoopVectorizationCostModel::getInterleaveGroupCost, but I wasn't aware of the first point you raised, and since the change to getInterleaveGroupCost didn't affect the motivating example, I eventually reverted it before opening the PR).

I think by itself the idea of this patch seems reasonable, but it's worth pointing out that even if we make better vectorisation choices the outcome is still very fragile.

I agree, and I had planned to look at the code we are generating for the interleaved version separately. I just thought this seemed reasonable on its own to make the comparison with the scalar version a bit fairer, and it solves the problem for the time being.

For what it's worth, prior to #106431 we used to choose the scalar version because the plan generated happened to have one extra instruction that nudged the decision to prefer the scalar version.

There is no extra address computations that needs to be done by interleave groups usually, unless there are gaps. We use the address from the first member, for which we will have a GEP recipe, which should already account for the cost of the GEP.

Shouldn't the same logic apply to scalar loads/stores then, where we seem to take the cost of the address computation into account (unless I'm misreading the code)?

llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

Line 5514 in a2b3110

return TTI.getAddressComputationCost(ValTy) +

Yeah, we also add the cost when calculating gathers and scatters.

rj-jesus requested review from fhahn, Mel-Chen and david-arm July 15, 2025 09:11

llvmbot added vectorizers llvm:transforms labels Jul 15, 2025

david-arm reviewed Jul 15, 2025

View reviewed changes

Update cost of LoopVectorizationCostModel::getInterleaveGroupCost too.

b3d5fba

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[VPlan] Consider address computation cost in VPInterleaveRecipe. #148808

[VPlan] Consider address computation cost in VPInterleaveRecipe. #148808

rj-jesus commented Jul 15, 2025

Uh oh!

llvmbot commented Jul 15, 2025 •

edited

Loading

Uh oh!

david-arm Jul 15, 2025

Uh oh!

david-arm Jul 15, 2025

Uh oh!

fhahn Jul 15, 2025

Uh oh!

rj-jesus Jul 15, 2025

Uh oh!

rj-jesus Jul 15, 2025 •

edited

Loading

Uh oh!

david-arm Jul 15, 2025

Uh oh!

Uh oh!

[VPlan] Consider address computation cost in VPInterleaveRecipe. #148808

Are you sure you want to change the base?

[VPlan] Consider address computation cost in VPInterleaveRecipe. #148808

Conversation

rj-jesus commented Jul 15, 2025

Uh oh!

llvmbot commented Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

david-arm Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

david-arm Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

fhahn Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

rj-jesus Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

rj-jesus Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

david-arm Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

llvmbot commented Jul 15, 2025 •

edited

Loading

rj-jesus Jul 15, 2025 •

edited

Loading