Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[IR] Don't set strictfp on irrelevant calls #122735

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

spavloff
Copy link
Collaborator

RBuilder incorrectry assigned the attribute "strictfp" to any intrinsic function call in some cases inside strictfp functions. Accordinf to the documentation in https://llvm.org/docs/LangRef.html#function-attributes:

    This attribute indicates that the function was called from a scope
    that requires strict floating-point semantics. LLVM will not attempt
    any optimizations that require assumptions about the floating-point
    rounding mode or that might alter the state of floating-point status
    flags that might otherwise be set or cleared by calling this
    function.

Intrinsic functions like llvm.fabs do not depend on rounding mode, they also never raise any FP exception, so these restrictions make no sense for them. If a function interacts with FP environment, it must be either a constrained intrinsic or an external function.

RBuilder incorrectry assigned the attribute "strictfp" to any intrinsic
function call in some cases inside strictfp functions. Accordinf to the
documentation in https://llvm.org/docs/LangRef.html#function-attributes:

```
    This attribute indicates that the function was called from a scope
    that requires strict floating-point semantics. LLVM will not attempt
    any optimizations that require assumptions about the floating-point
    rounding mode or that might alter the state of floating-point status
    flags that might otherwise be set or cleared by calling this
    function.
```

Intrinsic functions like `llvm.fabs` do not depend on rounding mode,
they also never raise any FP exception, so these restrictions make no
sense for them. If a function interacts with FP environment, it must be
either a constrained intrinsic or an external function.
@llvmbot
Copy link
Member

llvmbot commented Jan 13, 2025

@llvm/pr-subscribers-llvm-support
@llvm/pr-subscribers-backend-x86

@llvm/pr-subscribers-backend-amdgpu

Author: Serge Pavlov (spavloff)

Changes

RBuilder incorrectry assigned the attribute "strictfp" to any intrinsic function call in some cases inside strictfp functions. Accordinf to the documentation in https://llvm.org/docs/LangRef.html#function-attributes:

    This attribute indicates that the function was called from a scope
    that requires strict floating-point semantics. LLVM will not attempt
    any optimizations that require assumptions about the floating-point
    rounding mode or that might alter the state of floating-point status
    flags that might otherwise be set or cleared by calling this
    function.

Intrinsic functions like llvm.fabs do not depend on rounding mode, they also never raise any FP exception, so these restrictions make no sense for them. If a function interacts with FP environment, it must be either a constrained intrinsic or an external function.


Patch is 289.17 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/122735.diff

16 Files Affected:

  • (modified) clang/test/CodeGen/SystemZ/strictfp_builtins.c (+9-9)
  • (modified) clang/test/CodeGen/X86/strictfp_builtins.c (+3-3)
  • (modified) clang/test/CodeGen/cx-complex-range.c (+10-10)
  • (modified) clang/test/CodeGen/isfpclass.c (+9-9)
  • (modified) clang/test/CodeGen/strictfp-elementwise-bulitins.cpp (+31-32)
  • (modified) clang/test/CodeGen/strictfp_builtins.c (+10-10)
  • (modified) clang/test/CodeGenOpenCL/cl20-device-side-enqueue-attributes.cl (+4-4)
  • (modified) llvm/include/llvm/IR/IRBuilder.h (+2-14)
  • (modified) llvm/include/llvm/IR/IntrinsicInst.h (+3)
  • (modified) llvm/lib/IR/IRBuilder.cpp (+28)
  • (modified) llvm/lib/IR/IntrinsicInst.cpp (+11)
  • (modified) llvm/test/CodeGen/AMDGPU/amdgpu-simplify-libcall-pown.ll (+5-5)
  • (modified) llvm/test/CodeGen/AMDGPU/global_atomic_optimizer_fp_rtn.ll (+294-294)
  • (modified) llvm/test/CodeGen/AMDGPU/global_atomics_optimizer_fp_no_rtn.ll (+260-300)
  • (modified) llvm/test/Transforms/AtomicExpand/AMDGPU/expand-atomic-rmw-fadd.ll (+2-2)
  • (modified) llvm/test/Transforms/HardwareLoops/scalar-while-strictfp.ll (+28-28)
diff --git a/clang/test/CodeGen/SystemZ/strictfp_builtins.c b/clang/test/CodeGen/SystemZ/strictfp_builtins.c
index 8c8f1f4cabd742..b60fd932d31c3f 100644
--- a/clang/test/CodeGen/SystemZ/strictfp_builtins.c
+++ b/clang/test/CodeGen/SystemZ/strictfp_builtins.c
@@ -9,7 +9,7 @@
 // CHECK-NEXT:    [[F_ADDR:%.*]] = alloca float, align 4
 // CHECK-NEXT:    store float [[F:%.*]], ptr [[F_ADDR]], align 4
 // CHECK-NEXT:    [[TMP0:%.*]] = load float, ptr [[F_ADDR]], align 4
-// CHECK-NEXT:    [[TMP1:%.*]] = call i32 @llvm.s390.tdc.f32(float [[TMP0]], i64 15) #[[ATTR2:[0-9]+]]
+// CHECK-NEXT:    [[TMP1:%.*]] = call i32 @llvm.s390.tdc.f32(float [[TMP0]], i64 15)
 // CHECK-NEXT:    ret i32 [[TMP1]]
 //
 int test_isnan_float(float f) {
@@ -21,7 +21,7 @@ int test_isnan_float(float f) {
 // CHECK-NEXT:    [[D_ADDR:%.*]] = alloca double, align 8
 // CHECK-NEXT:    store double [[D:%.*]], ptr [[D_ADDR]], align 8
 // CHECK-NEXT:    [[TMP0:%.*]] = load double, ptr [[D_ADDR]], align 8
-// CHECK-NEXT:    [[TMP1:%.*]] = call i32 @llvm.s390.tdc.f64(double [[TMP0]], i64 15) #[[ATTR2]]
+// CHECK-NEXT:    [[TMP1:%.*]] = call i32 @llvm.s390.tdc.f64(double [[TMP0]], i64 15)
 // CHECK-NEXT:    ret i32 [[TMP1]]
 //
 int test_isnan_double(double d) {
@@ -34,7 +34,7 @@ int test_isnan_double(double d) {
 // CHECK-NEXT:    [[LD:%.*]] = load fp128, ptr [[TMP0:%.*]], align 8
 // CHECK-NEXT:    store fp128 [[LD]], ptr [[LD_ADDR]], align 8
 // CHECK-NEXT:    [[TMP1:%.*]] = load fp128, ptr [[LD_ADDR]], align 8
-// CHECK-NEXT:    [[TMP2:%.*]] = call i32 @llvm.s390.tdc.f128(fp128 [[TMP1]], i64 15) #[[ATTR2]]
+// CHECK-NEXT:    [[TMP2:%.*]] = call i32 @llvm.s390.tdc.f128(fp128 [[TMP1]], i64 15)
 // CHECK-NEXT:    ret i32 [[TMP2]]
 //
 int test_isnan_long_double(long double ld) {
@@ -46,7 +46,7 @@ int test_isnan_long_double(long double ld) {
 // CHECK-NEXT:    [[F_ADDR:%.*]] = alloca float, align 4
 // CHECK-NEXT:    store float [[F:%.*]], ptr [[F_ADDR]], align 4
 // CHECK-NEXT:    [[TMP0:%.*]] = load float, ptr [[F_ADDR]], align 4
-// CHECK-NEXT:    [[TMP1:%.*]] = call i32 @llvm.s390.tdc.f32(float [[TMP0]], i64 48) #[[ATTR2]]
+// CHECK-NEXT:    [[TMP1:%.*]] = call i32 @llvm.s390.tdc.f32(float [[TMP0]], i64 48)
 // CHECK-NEXT:    ret i32 [[TMP1]]
 //
 int test_isinf_float(float f) {
@@ -58,7 +58,7 @@ int test_isinf_float(float f) {
 // CHECK-NEXT:    [[D_ADDR:%.*]] = alloca double, align 8
 // CHECK-NEXT:    store double [[D:%.*]], ptr [[D_ADDR]], align 8
 // CHECK-NEXT:    [[TMP0:%.*]] = load double, ptr [[D_ADDR]], align 8
-// CHECK-NEXT:    [[TMP1:%.*]] = call i32 @llvm.s390.tdc.f64(double [[TMP0]], i64 48) #[[ATTR2]]
+// CHECK-NEXT:    [[TMP1:%.*]] = call i32 @llvm.s390.tdc.f64(double [[TMP0]], i64 48)
 // CHECK-NEXT:    ret i32 [[TMP1]]
 //
 int test_isinf_double(double d) {
@@ -71,7 +71,7 @@ int test_isinf_double(double d) {
 // CHECK-NEXT:    [[LD:%.*]] = load fp128, ptr [[TMP0:%.*]], align 8
 // CHECK-NEXT:    store fp128 [[LD]], ptr [[LD_ADDR]], align 8
 // CHECK-NEXT:    [[TMP1:%.*]] = load fp128, ptr [[LD_ADDR]], align 8
-// CHECK-NEXT:    [[TMP2:%.*]] = call i32 @llvm.s390.tdc.f128(fp128 [[TMP1]], i64 48) #[[ATTR2]]
+// CHECK-NEXT:    [[TMP2:%.*]] = call i32 @llvm.s390.tdc.f128(fp128 [[TMP1]], i64 48)
 // CHECK-NEXT:    ret i32 [[TMP2]]
 //
 int test_isinf_long_double(long double ld) {
@@ -83,7 +83,7 @@ int test_isinf_long_double(long double ld) {
 // CHECK-NEXT:    [[F_ADDR:%.*]] = alloca float, align 4
 // CHECK-NEXT:    store float [[F:%.*]], ptr [[F_ADDR]], align 4
 // CHECK-NEXT:    [[TMP0:%.*]] = load float, ptr [[F_ADDR]], align 4
-// CHECK-NEXT:    [[TMP1:%.*]] = call i32 @llvm.s390.tdc.f32(float [[TMP0]], i64 4032) #[[ATTR2]]
+// CHECK-NEXT:    [[TMP1:%.*]] = call i32 @llvm.s390.tdc.f32(float [[TMP0]], i64 4032)
 // CHECK-NEXT:    ret i32 [[TMP1]]
 //
 int test_isfinite_float(float f) {
@@ -95,7 +95,7 @@ int test_isfinite_float(float f) {
 // CHECK-NEXT:    [[D_ADDR:%.*]] = alloca double, align 8
 // CHECK-NEXT:    store double [[D:%.*]], ptr [[D_ADDR]], align 8
 // CHECK-NEXT:    [[TMP0:%.*]] = load double, ptr [[D_ADDR]], align 8
-// CHECK-NEXT:    [[TMP1:%.*]] = call i32 @llvm.s390.tdc.f64(double [[TMP0]], i64 4032) #[[ATTR2]]
+// CHECK-NEXT:    [[TMP1:%.*]] = call i32 @llvm.s390.tdc.f64(double [[TMP0]], i64 4032)
 // CHECK-NEXT:    ret i32 [[TMP1]]
 //
 int test_isfinite_double(double d) {
@@ -108,7 +108,7 @@ int test_isfinite_double(double d) {
 // CHECK-NEXT:    [[LD:%.*]] = load fp128, ptr [[TMP0:%.*]], align 8
 // CHECK-NEXT:    store fp128 [[LD]], ptr [[LD_ADDR]], align 8
 // CHECK-NEXT:    [[TMP1:%.*]] = load fp128, ptr [[LD_ADDR]], align 8
-// CHECK-NEXT:    [[TMP2:%.*]] = call i32 @llvm.s390.tdc.f128(fp128 [[TMP1]], i64 4032) #[[ATTR2]]
+// CHECK-NEXT:    [[TMP2:%.*]] = call i32 @llvm.s390.tdc.f128(fp128 [[TMP1]], i64 4032)
 // CHECK-NEXT:    ret i32 [[TMP2]]
 //
 int test_isfinite_long_double(long double ld) {
diff --git a/clang/test/CodeGen/X86/strictfp_builtins.c b/clang/test/CodeGen/X86/strictfp_builtins.c
index 43e4060bef259b..c4a22dc5ee90fd 100644
--- a/clang/test/CodeGen/X86/strictfp_builtins.c
+++ b/clang/test/CodeGen/X86/strictfp_builtins.c
@@ -27,7 +27,7 @@ void p(char *str, int x) {
 // CHECK-NEXT:    [[LD_ADDR:%.*]] = alloca x86_fp80, align 16
 // CHECK-NEXT:    store x86_fp80 [[LD:%.*]], ptr [[LD_ADDR]], align 16
 // CHECK-NEXT:    [[TMP0:%.*]] = load x86_fp80, ptr [[LD_ADDR]], align 16
-// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.is.fpclass.f80(x86_fp80 [[TMP0]], i32 516) #[[ATTR3]]
+// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.is.fpclass.f80(x86_fp80 [[TMP0]], i32 516)
 // CHECK-NEXT:    [[TMP2:%.*]] = zext i1 [[TMP1]] to i32
 // CHECK-NEXT:    call void @p(ptr noundef @.str.1, i32 noundef [[TMP2]]) #[[ATTR3]]
 // CHECK-NEXT:    ret void
@@ -43,7 +43,7 @@ void test_long_double_isinf(long double ld) {
 // CHECK-NEXT:    [[LD_ADDR:%.*]] = alloca x86_fp80, align 16
 // CHECK-NEXT:    store x86_fp80 [[LD:%.*]], ptr [[LD_ADDR]], align 16
 // CHECK-NEXT:    [[TMP0:%.*]] = load x86_fp80, ptr [[LD_ADDR]], align 16
-// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.is.fpclass.f80(x86_fp80 [[TMP0]], i32 504) #[[ATTR3]]
+// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.is.fpclass.f80(x86_fp80 [[TMP0]], i32 504)
 // CHECK-NEXT:    [[TMP2:%.*]] = zext i1 [[TMP1]] to i32
 // CHECK-NEXT:    call void @p(ptr noundef @.str.2, i32 noundef [[TMP2]]) #[[ATTR3]]
 // CHECK-NEXT:    ret void
@@ -59,7 +59,7 @@ void test_long_double_isfinite(long double ld) {
 // CHECK-NEXT:    [[LD_ADDR:%.*]] = alloca x86_fp80, align 16
 // CHECK-NEXT:    store x86_fp80 [[LD:%.*]], ptr [[LD_ADDR]], align 16
 // CHECK-NEXT:    [[TMP0:%.*]] = load x86_fp80, ptr [[LD_ADDR]], align 16
-// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.is.fpclass.f80(x86_fp80 [[TMP0]], i32 3) #[[ATTR3]]
+// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.is.fpclass.f80(x86_fp80 [[TMP0]], i32 3)
 // CHECK-NEXT:    [[TMP2:%.*]] = zext i1 [[TMP1]] to i32
 // CHECK-NEXT:    call void @p(ptr noundef @.str.3, i32 noundef [[TMP2]]) #[[ATTR3]]
 // CHECK-NEXT:    ret void
diff --git a/clang/test/CodeGen/cx-complex-range.c b/clang/test/CodeGen/cx-complex-range.c
index 88300041061aae..98735c4671ce52 100644
--- a/clang/test/CodeGen/cx-complex-range.c
+++ b/clang/test/CodeGen/cx-complex-range.c
@@ -1575,8 +1575,8 @@ _Complex float mulf(_Complex float a, _Complex float b) {
 // X86WINPRMTD_STRICT-NEXT:    [[B_REAL:%.*]] = load double, ptr [[B_REALP]], align 8
 // X86WINPRMTD_STRICT-NEXT:    [[B_IMAGP:%.*]] = getelementptr inbounds nuw { double, double }, ptr [[B]], i32 0, i32 1
 // X86WINPRMTD_STRICT-NEXT:    [[B_IMAG:%.*]] = load double, ptr [[B_IMAGP]], align 8
-// X86WINPRMTD_STRICT-NEXT:    [[TMP0:%.*]] = call double @llvm.fabs.f64(double [[B_REAL]]) #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP1:%.*]] = call double @llvm.fabs.f64(double [[B_IMAG]]) #[[ATTR3]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP0:%.*]] = call double @llvm.fabs.f64(double [[B_REAL]])
+// X86WINPRMTD_STRICT-NEXT:    [[TMP1:%.*]] = call double @llvm.fabs.f64(double [[B_IMAG]])
 // X86WINPRMTD_STRICT-NEXT:    [[ABS_CMP:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f64(double [[TMP0]], double [[TMP1]], metadata !"ugt", metadata !"fpexcept.strict") #[[ATTR3]]
 // X86WINPRMTD_STRICT-NEXT:    br i1 [[ABS_CMP]], label [[ABS_RHSR_GREATER_OR_EQUAL_ABS_RHSI:%.*]], label [[ABS_RHSR_LESS_THAN_ABS_RHSI:%.*]]
 // X86WINPRMTD_STRICT:       abs_rhsr_greater_or_equal_abs_rhsi:
@@ -2658,8 +2658,8 @@ _Complex double muld(_Complex double a, _Complex double b) {
 // X86WINPRMTD_STRICT-NEXT:    [[B_REAL:%.*]] = load double, ptr [[B_REALP]], align 8
 // X86WINPRMTD_STRICT-NEXT:    [[B_IMAGP:%.*]] = getelementptr inbounds nuw { double, double }, ptr [[B]], i32 0, i32 1
 // X86WINPRMTD_STRICT-NEXT:    [[B_IMAG:%.*]] = load double, ptr [[B_IMAGP]], align 8
-// X86WINPRMTD_STRICT-NEXT:    [[TMP0:%.*]] = call double @llvm.fabs.f64(double [[B_REAL]]) #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP1:%.*]] = call double @llvm.fabs.f64(double [[B_IMAG]]) #[[ATTR3]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP0:%.*]] = call double @llvm.fabs.f64(double [[B_REAL]])
+// X86WINPRMTD_STRICT-NEXT:    [[TMP1:%.*]] = call double @llvm.fabs.f64(double [[B_IMAG]])
 // X86WINPRMTD_STRICT-NEXT:    [[ABS_CMP:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f64(double [[TMP0]], double [[TMP1]], metadata !"ugt", metadata !"fpexcept.strict") #[[ATTR3]]
 // X86WINPRMTD_STRICT-NEXT:    br i1 [[ABS_CMP]], label [[ABS_RHSR_GREATER_OR_EQUAL_ABS_RHSI:%.*]], label [[ABS_RHSR_LESS_THAN_ABS_RHSI:%.*]]
 // X86WINPRMTD_STRICT:       abs_rhsr_greater_or_equal_abs_rhsi:
@@ -2713,8 +2713,8 @@ _Complex double muld(_Complex double a, _Complex double b) {
 // PRMTD_STRICT-NEXT:    [[B_REAL:%.*]] = load x86_fp80, ptr [[B_REALP]], align 16
 // PRMTD_STRICT-NEXT:    [[B_IMAGP:%.*]] = getelementptr inbounds nuw { x86_fp80, x86_fp80 }, ptr [[B]], i32 0, i32 1
 // PRMTD_STRICT-NEXT:    [[B_IMAG:%.*]] = load x86_fp80, ptr [[B_IMAGP]], align 16
-// PRMTD_STRICT-NEXT:    [[TMP0:%.*]] = call x86_fp80 @llvm.fabs.f80(x86_fp80 [[B_REAL]]) #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP1:%.*]] = call x86_fp80 @llvm.fabs.f80(x86_fp80 [[B_IMAG]]) #[[ATTR4]]
+// PRMTD_STRICT-NEXT:    [[TMP0:%.*]] = call x86_fp80 @llvm.fabs.f80(x86_fp80 [[B_REAL]])
+// PRMTD_STRICT-NEXT:    [[TMP1:%.*]] = call x86_fp80 @llvm.fabs.f80(x86_fp80 [[B_IMAG]])
 // PRMTD_STRICT-NEXT:    [[ABS_CMP:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f80(x86_fp80 [[TMP0]], x86_fp80 [[TMP1]], metadata !"ugt", metadata !"fpexcept.strict") #[[ATTR4]]
 // PRMTD_STRICT-NEXT:    br i1 [[ABS_CMP]], label [[ABS_RHSR_GREATER_OR_EQUAL_ABS_RHSI:%.*]], label [[ABS_RHSR_LESS_THAN_ABS_RHSI:%.*]]
 // PRMTD_STRICT:       abs_rhsr_greater_or_equal_abs_rhsi:
@@ -3961,8 +3961,8 @@ _Complex long double mulld(_Complex long double a, _Complex long double b) {
 // X86WINPRMTD_STRICT-NEXT:    [[C_IMAG:%.*]] = load float, ptr [[C_IMAGP]], align 4
 // X86WINPRMTD_STRICT-NEXT:    [[CONV:%.*]] = call double @llvm.experimental.constrained.fpext.f64.f32(float [[C_REAL]], metadata !"fpexcept.strict") #[[ATTR3]]
 // X86WINPRMTD_STRICT-NEXT:    [[CONV1:%.*]] = call double @llvm.experimental.constrained.fpext.f64.f32(float [[C_IMAG]], metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP0:%.*]] = call double @llvm.fabs.f64(double [[CONV]]) #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP1:%.*]] = call double @llvm.fabs.f64(double [[CONV1]]) #[[ATTR3]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP0:%.*]] = call double @llvm.fabs.f64(double [[CONV]])
+// X86WINPRMTD_STRICT-NEXT:    [[TMP1:%.*]] = call double @llvm.fabs.f64(double [[CONV1]])
 // X86WINPRMTD_STRICT-NEXT:    [[ABS_CMP:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f64(double [[TMP0]], double [[TMP1]], metadata !"ugt", metadata !"fpexcept.strict") #[[ATTR3]]
 // X86WINPRMTD_STRICT-NEXT:    br i1 [[ABS_CMP]], label [[ABS_RHSR_GREATER_OR_EQUAL_ABS_RHSI:%.*]], label [[ABS_RHSR_LESS_THAN_ABS_RHSI:%.*]]
 // X86WINPRMTD_STRICT:       abs_rhsr_greater_or_equal_abs_rhsi:
@@ -4038,8 +4038,8 @@ _Complex long double mulld(_Complex long double a, _Complex long double b) {
 // PRMTD_STRICT-NEXT:    [[C_IMAG:%.*]] = load float, ptr [[C_IMAGP]], align 4
 // PRMTD_STRICT-NEXT:    [[CONV:%.*]] = call x86_fp80 @llvm.experimental.constrained.fpext.f80.f32(float [[C_REAL]], metadata !"fpexcept.strict") #[[ATTR4]]
 // PRMTD_STRICT-NEXT:    [[CONV1:%.*]] = call x86_fp80 @llvm.experimental.constrained.fpext.f80.f32(float [[C_IMAG]], metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP0:%.*]] = call x86_fp80 @llvm.fabs.f80(x86_fp80 [[CONV]]) #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP1:%.*]] = call x86_fp80 @llvm.fabs.f80(x86_fp80 [[CONV1]]) #[[ATTR4]]
+// PRMTD_STRICT-NEXT:    [[TMP0:%.*]] = call x86_fp80 @llvm.fabs.f80(x86_fp80 [[CONV]])
+// PRMTD_STRICT-NEXT:    [[TMP1:%.*]] = call x86_fp80 @llvm.fabs.f80(x86_fp80 [[CONV1]])
 // PRMTD_STRICT-NEXT:    [[ABS_CMP:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f80(x86_fp80 [[TMP0]], x86_fp80 [[TMP1]], metadata !"ugt", metadata !"fpexcept.strict") #[[ATTR4]]
 // PRMTD_STRICT-NEXT:    br i1 [[ABS_CMP]], label [[ABS_RHSR_GREATER_OR_EQUAL_ABS_RHSI:%.*]], label [[ABS_RHSR_LESS_THAN_ABS_RHSI:%.*]]
 // PRMTD_STRICT:       abs_rhsr_greater_or_equal_abs_rhsi:
diff --git a/clang/test/CodeGen/isfpclass.c b/clang/test/CodeGen/isfpclass.c
index 1bf60b8fbca176..58849eec340c67 100644
--- a/clang/test/CodeGen/isfpclass.c
+++ b/clang/test/CodeGen/isfpclass.c
@@ -15,7 +15,7 @@ _Bool check_isfpclass_finite(float x) {
 // CHECK-LABEL: define dso_local noundef i1 @check_isfpclass_finite_strict
 // CHECK-SAME: (float noundef [[X:%.*]]) local_unnamed_addr #[[ATTR2:[0-9]+]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[TMP0:%.*]] = tail call i1 @llvm.is.fpclass.f32(float [[X]], i32 504) #[[ATTR5:[0-9]+]]
+// CHECK-NEXT:    [[TMP0:%.*]] = tail call i1 @llvm.is.fpclass.f32(float [[X]], i32 504)
 // CHECK-NEXT:    ret i1 [[TMP0]]
 //
 _Bool check_isfpclass_finite_strict(float x) {
@@ -36,7 +36,7 @@ _Bool check_isfpclass_nan_f32(float x) {
 // CHECK-LABEL: define dso_local noundef i1 @check_isfpclass_nan_f32_strict
 // CHECK-SAME: (float noundef [[X:%.*]]) local_unnamed_addr #[[ATTR2]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[TMP0:%.*]] = tail call i1 @llvm.is.fpclass.f32(float [[X]], i32 3) #[[ATTR5]]
+// CHECK-NEXT:    [[TMP0:%.*]] = tail call i1 @llvm.is.fpclass.f32(float [[X]], i32 3)
 // CHECK-NEXT:    ret i1 [[TMP0]]
 //
 _Bool check_isfpclass_nan_f32_strict(float x) {
@@ -57,7 +57,7 @@ _Bool check_isfpclass_snan_f64(double x) {
 // CHECK-LABEL: define dso_local noundef i1 @check_isfpclass_snan_f64_strict
 // CHECK-SAME: (double noundef [[X:%.*]]) local_unnamed_addr #[[ATTR2]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[TMP0:%.*]] = tail call i1 @llvm.is.fpclass.f64(double [[X]], i32 1) #[[ATTR5]]
+// CHECK-NEXT:    [[TMP0:%.*]] = tail call i1 @llvm.is.fpclass.f64(double [[X]], i32 1)
 // CHECK-NEXT:    ret i1 [[TMP0]]
 //
 _Bool check_isfpclass_snan_f64_strict(double x) {
@@ -78,7 +78,7 @@ _Bool check_isfpclass_zero_f16(_Float16 x) {
 // CHECK-LABEL: define dso_local noundef i1 @check_isfpclass_zero_f16_strict
 // CHECK-SAME: (half noundef [[X:%.*]]) local_unnamed_addr #[[ATTR2]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[TMP0:%.*]] = tail call i1 @llvm.is.fpclass.f16(half [[X]], i32 96) #[[ATTR5]]
+// CHECK-NEXT:    [[TMP0:%.*]] = tail call i1 @llvm.is.fpclass.f16(half [[X]], i32 96)
 // CHECK-NEXT:    ret i1 [[TMP0]]
 //
 _Bool check_isfpclass_zero_f16_strict(_Float16 x) {
@@ -89,7 +89,7 @@ _Bool check_isfpclass_zero_f16_strict(_Float16 x) {
 // CHECK-LABEL: define dso_local noundef i1 @check_isnan
 // CHECK-SAME: (float noundef [[X:%.*]]) local_unnamed_addr #[[ATTR2]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[TMP0:%.*]] = tail call i1 @llvm.is.fpclass.f32(float [[X]], i32 3) #[[ATTR5]]
+// CHECK-NEXT:    [[TMP0:%.*]] = tail call i1 @llvm.is.fpclass.f32(float [[X]], i32 3)
 // CHECK-NEXT:    ret i1 [[TMP0]]
 //
 _Bool check_isnan(float x) {
@@ -100,7 +100,7 @@ _Bool check_isnan(float x) {
 // CHECK-LABEL: define dso_local noundef i1 @check_isinf
 // CHECK-SAME: (float noundef [[X:%.*]]) local_unnamed_addr #[[ATTR2]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[TMP0:%.*]] = tail call i1 @llvm.is.fpclass.f32(float [[X]], i32 516) #[[ATTR5]]
+// CHECK-NEXT:    [[TMP0:%.*]] = tail call i1 @llvm.is.fpclass.f32(float [[X]], i32 516)
 // CHECK-NEXT:    ret i1 [[TMP0]]
 //
 _Bool check_isinf(float x) {
@@ -111,7 +111,7 @@ _Bool check_isinf(float x) {
 // CHECK-LABEL: define dso_local noundef i1 @check_isfinite
 // CHECK-SAME: (float noundef [[X:%.*]]) local_unnamed_addr #[[ATTR2]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[TMP0:%.*]] = tail call i1 @llvm.is.fpclass.f32(float [[X]], i32 504) #[[ATTR5]]
+// CHECK-NEXT:    [[TMP0:%.*]] = tail call i1 @llvm.is.fpclass.f32(float [[X]], i32 504)
 // CHECK-NEXT:    ret i1 [[TMP0]]
 //
 _Bool check_isfinite(float x) {
@@ -122,7 +122,7 @@ _Bool check_isfinite(float x) {
 // CHECK-LABEL: define dso_local noundef i1 @check_isnormal
 // CHECK-SAME: (float noundef [[X:%.*]]) local_unnamed_addr #[[ATTR2]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[TMP0:%.*]] = tail call i1 @llvm.is.fpclass.f32(float [[X]], i32 264) #[[ATTR5]]
+// CHECK-NEXT:    [[TMP0:%.*]] = tail call i1 @llvm.is.fpclass.f32(float [[X]], i32 264)
 // CHECK-NEXT:    ret i1 [[TMP0]]
 //
 _Bool check_isnormal(float x) {
@@ -150,7 +150,7 @@ int4 check_isfpclass_nan_v4f32(float4 x) {
 // CHECK-LABEL: define dso_local range(i32 0, 2) <4 x i32> @check_isfpclass_nan_strict_v4f32
 // CHECK-SAME: (<4 x float> noundef [[X:%.*]]) local_unnamed_addr #[[ATTR2]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[TMP0:%.*]] = tail call <4 x i1> @llvm.is.fpclass.v4f32(<4 x float> [[X]], i32 3) #[[ATTR5]]
+// CHECK-NEXT:    [[TMP0:%.*]] = tail call <4 x i1> @llvm.is.fpclass.v4f32(<4 x float> [[X]], i32 3)
 // CHECK-NEXT:    [[TMP1:%.*]] = zext <4 x i1> [[TMP0]] to <4 x i32>
 // CHECK-NEXT:    ret <4 x i32> [[TMP1]]
 //
diff --git a/clang/test/CodeGen/strictfp-elementwise-bulitins.cpp b/clang/test/CodeGen/strictfp-elementwise-bulitins.cpp
index 175ad22601839d..9ba1a557a90fee 100644
--- a/clang/test/CodeGen/strictfp-elementwise-bulitins.cpp
+++ b/clang/test/CodeGen/strictfp-elementwise-bulitins.cpp
@@ -20,7 +20,7 @@ float4 strict_fadd(float4 a, float4 b) {
 // CHECK-LABEL: define dso_local noundef <4 x float> @_Z22strict_elementwise_absDv4_f
 // CHECK-SAME: (<4 x float> noundef [[A:%.*]]) local_unnamed_addr #[[ATTR2:[0-9]+]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[ELT_ABS:%.*]] = tail call <4 x float> @llvm.fabs.v4f32(<4 x float> [[A]]) #[[ATTR4]]
+// CHECK-NEXT:    [[ELT_ABS:%.*]] = tail call <4 x float> @llvm.fabs.v4f32(<4 x float> [[A]])
 // CHECK-NEXT:    ret <4 x float> [[ELT_ABS]]
 //
 float4 strict_elementwise_abs(float4 a) {
@@ -50,7 +50,7 @@ float4 strict_elementwise_min(float4 a, float4 b) {
 // CHECK-LABEL: define dso_local noundef <4 x float> @_Z26strict_elementwise_maximumDv4_fS_
 // CHECK-SAME: (<4 x float> noundef [[A:%.*]], <4 x float> noundef [[B:%.*]]) local_unnamed_addr #[[ATTR2]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[ELT_MAXIMUM:%.*]] = tail call <4 x float> @llvm.maximum.v4f32(<4 x float> [[A]], <4 x float> [[B]]) #[[ATTR4]]
+// CHECK-NEXT:    [[ELT_MAXIMUM:%.*]] = tail call <4 x float> @llvm.maximum.v4f32(<4 x float> [[A]], <4 x float> [[B]])
 // CHECK-NEXT:    ret <4 x float> [[ELT_MAXIMUM]]
 //
 float4 strict_elementwise_maximum(float4 a, float4 b) {
@@ -60,7 +60,7 @@ float4 strict_elementwise_maximum(float4 a, float4 b) {
 // CHECK-LABEL: define dso_local noundef <4 x float> @_Z26strict_elementwise_minimumDv4_fS_
 // CHECK-SAME: (<4 x float> noundef [[A:%.*]], <4 x float> noundef [[B:%.*]]) local_unnamed_addr #[[ATTR2]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[ELT_MINIMUM:%.*]] = tail call <4 x float> @llvm.minimum.v4f32(<4 x float> [[A]], <4 x float> [[B]]) #[[ATTR4]]
+// CHECK-NEXT:    [[ELT_MINIMUM:%.*]] = tail call <4 x float> @llvm.minimum.v4f32(<4 x float> [[A]], <4 x float> [[B]])
 // CHECK-NEXT:    ret <4 x float> [[ELT_MINIMUM]]
 //
 float4 strict_elementwise_minimum(float4 a, float4 b) {
@@ -70,7 +70,7 @@ float4 strict_elementwise_minimum(float4 a, float4 b) {
 // CHECK-LABEL: define dso_local noundef <4 x float> @_Z23strict_elementwise_ceilDv4_f
 // CHECK-SAME: (<4 x float> noundef [[A:%.*]]) local_unnamed_addr #[[ATTR2]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Jan 13, 2025

@llvm/pr-subscribers-clang

Author: Serge Pavlov (spavloff)

Changes

RBuilder incorrectry assigned the attribute "strictfp" to any intrinsic function call in some cases inside strictfp functions. Accordinf to the documentation in https://llvm.org/docs/LangRef.html#function-attributes:

    This attribute indicates that the function was called from a scope
    that requires strict floating-point semantics. LLVM will not attempt
    any optimizations that require assumptions about the floating-point
    rounding mode or that might alter the state of floating-point status
    flags that might otherwise be set or cleared by calling this
    function.

Intrinsic functions like llvm.fabs do not depend on rounding mode, they also never raise any FP exception, so these restrictions make no sense for them. If a function interacts with FP environment, it must be either a constrained intrinsic or an external function.


Patch is 289.17 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/122735.diff

16 Files Affected:

  • (modified) clang/test/CodeGen/SystemZ/strictfp_builtins.c (+9-9)
  • (modified) clang/test/CodeGen/X86/strictfp_builtins.c (+3-3)
  • (modified) clang/test/CodeGen/cx-complex-range.c (+10-10)
  • (modified) clang/test/CodeGen/isfpclass.c (+9-9)
  • (modified) clang/test/CodeGen/strictfp-elementwise-bulitins.cpp (+31-32)
  • (modified) clang/test/CodeGen/strictfp_builtins.c (+10-10)
  • (modified) clang/test/CodeGenOpenCL/cl20-device-side-enqueue-attributes.cl (+4-4)
  • (modified) llvm/include/llvm/IR/IRBuilder.h (+2-14)
  • (modified) llvm/include/llvm/IR/IntrinsicInst.h (+3)
  • (modified) llvm/lib/IR/IRBuilder.cpp (+28)
  • (modified) llvm/lib/IR/IntrinsicInst.cpp (+11)
  • (modified) llvm/test/CodeGen/AMDGPU/amdgpu-simplify-libcall-pown.ll (+5-5)
  • (modified) llvm/test/CodeGen/AMDGPU/global_atomic_optimizer_fp_rtn.ll (+294-294)
  • (modified) llvm/test/CodeGen/AMDGPU/global_atomics_optimizer_fp_no_rtn.ll (+260-300)
  • (modified) llvm/test/Transforms/AtomicExpand/AMDGPU/expand-atomic-rmw-fadd.ll (+2-2)
  • (modified) llvm/test/Transforms/HardwareLoops/scalar-while-strictfp.ll (+28-28)
diff --git a/clang/test/CodeGen/SystemZ/strictfp_builtins.c b/clang/test/CodeGen/SystemZ/strictfp_builtins.c
index 8c8f1f4cabd742..b60fd932d31c3f 100644
--- a/clang/test/CodeGen/SystemZ/strictfp_builtins.c
+++ b/clang/test/CodeGen/SystemZ/strictfp_builtins.c
@@ -9,7 +9,7 @@
 // CHECK-NEXT:    [[F_ADDR:%.*]] = alloca float, align 4
 // CHECK-NEXT:    store float [[F:%.*]], ptr [[F_ADDR]], align 4
 // CHECK-NEXT:    [[TMP0:%.*]] = load float, ptr [[F_ADDR]], align 4
-// CHECK-NEXT:    [[TMP1:%.*]] = call i32 @llvm.s390.tdc.f32(float [[TMP0]], i64 15) #[[ATTR2:[0-9]+]]
+// CHECK-NEXT:    [[TMP1:%.*]] = call i32 @llvm.s390.tdc.f32(float [[TMP0]], i64 15)
 // CHECK-NEXT:    ret i32 [[TMP1]]
 //
 int test_isnan_float(float f) {
@@ -21,7 +21,7 @@ int test_isnan_float(float f) {
 // CHECK-NEXT:    [[D_ADDR:%.*]] = alloca double, align 8
 // CHECK-NEXT:    store double [[D:%.*]], ptr [[D_ADDR]], align 8
 // CHECK-NEXT:    [[TMP0:%.*]] = load double, ptr [[D_ADDR]], align 8
-// CHECK-NEXT:    [[TMP1:%.*]] = call i32 @llvm.s390.tdc.f64(double [[TMP0]], i64 15) #[[ATTR2]]
+// CHECK-NEXT:    [[TMP1:%.*]] = call i32 @llvm.s390.tdc.f64(double [[TMP0]], i64 15)
 // CHECK-NEXT:    ret i32 [[TMP1]]
 //
 int test_isnan_double(double d) {
@@ -34,7 +34,7 @@ int test_isnan_double(double d) {
 // CHECK-NEXT:    [[LD:%.*]] = load fp128, ptr [[TMP0:%.*]], align 8
 // CHECK-NEXT:    store fp128 [[LD]], ptr [[LD_ADDR]], align 8
 // CHECK-NEXT:    [[TMP1:%.*]] = load fp128, ptr [[LD_ADDR]], align 8
-// CHECK-NEXT:    [[TMP2:%.*]] = call i32 @llvm.s390.tdc.f128(fp128 [[TMP1]], i64 15) #[[ATTR2]]
+// CHECK-NEXT:    [[TMP2:%.*]] = call i32 @llvm.s390.tdc.f128(fp128 [[TMP1]], i64 15)
 // CHECK-NEXT:    ret i32 [[TMP2]]
 //
 int test_isnan_long_double(long double ld) {
@@ -46,7 +46,7 @@ int test_isnan_long_double(long double ld) {
 // CHECK-NEXT:    [[F_ADDR:%.*]] = alloca float, align 4
 // CHECK-NEXT:    store float [[F:%.*]], ptr [[F_ADDR]], align 4
 // CHECK-NEXT:    [[TMP0:%.*]] = load float, ptr [[F_ADDR]], align 4
-// CHECK-NEXT:    [[TMP1:%.*]] = call i32 @llvm.s390.tdc.f32(float [[TMP0]], i64 48) #[[ATTR2]]
+// CHECK-NEXT:    [[TMP1:%.*]] = call i32 @llvm.s390.tdc.f32(float [[TMP0]], i64 48)
 // CHECK-NEXT:    ret i32 [[TMP1]]
 //
 int test_isinf_float(float f) {
@@ -58,7 +58,7 @@ int test_isinf_float(float f) {
 // CHECK-NEXT:    [[D_ADDR:%.*]] = alloca double, align 8
 // CHECK-NEXT:    store double [[D:%.*]], ptr [[D_ADDR]], align 8
 // CHECK-NEXT:    [[TMP0:%.*]] = load double, ptr [[D_ADDR]], align 8
-// CHECK-NEXT:    [[TMP1:%.*]] = call i32 @llvm.s390.tdc.f64(double [[TMP0]], i64 48) #[[ATTR2]]
+// CHECK-NEXT:    [[TMP1:%.*]] = call i32 @llvm.s390.tdc.f64(double [[TMP0]], i64 48)
 // CHECK-NEXT:    ret i32 [[TMP1]]
 //
 int test_isinf_double(double d) {
@@ -71,7 +71,7 @@ int test_isinf_double(double d) {
 // CHECK-NEXT:    [[LD:%.*]] = load fp128, ptr [[TMP0:%.*]], align 8
 // CHECK-NEXT:    store fp128 [[LD]], ptr [[LD_ADDR]], align 8
 // CHECK-NEXT:    [[TMP1:%.*]] = load fp128, ptr [[LD_ADDR]], align 8
-// CHECK-NEXT:    [[TMP2:%.*]] = call i32 @llvm.s390.tdc.f128(fp128 [[TMP1]], i64 48) #[[ATTR2]]
+// CHECK-NEXT:    [[TMP2:%.*]] = call i32 @llvm.s390.tdc.f128(fp128 [[TMP1]], i64 48)
 // CHECK-NEXT:    ret i32 [[TMP2]]
 //
 int test_isinf_long_double(long double ld) {
@@ -83,7 +83,7 @@ int test_isinf_long_double(long double ld) {
 // CHECK-NEXT:    [[F_ADDR:%.*]] = alloca float, align 4
 // CHECK-NEXT:    store float [[F:%.*]], ptr [[F_ADDR]], align 4
 // CHECK-NEXT:    [[TMP0:%.*]] = load float, ptr [[F_ADDR]], align 4
-// CHECK-NEXT:    [[TMP1:%.*]] = call i32 @llvm.s390.tdc.f32(float [[TMP0]], i64 4032) #[[ATTR2]]
+// CHECK-NEXT:    [[TMP1:%.*]] = call i32 @llvm.s390.tdc.f32(float [[TMP0]], i64 4032)
 // CHECK-NEXT:    ret i32 [[TMP1]]
 //
 int test_isfinite_float(float f) {
@@ -95,7 +95,7 @@ int test_isfinite_float(float f) {
 // CHECK-NEXT:    [[D_ADDR:%.*]] = alloca double, align 8
 // CHECK-NEXT:    store double [[D:%.*]], ptr [[D_ADDR]], align 8
 // CHECK-NEXT:    [[TMP0:%.*]] = load double, ptr [[D_ADDR]], align 8
-// CHECK-NEXT:    [[TMP1:%.*]] = call i32 @llvm.s390.tdc.f64(double [[TMP0]], i64 4032) #[[ATTR2]]
+// CHECK-NEXT:    [[TMP1:%.*]] = call i32 @llvm.s390.tdc.f64(double [[TMP0]], i64 4032)
 // CHECK-NEXT:    ret i32 [[TMP1]]
 //
 int test_isfinite_double(double d) {
@@ -108,7 +108,7 @@ int test_isfinite_double(double d) {
 // CHECK-NEXT:    [[LD:%.*]] = load fp128, ptr [[TMP0:%.*]], align 8
 // CHECK-NEXT:    store fp128 [[LD]], ptr [[LD_ADDR]], align 8
 // CHECK-NEXT:    [[TMP1:%.*]] = load fp128, ptr [[LD_ADDR]], align 8
-// CHECK-NEXT:    [[TMP2:%.*]] = call i32 @llvm.s390.tdc.f128(fp128 [[TMP1]], i64 4032) #[[ATTR2]]
+// CHECK-NEXT:    [[TMP2:%.*]] = call i32 @llvm.s390.tdc.f128(fp128 [[TMP1]], i64 4032)
 // CHECK-NEXT:    ret i32 [[TMP2]]
 //
 int test_isfinite_long_double(long double ld) {
diff --git a/clang/test/CodeGen/X86/strictfp_builtins.c b/clang/test/CodeGen/X86/strictfp_builtins.c
index 43e4060bef259b..c4a22dc5ee90fd 100644
--- a/clang/test/CodeGen/X86/strictfp_builtins.c
+++ b/clang/test/CodeGen/X86/strictfp_builtins.c
@@ -27,7 +27,7 @@ void p(char *str, int x) {
 // CHECK-NEXT:    [[LD_ADDR:%.*]] = alloca x86_fp80, align 16
 // CHECK-NEXT:    store x86_fp80 [[LD:%.*]], ptr [[LD_ADDR]], align 16
 // CHECK-NEXT:    [[TMP0:%.*]] = load x86_fp80, ptr [[LD_ADDR]], align 16
-// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.is.fpclass.f80(x86_fp80 [[TMP0]], i32 516) #[[ATTR3]]
+// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.is.fpclass.f80(x86_fp80 [[TMP0]], i32 516)
 // CHECK-NEXT:    [[TMP2:%.*]] = zext i1 [[TMP1]] to i32
 // CHECK-NEXT:    call void @p(ptr noundef @.str.1, i32 noundef [[TMP2]]) #[[ATTR3]]
 // CHECK-NEXT:    ret void
@@ -43,7 +43,7 @@ void test_long_double_isinf(long double ld) {
 // CHECK-NEXT:    [[LD_ADDR:%.*]] = alloca x86_fp80, align 16
 // CHECK-NEXT:    store x86_fp80 [[LD:%.*]], ptr [[LD_ADDR]], align 16
 // CHECK-NEXT:    [[TMP0:%.*]] = load x86_fp80, ptr [[LD_ADDR]], align 16
-// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.is.fpclass.f80(x86_fp80 [[TMP0]], i32 504) #[[ATTR3]]
+// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.is.fpclass.f80(x86_fp80 [[TMP0]], i32 504)
 // CHECK-NEXT:    [[TMP2:%.*]] = zext i1 [[TMP1]] to i32
 // CHECK-NEXT:    call void @p(ptr noundef @.str.2, i32 noundef [[TMP2]]) #[[ATTR3]]
 // CHECK-NEXT:    ret void
@@ -59,7 +59,7 @@ void test_long_double_isfinite(long double ld) {
 // CHECK-NEXT:    [[LD_ADDR:%.*]] = alloca x86_fp80, align 16
 // CHECK-NEXT:    store x86_fp80 [[LD:%.*]], ptr [[LD_ADDR]], align 16
 // CHECK-NEXT:    [[TMP0:%.*]] = load x86_fp80, ptr [[LD_ADDR]], align 16
-// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.is.fpclass.f80(x86_fp80 [[TMP0]], i32 3) #[[ATTR3]]
+// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.is.fpclass.f80(x86_fp80 [[TMP0]], i32 3)
 // CHECK-NEXT:    [[TMP2:%.*]] = zext i1 [[TMP1]] to i32
 // CHECK-NEXT:    call void @p(ptr noundef @.str.3, i32 noundef [[TMP2]]) #[[ATTR3]]
 // CHECK-NEXT:    ret void
diff --git a/clang/test/CodeGen/cx-complex-range.c b/clang/test/CodeGen/cx-complex-range.c
index 88300041061aae..98735c4671ce52 100644
--- a/clang/test/CodeGen/cx-complex-range.c
+++ b/clang/test/CodeGen/cx-complex-range.c
@@ -1575,8 +1575,8 @@ _Complex float mulf(_Complex float a, _Complex float b) {
 // X86WINPRMTD_STRICT-NEXT:    [[B_REAL:%.*]] = load double, ptr [[B_REALP]], align 8
 // X86WINPRMTD_STRICT-NEXT:    [[B_IMAGP:%.*]] = getelementptr inbounds nuw { double, double }, ptr [[B]], i32 0, i32 1
 // X86WINPRMTD_STRICT-NEXT:    [[B_IMAG:%.*]] = load double, ptr [[B_IMAGP]], align 8
-// X86WINPRMTD_STRICT-NEXT:    [[TMP0:%.*]] = call double @llvm.fabs.f64(double [[B_REAL]]) #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP1:%.*]] = call double @llvm.fabs.f64(double [[B_IMAG]]) #[[ATTR3]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP0:%.*]] = call double @llvm.fabs.f64(double [[B_REAL]])
+// X86WINPRMTD_STRICT-NEXT:    [[TMP1:%.*]] = call double @llvm.fabs.f64(double [[B_IMAG]])
 // X86WINPRMTD_STRICT-NEXT:    [[ABS_CMP:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f64(double [[TMP0]], double [[TMP1]], metadata !"ugt", metadata !"fpexcept.strict") #[[ATTR3]]
 // X86WINPRMTD_STRICT-NEXT:    br i1 [[ABS_CMP]], label [[ABS_RHSR_GREATER_OR_EQUAL_ABS_RHSI:%.*]], label [[ABS_RHSR_LESS_THAN_ABS_RHSI:%.*]]
 // X86WINPRMTD_STRICT:       abs_rhsr_greater_or_equal_abs_rhsi:
@@ -2658,8 +2658,8 @@ _Complex double muld(_Complex double a, _Complex double b) {
 // X86WINPRMTD_STRICT-NEXT:    [[B_REAL:%.*]] = load double, ptr [[B_REALP]], align 8
 // X86WINPRMTD_STRICT-NEXT:    [[B_IMAGP:%.*]] = getelementptr inbounds nuw { double, double }, ptr [[B]], i32 0, i32 1
 // X86WINPRMTD_STRICT-NEXT:    [[B_IMAG:%.*]] = load double, ptr [[B_IMAGP]], align 8
-// X86WINPRMTD_STRICT-NEXT:    [[TMP0:%.*]] = call double @llvm.fabs.f64(double [[B_REAL]]) #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP1:%.*]] = call double @llvm.fabs.f64(double [[B_IMAG]]) #[[ATTR3]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP0:%.*]] = call double @llvm.fabs.f64(double [[B_REAL]])
+// X86WINPRMTD_STRICT-NEXT:    [[TMP1:%.*]] = call double @llvm.fabs.f64(double [[B_IMAG]])
 // X86WINPRMTD_STRICT-NEXT:    [[ABS_CMP:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f64(double [[TMP0]], double [[TMP1]], metadata !"ugt", metadata !"fpexcept.strict") #[[ATTR3]]
 // X86WINPRMTD_STRICT-NEXT:    br i1 [[ABS_CMP]], label [[ABS_RHSR_GREATER_OR_EQUAL_ABS_RHSI:%.*]], label [[ABS_RHSR_LESS_THAN_ABS_RHSI:%.*]]
 // X86WINPRMTD_STRICT:       abs_rhsr_greater_or_equal_abs_rhsi:
@@ -2713,8 +2713,8 @@ _Complex double muld(_Complex double a, _Complex double b) {
 // PRMTD_STRICT-NEXT:    [[B_REAL:%.*]] = load x86_fp80, ptr [[B_REALP]], align 16
 // PRMTD_STRICT-NEXT:    [[B_IMAGP:%.*]] = getelementptr inbounds nuw { x86_fp80, x86_fp80 }, ptr [[B]], i32 0, i32 1
 // PRMTD_STRICT-NEXT:    [[B_IMAG:%.*]] = load x86_fp80, ptr [[B_IMAGP]], align 16
-// PRMTD_STRICT-NEXT:    [[TMP0:%.*]] = call x86_fp80 @llvm.fabs.f80(x86_fp80 [[B_REAL]]) #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP1:%.*]] = call x86_fp80 @llvm.fabs.f80(x86_fp80 [[B_IMAG]]) #[[ATTR4]]
+// PRMTD_STRICT-NEXT:    [[TMP0:%.*]] = call x86_fp80 @llvm.fabs.f80(x86_fp80 [[B_REAL]])
+// PRMTD_STRICT-NEXT:    [[TMP1:%.*]] = call x86_fp80 @llvm.fabs.f80(x86_fp80 [[B_IMAG]])
 // PRMTD_STRICT-NEXT:    [[ABS_CMP:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f80(x86_fp80 [[TMP0]], x86_fp80 [[TMP1]], metadata !"ugt", metadata !"fpexcept.strict") #[[ATTR4]]
 // PRMTD_STRICT-NEXT:    br i1 [[ABS_CMP]], label [[ABS_RHSR_GREATER_OR_EQUAL_ABS_RHSI:%.*]], label [[ABS_RHSR_LESS_THAN_ABS_RHSI:%.*]]
 // PRMTD_STRICT:       abs_rhsr_greater_or_equal_abs_rhsi:
@@ -3961,8 +3961,8 @@ _Complex long double mulld(_Complex long double a, _Complex long double b) {
 // X86WINPRMTD_STRICT-NEXT:    [[C_IMAG:%.*]] = load float, ptr [[C_IMAGP]], align 4
 // X86WINPRMTD_STRICT-NEXT:    [[CONV:%.*]] = call double @llvm.experimental.constrained.fpext.f64.f32(float [[C_REAL]], metadata !"fpexcept.strict") #[[ATTR3]]
 // X86WINPRMTD_STRICT-NEXT:    [[CONV1:%.*]] = call double @llvm.experimental.constrained.fpext.f64.f32(float [[C_IMAG]], metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP0:%.*]] = call double @llvm.fabs.f64(double [[CONV]]) #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP1:%.*]] = call double @llvm.fabs.f64(double [[CONV1]]) #[[ATTR3]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP0:%.*]] = call double @llvm.fabs.f64(double [[CONV]])
+// X86WINPRMTD_STRICT-NEXT:    [[TMP1:%.*]] = call double @llvm.fabs.f64(double [[CONV1]])
 // X86WINPRMTD_STRICT-NEXT:    [[ABS_CMP:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f64(double [[TMP0]], double [[TMP1]], metadata !"ugt", metadata !"fpexcept.strict") #[[ATTR3]]
 // X86WINPRMTD_STRICT-NEXT:    br i1 [[ABS_CMP]], label [[ABS_RHSR_GREATER_OR_EQUAL_ABS_RHSI:%.*]], label [[ABS_RHSR_LESS_THAN_ABS_RHSI:%.*]]
 // X86WINPRMTD_STRICT:       abs_rhsr_greater_or_equal_abs_rhsi:
@@ -4038,8 +4038,8 @@ _Complex long double mulld(_Complex long double a, _Complex long double b) {
 // PRMTD_STRICT-NEXT:    [[C_IMAG:%.*]] = load float, ptr [[C_IMAGP]], align 4
 // PRMTD_STRICT-NEXT:    [[CONV:%.*]] = call x86_fp80 @llvm.experimental.constrained.fpext.f80.f32(float [[C_REAL]], metadata !"fpexcept.strict") #[[ATTR4]]
 // PRMTD_STRICT-NEXT:    [[CONV1:%.*]] = call x86_fp80 @llvm.experimental.constrained.fpext.f80.f32(float [[C_IMAG]], metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP0:%.*]] = call x86_fp80 @llvm.fabs.f80(x86_fp80 [[CONV]]) #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP1:%.*]] = call x86_fp80 @llvm.fabs.f80(x86_fp80 [[CONV1]]) #[[ATTR4]]
+// PRMTD_STRICT-NEXT:    [[TMP0:%.*]] = call x86_fp80 @llvm.fabs.f80(x86_fp80 [[CONV]])
+// PRMTD_STRICT-NEXT:    [[TMP1:%.*]] = call x86_fp80 @llvm.fabs.f80(x86_fp80 [[CONV1]])
 // PRMTD_STRICT-NEXT:    [[ABS_CMP:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f80(x86_fp80 [[TMP0]], x86_fp80 [[TMP1]], metadata !"ugt", metadata !"fpexcept.strict") #[[ATTR4]]
 // PRMTD_STRICT-NEXT:    br i1 [[ABS_CMP]], label [[ABS_RHSR_GREATER_OR_EQUAL_ABS_RHSI:%.*]], label [[ABS_RHSR_LESS_THAN_ABS_RHSI:%.*]]
 // PRMTD_STRICT:       abs_rhsr_greater_or_equal_abs_rhsi:
diff --git a/clang/test/CodeGen/isfpclass.c b/clang/test/CodeGen/isfpclass.c
index 1bf60b8fbca176..58849eec340c67 100644
--- a/clang/test/CodeGen/isfpclass.c
+++ b/clang/test/CodeGen/isfpclass.c
@@ -15,7 +15,7 @@ _Bool check_isfpclass_finite(float x) {
 // CHECK-LABEL: define dso_local noundef i1 @check_isfpclass_finite_strict
 // CHECK-SAME: (float noundef [[X:%.*]]) local_unnamed_addr #[[ATTR2:[0-9]+]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[TMP0:%.*]] = tail call i1 @llvm.is.fpclass.f32(float [[X]], i32 504) #[[ATTR5:[0-9]+]]
+// CHECK-NEXT:    [[TMP0:%.*]] = tail call i1 @llvm.is.fpclass.f32(float [[X]], i32 504)
 // CHECK-NEXT:    ret i1 [[TMP0]]
 //
 _Bool check_isfpclass_finite_strict(float x) {
@@ -36,7 +36,7 @@ _Bool check_isfpclass_nan_f32(float x) {
 // CHECK-LABEL: define dso_local noundef i1 @check_isfpclass_nan_f32_strict
 // CHECK-SAME: (float noundef [[X:%.*]]) local_unnamed_addr #[[ATTR2]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[TMP0:%.*]] = tail call i1 @llvm.is.fpclass.f32(float [[X]], i32 3) #[[ATTR5]]
+// CHECK-NEXT:    [[TMP0:%.*]] = tail call i1 @llvm.is.fpclass.f32(float [[X]], i32 3)
 // CHECK-NEXT:    ret i1 [[TMP0]]
 //
 _Bool check_isfpclass_nan_f32_strict(float x) {
@@ -57,7 +57,7 @@ _Bool check_isfpclass_snan_f64(double x) {
 // CHECK-LABEL: define dso_local noundef i1 @check_isfpclass_snan_f64_strict
 // CHECK-SAME: (double noundef [[X:%.*]]) local_unnamed_addr #[[ATTR2]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[TMP0:%.*]] = tail call i1 @llvm.is.fpclass.f64(double [[X]], i32 1) #[[ATTR5]]
+// CHECK-NEXT:    [[TMP0:%.*]] = tail call i1 @llvm.is.fpclass.f64(double [[X]], i32 1)
 // CHECK-NEXT:    ret i1 [[TMP0]]
 //
 _Bool check_isfpclass_snan_f64_strict(double x) {
@@ -78,7 +78,7 @@ _Bool check_isfpclass_zero_f16(_Float16 x) {
 // CHECK-LABEL: define dso_local noundef i1 @check_isfpclass_zero_f16_strict
 // CHECK-SAME: (half noundef [[X:%.*]]) local_unnamed_addr #[[ATTR2]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[TMP0:%.*]] = tail call i1 @llvm.is.fpclass.f16(half [[X]], i32 96) #[[ATTR5]]
+// CHECK-NEXT:    [[TMP0:%.*]] = tail call i1 @llvm.is.fpclass.f16(half [[X]], i32 96)
 // CHECK-NEXT:    ret i1 [[TMP0]]
 //
 _Bool check_isfpclass_zero_f16_strict(_Float16 x) {
@@ -89,7 +89,7 @@ _Bool check_isfpclass_zero_f16_strict(_Float16 x) {
 // CHECK-LABEL: define dso_local noundef i1 @check_isnan
 // CHECK-SAME: (float noundef [[X:%.*]]) local_unnamed_addr #[[ATTR2]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[TMP0:%.*]] = tail call i1 @llvm.is.fpclass.f32(float [[X]], i32 3) #[[ATTR5]]
+// CHECK-NEXT:    [[TMP0:%.*]] = tail call i1 @llvm.is.fpclass.f32(float [[X]], i32 3)
 // CHECK-NEXT:    ret i1 [[TMP0]]
 //
 _Bool check_isnan(float x) {
@@ -100,7 +100,7 @@ _Bool check_isnan(float x) {
 // CHECK-LABEL: define dso_local noundef i1 @check_isinf
 // CHECK-SAME: (float noundef [[X:%.*]]) local_unnamed_addr #[[ATTR2]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[TMP0:%.*]] = tail call i1 @llvm.is.fpclass.f32(float [[X]], i32 516) #[[ATTR5]]
+// CHECK-NEXT:    [[TMP0:%.*]] = tail call i1 @llvm.is.fpclass.f32(float [[X]], i32 516)
 // CHECK-NEXT:    ret i1 [[TMP0]]
 //
 _Bool check_isinf(float x) {
@@ -111,7 +111,7 @@ _Bool check_isinf(float x) {
 // CHECK-LABEL: define dso_local noundef i1 @check_isfinite
 // CHECK-SAME: (float noundef [[X:%.*]]) local_unnamed_addr #[[ATTR2]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[TMP0:%.*]] = tail call i1 @llvm.is.fpclass.f32(float [[X]], i32 504) #[[ATTR5]]
+// CHECK-NEXT:    [[TMP0:%.*]] = tail call i1 @llvm.is.fpclass.f32(float [[X]], i32 504)
 // CHECK-NEXT:    ret i1 [[TMP0]]
 //
 _Bool check_isfinite(float x) {
@@ -122,7 +122,7 @@ _Bool check_isfinite(float x) {
 // CHECK-LABEL: define dso_local noundef i1 @check_isnormal
 // CHECK-SAME: (float noundef [[X:%.*]]) local_unnamed_addr #[[ATTR2]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[TMP0:%.*]] = tail call i1 @llvm.is.fpclass.f32(float [[X]], i32 264) #[[ATTR5]]
+// CHECK-NEXT:    [[TMP0:%.*]] = tail call i1 @llvm.is.fpclass.f32(float [[X]], i32 264)
 // CHECK-NEXT:    ret i1 [[TMP0]]
 //
 _Bool check_isnormal(float x) {
@@ -150,7 +150,7 @@ int4 check_isfpclass_nan_v4f32(float4 x) {
 // CHECK-LABEL: define dso_local range(i32 0, 2) <4 x i32> @check_isfpclass_nan_strict_v4f32
 // CHECK-SAME: (<4 x float> noundef [[X:%.*]]) local_unnamed_addr #[[ATTR2]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[TMP0:%.*]] = tail call <4 x i1> @llvm.is.fpclass.v4f32(<4 x float> [[X]], i32 3) #[[ATTR5]]
+// CHECK-NEXT:    [[TMP0:%.*]] = tail call <4 x i1> @llvm.is.fpclass.v4f32(<4 x float> [[X]], i32 3)
 // CHECK-NEXT:    [[TMP1:%.*]] = zext <4 x i1> [[TMP0]] to <4 x i32>
 // CHECK-NEXT:    ret <4 x i32> [[TMP1]]
 //
diff --git a/clang/test/CodeGen/strictfp-elementwise-bulitins.cpp b/clang/test/CodeGen/strictfp-elementwise-bulitins.cpp
index 175ad22601839d..9ba1a557a90fee 100644
--- a/clang/test/CodeGen/strictfp-elementwise-bulitins.cpp
+++ b/clang/test/CodeGen/strictfp-elementwise-bulitins.cpp
@@ -20,7 +20,7 @@ float4 strict_fadd(float4 a, float4 b) {
 // CHECK-LABEL: define dso_local noundef <4 x float> @_Z22strict_elementwise_absDv4_f
 // CHECK-SAME: (<4 x float> noundef [[A:%.*]]) local_unnamed_addr #[[ATTR2:[0-9]+]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[ELT_ABS:%.*]] = tail call <4 x float> @llvm.fabs.v4f32(<4 x float> [[A]]) #[[ATTR4]]
+// CHECK-NEXT:    [[ELT_ABS:%.*]] = tail call <4 x float> @llvm.fabs.v4f32(<4 x float> [[A]])
 // CHECK-NEXT:    ret <4 x float> [[ELT_ABS]]
 //
 float4 strict_elementwise_abs(float4 a) {
@@ -50,7 +50,7 @@ float4 strict_elementwise_min(float4 a, float4 b) {
 // CHECK-LABEL: define dso_local noundef <4 x float> @_Z26strict_elementwise_maximumDv4_fS_
 // CHECK-SAME: (<4 x float> noundef [[A:%.*]], <4 x float> noundef [[B:%.*]]) local_unnamed_addr #[[ATTR2]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[ELT_MAXIMUM:%.*]] = tail call <4 x float> @llvm.maximum.v4f32(<4 x float> [[A]], <4 x float> [[B]]) #[[ATTR4]]
+// CHECK-NEXT:    [[ELT_MAXIMUM:%.*]] = tail call <4 x float> @llvm.maximum.v4f32(<4 x float> [[A]], <4 x float> [[B]])
 // CHECK-NEXT:    ret <4 x float> [[ELT_MAXIMUM]]
 //
 float4 strict_elementwise_maximum(float4 a, float4 b) {
@@ -60,7 +60,7 @@ float4 strict_elementwise_maximum(float4 a, float4 b) {
 // CHECK-LABEL: define dso_local noundef <4 x float> @_Z26strict_elementwise_minimumDv4_fS_
 // CHECK-SAME: (<4 x float> noundef [[A:%.*]], <4 x float> noundef [[B:%.*]]) local_unnamed_addr #[[ATTR2]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[ELT_MINIMUM:%.*]] = tail call <4 x float> @llvm.minimum.v4f32(<4 x float> [[A]], <4 x float> [[B]]) #[[ATTR4]]
+// CHECK-NEXT:    [[ELT_MINIMUM:%.*]] = tail call <4 x float> @llvm.minimum.v4f32(<4 x float> [[A]], <4 x float> [[B]])
 // CHECK-NEXT:    ret <4 x float> [[ELT_MINIMUM]]
 //
 float4 strict_elementwise_minimum(float4 a, float4 b) {
@@ -70,7 +70,7 @@ float4 strict_elementwise_minimum(float4 a, float4 b) {
 // CHECK-LABEL: define dso_local noundef <4 x float> @_Z23strict_elementwise_ceilDv4_f
 // CHECK-SAME: (<4 x float> noundef [[A:%.*]]) local_unnamed_addr #[[ATTR2]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:...
[truncated]

@spavloff
Copy link
Collaborator Author

This patch was obtained from the commit "Don't set strictfp on irrelevant calls" from #109798.

Copy link

github-actions bot commented Jan 13, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

switch (IID) {
#define DAG_INSTRUCTION(NAME, NARG, ROUND_MODE, INTRINSIC, DAGN) \
case Intrinsic::INTRINSIC:
#include "llvm/IR/ConstrainedOps.def"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem right. We have intrinsics to explicitly access the FP environment (llvm.set.rounding etc.). And any intrinsic which can call user code can access the FP environment.

You might need to specify this in the .td files (IntrStrictFP or something like that).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. I changed the implementation to use access to InaccessibleMemory. It is not precise, because this access type is used by other intrinsics too. In future we could add new type of "memory" to represent FP control and status bit and make this check precise.

bool IntrinsicInst::canAccessFPEnvironment(LLVMContext &C, Intrinsic::ID IID) {
AttributeList Attrs = Intrinsic::getAttributes(C, IID);
MemoryEffects ME = Attrs.getMemoryEffects();
return ME.onlyAccessesInaccessibleMem();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you want doesAccessInaccessibleMem() (any function that can access inaccessible memory, not just functions which only access inaccessible memory). Which apparently doesn't exist at the moment, but should be straightforward to implement.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, fixed, thank you!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't a large fraction of existing functions that may or may not have any floating point in them qualify as doesAccessInaccessibleMem()? It seems like if we're going to model the FP environment like "memory" then we need an FP environment modeling flag proper.

@spavloff spavloff added the floating-point Floating-point math label Jan 15, 2025
Copy link
Collaborator

@efriedma-quic efriedma-quic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with one minor comment

if (IsFPConstrained) {
if (const auto *Func = dyn_cast<Function>(Callee)) {
// Some intrinsic functions in non-default FP mode must have FP operand
// bundles to indicate a side effect due to read/write FP environment.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the reference to "FP operand bundles", since they don't exist at the moment.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Please remove the reference to "FP operand bundles", since they don't exist at the moment.

I removed this comment, it came from the bundle implementation and has no value for the current one.

@kpneal
Copy link
Member

kpneal commented Jan 16, 2025

No, this is the wrong approach....

@kpneal
Copy link
Member

kpneal commented Jan 16, 2025

A recap of the history of the strictfp attribute.

Originally we needed a way to prevent the Inliner from inlining a non-strictfp function into a strictfp function. We added the strictfp attribute to the function definition for this. Then we found that basic blocks were being optimized when they didn't belong to a function so we couldn't get to the where the strictfp attribute was located. The solution was to add the attribute to every function call. Simple.

Then I went to work on the Verifier changes. Originally I implemented it by checking to see if the presence of the attribute on a call site matched the presence on the function definition. Easy. But then I ran into a problem: Checks at call sites to see if an attribute is present will look through to the function declaration.

Thus, we had to weaken the rule. Now, the strictfp attribute is allowed at call sites when the attribute is missing from the calling function's definition. If the strictfp attribute is present on the calling function's definition then the strictfp attribute is required on all call sites. This is the rule that is present in my IR Verifier ticket that was approved (D146845) but not pushed because we still have broken tests.

Clearly intrinisics don't get inlined by the Inliner. But what if an intrinsic gets lowered to a call to a real function? Especially if LTO is being used? Will we lose the information that the function now being called needs the strictfp attribute or something similar? Since we don't have a way to mark accessing the FP environment, exactly, then aren't we throwing away information?

Is it still possible to write a check in the Verifier to catch any places where the strictfp attribute, or some equivalent, is missing?

I'm not convinced we're ready to drop any uses of the strictfp attribute without having all places that check for it changed first. And, does it make sense for the Inliner to be unable to inline a function if the called function onlyAccessesInaccessibleMem()? I don't think we're ready for a switchover.

@spavloff
Copy link
Collaborator Author

Thank you for your retrospective view on strictfp. I didn't realize how much it is related to inlining. This use case should be carefully analyzed to refine meaning of the attribute.

Clearly intrinisics don't get inlined by the Inliner. But what if an intrinsic gets lowered to a call to a real function? Especially if LTO is being used? Will we lose the information that the function now being called needs the strictfp attribute or something similar? Since we don't have a way to mark accessing the FP environment, exactly, then aren't we throwing away information?

Intrinsic function is what compiler knows a priori. It knows its effects and properties, in particular how it interacts with FP environment. If it is implemented as a call to real function (with a body) there are no problem, - an external function call has side effect and it may do anything, including access to FP environment. But if that body is inlined, things get more complicated.

  • If the caller is strictfp but the callee implementation is not, the inlined body is transformed into strictfp form by replacing FP operations with their constrained variants.

  • If the caller is not strictfp but the callee is, inlining should not take place (see the file lvm/test/Transforms/Inline/inline-strictfp.ll, test inlined_05).

  • If both caller and callee are strictfp or both are not, this is a trivial case.

In all cases inlining does not create invalid constructs and additional information is not needed.

Is it still possible to write a check in the Verifier to catch any places where the strictfp attribute, or some equivalent, is missing?

I think we need first to refine meaning of strictfp. Maybe this attribute can be used for function body only, and for call site MemoryEffets.doesAccessesInaccessibleMem() would be enough?

@andykaylor
Copy link
Contributor

What's the motivation for this change? Was the strictfp attribute blocking some optimization?

My understanding of the attribute is that it only indicates that strict floating-point semantics are (potentially?) required at the call site. It doesn't indicate that the called function does access the floating-point environment.

@spavloff
Copy link
Collaborator Author

My understanding of the attribute is that it only indicates that strict floating-point semantics are (potentially?) required at the call site.

If strict floating-point semantics are required at this call site, they are are required on every relevant call in this function. It means strictfp is a function attribute. Does anything prevents us from removal strictfp from all call sites?

@andykaylor
Copy link
Contributor

My understanding of the attribute is that it only indicates that strict floating-point semantics are (potentially?) required at the call site.

If strict floating-point semantics are required at this call site, they are are required on every relevant call in this function. It means strictfp is a function attribute. Does anything prevents us from removal strictfp from all call sites?

I think that takes us back to @kpneal's history, "Then we found that basic blocks were being optimized when they didn't belong to a function so we couldn't get to the where the strictfp attribute was located. The solution was to add the attribute to every function call." To handle the case where a block isn't owned by a function, we need the attribute at the call site. I don't know the specifics of how that case arises, but if we remove the attribute from the call site, we would have to do something to add it again when the block gets detached from the function (possibly during cloning?).

@spavloff
Copy link
Collaborator Author

If strict floating-point semantics are required at this call site, they are are required on every relevant call in this function. It means strictfp is a function attribute. Does anything prevents us from removal strictfp from all call sites?

I think that takes us back to @kpneal's history, "Then we found that basic blocks were being optimized when they didn't belong to a function so we couldn't get to the where the strictfp attribute was located. The solution was to add the attribute to every function call."

It sounds like a transient property, used in very specific cases. We could return to this problem later, when/if transition to bundles happens.

What about this PR? With it strictfp can be used as an indicator of a call that requires special handling for both constrained functions and calls with FP bundles, while both coexist.

@kpneal
Copy link
Member

kpneal commented Feb 4, 2025

To handle the case where a block isn't owned by a function, we need the attribute at the call site. I don't know the specifics of how that case arises, but if we remove the attribute from the call site, we would have to do something to add it again when the block gets detached from the function (possibly during cloning?).

And then remove it again when reinserting the BB into a function body? That sounds like a complication, said complication may be missed by someone in the future who isn't focused on FP, and I haven't yet heard a reason we need all of this.

@spavloff -- Why do we need to change how we handle the strictfp attribute? What's the benefit?

@kpneal
Copy link
Member

kpneal commented Feb 4, 2025

If strict floating-point semantics are required at this call site, they are are required on every relevant call in this function. It means strictfp is a function attribute. Does anything prevents us from removal strictfp from all call sites?

I think that takes us back to @kpneal's history, "Then we found that basic blocks were being optimized when they didn't belong to a function so we couldn't get to the where the strictfp attribute was located. The solution was to add the attribute to every function call."

It sounds like a transient property, used in very specific cases. We could return to this problem later, when/if transition to bundles happens.

Revisiting decisions made in the past is, in general, a good thing. Sometimes (frequently?) it doesn't result in any change, but that doesn't make it any less worthwhile.

What about this PR? With it strictfp can be used as an indicator of a call that requires special handling for both constrained functions and calls with FP bundles, while both coexist.

I think efforts to change the rules around the strictfp attribute are not warranted at this time. If, in the future, there's a real benefit to changing the rules then we can talk about it then.

@spavloff
Copy link
Collaborator Author

spavloff commented Feb 5, 2025

Thank you for the clarification. Let's keep the meaning of this attribute as it currently stands.

The explanations raises some concern, though. OK, strictfp at call sites keeps previous property of the call even if the basic block is detached. If a block is taken from a strictfp function, it should only be placed within another strictfp function. Default mode functions have stricter requirements regarding their bodies; for instance, they cannot call functions such as fesetround. Strictfp functions have less limitations. Transferring code from a strictfp to default mode function in general cases seems incorrect. Does the case of detached block really needs strictfp calls?

Additionally, it's unfortunate that we use the same attribute for functions requiring strictfp semantics for their body, and for calls that originate from a strictfp function.

spavloff added a commit to spavloff/llvm-project that referenced this pull request Feb 17, 2025
This change was discussed in
llvm#122735. As it was decided,
current usage of strictfp on call sites remains intact.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants