-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
!cir.bool
lowering differs from original clang CodeGen.
#480
Comments
Hi, I'm new to open source and llvm, and I want start contributing with this issue. To my understanding, the codegen for CIR produces the return value with extra bits, which is redundant since it requires extra casting for other operations. One thing I'm confused is the comment mentioned that the problem might be generated from LowerExpectIntrinsicPass, which is the infrastructure from llvm. Do you have any suggestions where to look at to get started? Thanks! |
@wolfcomos nice, welcome! You can ignore the
|
After discussions with @sitio-couto this will be solved by his ABI work that is work in progress, so assigning to him. |
) The llvm's intrinsic `llvm.is.fpclass` is used to support multiple float point builtins: https://clang.llvm.org/docs/LanguageExtensions.html#builtin-isfpclass > The `__builtin_isfpclass()` builtin is a generalization of functions > isnan, isinf, isfinite and some others defined by the C standard. It tests > if the floating-point value, specified by the first argument, falls into > any of data classes, specified by the second argument. I meant to support this by creating IntrinsicCallOp directly. But I can't make it due to #480 since the return type of the intrinsic will mismatch. So I have to create a new Op for it. But I feel it might not be too bad. At least it is more explicit and more expressive.
This also causes missed optimizations, e.g. https://godbolt.org/z/Kfcn8P7M7. That's a pretty contrived example, but I expect something similar to show up in the real world as well. What was the intended ABI-based solution for this? |
Hey @smeenai, let me add a bit more info on this: The original CodeGen has both a scalar and a memory representation for
I'm not sure if the ABI alone can address this issue. When I had this discussion with @bcardosolopes I was looking only into calling conventions, which does not cover all cases, and the approach I had in mind did not come to fruition. While the ABI-stuff might help us determine what the scalar/memory representation for a |
) The llvm's intrinsic `llvm.is.fpclass` is used to support multiple float point builtins: https://clang.llvm.org/docs/LanguageExtensions.html#builtin-isfpclass > The `__builtin_isfpclass()` builtin is a generalization of functions > isnan, isinf, isfinite and some others defined by the C standard. It tests > if the floating-point value, specified by the first argument, falls into > any of data classes, specified by the second argument. I meant to support this by creating IntrinsicCallOp directly. But I can't make it due to #480 since the return type of the intrinsic will mismatch. So I have to create a new Op for it. But I feel it might not be too bad. At least it is more explicit and more expressive.
I am not super sure about the example in the top so can't tell what problem you are seeking here. These two codes are clearly not computing the same thing 🙏
|
https://godbolt.org/z/cKjxzf5nE should be up-to-date; note the different return types for regular Clang vs. ClangIR. |
I see now, thanks! In my local setup I get this output, Indicating that cir allocates both the local variable and the return value. Perhaps this is the difference?
I can try to have a look at it, but can't promise when will it be :) |
After trying to play with changing
I will try to track the "shoehorning" process in clang and see if we can find a non-ugly method to use it in CIR :) |
This patch adds support for `__builtin_isinf_sign`. The implementation has several limitations that result in discrepancies between the generated LLVM IR and the expected output. Firstly, it uses `IsFPClass` to determine if a value is infinite, whereas the original CGBuiltin implementation used direct comparisons with infinity constants. Secondly, due to llvm#480, there are numerous unnecessary type conversions occurring, such as converting from i1 to i8 and then back to i1. Additionally, `SignBitOp` cannot set the return type to `CIR_BoolType`, as doing so would lead to failures during the lowering to LLVM IR.
@orbiri thanks a bunch! |
I have a solution which I implemented in the
My main concern about this solution is that it will need to be duplicated to both On second thought, I would be surprised if we are the first who require this lowering pattern. I would like to suggest trying to ask in discourse how people suggest to address this issue of having separate data layout than the actual data. |
Published draft at #1158 :) This still doesn't solve the ellision that is required to solve this issue, but it lays the foundations for solving it. |
Thanks for exploring solutions here, it's about time we fix this! My suggestion is to check whether we should implement this as part of target lowering, since it's also tied to ABI (memory usage in and out of functions), wrote more content in the PR. |
This patch adds support for `__builtin_isinf_sign`. The implementation has several limitations that result in discrepancies between the generated LLVM IR and the expected output. Firstly, it uses `IsFPClass` to determine if a value is infinite, whereas the original CGBuiltin implementation used direct comparisons with infinity constants. Secondly, due to llvm#480, there are numerous unnecessary type conversions occurring, such as converting from i1 to i8 and then back to i1. Additionally, `SignBitOp` cannot set the return type to `CIR_BoolType`, as doing so would lead to failures during the lowering to LLVM IR.
While thinking about #1267, I came to think about a crucial reason for saving the retval as stack variable - missing the ability to carry and update values across loops. Indeed, when trying a more sophisticated example (https://godbolt.org/z/M7hWczcn9), one can see that vanilla clang also emits the same temp variable. It does so probably in order to avoid complex phi emission and defer it to mem2reg passes. I would claim it is nice to have to elide the P.S. the example above shows many more opportunities for minor optimizations as part of the codegen - many redundant pass through blocks, having two return sites instead of one, being able to emit an alloca of |
The logic in codegen is: (1) Always emit temp variable for return value (2) If return is of type direct/extend:
Later, [3] if the return value address is not used (which can be the case when the only use was zapped by ☝) - delete the return value argument. I have a few questions on this to the CIR designers:
[1] clangir/clang/lib/CodeGen/CGCall.cpp Line 3571 in 54d48d8
[2] clangir/clang/lib/CodeGen/CGCall.cpp Lines 3907 to 3908 in 54d48d8
[3] clangir/clang/lib/CodeGen/CodeGenFunction.cpp Lines 566 to 573 in 54d48d8
|
The general approach in CIRGen (and OG) is to be as simple as possible - if we decide to elide it should come as an extra pass (either out of generic dce from mem2reg, etc or some custom pass) |
+1
Agreed, we already have a special "name" for it "__retval", so we usually now exactly what alloca is used for returning values. It's also possible we could leverage that info for NRVO, etc. I'd say we should tag it.
Hell yea! My take is to match OG (so it's easy to compare the IR outputs). Does it take -O1 to do this optimization currently or is it always on? |
I believe the root problem here is that CIR lowers
!cir.bool
in a different way than the original clang CodeGen.In the original CodeGen,
bool
glvalues are lowered to LLVMi8
values, andbool
prvalues are lowered to LLVMi1
values, as illustrated in the following example:However, in CIRGen, all
!cir.bool
values are lowered to LLVMi8
values. The example above would be lowered to LLVMIR through CIR as:This divergence leads to the redundancy illustrated in the PR description. The result of a
cir.cmp
operation is currently expected to be lowered to ani8
value although it's a prvalue. After emitting anllvm.icmp
operation, you have to insert anllvm.zext
operation to extend thei1
toi8
. Thus the redundancy.Originally posted by @Lancern in #478 (comment)
The text was updated successfully, but these errors were encountered: