-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SDAG] Fix CSE for ADDRSPACECAST nodes #122912
[SDAG] Fix CSE for ADDRSPACECAST nodes #122912
Conversation
@llvm/pr-subscribers-llvm-selectiondag @llvm/pr-subscribers-backend-nvptx Author: Alex MacLean (AlexMaclean) ChangesCorrect CSE in SelectionDAG can make DAG combining more effective and reduces the size of the DAG and thus should improve compile time. This change is a prerequisite for #121710 as it enables the removal or redundant loads of byval arguments in some cases. Full diff: https://github.com/llvm/llvm-project/pull/122912.diff 2 Files Affected:
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
index 0dfd0302ae5438..743ae4895a1b1c 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -954,6 +954,12 @@ static void AddNodeIDCustom(FoldingSetNodeID &ID, const SDNode *N) {
ID.AddInteger(M);
break;
}
+ case ISD::ADDRSPACECAST: {
+ const AddrSpaceCastSDNode *ASC = cast<AddrSpaceCastSDNode>(N);
+ ID.AddInteger(ASC->getSrcAddressSpace());
+ ID.AddInteger(ASC->getDestAddressSpace());
+ break;
+ }
case ISD::TargetBlockAddress:
case ISD::BlockAddress: {
const BlockAddressSDNode *BA = cast<BlockAddressSDNode>(N);
diff --git a/llvm/test/CodeGen/NVPTX/addrspacecast-cse.ll b/llvm/test/CodeGen/NVPTX/addrspacecast-cse.ll
new file mode 100644
index 00000000000000..a1bf8042817b43
--- /dev/null
+++ b/llvm/test/CodeGen/NVPTX/addrspacecast-cse.ll
@@ -0,0 +1,19 @@
+; RUN: llc < %s -mcpu=sm_80 -mattr=+ptx73 -debug-only=isel -o /dev/null 2>&1 | FileCheck %s
+
+; REQUIRES: asserts
+
+target triple = "nvptx64-nvidia-cuda"
+
+;; Selection DAG CSE is hard to test since we run CSE/GVN on the IR before and
+;; after selection DAG ISel so most cases will be handled by one of these.
+define void @foo(ptr %p) {
+; CHECK-LABEL: Optimized legalized selection DAG: %bb.0 'foo:'
+; CHECK: addrspacecast[0 -> 5]
+; CHECK-NOT: addrspacecast[0 -> 5]
+; CHECK-LABEL: ===== Instruction selection begins
+;
+ %a1 = addrspacecast ptr %p to ptr addrspace(5)
+ call void @llvm.stackrestore(ptr %p)
+ store ptr %p, ptr addrspace(5) %a1
+ ret void
+}
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM, but please wait for @arsenm's review as I'm not very familiar with the NodeID
infra.
;; Selection DAG CSE is hard to test since we run CSE/GVN on the IR before and | ||
;; after selection DAG ISel so most cases will be handled by one of these. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could also hack around this with -O0, the CSE always happens at any opt level (a dirtier hack would be use -start-before/after). I'm not sure I understand how stackrestore helps here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was able to get around the pre-SelectionDAG CSE by using the stackrestore which causes a addrspacecast to be added during DAG legalization, so that isn't a problem. However it seems like there is some post-ISel (perhaps DAG-CSE on the machine nodes?) which is preventing the impact of this change from being easily observed in the final output. That's why I'm using the debug output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-stop-after=finalize-isel might work for avoiding machine cse
Code change lgtm but a test that doesn't rely on asserts / checking debug output would be better |
NVPTX has a stackrestore instruction which takes a pointer in the local AS. During operation legalization in ISel we add an addrspacecast node in here to case from generic -> local. This test works by ensuring that the ASC used for the stackrestore does not introduce a new node. This was the simplest way I could think of to get around CSE occurring on the LLVM IR.
Yea, I agree it is not ideal, but I've spent some time on it and this is the best I can come up with at the moment. #121710 does rely on this change to ensure load elimination occurs for byval arguments, so once this lands we'll have much better non-debug coverage. In the meantime, given that this is a fairly simple and clearly correct fix, is the current method of testing sufficient? |
The 2 casts appear in the DAG if you just use -O0 for a more straightforward example:
|
;; Selection DAG CSE is hard to test since we run CSE/GVN on the IR before and | ||
;; after selection DAG ISel so most cases will be handled by one of these. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-stop-after=finalize-isel might work for avoiding machine cse
define void @foo(ptr %p) { | ||
; CHECK-LABEL: Optimized legalized selection DAG: %bb.0 'foo:' | ||
; CHECK: addrspacecast[0 -> 5] | ||
; CHECK-NOT: addrspacecast[0 -> 5] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-NOT checks, especially of debug print scraping are fragile and should be avoided, or as loose as possible. These stop being useful if the debug format ever changes slightly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup.
For the negative checks, the best compromise I've found is to run them as a separate test "RUN". At least that way it's possible to correctly set the bounds on the areas where the negative checks should apply. Mixing them with the regular checks is hard to implement correctly in general.
That said, in this particular case the intent appears to be to verify that there's only one instance of addrspacecast, and this combination of check and check-not is reasonable. We could use CHECK-COUNT-1
to make it more obvious what we're doing here.
An alternative would be to autogenerate the checks and just match the exact output, but that's probably not going to work all that well with matching unstable debug printouts.
TL;DR; if we have to verify that there's only one addrrspacecast in debug printout, this is OK.
If you figure out the way to test non-debug output, I'd suggest switching to autogenerated checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I've done my best to update the test using @arsenm's suggestion in #122912 (comment). I've also removed the -NOT
checks and tried to make the checks as generic as possible to avoid unnecessary dependencies on debug printing format.
However, I'm still not yet able to prevent some other sort of CSE in SDAG from making the effects of this change difficult to observe even directly after SDAG via -stop-after=finalize-isel.
Does the updated version look good?
1921a4d
to
7e9217b
Compare
7e9217b
to
9a367ce
Compare
Correct CSE in SelectionDAG can make DAG combining more effective and reduces the size of the DAG and thus should improve compile time.
This change is a prerequisite for #121710 as it enables the removal or redundant loads of byval arguments in some cases.