Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SDAG] Fix CSE for ADDRSPACECAST nodes #122912

Merged
merged 2 commits into from
Jan 20, 2025

Conversation

AlexMaclean
Copy link
Member

Correct CSE in SelectionDAG can make DAG combining more effective and reduces the size of the DAG and thus should improve compile time.

This change is a prerequisite for #121710 as it enables the removal or redundant loads of byval arguments in some cases.

@AlexMaclean AlexMaclean self-assigned this Jan 14, 2025
@llvmbot llvmbot added backend:NVPTX llvm:SelectionDAG SelectionDAGISel as well labels Jan 14, 2025
@llvmbot
Copy link
Member

llvmbot commented Jan 14, 2025

@llvm/pr-subscribers-llvm-selectiondag

@llvm/pr-subscribers-backend-nvptx

Author: Alex MacLean (AlexMaclean)

Changes

Correct CSE in SelectionDAG can make DAG combining more effective and reduces the size of the DAG and thus should improve compile time.

This change is a prerequisite for #121710 as it enables the removal or redundant loads of byval arguments in some cases.


Full diff: https://github.com/llvm/llvm-project/pull/122912.diff

2 Files Affected:

  • (modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp (+6)
  • (added) llvm/test/CodeGen/NVPTX/addrspacecast-cse.ll (+19)
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
index 0dfd0302ae5438..743ae4895a1b1c 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -954,6 +954,12 @@ static void AddNodeIDCustom(FoldingSetNodeID &ID, const SDNode *N) {
       ID.AddInteger(M);
     break;
   }
+  case ISD::ADDRSPACECAST: {
+    const AddrSpaceCastSDNode *ASC = cast<AddrSpaceCastSDNode>(N);
+    ID.AddInteger(ASC->getSrcAddressSpace());
+    ID.AddInteger(ASC->getDestAddressSpace());
+    break;
+  }
   case ISD::TargetBlockAddress:
   case ISD::BlockAddress: {
     const BlockAddressSDNode *BA = cast<BlockAddressSDNode>(N);
diff --git a/llvm/test/CodeGen/NVPTX/addrspacecast-cse.ll b/llvm/test/CodeGen/NVPTX/addrspacecast-cse.ll
new file mode 100644
index 00000000000000..a1bf8042817b43
--- /dev/null
+++ b/llvm/test/CodeGen/NVPTX/addrspacecast-cse.ll
@@ -0,0 +1,19 @@
+; RUN: llc < %s -mcpu=sm_80 -mattr=+ptx73 -debug-only=isel -o /dev/null 2>&1 | FileCheck %s
+
+; REQUIRES: asserts
+
+target triple = "nvptx64-nvidia-cuda"
+
+;; Selection DAG CSE is hard to test since we run CSE/GVN on the IR before and
+;; after selection DAG ISel so most cases will be handled by one of these.
+define void @foo(ptr %p) {
+; CHECK-LABEL: Optimized legalized selection DAG: %bb.0 'foo:'
+; CHECK:       addrspacecast[0 -> 5]
+; CHECK-NOT:   addrspacecast[0 -> 5]
+; CHECK-LABEL: ===== Instruction selection begins
+;
+  %a1 = addrspacecast ptr %p to ptr addrspace(5)
+  call void @llvm.stackrestore(ptr %p)
+  store ptr %p, ptr addrspace(5) %a1
+  ret void
+}

Copy link
Contributor

@justinfargnoli justinfargnoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM, but please wait for @arsenm's review as I'm not very familiar with the NodeID infra.

Comment on lines +7 to +8
;; Selection DAG CSE is hard to test since we run CSE/GVN on the IR before and
;; after selection DAG ISel so most cases will be handled by one of these.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also hack around this with -O0, the CSE always happens at any opt level (a dirtier hack would be use -start-before/after). I'm not sure I understand how stackrestore helps here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to get around the pre-SelectionDAG CSE by using the stackrestore which causes a addrspacecast to be added during DAG legalization, so that isn't a problem. However it seems like there is some post-ISel (perhaps DAG-CSE on the machine nodes?) which is preventing the impact of this change from being easily observed in the final output. That's why I'm using the debug output.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-stop-after=finalize-isel might work for avoiding machine cse

@arsenm
Copy link
Contributor

arsenm commented Jan 15, 2025

Code change lgtm but a test that doesn't rely on asserts / checking debug output would be better

@AlexMaclean
Copy link
Member Author

I'm not sure I understand how stackrestore helps here.

NVPTX has a stackrestore instruction which takes a pointer in the local AS. During operation legalization in ISel we add an addrspacecast node in here to case from generic -> local. This test works by ensuring that the ASC used for the stackrestore does not introduce a new node. This was the simplest way I could think of to get around CSE occurring on the LLVM IR.

Code change lgtm but a test that doesn't rely on asserts / checking debug output would be better

Yea, I agree it is not ideal, but I've spent some time on it and this is the best I can come up with at the moment. #121710 does rely on this change to ensure load elimination occurs for byval arguments, so once this lands we'll have much better non-debug coverage.

In the meantime, given that this is a fairly simple and clearly correct fix, is the current method of testing sufficient?

@arsenm
Copy link
Contributor

arsenm commented Jan 15, 2025

Yea, I agree it is not ideal, but I've spent some time on it and this is the best I can come up with at the moment. #121710 does rely on this change to ensure load elimination occurs for byval arguments, so once this lands we'll have much better non-debug coverage.

The 2 casts appear in the DAG if you just use -O0 for a more straightforward example:

define void @foo(ptr %p) {
   %a1 = addrspacecast ptr %p to ptr addrspace(5)
   %a2 = addrspacecast ptr %p to ptr addrspace(5)
   store i32 0, ptr addrspace(5) %a1
   store i32 0, ptr addrspace(5) %a2
   ret void
 }

Comment on lines +7 to +8
;; Selection DAG CSE is hard to test since we run CSE/GVN on the IR before and
;; after selection DAG ISel so most cases will be handled by one of these.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-stop-after=finalize-isel might work for avoiding machine cse

define void @foo(ptr %p) {
; CHECK-LABEL: Optimized legalized selection DAG: %bb.0 'foo:'
; CHECK: addrspacecast[0 -> 5]
; CHECK-NOT: addrspacecast[0 -> 5]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-NOT checks, especially of debug print scraping are fragile and should be avoided, or as loose as possible. These stop being useful if the debug format ever changes slightly

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup.

For the negative checks, the best compromise I've found is to run them as a separate test "RUN". At least that way it's possible to correctly set the bounds on the areas where the negative checks should apply. Mixing them with the regular checks is hard to implement correctly in general.

That said, in this particular case the intent appears to be to verify that there's only one instance of addrspacecast, and this combination of check and check-not is reasonable. We could use CHECK-COUNT-1 to make it more obvious what we're doing here.

An alternative would be to autogenerate the checks and just match the exact output, but that's probably not going to work all that well with matching unstable debug printouts.

TL;DR; if we have to verify that there's only one addrrspacecast in debug printout, this is OK.
If you figure out the way to test non-debug output, I'd suggest switching to autogenerated checks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I've done my best to update the test using @arsenm's suggestion in #122912 (comment). I've also removed the -NOT checks and tried to make the checks as generic as possible to avoid unnecessary dependencies on debug printing format.

However, I'm still not yet able to prevent some other sort of CSE in SDAG from making the effects of this change difficult to observe even directly after SDAG via -stop-after=finalize-isel.

Does the updated version look good?

@AlexMaclean AlexMaclean force-pushed the dev/amaclean/upstream/dag-asc-cse branch from 1921a4d to 7e9217b Compare January 20, 2025 00:15
@AlexMaclean AlexMaclean force-pushed the dev/amaclean/upstream/dag-asc-cse branch from 7e9217b to 9a367ce Compare January 20, 2025 00:16
@AlexMaclean AlexMaclean merged commit 3606876 into llvm:main Jan 20, 2025
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:NVPTX llvm:SelectionDAG SelectionDAGISel as well
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants