-
Notifications
You must be signed in to change notification settings - Fork 5.1k
JIT: Mark throw blocks with heuristic-derived weights as rarely run during morph #117692
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: Mark throw blocks with heuristic-derived weights as rarely run during morph #117692
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR updates the JIT’s basic block conversion for throw blocks by marking them as rarely run when no profile-derived weight is available.
- Adds heuristic to assume throw blocks without profile data are rare.
- Calls
bbSetRunRarely()
on eligible blocks during morph.
Comments suppressed due to low confidence (1)
src/coreclr/jit/fgbasic.cpp:187
- Consider adding a unit or regression test to verify that throw blocks without profile weights are correctly marked as rarely run, preventing future regressions.
// Heuristic: Throw blocks without profile-derived weights are presumed to be rare.
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
cc @dotnet/jit-contrib @AndyAyersMS. Diffs show size increases from throw blocks being moved out-of-line. As expected, size diffs are larger on x86/x64 due to branch lengths changing. PGO scenarios aren't impacted, so this change seems relatively low-risk. |
/ba-g build timeouts |
Will this make the issue in #115918 actually more likely? Or it could only affect some late-inlining that happens after morph? |
Since this happens after the main inlining phase, I wouldn't expect this to affect future attempts to inline the call in question. Also, while unaffected by this change, if a no-inline call is not inlined during one particular compilation, I think it's quite unlikely it should ever be inlined. |
But such a decision is actually incorrect when done due to low/zero block weight and persisted - throw blocks don't contain only calls to throw helpers, they can also contain calls to simple methods like conversions (highlighted in the issue) that are used by other methods also (and the throw blocks would also benefit from inlining trivial methods). The change in this PR is correct, but it just makes the other issue somewhat more common (visible in diffs for this PR also). |
Right, I think that ought to be treated a separate issue from #115918. In the absence of PGO data, the JIT is pretty bad at coming up with sensible block weights. You are likely seeing weights of 0.5 on no-return blocks with some frequency because such blocks are guarded by some conditional, and the JIT puts little effort into coming up with a smart guess at the likelihood for the condition being true, so we just assume it's 50/50. In .NET 11, we hope to alleviate this somewhat by coming up with smarter edge likelihood heuristics when we don't have PGO data. The PGO version of the scenario described in #115918 seems like a much more pressing issue.
Could you please share one of the diffs you're referring to? I wouldn't expect this change to influence inlining decisions; we usually don't know if a call is no-return until we've tried to inline it, and if we don't inline it, it's not until morph that we make the block a cold throw helper. |
From the diffs I didn't find any inlining issues as expected, but the zero block weights do seem to have somewhat unexpected changes for a FullOpts method like below. The performance impact is tiny here, but it highlights the effects of bbWeight=0. sub rsp, 40
- ;; size=4 bbWeight=1 PerfScore 0.25
-G_M749_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
+ mov qword ptr [rsp+0x30], rcx
+ ;; size=9 bbWeight=0 PerfScore 0.00
+G_M749_IG02: ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
+ mov rcx, qword ptr [rsp+0x30]
mov qword ptr [rsp+0x20], rcx
lea rcx, [rsp+0x20]
xor rdx, rdx
@@ -28,9 +30,9 @@ G_M749_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
; gcrRegs -[rdx]
; gcr arg pop 0
int3
- ;; size=19 bbWeight=1 PerfScore 5.00
+ ;; size=24 bbWeight=0 PerfScore 0.00
-; Total bytes of code 23, prolog size 4, PerfScore 5.25, instruction count 6, allocated bytes for code 23 (MethodHash=4138fd12) for method System.Convert:ToInt64(System.DateTime):long (FullOpts)
+; Total bytes of code 33, prolog size 4, PerfScore 0.00, instruction count 8, allocated bytes for code 33 (MethodHash=4138fd12) for method System.Convert:ToInt64(System.DateTime):long (FullOpts) |
Thanks for finding that. I hypothesize the diffs from this change are dominated by changes in code layout, but I was also expecting some extraneous churn from register allocation, like in the example you shared. Our register allocator is known to be vulnerable to spurious diffs from changes in block weights. This stems from various sites in the allocator where we have to linearize control flow by picking some preferred successor or predecessor of a block based on their weights; these sites predate our switch over to likelihood-based edge weights, and should probably be changed at some point to reflect the likelihood of control flow transfer between blocks, rather than the blocks' absolute execution counts. Unfortunately, such changes are hard to make with confidence, since the churn is so widespread. The register allocator is long due for an overhaul, and I'm personally hoping we make time to do some of this work in .NET 11. At the moment, it's too early to say what exactly we hope to accomplish. |
Fixes #117577 (comment).