Skip to content

JIT: Mark throw blocks with heuristic-derived weights as rarely run during morph #117692

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 16, 2025

Conversation

amanasifkhalid
Copy link
Member

Fixes #117577 (comment).

@Copilot Copilot AI review requested due to automatic review settings July 15, 2025 23:34
@github-actions github-actions bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jul 15, 2025
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR updates the JIT’s basic block conversion for throw blocks by marking them as rarely run when no profile-derived weight is available.

  • Adds heuristic to assume throw blocks without profile data are rare.
  • Calls bbSetRunRarely() on eligible blocks during morph.
Comments suppressed due to low confidence (1)

src/coreclr/jit/fgbasic.cpp:187

  • Consider adding a unit or regression test to verify that throw blocks without profile weights are correctly marked as rarely run, preventing future regressions.
    // Heuristic: Throw blocks without profile-derived weights are presumed to be rare.

Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@amanasifkhalid
Copy link
Member Author

cc @dotnet/jit-contrib @AndyAyersMS. Diffs show size increases from throw blocks being moved out-of-line. As expected, size diffs are larger on x86/x64 due to branch lengths changing. PGO scenarios aren't impacted, so this change seems relatively low-risk.

@amanasifkhalid
Copy link
Member Author

/ba-g build timeouts

@amanasifkhalid amanasifkhalid merged commit 950a778 into dotnet:main Jul 16, 2025
103 of 112 checks passed
@amanasifkhalid amanasifkhalid deleted the mark-throw-blocks-rare branch July 16, 2025 17:25
@pentp
Copy link
Contributor

pentp commented Jul 16, 2025

Will this make the issue in #115918 actually more likely? Or it could only affect some late-inlining that happens after morph?

@amanasifkhalid
Copy link
Member Author

Will this make the issue in #115918 actually more likely? Or it could only affect some late-inlining that happens after morph?

Since this happens after the main inlining phase, I wouldn't expect this to affect future attempts to inline the call in question. Also, while unaffected by this change, if a no-inline call is not inlined during one particular compilation, I think it's quite unlikely it should ever be inlined.

@pentp
Copy link
Contributor

pentp commented Jul 16, 2025

Since this happens after the main inlining phase, I wouldn't expect this to affect future attempts to inline the call in question. Also, while unaffected by this change, if a no-inline call is not inlined during one particular compilation, I think it's quite unlikely it should ever be inlined.

But such a decision is actually incorrect when done due to low/zero block weight and persisted - throw blocks don't contain only calls to throw helpers, they can also contain calls to simple methods like conversions (highlighted in the issue) that are used by other methods also (and the throw blocks would also benefit from inlining trivial methods).

The change in this PR is correct, but it just makes the other issue somewhat more common (visible in diffs for this PR also).

@amanasifkhalid
Copy link
Member Author

But such a decision is actually incorrect when done due to low/zero block weight and persisted - throw blocks don't contain only calls to throw helpers, they can also contain calls to simple methods like conversions (highlighted in the issue) that are used by other methods also (and the throw blocks would also benefit from inlining trivial methods).

Right, I think that ought to be treated a separate issue from #115918. In the absence of PGO data, the JIT is pretty bad at coming up with sensible block weights. You are likely seeing weights of 0.5 on no-return blocks with some frequency because such blocks are guarded by some conditional, and the JIT puts little effort into coming up with a smart guess at the likelihood for the condition being true, so we just assume it's 50/50. In .NET 11, we hope to alleviate this somewhat by coming up with smarter edge likelihood heuristics when we don't have PGO data.

The PGO version of the scenario described in #115918 seems like a much more pressing issue.

The change in this PR is correct, but it just makes the other issue somewhat more common (visible in diffs for this PR also).

Could you please share one of the diffs you're referring to? I wouldn't expect this change to influence inlining decisions; we usually don't know if a call is no-return until we've tried to inline it, and if we don't inline it, it's not until morph that we make the block a cold throw helper.

@pentp
Copy link
Contributor

pentp commented Jul 16, 2025

From the diffs I didn't find any inlining issues as expected, but the zero block weights do seem to have somewhat unexpected changes for a FullOpts method like below. The performance impact is tiny here, but it highlights the effects of bbWeight=0.

        sub      rsp, 40
-						;; size=4 bbWeight=1 PerfScore 0.25
-G_M749_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
+       mov      qword ptr [rsp+0x30], rcx
+						;; size=9 bbWeight=0 PerfScore 0.00
+G_M749_IG02:        ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
+       mov      rcx, qword ptr [rsp+0x30]
        mov      qword ptr [rsp+0x20], rcx
        lea      rcx, [rsp+0x20]
        xor      rdx, rdx
@@ -28,9 +30,9 @@ G_M749_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        ; gcrRegs -[rdx]
        ; gcr arg pop 0
        int3     
-						;; size=19 bbWeight=1 PerfScore 5.00
+						;; size=24 bbWeight=0 PerfScore 0.00
 
-; Total bytes of code 23, prolog size 4, PerfScore 5.25, instruction count 6, allocated bytes for code 23 (MethodHash=4138fd12) for method System.Convert:ToInt64(System.DateTime):long (FullOpts)
+; Total bytes of code 33, prolog size 4, PerfScore 0.00, instruction count 8, allocated bytes for code 33 (MethodHash=4138fd12) for method System.Convert:ToInt64(System.DateTime):long (FullOpts)

@amanasifkhalid
Copy link
Member Author

but the zero block weights do seem to have somewhat unexpected changes for a FullOpts method like below

Thanks for finding that. I hypothesize the diffs from this change are dominated by changes in code layout, but I was also expecting some extraneous churn from register allocation, like in the example you shared. Our register allocator is known to be vulnerable to spurious diffs from changes in block weights. This stems from various sites in the allocator where we have to linearize control flow by picking some preferred successor or predecessor of a block based on their weights; these sites predate our switch over to likelihood-based edge weights, and should probably be changed at some point to reflect the likelihood of control flow transfer between blocks, rather than the blocks' absolute execution counts. Unfortunately, such changes are hard to make with confidence, since the churn is so widespread.

The register allocator is long due for an overhaul, and I'm personally hoping we make time to do some of this work in .NET 11. At the moment, it's too early to say what exactly we hope to accomplish.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

JIT: Rarely run exception block is not placed at the end, regression
4 participants