Skip to content

8346989: C2: deoptimization and re-execution cycle with Math.*Exact in case of frequent overflow #23916

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

marc-chevalier
Copy link
Member

@marc-chevalier marc-chevalier commented Mar 5, 2025

Math.*Exact intrinsics can cause many deopt when used repeatedly with problematic arguments.
This fix proposes not to rely on intrinsics after too_many_traps() has been reached.

Benchmark show that this issue affects every Math.*Exact functions. And this fix improve them all.

tl;dr:

  • C1: no problem, no change
  • C2:
    • with intrinsics:
      • with overflow: clear improvement. Was way worse than C1, now is similar (~4s => ~600ms)
      • without overflow: no problem, no change
    • without intrinsics: no problem, no change

Before the fix:

Benchmark                                           (SIZE)  Mode  Cnt     Score      Error  Units
MathExact.C1_1.loopAddIInBounds                    1000000  avgt    3     1.272 ±    0.048  ms/op
MathExact.C1_1.loopAddIOverflow                    1000000  avgt    3   641.917 ±   58.238  ms/op
MathExact.C1_1.loopAddLInBounds                    1000000  avgt    3     1.402 ±    0.842  ms/op
MathExact.C1_1.loopAddLOverflow                    1000000  avgt    3   671.013 ±  229.425  ms/op
MathExact.C1_1.loopDecrementIInBounds              1000000  avgt    3     3.722 ±   22.244  ms/op
MathExact.C1_1.loopDecrementIOverflow              1000000  avgt    3   653.341 ±  279.003  ms/op
MathExact.C1_1.loopDecrementLInBounds              1000000  avgt    3     2.525 ±    0.810  ms/op
MathExact.C1_1.loopDecrementLOverflow              1000000  avgt    3   656.750 ±  141.792  ms/op
MathExact.C1_1.loopIncrementIInBounds              1000000  avgt    3     4.621 ±   12.822  ms/op
MathExact.C1_1.loopIncrementIOverflow              1000000  avgt    3   651.608 ±  274.396  ms/op
MathExact.C1_1.loopIncrementLInBounds              1000000  avgt    3     2.576 ±    3.316  ms/op
MathExact.C1_1.loopIncrementLOverflow              1000000  avgt    3   662.216 ±   71.879  ms/op
MathExact.C1_1.loopMultiplyIInBounds               1000000  avgt    3     1.402 ±    0.587  ms/op
MathExact.C1_1.loopMultiplyIOverflow               1000000  avgt    3   615.836 ±  252.137  ms/op
MathExact.C1_1.loopMultiplyLInBounds               1000000  avgt    3     2.906 ±    5.718  ms/op
MathExact.C1_1.loopMultiplyLOverflow               1000000  avgt    3   655.576 ±  147.432  ms/op
MathExact.C1_1.loopNegateIInBounds                 1000000  avgt    3     2.023 ±    0.027  ms/op
MathExact.C1_1.loopNegateIOverflow                 1000000  avgt    3   639.136 ±   30.841  ms/op
MathExact.C1_1.loopNegateLInBounds                 1000000  avgt    3     2.422 ±    3.590  ms/op
MathExact.C1_1.loopNegateLOverflow                 1000000  avgt    3   638.837 ±   49.512  ms/op
MathExact.C1_1.loopSubtractIInBounds               1000000  avgt    3     1.255 ±    0.799  ms/op
MathExact.C1_1.loopSubtractIOverflow               1000000  avgt    3   637.857 ±  231.804  ms/op
MathExact.C1_1.loopSubtractLInBounds               1000000  avgt    3     1.412 ±    0.602  ms/op
MathExact.C1_1.loopSubtractLOverflow               1000000  avgt    3   642.113 ±  251.349  ms/op
MathExact.C1_2.loopAddIInBounds                    1000000  avgt    3     1.748 ±    1.095  ms/op
MathExact.C1_2.loopAddIOverflow                    1000000  avgt    3   654.617 ±  287.678  ms/op
MathExact.C1_2.loopAddLInBounds                    1000000  avgt    3     2.004 ±    1.655  ms/op
MathExact.C1_2.loopAddLOverflow                    1000000  avgt    3   670.791 ±   93.689  ms/op
MathExact.C1_2.loopDecrementIInBounds              1000000  avgt    3     5.306 ±   65.215  ms/op
MathExact.C1_2.loopDecrementIOverflow              1000000  avgt    3   650.425 ±  461.740  ms/op
MathExact.C1_2.loopDecrementLInBounds              1000000  avgt    3     5.484 ±   42.778  ms/op
MathExact.C1_2.loopDecrementLOverflow              1000000  avgt    3   656.747 ±  333.281  ms/op
MathExact.C1_2.loopIncrementIInBounds              1000000  avgt    3     3.077 ±    1.677  ms/op
MathExact.C1_2.loopIncrementIOverflow              1000000  avgt    3   634.510 ±   51.365  ms/op
MathExact.C1_2.loopIncrementLInBounds              1000000  avgt    3     3.902 ±   18.471  ms/op
MathExact.C1_2.loopIncrementLOverflow              1000000  avgt    3   656.465 ±  227.014  ms/op
MathExact.C1_2.loopMultiplyIInBounds               1000000  avgt    3     2.384 ±   10.045  ms/op
MathExact.C1_2.loopMultiplyIOverflow               1000000  avgt    3   624.029 ±  342.084  ms/op
MathExact.C1_2.loopMultiplyLInBounds               1000000  avgt    3     3.247 ±    0.735  ms/op
MathExact.C1_2.loopMultiplyLOverflow               1000000  avgt    3   661.427 ±  100.744  ms/op
MathExact.C1_2.loopNegateIInBounds                 1000000  avgt    3     3.061 ±    1.148  ms/op
MathExact.C1_2.loopNegateIOverflow                 1000000  avgt    3   645.241 ±  323.824  ms/op
MathExact.C1_2.loopNegateLInBounds                 1000000  avgt    3     3.211 ±    0.068  ms/op
MathExact.C1_2.loopNegateLOverflow                 1000000  avgt    3   658.846 ±  204.524  ms/op
MathExact.C1_2.loopSubtractIInBounds               1000000  avgt    3     1.717 ±    0.161  ms/op
MathExact.C1_2.loopSubtractIOverflow               1000000  avgt    3   644.287 ±  301.787  ms/op
MathExact.C1_2.loopSubtractLInBounds               1000000  avgt    3     3.976 ±   11.982  ms/op
MathExact.C1_2.loopSubtractLOverflow               1000000  avgt    3   660.871 ±   16.538  ms/op
MathExact.C1_3.loopAddIInBounds                    1000000  avgt    3     4.380 ±   42.598  ms/op
MathExact.C1_3.loopAddIOverflow                    1000000  avgt    3   686.766 ±  511.146  ms/op
MathExact.C1_3.loopAddLInBounds                    1000000  avgt    3     5.445 ±   49.738  ms/op
MathExact.C1_3.loopAddLOverflow                    1000000  avgt    3   641.936 ±   32.769  ms/op
MathExact.C1_3.loopDecrementIInBounds              1000000  avgt    3     8.340 ±   69.455  ms/op
MathExact.C1_3.loopDecrementIOverflow              1000000  avgt    3   682.239 ±  212.017  ms/op
MathExact.C1_3.loopDecrementLInBounds              1000000  avgt    3     6.048 ±    0.651  ms/op
MathExact.C1_3.loopDecrementLOverflow              1000000  avgt    3   670.924 ±   42.037  ms/op
MathExact.C1_3.loopIncrementIInBounds              1000000  avgt    3     7.970 ±   63.664  ms/op
MathExact.C1_3.loopIncrementIOverflow              1000000  avgt    3   684.490 ±  197.407  ms/op
MathExact.C1_3.loopIncrementLInBounds              1000000  avgt    3     8.780 ±   86.737  ms/op
MathExact.C1_3.loopIncrementLOverflow              1000000  avgt    3   660.941 ±  172.305  ms/op
MathExact.C1_3.loopMultiplyIInBounds               1000000  avgt    3     3.241 ±    0.567  ms/op
MathExact.C1_3.loopMultiplyIOverflow               1000000  avgt    3   630.455 ±  138.458  ms/op
MathExact.C1_3.loopMultiplyLInBounds               1000000  avgt    3     5.906 ±    0.662  ms/op
MathExact.C1_3.loopMultiplyLOverflow               1000000  avgt    3   693.248 ±  539.146  ms/op
MathExact.C1_3.loopNegateIInBounds                 1000000  avgt    3     6.394 ±    7.757  ms/op
MathExact.C1_3.loopNegateIOverflow                 1000000  avgt    3   644.722 ±   56.929  ms/op
MathExact.C1_3.loopNegateLInBounds                 1000000  avgt    3     7.610 ±   41.533  ms/op
MathExact.C1_3.loopNegateLOverflow                 1000000  avgt    3   670.166 ±   14.496  ms/op
MathExact.C1_3.loopSubtractIInBounds               1000000  avgt    3     3.345 ±    1.977  ms/op
MathExact.C1_3.loopSubtractIOverflow               1000000  avgt    3   677.317 ±   22.878  ms/op
MathExact.C1_3.loopSubtractLInBounds               1000000  avgt    3     3.226 ±    0.122  ms/op
MathExact.C1_3.loopSubtractLOverflow               1000000  avgt    3   643.642 ±   65.217  ms/op
MathExact.C2.loopAddIInBounds                      1000000  avgt    3     1.217 ±    1.694  ms/op
MathExact.C2.loopAddIOverflow                      1000000  avgt    3  3995.424 ± 1177.165  ms/op
MathExact.C2.loopAddLInBounds                      1000000  avgt    3     2.404 ±    0.053  ms/op
MathExact.C2.loopAddLOverflow                      1000000  avgt    3  3997.984 ±  612.558  ms/op
MathExact.C2.loopDecrementIInBounds                1000000  avgt    3     2.014 ±    0.176  ms/op
MathExact.C2.loopDecrementIOverflow                1000000  avgt    3  3828.615 ±  260.670  ms/op
MathExact.C2.loopDecrementLInBounds                1000000  avgt    3     1.986 ±    1.536  ms/op
MathExact.C2.loopDecrementLOverflow                1000000  avgt    3  4075.934 ±  263.798  ms/op
MathExact.C2.loopIncrementIInBounds                1000000  avgt    3     2.238 ±    6.380  ms/op
MathExact.C2.loopIncrementIOverflow                1000000  avgt    3  3927.929 ±  837.162  ms/op
MathExact.C2.loopIncrementLInBounds                1000000  avgt    3     1.971 ±    1.232  ms/op
MathExact.C2.loopIncrementLOverflow                1000000  avgt    3  3915.202 ± 1024.956  ms/op
MathExact.C2.loopMultiplyIInBounds                 1000000  avgt    3     1.175 ±    0.509  ms/op
MathExact.C2.loopMultiplyIOverflow                 1000000  avgt    3  3803.719 ± 1583.828  ms/op
MathExact.C2.loopMultiplyLInBounds                 1000000  avgt    3     0.937 ±    0.631  ms/op
MathExact.C2.loopMultiplyLOverflow                 1000000  avgt    3  4023.742 ±  967.498  ms/op
MathExact.C2.loopNegateIInBounds                   1000000  avgt    3     2.129 ±    1.094  ms/op
MathExact.C2.loopNegateIOverflow                   1000000  avgt    3  3850.484 ±  464.979  ms/op
MathExact.C2.loopNegateLInBounds                   1000000  avgt    3     2.247 ±    9.714  ms/op
MathExact.C2.loopNegateLOverflow                   1000000  avgt    3  3911.853 ±  362.961  ms/op
MathExact.C2.loopSubtractIInBounds                 1000000  avgt    3     1.141 ±    1.579  ms/op
MathExact.C2.loopSubtractIOverflow                 1000000  avgt    3  3917.533 ±  628.485  ms/op
MathExact.C2.loopSubtractLInBounds                 1000000  avgt    3     2.232 ±   22.329  ms/op
MathExact.C2.loopSubtractLOverflow                 1000000  avgt    3  3995.088 ±  302.549  ms/op
MathExact.C2_no_intrinsics.loopAddIInBounds        1000000  avgt    3     1.488 ±   12.243  ms/op
MathExact.C2_no_intrinsics.loopAddIOverflow        1000000  avgt    3   585.568 ±  106.360  ms/op
MathExact.C2_no_intrinsics.loopAddLInBounds        1000000  avgt    3     2.234 ±   23.010  ms/op
MathExact.C2_no_intrinsics.loopAddLOverflow        1000000  avgt    3   602.290 ±  212.146  ms/op
MathExact.C2_no_intrinsics.loopDecrementIInBounds  1000000  avgt    3     4.705 ±   36.814  ms/op
MathExact.C2_no_intrinsics.loopDecrementIOverflow  1000000  avgt    3   590.212 ±  280.334  ms/op
MathExact.C2_no_intrinsics.loopDecrementLInBounds  1000000  avgt    3     2.374 ±   13.667  ms/op
MathExact.C2_no_intrinsics.loopDecrementLOverflow  1000000  avgt    3   583.053 ±   50.535  ms/op
MathExact.C2_no_intrinsics.loopIncrementIInBounds  1000000  avgt    3     3.966 ±   15.366  ms/op
MathExact.C2_no_intrinsics.loopIncrementIOverflow  1000000  avgt    3   591.683 ±  171.580  ms/op
MathExact.C2_no_intrinsics.loopIncrementLInBounds  1000000  avgt    3     3.682 ±   23.147  ms/op
MathExact.C2_no_intrinsics.loopIncrementLOverflow  1000000  avgt    3   601.325 ±   10.597  ms/op
MathExact.C2_no_intrinsics.loopMultiplyIInBounds   1000000  avgt    3     1.307 ±    0.235  ms/op
MathExact.C2_no_intrinsics.loopMultiplyIOverflow   1000000  avgt    3   570.615 ±   50.808  ms/op
MathExact.C2_no_intrinsics.loopMultiplyLInBounds   1000000  avgt    3     1.087 ±    0.486  ms/op
MathExact.C2_no_intrinsics.loopMultiplyLOverflow   1000000  avgt    3   595.713 ±  162.773  ms/op
MathExact.C2_no_intrinsics.loopNegateIInBounds     1000000  avgt    3     1.874 ±    0.954  ms/op
MathExact.C2_no_intrinsics.loopNegateIOverflow     1000000  avgt    3   596.588 ±   68.081  ms/op
MathExact.C2_no_intrinsics.loopNegateLInBounds     1000000  avgt    3     2.337 ±   12.164  ms/op
MathExact.C2_no_intrinsics.loopNegateLOverflow     1000000  avgt    3   573.711 ±   63.243  ms/op
MathExact.C2_no_intrinsics.loopSubtractIInBounds   1000000  avgt    3     1.085 ±    0.815  ms/op
MathExact.C2_no_intrinsics.loopSubtractIOverflow   1000000  avgt    3   579.489 ±   61.399  ms/op
MathExact.C2_no_intrinsics.loopSubtractLInBounds   1000000  avgt    3     1.020 ±    0.161  ms/op
MathExact.C2_no_intrinsics.loopSubtractLOverflow   1000000  avgt    3   580.578 ±  167.454  ms/op

After:

Benchmark                                           (SIZE)  Mode  Cnt    Score     Error  Units
MathExact.C1_1.loopAddIInBounds                    1000000  avgt    3    1.369 ±   0.462  ms/op
MathExact.C1_1.loopAddIOverflow                    1000000  avgt    3  635.020 ± 106.156  ms/op
MathExact.C1_1.loopAddLInBounds                    1000000  avgt    3    1.371 ±   0.020  ms/op
MathExact.C1_1.loopAddLOverflow                    1000000  avgt    3  633.864 ±  72.176  ms/op
MathExact.C1_1.loopDecrementIInBounds              1000000  avgt    3    2.053 ±   0.330  ms/op
MathExact.C1_1.loopDecrementIOverflow              1000000  avgt    3  634.675 ±  79.427  ms/op
MathExact.C1_1.loopDecrementLInBounds              1000000  avgt    3    3.798 ±  38.502  ms/op
MathExact.C1_1.loopDecrementLOverflow              1000000  avgt    3  650.880 ± 123.220  ms/op
MathExact.C1_1.loopIncrementIInBounds              1000000  avgt    3    2.305 ±   4.829  ms/op
MathExact.C1_1.loopIncrementIOverflow              1000000  avgt    3  648.231 ±  39.012  ms/op
MathExact.C1_1.loopIncrementLInBounds              1000000  avgt    3    2.627 ±   3.129  ms/op
MathExact.C1_1.loopIncrementLOverflow              1000000  avgt    3  663.671 ± 446.140  ms/op
MathExact.C1_1.loopMultiplyIInBounds               1000000  avgt    3    1.479 ±   0.102  ms/op
MathExact.C1_1.loopMultiplyIOverflow               1000000  avgt    3  627.959 ± 297.291  ms/op
MathExact.C1_1.loopMultiplyLInBounds               1000000  avgt    3    2.718 ±   0.806  ms/op
MathExact.C1_1.loopMultiplyLOverflow               1000000  avgt    3  655.310 ± 112.686  ms/op
MathExact.C1_1.loopNegateIInBounds                 1000000  avgt    3    2.079 ±   2.166  ms/op
MathExact.C1_1.loopNegateIOverflow                 1000000  avgt    3  640.530 ± 152.489  ms/op
MathExact.C1_1.loopNegateLInBounds                 1000000  avgt    3    3.168 ±  16.524  ms/op
MathExact.C1_1.loopNegateLOverflow                 1000000  avgt    3  650.823 ±  58.420  ms/op
MathExact.C1_1.loopSubtractIInBounds               1000000  avgt    3    2.325 ±  27.865  ms/op
MathExact.C1_1.loopSubtractIOverflow               1000000  avgt    3  632.198 ± 280.799  ms/op
MathExact.C1_1.loopSubtractLInBounds               1000000  avgt    3    1.478 ±   0.281  ms/op
MathExact.C1_1.loopSubtractLOverflow               1000000  avgt    3  626.481 ±  47.028  ms/op
MathExact.C1_2.loopAddIInBounds                    1000000  avgt    3    1.850 ±   0.462  ms/op
MathExact.C1_2.loopAddIOverflow                    1000000  avgt    3  640.668 ± 217.610  ms/op
MathExact.C1_2.loopAddLInBounds                    1000000  avgt    3    1.823 ±   0.123  ms/op
MathExact.C1_2.loopAddLOverflow                    1000000  avgt    3  643.123 ± 174.505  ms/op
MathExact.C1_2.loopDecrementIInBounds              1000000  avgt    3    6.435 ±  54.316  ms/op
MathExact.C1_2.loopDecrementIOverflow              1000000  avgt    3  649.622 ±  15.314  ms/op
MathExact.C1_2.loopDecrementLInBounds              1000000  avgt    3    4.315 ±  26.421  ms/op
MathExact.C1_2.loopDecrementLOverflow              1000000  avgt    3  649.018 ± 386.320  ms/op
MathExact.C1_2.loopIncrementIInBounds              1000000  avgt    3    3.444 ±   1.375  ms/op
MathExact.C1_2.loopIncrementIOverflow              1000000  avgt    3  628.711 ±  51.292  ms/op
MathExact.C1_2.loopIncrementLInBounds              1000000  avgt    3    3.351 ±   0.483  ms/op
MathExact.C1_2.loopIncrementLOverflow              1000000  avgt    3  653.560 ± 160.718  ms/op
MathExact.C1_2.loopMultiplyIInBounds               1000000  avgt    3    1.860 ±   0.633  ms/op
MathExact.C1_2.loopMultiplyIOverflow               1000000  avgt    3  620.883 ±  54.516  ms/op
MathExact.C1_2.loopMultiplyLInBounds               1000000  avgt    3    3.998 ±  16.269  ms/op
MathExact.C1_2.loopMultiplyLOverflow               1000000  avgt    3  671.956 ±  93.092  ms/op
MathExact.C1_2.loopNegateIInBounds                 1000000  avgt    3    4.415 ±  44.105  ms/op
MathExact.C1_2.loopNegateIOverflow                 1000000  avgt    3  661.902 ± 224.843  ms/op
MathExact.C1_2.loopNegateLInBounds                 1000000  avgt    3    3.492 ±   0.738  ms/op
MathExact.C1_2.loopNegateLOverflow                 1000000  avgt    3  634.946 ± 150.491  ms/op
MathExact.C1_2.loopSubtractIInBounds               1000000  avgt    3    1.712 ±   0.066  ms/op
MathExact.C1_2.loopSubtractIOverflow               1000000  avgt    3  651.508 ±  76.022  ms/op
MathExact.C1_2.loopSubtractLInBounds               1000000  avgt    3    1.949 ±   0.201  ms/op
MathExact.C1_2.loopSubtractLOverflow               1000000  avgt    3  627.459 ±  26.817  ms/op
MathExact.C1_3.loopAddIInBounds                    1000000  avgt    3    7.378 ±   4.301  ms/op
MathExact.C1_3.loopAddIOverflow                    1000000  avgt    3  647.275 ± 177.062  ms/op
MathExact.C1_3.loopAddLInBounds                    1000000  avgt    3    3.427 ±   0.037  ms/op
MathExact.C1_3.loopAddLOverflow                    1000000  avgt    3  643.735 ± 227.934  ms/op
MathExact.C1_3.loopDecrementIInBounds              1000000  avgt    3    5.680 ±   0.497  ms/op
MathExact.C1_3.loopDecrementIOverflow              1000000  avgt    3  666.431 ±   8.006  ms/op
MathExact.C1_3.loopDecrementLInBounds              1000000  avgt    3    6.897 ±  24.615  ms/op
MathExact.C1_3.loopDecrementLOverflow              1000000  avgt    3  683.691 ±  52.892  ms/op
MathExact.C1_3.loopIncrementIInBounds              1000000  avgt    3    5.743 ±   0.602  ms/op
MathExact.C1_3.loopIncrementIOverflow              1000000  avgt    3  670.027 ± 175.208  ms/op
MathExact.C1_3.loopIncrementLInBounds              1000000  avgt    3    6.157 ±   2.876  ms/op
MathExact.C1_3.loopIncrementLOverflow              1000000  avgt    3  673.410 ± 245.939  ms/op
MathExact.C1_3.loopMultiplyIInBounds               1000000  avgt    3    3.220 ±   0.165  ms/op
MathExact.C1_3.loopMultiplyIOverflow               1000000  avgt    3  640.165 ± 505.006  ms/op
MathExact.C1_3.loopMultiplyLInBounds               1000000  avgt    3    7.986 ±  62.547  ms/op
MathExact.C1_3.loopMultiplyLOverflow               1000000  avgt    3  681.282 ± 107.856  ms/op
MathExact.C1_3.loopNegateIInBounds                 1000000  avgt    3    7.133 ±  18.111  ms/op
MathExact.C1_3.loopNegateIOverflow                 1000000  avgt    3  680.976 ± 285.486  ms/op
MathExact.C1_3.loopNegateLInBounds                 1000000  avgt    3    7.405 ±  37.040  ms/op
MathExact.C1_3.loopNegateLOverflow                 1000000  avgt    3  681.574 ± 173.484  ms/op
MathExact.C1_3.loopSubtractIInBounds               1000000  avgt    3    3.971 ±  16.942  ms/op
MathExact.C1_3.loopSubtractIOverflow               1000000  avgt    3  655.780 ± 230.793  ms/op
MathExact.C1_3.loopSubtractLInBounds               1000000  avgt    3    3.369 ±   3.844  ms/op
MathExact.C1_3.loopSubtractLOverflow               1000000  avgt    3  634.824 ±  20.350  ms/op
MathExact.C2.loopAddIInBounds                      1000000  avgt    3    2.461 ±   2.936  ms/op
MathExact.C2.loopAddIOverflow                      1000000  avgt    3  589.095 ± 151.126  ms/op
MathExact.C2.loopAddLInBounds                      1000000  avgt    3    0.978 ±   0.604  ms/op
MathExact.C2.loopAddLOverflow                      1000000  avgt    3  590.511 ±  64.618  ms/op
MathExact.C2.loopDecrementIInBounds                1000000  avgt    3    1.981 ±   0.443  ms/op
MathExact.C2.loopDecrementIOverflow                1000000  avgt    3  593.578 ±  32.752  ms/op
MathExact.C2.loopDecrementLInBounds                1000000  avgt    3    2.924 ±  29.455  ms/op
MathExact.C2.loopDecrementLOverflow                1000000  avgt    3  601.392 ± 936.568  ms/op
MathExact.C2.loopIncrementIInBounds                1000000  avgt    3    2.697 ±  22.142  ms/op
MathExact.C2.loopIncrementIOverflow                1000000  avgt    3  602.418 ± 199.763  ms/op
MathExact.C2.loopIncrementLInBounds                1000000  avgt    3    1.954 ±   0.396  ms/op
MathExact.C2.loopIncrementLOverflow                1000000  avgt    3  601.183 ± 156.439  ms/op
MathExact.C2.loopMultiplyIInBounds                 1000000  avgt    3    1.530 ±   7.954  ms/op
MathExact.C2.loopMultiplyIOverflow                 1000000  avgt    3  566.677 ±  45.992  ms/op
MathExact.C2.loopMultiplyLInBounds                 1000000  avgt    3    2.184 ±  22.242  ms/op
MathExact.C2.loopMultiplyLOverflow                 1000000  avgt    3  600.233 ± 234.648  ms/op
MathExact.C2.loopNegateIInBounds                   1000000  avgt    3    2.130 ±   1.028  ms/op
MathExact.C2.loopNegateIOverflow                   1000000  avgt    3  593.145 ± 337.886  ms/op
MathExact.C2.loopNegateLInBounds                   1000000  avgt    3    2.600 ±  20.795  ms/op
MathExact.C2.loopNegateLOverflow                   1000000  avgt    3  592.288 ± 138.321  ms/op
MathExact.C2.loopSubtractIInBounds                 1000000  avgt    3    1.081 ±   0.265  ms/op
MathExact.C2.loopSubtractIOverflow                 1000000  avgt    3  575.884 ± 200.113  ms/op
MathExact.C2.loopSubtractLInBounds                 1000000  avgt    3    1.016 ±   0.792  ms/op
MathExact.C2.loopSubtractLOverflow                 1000000  avgt    3  589.873 ±  52.521  ms/op
MathExact.C2_no_intrinsics.loopAddIInBounds        1000000  avgt    3    2.166 ±  10.999  ms/op
MathExact.C2_no_intrinsics.loopAddIOverflow        1000000  avgt    3  586.660 ± 229.451  ms/op
MathExact.C2_no_intrinsics.loopAddLInBounds        1000000  avgt    3    1.054 ±   0.528  ms/op
MathExact.C2_no_intrinsics.loopAddLOverflow        1000000  avgt    3  572.511 ±  76.440  ms/op
MathExact.C2_no_intrinsics.loopDecrementIInBounds  1000000  avgt    3    1.907 ±   0.149  ms/op
MathExact.C2_no_intrinsics.loopDecrementIOverflow  1000000  avgt    3  599.262 ± 600.992  ms/op
MathExact.C2_no_intrinsics.loopDecrementLInBounds  1000000  avgt    3    1.820 ±   0.106  ms/op
MathExact.C2_no_intrinsics.loopDecrementLOverflow  1000000  avgt    3  570.464 ±  44.418  ms/op
MathExact.C2_no_intrinsics.loopIncrementIInBounds  1000000  avgt    3    1.914 ±   0.131  ms/op
MathExact.C2_no_intrinsics.loopIncrementIOverflow  1000000  avgt    3  575.143 ± 160.185  ms/op
MathExact.C2_no_intrinsics.loopIncrementLInBounds  1000000  avgt    3    1.818 ±   0.288  ms/op
MathExact.C2_no_intrinsics.loopIncrementLOverflow  1000000  avgt    3  589.998 ±  33.029  ms/op
MathExact.C2_no_intrinsics.loopMultiplyIInBounds   1000000  avgt    3    1.960 ±  10.135  ms/op
MathExact.C2_no_intrinsics.loopMultiplyIOverflow   1000000  avgt    3  571.497 ± 264.484  ms/op
MathExact.C2_no_intrinsics.loopMultiplyLInBounds   1000000  avgt    3    1.061 ±   0.198  ms/op
MathExact.C2_no_intrinsics.loopMultiplyLOverflow   1000000  avgt    3  585.139 ± 317.175  ms/op
MathExact.C2_no_intrinsics.loopNegateIInBounds     1000000  avgt    3    2.611 ±  22.325  ms/op
MathExact.C2_no_intrinsics.loopNegateIOverflow     1000000  avgt    3  579.911 ± 140.426  ms/op
MathExact.C2_no_intrinsics.loopNegateLInBounds     1000000  avgt    3    2.233 ±   2.774  ms/op
MathExact.C2_no_intrinsics.loopNegateLOverflow     1000000  avgt    3  572.368 ±  81.851  ms/op
MathExact.C2_no_intrinsics.loopSubtractIInBounds   1000000  avgt    3    3.162 ±  38.115  ms/op
MathExact.C2_no_intrinsics.loopSubtractIOverflow   1000000  avgt    3  582.794 ±  65.622  ms/op
MathExact.C2_no_intrinsics.loopSubtractLInBounds   1000000  avgt    3    1.028 ±   0.255  ms/op
MathExact.C2_no_intrinsics.loopSubtractLOverflow   1000000  avgt    3  577.491 ±  69.778  ms/op

Is it worth having intrinsics at all? @eme64 wondered, so I tried with this code:

public class Test {
    final static int N = 500_000_000;

    public static int test(int i) {
        try{
            return Math.multiplyExact(i, i);
        } catch (Throwable e){
            return 0;
        }
    }

    public static void loop() {
        for(int i = 0; i < N; i++) {
            test(i % 32_768);
        }
    }

    public static void main(String[] args) {
        loop();
    }
}

and with much more runs (50 instead of 3), and in a more stable load for the rest of the system.

No intrinsic (inlined Java implem):

Benchmark 1: ~/jdk/build/linux-x64/jdk/bin/java -XX:CompileCommand=compileonly,"Test*::test*" -XX:-UseOnStackReplacement Test.java
  Time (mean ± σ):      8.651 s ±  0.902 s    [User: 8.517 s, System: 0.155 s]
  Range (min … max):    6.853 s … 10.439 s    50 runs

Always intrinsic (current behavior, and new behavior in absence of overflow, like in this example):

Benchmark 1: ~/jdk/build/linux-x64/jdk/bin/java -XX:CompileCommand=compileonly,"Test*::test*" -XX:-UseOnStackReplacement Test.java
  Time (mean ± σ):      8.222 s ±  1.024 s    [User: 8.090 s, System: 0.155 s]
  Range (min … max):    6.667 s … 10.406 s    50 runs

So it's... not very conclusive, but likely to be a bit useful. The gap between the means is about 0.4s, which is less than half the standard deviation.
Still, it seems good to have.

From a more theoretical point of view, we can see that the code generated for the instrinsics is mostly a mul and a jo, while it is much more complicated for inlined java (with many mov, movsx, cmp and conditional jumps, looking a lot like the Java code).

Thanks,
Marc


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8346989: C2: deoptimization and re-execution cycle with Math.*Exact in case of frequent overflow (Bug - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/23916/head:pull/23916
$ git checkout pull/23916

Update a local copy of the PR:
$ git checkout pull/23916
$ git pull https://git.openjdk.org/jdk.git pull/23916/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 23916

View PR using the GUI difftool:
$ git pr show -t 23916

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/23916.diff

Using Webrev

Link to Webrev Comment

@marc-chevalier marc-chevalier changed the title Limit inlining of math Exact operations in case of too many deopts 8346989: Deoptimization and re-compilation cycle with C2 compiled code Mar 5, 2025
@bridgekeeper
Copy link

bridgekeeper bot commented Mar 5, 2025

👋 Welcome back marc-chevalier! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Mar 5, 2025

@marc-chevalier This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8346989: C2: deoptimization and re-execution cycle with Math.*Exact in case of frequent overflow

Reviewed-by: thartmann, vlivanov

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 86 new commits pushed to the master branch:

  • 41d4a0d: 8352392: AIX: implement attach API v2 and streaming output
  • 1c2a553: 8327858: Improve spliterator and forEach for single-element immutable collections
  • a449aee: 8350704: Create tests to ensure the failure behavior of core reflection APIs
  • 57df89c: 8353684: [BACKOUT] j.u.l.Handler classes create deadlock risk via synchronized publish() method
  • ebcb9a8: 8349206: j.u.l.Handler classes create deadlock risk via synchronized publish() method
  • d894b78: 8353588: [REDO] DaCapo xalan performance with -XX:+UseObjectMonitorTable
  • db08726: 8352966: Opensource Several Font related tests - Batch 2
  • 6b7b324: 8351431: Type annotations on new class creation expressions can't be retrieved
  • 64b691a: 8271870: G1: Add objArray splitting when scanning object with evacuation failure
  • b428cda: 8349686: [s390x] C1: Improve Class.isInstance intrinsic
  • ... and 76 more: https://git.openjdk.org/jdk/compare/25925138b0a7d781d9293e52a8c9520329a85219...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@eme64, @iwanowww, @TobiHartmann) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

@openjdk
Copy link

openjdk bot commented Mar 5, 2025

@marc-chevalier The following labels will be automatically applied to this pull request:

  • graal
  • hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

Copy link
Contributor

@eme64 eme64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The benchmark generally looks good to me, I only have some minor suggestions ;)

@eme64
Copy link
Contributor

eme64 commented Mar 6, 2025

Is it worth inlining at all? @eme64 wondered, so I tried with this code:

You ask this in the PR description. I think I was not thinking about inlining but rather using the intrinsic. How much speedup does the intrinsic really deliver? Is it really better than pure Java?

@eme64
Copy link
Contributor

eme64 commented Mar 6, 2025

Ah. And is this only about multiplyExact, or are there other methods affected? Would be nice to extend the benchmark to those as well.

And yet another idea: you could probably write an IR test that checks that we at first have the compilation with the trap, and another test where we trap too much and then get a different compilation (without the intrinsic?).

Plus: the issue title is very generic. I think it should mention something about Math.*Exact as well ;)

@marc-chevalier
Copy link
Member Author

marc-chevalier commented Mar 6, 2025

You ask this in the PR description. I think I was not thinking about inlining but rather using the intrinsic. How much speedup does the intrinsic really deliver? Is it really better than pure Java?

My fault. I used "inline" instead of "intrinsic" because the functions implementing the intrinsic are called inline_math_mathExact and alike. So, I compared the intrinsic vs. the pure java implementation, that happens to be inlined. And intrinsic is a bit better.

I'll edit the text to fix that.

@marc-chevalier marc-chevalier marked this pull request as ready for review March 7, 2025 14:18
@openjdk openjdk bot added the rfr Pull request is ready for review label Mar 7, 2025
@mlbridge
Copy link

mlbridge bot commented Mar 7, 2025

Webrevs

Copy link
Contributor

@iwanowww iwanowww left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice benchmark, Marc!

@@ -1961,7 +1961,7 @@ void LibraryCallKit::inline_math_mathExact(Node* math, Node *test) {
set_i_o(i_o());

uncommon_trap(Deoptimization::Reason_intrinsic,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about using builtin_throw here? (Requires some tuning on builtin_throw side.) How much does it affect performance? Also, passing must_throw = true into uncommon_trap may help a bit here as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using builtin_throw sounds nice! But indeed, it won't work so directly. I want to prevent intrinsic in case of too_many_traps. But that's only when builtin_throw will do something. But if I only rely on builtin_throw, then, when the built-in throwing is not possible (that is when treat_throw_as_hot && method()->can_omit_stack_trace() is false), we will have the repeated deopt again.

There is also throwing the right exception, which is right now determined only by the reason (which adapts poorly to this case).

I guess that's what you meant by tuning: be able to know if we would built-in throw, and if so, do it, otherwise, prevent infinitely repeated deopt.

The way I see doing that is by (maybe optionally) providing the preallocated exception to throw as a parameter so that we don't have to rely on the "reason to exception" decision (or we can override it), and factor out the decision whether we can take the nice branch of builtin_throw so that we can bail out of intrinsic if we can't fast throw before we start setting up the intrinsic (that we would then need to undo). Does that match what you had in mind or you have another suggestion?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think adapting and re-using builtin_throw like you described is reasonable but I let @iwanowww confirm 🙂

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's basically what I had in mind.

Currently, the focus of the intrinsic is on well-behaved case (overflows are very rare). builtin_throw() covers more ground and optimize for scenarios when exceptions are thrown. But it depends on ciMethod::can_omit_stack_trace() where -XX:-OmitStackTraceInFastThrow mode will suffer from the original problem (continuous deoptimizations), plus a round of recompilations before giving up.

I suggest to improve and reuse builtin_throw here and add additional checks in the intrinsic to guard against problematic scenario with continuous deoptimizations. IMO it improves performance model for a wide range of use cases while addressing pathological scenarios.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I have done something like that (getting the exception object to throw from parameter, and factor out the logic whether builtin_throw is possible, so we can bailout of intrinsics instead of cycling again). Test seem to pass in the various cases I wrote. As for benchmark, it's quite a change. I post only the new part, the rest is pretty much the same. C2_no_builtin_throw does what the original C2 was (no builtin throw, just bailing out of intrinsics to cut our losses), and new C2 is with builtin_throw. tldr: builtin_throw makes the overflow case of the same order as the in-bound cases (1-4ms) instead of being about 100 times bigger (600-700ms with C1, C2 without intrinsics, C2 with bailing out).

MathExact.C2.loopAddIInBounds                         1000000  avgt    3    1.657 ±   11.994  ms/op
MathExact.C2.loopAddIOverflow                         1000000  avgt    3    1.313 ±    4.188  ms/op
MathExact.C2.loopAddLInBounds                         1000000  avgt    3    0.980 ±    0.396  ms/op
MathExact.C2.loopAddLOverflow                         1000000  avgt    3    2.474 ±    3.473  ms/op
MathExact.C2.loopDecrementIInBounds                   1000000  avgt    3    3.733 ±   13.709  ms/op
MathExact.C2.loopDecrementIOverflow                   1000000  avgt    3    2.792 ±   23.724  ms/op
MathExact.C2.loopDecrementLInBounds                   1000000  avgt    3    2.761 ±   24.744  ms/op
MathExact.C2.loopDecrementLOverflow                   1000000  avgt    3    2.730 ±   23.065  ms/op
MathExact.C2.loopIncrementIInBounds                   1000000  avgt    3    3.134 ±   20.980  ms/op
MathExact.C2.loopIncrementIOverflow                   1000000  avgt    3    3.271 ±    8.876  ms/op
MathExact.C2.loopIncrementLInBounds                   1000000  avgt    3    2.756 ±   22.912  ms/op
MathExact.C2.loopIncrementLOverflow                   1000000  avgt    3    4.549 ±    9.543  ms/op
MathExact.C2.loopMultiplyIInBounds                    1000000  avgt    3    1.268 ±    0.574  ms/op
MathExact.C2.loopMultiplyIOverflow                    1000000  avgt    3    1.572 ±   11.171  ms/op
MathExact.C2.loopMultiplyLInBounds                    1000000  avgt    3    1.021 ±    1.054  ms/op
MathExact.C2.loopMultiplyLOverflow                    1000000  avgt    3    3.167 ±   20.666  ms/op
MathExact.C2.loopNegateIInBounds                      1000000  avgt    3    3.575 ±   29.997  ms/op
MathExact.C2.loopNegateIOverflow                      1000000  avgt    3    4.222 ±    9.041  ms/op
MathExact.C2.loopNegateLInBounds                      1000000  avgt    3    4.452 ±    6.680  ms/op
MathExact.C2.loopNegateLOverflow                      1000000  avgt    3    4.739 ±   34.662  ms/op
MathExact.C2.loopSubtractIInBounds                    1000000  avgt    3    1.087 ±    0.539  ms/op
MathExact.C2.loopSubtractIOverflow                    1000000  avgt    3    3.027 ±    9.709  ms/op
MathExact.C2.loopSubtractLInBounds                    1000000  avgt    3    1.197 ±    5.763  ms/op
MathExact.C2.loopSubtractLOverflow                    1000000  avgt    3    1.765 ±   10.037  ms/op
MathExact.C2_no_builtin_throw.loopAddIInBounds        1000000  avgt    3    2.310 ±    2.990  ms/op
MathExact.C2_no_builtin_throw.loopAddIOverflow        1000000  avgt    3  594.036 ±  500.000  ms/op
MathExact.C2_no_builtin_throw.loopAddLInBounds        1000000  avgt    3    1.577 ±   14.053  ms/op
MathExact.C2_no_builtin_throw.loopAddLOverflow        1000000  avgt    3  631.345 ±   75.836  ms/op
MathExact.C2_no_builtin_throw.loopDecrementIInBounds  1000000  avgt    3    2.090 ±    0.937  ms/op
MathExact.C2_no_builtin_throw.loopDecrementIOverflow  1000000  avgt    3  618.080 ±   38.047  ms/op
MathExact.C2_no_builtin_throw.loopDecrementLInBounds  1000000  avgt    3    4.164 ±    6.184  ms/op
MathExact.C2_no_builtin_throw.loopDecrementLOverflow  1000000  avgt    3  596.031 ±  584.159  ms/op
MathExact.C2_no_builtin_throw.loopIncrementIInBounds  1000000  avgt    3    2.383 ±   11.729  ms/op
MathExact.C2_no_builtin_throw.loopIncrementIOverflow  1000000  avgt    3  626.425 ±  134.612  ms/op
MathExact.C2_no_builtin_throw.loopIncrementLInBounds  1000000  avgt    3    2.345 ±   13.927  ms/op
MathExact.C2_no_builtin_throw.loopIncrementLOverflow  1000000  avgt    3  630.535 ±   99.348  ms/op
MathExact.C2_no_builtin_throw.loopMultiplyIInBounds   1000000  avgt    3    1.419 ±    4.289  ms/op
MathExact.C2_no_builtin_throw.loopMultiplyIOverflow   1000000  avgt    3  587.796 ±   52.215  ms/op
MathExact.C2_no_builtin_throw.loopMultiplyLInBounds   1000000  avgt    3    0.934 ±    0.272  ms/op
MathExact.C2_no_builtin_throw.loopMultiplyLOverflow   1000000  avgt    3  589.736 ±  347.848  ms/op
MathExact.C2_no_builtin_throw.loopNegateIInBounds     1000000  avgt    3    2.236 ±    5.749  ms/op
MathExact.C2_no_builtin_throw.loopNegateIOverflow     1000000  avgt    3  618.711 ±  725.158  ms/op
MathExact.C2_no_builtin_throw.loopNegateLInBounds     1000000  avgt    3    2.605 ±   17.373  ms/op
MathExact.C2_no_builtin_throw.loopNegateLOverflow     1000000  avgt    3  627.055 ±  184.767  ms/op
MathExact.C2_no_builtin_throw.loopSubtractIInBounds   1000000  avgt    3    1.006 ±    0.584  ms/op
MathExact.C2_no_builtin_throw.loopSubtractIOverflow   1000000  avgt    3  588.062 ±  403.116  ms/op
MathExact.C2_no_builtin_throw.loopSubtractLInBounds   1000000  avgt    3    0.978 ±    0.193  ms/op
MathExact.C2_no_builtin_throw.loopSubtractLOverflow   1000000  avgt    3  611.004 ±  456.779  ms/op

@eme64
Copy link
Contributor

eme64 commented Mar 20, 2025

@iwanowww Are you still reviewing or should I have a look?

@marc-chevalier marc-chevalier changed the title 8346989: Deoptimization and re-compilation cycle with C2 compiled code 8346989: C2: deoptimization and re-compilation cycle with Math.*Exact in case of frequent overflow Mar 24, 2025
Copy link
Contributor

@iwanowww iwanowww left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Marc.

It looks a bit too convoluted to me. IMO an unconditional call to builtin_throw, plus too_many_traps check should do the job. Do I miss something important here?

@@ -275,7 +275,10 @@ class GraphKit : public Phase {

// Helper to throw a built-in exception.
// The JVMS must allow the bytecode to be re-executed via an uncommon trap.
void builtin_throw(Deoptimization::DeoptReason reason);
// If `exception_object` is nullptr, the exception to throw will be guessed based on `reason`
void builtin_throw(Deoptimization::DeoptReason reason, ciInstance* exception_object = nullptr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, introduce a new overload instead.

I suggest to extract Deoptimization::DeoptReason -> ciInstance mapping into a helper method and turn void builtin_throw(Deoptimization::DeoptReason reason) into a wrapper:

void GraphKit::builtin_throw(Deoptimization::DeoptReason reason) {
   builtin_throw(reason, exception_on_deopt(reason));
}

uncommon_trap(Deoptimization::Reason_intrinsic,
Deoptimization::Action_none);
if (use_builtin_throw) {
builtin_throw(Deoptimization::Reason_intrinsic, env()->ArithmeticException_instance());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to unconditionally call builtin_throw(). It should handle uncommon_trap case as well.

What makes sense is to ensure that builtin_throw() doesn't change deoptimization reason. It can be implemented with an extra argument to new GraphKit::builtin_throw overload (e.g., bool allow_deopt_reason_none).

// If builtin_throw would work (notably, the throw is hot and we don't care about backtraces),
// instead of bailing out on intrinsic or potentially deopting, let's do that!
use_builtin_throw = true;
} else if (too_many_traps(Deoptimization::Reason_intrinsic)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why too_many_traps(Deoptimization::Reason_intrinsic) check is not enough here?

@marc-chevalier
Copy link
Member Author

Actually, yes, there is a reason I've made it so weird (and I agree it's pretty convoluted).
builtin_throw kicks in if too_many_traps(reason) is true (and another case, but it might not apply):

if (ProfileTraps) {
if (too_many_traps(reason)) {
treat_throw_as_hot = true;
}
// (If there is no MDO at all, assume it is early in
// execution, and that any deopts are part of the
// startup transient, and don't need to be remembered.)
// Also, if there is a local exception handler, treat all throws
// as hot if there has been at least one in this method.
if (C->trap_count(reason) != 0
&& method()->method_data()->trap_count(reason) != 0
&& has_exception_handler()) {
treat_throw_as_hot = true;
}
}

If treat_throw_as_hot is false (so before too many traps) it just ends up as a uncommon_trap with Action_maybe_recompile action. That is fine at first. But later, we would like builtin_throw to do its job, but it can only do if if
if (treat_throw_as_hot && method()->can_omit_stack_trace()) {

which is not too_many_traps(reason). Which means that:

  • if we don't bailout intrinsics on too_many_traps(reason) we may be in the same situation as in the bug, with deopt cycles, in the situation where builtin_throw doesn't do it's job (for instance method()->can_omit_stack_trace() is false)
  • if we bailout intrincs on too_many_traps(reason), then builtin_throw never get a hot enough throw that it can speed up, and we have the same situation as my first version, before you suggested builtin_throw (with performances similar for C2 and C1).

In other words, we need too_many_traps(reason) to be reached to have builtin_throw start to have a change to do something, but it might not, and in this case, we need to bailout from intrinsics otherwise, we will repeatedly deopt. So, when too_many_traps(reason) is true, we have two options: either we give it to builtin_throw or we bailout. And to avoid the deopt cycles, we must know in advance if builtin_throw will do its job or just default to an uncommon_trap again (in which case, bailing out is better). This is why I extracted the condition for builtin_throw into builtin_throw_applies: so that intrinsic can decide what is best to do.

Some of your suggestions are still relevant tho! I'll apply them.

@openjdk
Copy link

openjdk bot commented Mar 31, 2025

@marc-chevalier this pull request can not be integrated into master due to one or more merge conflicts. To resolve these merge conflicts and update this pull request you can run the following commands in the local repository for your personal fork:

git checkout fix/Deoptimization-and-re-compilation-cycle-with-C2-compiled-code
git fetch https://git.openjdk.org/jdk.git master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push

@openjdk openjdk bot added merge-conflict Pull request has merge conflict with target branch and removed rfr Pull request is ready for review labels Mar 31, 2025
@openjdk openjdk bot added rfr Pull request is ready for review and removed merge-conflict Pull request has merge conflict with target branch labels Mar 31, 2025
@marc-chevalier
Copy link
Member Author

I've applied the suggested refactoring. It looks fine to me, tests seems happy, microbench shows similar profile.

Copy link
Contributor

@iwanowww iwanowww left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Apr 2, 2025
@openjdk openjdk bot removed the ready Pull request is ready to be integrated label Apr 2, 2025
@openjdk openjdk bot added the ready Pull request is ready to be integrated label Apr 2, 2025
Copy link
Member

@TobiHartmann TobiHartmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took me a while to parse the code but the refactoring definitely improves the situation 🙂 Looks good!

@openjdk openjdk bot removed the ready Pull request is ready to be integrated label Apr 3, 2025
@marc-chevalier
Copy link
Member Author

I've made the test flags tighter as discussed offline. I'll need a fresh approval.

And for completeness, there are the bench result on this last state. We can see that things behave as we expect: builtin_throw is taken and making the situation a lot better. When intrinsics or builtin_throw are disabled, we see C1-like perfs.

Benchmark                                              (SIZE)  Mode  Cnt    Score      Error  Units
MathExact.C1_1.loopAddIInBounds                       1000000  avgt    3    1.616 ±    7.813  ms/op
MathExact.C1_1.loopAddIOverflow                       1000000  avgt    3  654.971 ±  573.250  ms/op
MathExact.C1_1.loopAddLInBounds                       1000000  avgt    3    1.398 ±    0.274  ms/op
MathExact.C1_1.loopAddLOverflow                       1000000  avgt    3  629.620 ±   41.181  ms/op
MathExact.C1_1.loopDecrementIInBounds                 1000000  avgt    3    2.048 ±    0.340  ms/op
MathExact.C1_1.loopDecrementIOverflow                 1000000  avgt    3  681.702 ±   63.721  ms/op
MathExact.C1_1.loopDecrementLInBounds                 1000000  avgt    3    3.057 ±   13.688  ms/op
MathExact.C1_1.loopDecrementLOverflow                 1000000  avgt    3  660.457 ±  295.393  ms/op
MathExact.C1_1.loopIncrementIInBounds                 1000000  avgt    3    2.531 ±   13.692  ms/op
MathExact.C1_1.loopIncrementIOverflow                 1000000  avgt    3  647.970 ±   65.451  ms/op
MathExact.C1_1.loopIncrementLInBounds                 1000000  avgt    3    5.350 ±   25.080  ms/op
MathExact.C1_1.loopIncrementLOverflow                 1000000  avgt    3  681.097 ±   72.604  ms/op
MathExact.C1_1.loopMultiplyIInBounds                  1000000  avgt    3    1.552 ±    3.145  ms/op
MathExact.C1_1.loopMultiplyIOverflow                  1000000  avgt    3  648.402 ±   62.995  ms/op
MathExact.C1_1.loopMultiplyLInBounds                  1000000  avgt    3    2.501 ±    0.720  ms/op
MathExact.C1_1.loopMultiplyLOverflow                  1000000  avgt    3  701.498 ±   47.948  ms/op
MathExact.C1_1.loopNegateIInBounds                    1000000  avgt    3    2.074 ±    0.949  ms/op
MathExact.C1_1.loopNegateIOverflow                    1000000  avgt    3  665.143 ±  537.941  ms/op
MathExact.C1_1.loopNegateLInBounds                    1000000  avgt    3    5.487 ±    7.165  ms/op
MathExact.C1_1.loopNegateLOverflow                    1000000  avgt    3  687.085 ±   20.738  ms/op
MathExact.C1_1.loopSubtractIInBounds                  1000000  avgt    3    1.329 ±    0.769  ms/op
MathExact.C1_1.loopSubtractIOverflow                  1000000  avgt    3  683.922 ±   70.434  ms/op
MathExact.C1_1.loopSubtractLInBounds                  1000000  avgt    3    1.384 ±    0.386  ms/op
MathExact.C1_1.loopSubtractLOverflow                  1000000  avgt    3  664.380 ±  480.847  ms/op
MathExact.C1_2.loopAddIInBounds                       1000000  avgt    3    1.862 ±    0.815  ms/op
MathExact.C1_2.loopAddIOverflow                       1000000  avgt    3  660.421 ±  506.723  ms/op
MathExact.C1_2.loopAddLInBounds                       1000000  avgt    3    1.829 ±    0.221  ms/op
MathExact.C1_2.loopAddLOverflow                       1000000  avgt    3  681.209 ±   78.976  ms/op
MathExact.C1_2.loopDecrementIInBounds                 1000000  avgt    3    3.533 ±   11.302  ms/op
MathExact.C1_2.loopDecrementIOverflow                 1000000  avgt    3  682.639 ±  225.392  ms/op
MathExact.C1_2.loopDecrementLInBounds                 1000000  avgt    3    3.402 ±    1.031  ms/op
MathExact.C1_2.loopDecrementLOverflow                 1000000  avgt    3  697.283 ±  306.867  ms/op
MathExact.C1_2.loopIncrementIInBounds                 1000000  avgt    3    3.326 ±    5.072  ms/op
MathExact.C1_2.loopIncrementIOverflow                 1000000  avgt    3  658.514 ±  636.731  ms/op
MathExact.C1_2.loopIncrementLInBounds                 1000000  avgt    3    3.718 ±    0.422  ms/op
MathExact.C1_2.loopIncrementLOverflow                 1000000  avgt    3  693.863 ±   49.201  ms/op
MathExact.C1_2.loopMultiplyIInBounds                  1000000  avgt    3    1.924 ±    2.800  ms/op
MathExact.C1_2.loopMultiplyIOverflow                  1000000  avgt    3  609.308 ±   94.814  ms/op
MathExact.C1_2.loopMultiplyLInBounds                  1000000  avgt    3    3.459 ±    0.625  ms/op
MathExact.C1_2.loopMultiplyLOverflow                  1000000  avgt    3  713.503 ±  556.995  ms/op
MathExact.C1_2.loopNegateIInBounds                    1000000  avgt    3    3.195 ±    0.726  ms/op
MathExact.C1_2.loopNegateIOverflow                    1000000  avgt    3  684.176 ±   27.164  ms/op
MathExact.C1_2.loopNegateLInBounds                    1000000  avgt    3    3.483 ±    0.947  ms/op
MathExact.C1_2.loopNegateLOverflow                    1000000  avgt    3  656.284 ±  582.286  ms/op
MathExact.C1_2.loopSubtractIInBounds                  1000000  avgt    3    1.728 ±    0.315  ms/op
MathExact.C1_2.loopSubtractIOverflow                  1000000  avgt    3  688.029 ±   25.201  ms/op
MathExact.C1_2.loopSubtractLInBounds                  1000000  avgt    3    1.941 ±    0.169  ms/op
MathExact.C1_2.loopSubtractLOverflow                  1000000  avgt    3  694.341 ±  339.431  ms/op
MathExact.C1_3.loopAddIInBounds                       1000000  avgt    3    3.122 ±    0.910  ms/op
MathExact.C1_3.loopAddIOverflow                       1000000  avgt    3  688.731 ±  308.210  ms/op
MathExact.C1_3.loopAddLInBounds                       1000000  avgt    3    5.492 ±   36.236  ms/op
MathExact.C1_3.loopAddLOverflow                       1000000  avgt    3  697.053 ±  229.958  ms/op
MathExact.C1_3.loopDecrementIInBounds                 1000000  avgt    3    9.155 ±   72.182  ms/op
MathExact.C1_3.loopDecrementIOverflow                 1000000  avgt    3  708.458 ±  788.701  ms/op
MathExact.C1_3.loopDecrementLInBounds                 1000000  avgt    3    6.402 ±    3.658  ms/op
MathExact.C1_3.loopDecrementLOverflow                 1000000  avgt    3  705.992 ±  213.542  ms/op
MathExact.C1_3.loopIncrementIInBounds                 1000000  avgt    3    7.699 ±   61.434  ms/op
MathExact.C1_3.loopIncrementIOverflow                 1000000  avgt    3  697.353 ±  105.457  ms/op
MathExact.C1_3.loopIncrementLInBounds                 1000000  avgt    3    6.380 ±    0.839  ms/op
MathExact.C1_3.loopIncrementLOverflow                 1000000  avgt    3  669.240 ±  522.870  ms/op
MathExact.C1_3.loopMultiplyIInBounds                  1000000  avgt    3    3.225 ±    0.140  ms/op
MathExact.C1_3.loopMultiplyIOverflow                  1000000  avgt    3  624.811 ±  457.059  ms/op
MathExact.C1_3.loopMultiplyLInBounds                  1000000  avgt    3    6.110 ±    1.265  ms/op
MathExact.C1_3.loopMultiplyLOverflow                  1000000  avgt    3  718.460 ±   68.166  ms/op
MathExact.C1_3.loopNegateIInBounds                    1000000  avgt    3    6.085 ±    1.430  ms/op
MathExact.C1_3.loopNegateIOverflow                    1000000  avgt    3  675.036 ±  341.177  ms/op
MathExact.C1_3.loopNegateLInBounds                    1000000  avgt    3    9.410 ±   93.522  ms/op
MathExact.C1_3.loopNegateLOverflow                    1000000  avgt    3  652.042 ±  166.119  ms/op
MathExact.C1_3.loopSubtractIInBounds                  1000000  avgt    3    3.432 ±   11.899  ms/op
MathExact.C1_3.loopSubtractIOverflow                  1000000  avgt    3  654.208 ±  120.258  ms/op
MathExact.C1_3.loopSubtractLInBounds                  1000000  avgt    3    5.166 ±   38.529  ms/op
MathExact.C1_3.loopSubtractLOverflow                  1000000  avgt    3  691.094 ±   80.676  ms/op
MathExact.C2.loopAddIInBounds                         1000000  avgt    3    2.276 ±    1.750  ms/op
MathExact.C2.loopAddIOverflow                         1000000  avgt    3    1.173 ±    1.392  ms/op
MathExact.C2.loopAddLInBounds                         1000000  avgt    3    0.985 ±    0.167  ms/op
MathExact.C2.loopAddLOverflow                         1000000  avgt    3    1.990 ±    5.310  ms/op
MathExact.C2.loopDecrementIInBounds                   1000000  avgt    3    2.072 ±    0.173  ms/op
MathExact.C2.loopDecrementIOverflow                   1000000  avgt    3    1.911 ±    0.288  ms/op
MathExact.C2.loopDecrementLInBounds                   1000000  avgt    3    1.845 ±    0.424  ms/op
MathExact.C2.loopDecrementLOverflow                   1000000  avgt    3    2.757 ±   27.268  ms/op
MathExact.C2.loopIncrementIInBounds                   1000000  avgt    3    2.136 ±    0.517  ms/op
MathExact.C2.loopIncrementIOverflow                   1000000  avgt    3    2.199 ±    4.024  ms/op
MathExact.C2.loopIncrementLInBounds                   1000000  avgt    3    1.957 ±    0.365  ms/op
MathExact.C2.loopIncrementLOverflow                   1000000  avgt    3    2.053 ±    0.779  ms/op
MathExact.C2.loopMultiplyIInBounds                    1000000  avgt    3    1.174 ±    0.941  ms/op
MathExact.C2.loopMultiplyIOverflow                    1000000  avgt    3    1.971 ±   10.040  ms/op
MathExact.C2.loopMultiplyLInBounds                    1000000  avgt    3    0.997 ±    0.318  ms/op
MathExact.C2.loopMultiplyLOverflow                    1000000  avgt    3    2.847 ±    4.548  ms/op
MathExact.C2.loopNegateIInBounds                      1000000  avgt    3    4.783 ±    2.454  ms/op
MathExact.C2.loopNegateIOverflow                      1000000  avgt    3    1.915 ±    0.009  ms/op
MathExact.C2.loopNegateLInBounds                      1000000  avgt    3    2.824 ±   28.297  ms/op
MathExact.C2.loopNegateLOverflow                      1000000  avgt    3    4.766 ±   32.627  ms/op
MathExact.C2.loopSubtractIInBounds                    1000000  avgt    3    0.990 ±    0.264  ms/op
MathExact.C2.loopSubtractIOverflow                    1000000  avgt    3    1.181 ±    2.120  ms/op
MathExact.C2.loopSubtractLInBounds                    1000000  avgt    3    2.363 ±    1.575  ms/op
MathExact.C2.loopSubtractLOverflow                    1000000  avgt    3    2.429 ±    7.120  ms/op
MathExact.C2_no_builtin_throw.loopAddIInBounds        1000000  avgt    3    1.040 ±    0.181  ms/op
MathExact.C2_no_builtin_throw.loopAddIOverflow        1000000  avgt    3  580.950 ±  112.050  ms/op
MathExact.C2_no_builtin_throw.loopAddLInBounds        1000000  avgt    3    1.223 ±    5.700  ms/op
MathExact.C2_no_builtin_throw.loopAddLOverflow        1000000  avgt    3  585.712 ±   61.699  ms/op
MathExact.C2_no_builtin_throw.loopDecrementIInBounds  1000000  avgt    3    2.114 ±    0.663  ms/op
MathExact.C2_no_builtin_throw.loopDecrementIOverflow  1000000  avgt    3  604.866 ±  578.502  ms/op
MathExact.C2_no_builtin_throw.loopDecrementLInBounds  1000000  avgt    3    2.167 ±    9.268  ms/op
MathExact.C2_no_builtin_throw.loopDecrementLOverflow  1000000  avgt    3  621.175 ±  225.858  ms/op
MathExact.C2_no_builtin_throw.loopIncrementIInBounds  1000000  avgt    3    1.950 ±    0.326  ms/op
MathExact.C2_no_builtin_throw.loopIncrementIOverflow  1000000  avgt    3  633.735 ±  830.255  ms/op
MathExact.C2_no_builtin_throw.loopIncrementLInBounds  1000000  avgt    3    2.397 ±   11.911  ms/op
MathExact.C2_no_builtin_throw.loopIncrementLOverflow  1000000  avgt    3  627.599 ±  141.709  ms/op
MathExact.C2_no_builtin_throw.loopMultiplyIInBounds   1000000  avgt    3    1.167 ±    1.187  ms/op
MathExact.C2_no_builtin_throw.loopMultiplyIOverflow   1000000  avgt    3  623.224 ±  298.374  ms/op
MathExact.C2_no_builtin_throw.loopMultiplyLInBounds   1000000  avgt    3    0.944 ±    0.743  ms/op
MathExact.C2_no_builtin_throw.loopMultiplyLOverflow   1000000  avgt    3  658.380 ±  137.021  ms/op
MathExact.C2_no_builtin_throw.loopNegateIInBounds     1000000  avgt    3    2.119 ±    0.642  ms/op
MathExact.C2_no_builtin_throw.loopNegateIOverflow     1000000  avgt    3  643.102 ±  452.213  ms/op
MathExact.C2_no_builtin_throw.loopNegateLInBounds     1000000  avgt    3    2.036 ±    0.862  ms/op
MathExact.C2_no_builtin_throw.loopNegateLOverflow     1000000  avgt    3  586.103 ±   26.173  ms/op
MathExact.C2_no_builtin_throw.loopSubtractIInBounds   1000000  avgt    3    2.552 ±    3.677  ms/op
MathExact.C2_no_builtin_throw.loopSubtractIOverflow   1000000  avgt    3  635.294 ±  217.034  ms/op
MathExact.C2_no_builtin_throw.loopSubtractLInBounds   1000000  avgt    3    1.093 ±    1.685  ms/op
MathExact.C2_no_builtin_throw.loopSubtractLOverflow   1000000  avgt    3  661.541 ± 1358.199  ms/op
MathExact.C2_no_intrinsics.loopAddIInBounds           1000000  avgt    3    2.185 ±   15.103  ms/op
MathExact.C2_no_intrinsics.loopAddIOverflow           1000000  avgt    3  831.812 ± 1260.546  ms/op
MathExact.C2_no_intrinsics.loopAddLInBounds           1000000  avgt    3    2.145 ±    0.088  ms/op
MathExact.C2_no_intrinsics.loopAddLOverflow           1000000  avgt    3  709.930 ±  658.722  ms/op
MathExact.C2_no_intrinsics.loopDecrementIInBounds     1000000  avgt    3    2.288 ±    0.950  ms/op
MathExact.C2_no_intrinsics.loopDecrementIOverflow     1000000  avgt    3  646.879 ±  186.231  ms/op
MathExact.C2_no_intrinsics.loopDecrementLInBounds     1000000  avgt    3    1.894 ±    0.421  ms/op
MathExact.C2_no_intrinsics.loopDecrementLOverflow     1000000  avgt    3  641.577 ±  323.040  ms/op
MathExact.C2_no_intrinsics.loopIncrementIInBounds     1000000  avgt    3    2.027 ±    0.249  ms/op
MathExact.C2_no_intrinsics.loopIncrementIOverflow     1000000  avgt    3  657.092 ±  229.818  ms/op
MathExact.C2_no_intrinsics.loopIncrementLInBounds     1000000  avgt    3    3.220 ±   16.992  ms/op
MathExact.C2_no_intrinsics.loopIncrementLOverflow     1000000  avgt    3  603.468 ±   73.240  ms/op
MathExact.C2_no_intrinsics.loopMultiplyIInBounds      1000000  avgt    3    1.295 ±    0.413  ms/op
MathExact.C2_no_intrinsics.loopMultiplyIOverflow      1000000  avgt    3  593.005 ±  576.291  ms/op
MathExact.C2_no_intrinsics.loopMultiplyLInBounds      1000000  avgt    3    1.093 ±    0.916  ms/op
MathExact.C2_no_intrinsics.loopMultiplyLOverflow      1000000  avgt    3  618.956 ±  554.204  ms/op
MathExact.C2_no_intrinsics.loopNegateIInBounds        1000000  avgt    3    2.035 ±    0.047  ms/op
MathExact.C2_no_intrinsics.loopNegateIOverflow        1000000  avgt    3  650.591 ± 1248.923  ms/op
MathExact.C2_no_intrinsics.loopNegateLInBounds        1000000  avgt    3    3.505 ±   20.475  ms/op
MathExact.C2_no_intrinsics.loopNegateLOverflow        1000000  avgt    3  660.686 ±  201.612  ms/op
MathExact.C2_no_intrinsics.loopSubtractIInBounds      1000000  avgt    3    1.109 ±    0.726  ms/op
MathExact.C2_no_intrinsics.loopSubtractIOverflow      1000000  avgt    3  670.468 ±  475.269  ms/op
MathExact.C2_no_intrinsics.loopSubtractLInBounds      1000000  avgt    3    1.208 ±    0.806  ms/op
MathExact.C2_no_intrinsics.loopSubtractLOverflow      1000000  avgt    3  597.522 ±   32.465  ms/op

@TobiHartmann
Copy link
Member

Great, thank you!

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Apr 3, 2025
@marc-chevalier marc-chevalier changed the title 8346989: C2: deoptimization and re-compilation cycle with Math.*Exact in case of frequent overflow 8346989: C2: deoptimization and re-execution cycle with Math.*Exact in case of frequent overflow Apr 3, 2025
@openjdk openjdk bot added ready Pull request is ready to be integrated and removed ready Pull request is ready to be integrated labels Apr 3, 2025
@marc-chevalier
Copy link
Member Author

/integrate

Thanks @iwanowww and @TobiHartmann!

@openjdk openjdk bot added the sponsor Pull request is ready to be sponsored label Apr 4, 2025
@openjdk
Copy link

openjdk bot commented Apr 4, 2025

@marc-chevalier
Your change (at version e7c8f3e) is now ready to be sponsored by a Committer.

@TobiHartmann
Copy link
Member

/sponsor

@openjdk
Copy link

openjdk bot commented Apr 7, 2025

Going to push as commit 97ed536.
Since your change was applied there have been 105 commits pushed to the master branch:

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Apr 7, 2025
@openjdk openjdk bot closed this Apr 7, 2025
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review sponsor Pull request is ready to be sponsored labels Apr 7, 2025
@openjdk
Copy link

openjdk bot commented Apr 7, 2025

@TobiHartmann @marc-chevalier Pushed as commit 97ed536.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

4 participants