[ty] LSP Benchmarks #21625

MichaReiser · 2025-11-25T08:01:25Z

Adds two LSP specific benchmarks to ty's benchmark suite

Test plan

Startup

----------------------------------------------------------------------------------------- benchmark 'black': 3 tests ----------------------------------------------------------------------------------------
Name (time in ms)                              Min                 Max                Mean            StdDev              Median               IQR            Outliers      OPS            Rounds  Iterations
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_fetch_diagnostics[black-ty]           45.0022 (1.0)       50.5992 (1.0)       48.1661 (1.0)      2.1093 (1.56)      48.7534 (1.0)      3.8456 (2.35)          5;0  20.7615 (1.0)          10           1
test_fetch_diagnostics[black-pyrefly]     130.7642 (2.91)     134.9948 (2.67)     132.4777 (2.75)     1.3516 (1.0)      132.1238 (2.71)     1.6354 (1.0)           3;0   7.5484 (0.36)         10           1
test_fetch_diagnostics[black-pyright]     259.8113 (5.77)     270.9675 (5.36)     265.2209 (5.51)     3.2077 (2.37)     265.2749 (5.44)     3.7415 (2.29)          3;0   3.7704 (0.18)         10           1
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------------------- benchmark 'discord.py': 3 tests ----------------------------------------------------------------------------------------
Name (time in ms)                                   Min                 Max                Mean            StdDev              Median               IQR            Outliers     OPS            Rounds  Iterations
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_fetch_diagnostics[discord.py-ty]          102.0583 (1.0)      112.3228 (1.0)      106.5086 (1.0)      3.1388 (1.0)      107.1197 (1.0)      3.9039 (1.0)           3;0  9.3889 (1.0)          10           1
test_fetch_diagnostics[discord.py-pyrefly]     396.0137 (3.88)     418.2918 (3.72)     403.7360 (3.79)     6.8992 (2.20)     402.4430 (3.76)     9.0345 (2.31)          3;0  2.4769 (0.26)         10           1
test_fetch_diagnostics[discord.py-pyright]     543.4070 (5.32)     559.0652 (4.98)     551.4626 (5.18)     5.0122 (1.60)     551.1797 (5.15)     5.9366 (1.52)          3;0  1.8134 (0.19)         10           1
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------------------- benchmark 'homeassistant': 3 tests ----------------------------------------------------------------------------------------
Name (time in ms)                                      Min                 Max                Mean            StdDev              Median               IQR            Outliers     OPS            Rounds  Iterations
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_fetch_diagnostics[homeassistant-ty]          106.4813 (1.0)      110.0322 (1.0)      107.8758 (1.0)      1.1792 (1.0)      107.7968 (1.0)      1.2990 (1.0)           3;0  9.2699 (1.0)          10           1
test_fetch_diagnostics[homeassistant-pyrefly]     211.0191 (1.98)     220.0760 (2.00)     214.8664 (1.99)     2.6738 (2.27)     214.5144 (1.99)     2.4436 (1.88)          3;1  4.6541 (0.50)         10           1
test_fetch_diagnostics[homeassistant-pyright]     923.6263 (8.67)     955.3790 (8.68)     937.9460 (8.69)     9.9209 (8.41)     935.8720 (8.68)     7.4282 (5.72)          3;2  1.0662 (0.12)         10           1
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

----------------------------------------------------------------------------------------- benchmark 'isort': 3 tests ----------------------------------------------------------------------------------------
Name (time in ms)                              Min                 Max                Mean            StdDev              Median               IQR            Outliers      OPS            Rounds  Iterations
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_fetch_diagnostics[isort-ty]           40.9938 (1.0)       49.5640 (1.0)       44.8959 (1.0)      2.4222 (1.0)       45.2774 (1.0)      2.6837 (1.0)           3;0  22.2737 (1.0)          10           1
test_fetch_diagnostics[isort-pyrefly]      96.3884 (2.35)     113.4274 (2.29)     103.4755 (2.30)     4.9937 (2.06)     104.3979 (2.31)     7.6517 (2.85)          4;0   9.6641 (0.43)         10           1
test_fetch_diagnostics[isort-pyright]     321.9132 (7.85)     330.8636 (6.68)     325.5137 (7.25)     3.3694 (1.39)     324.8571 (7.17)     6.2640 (2.33)          3;0   3.0721 (0.14)         10           1
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

----------------------------------------------------------------------------------------- benchmark 'jinja': 3 tests -----------------------------------------------------------------------------------------
Name (time in ms)                              Min                 Max                Mean             StdDev              Median                IQR            Outliers     OPS            Rounds  Iterations
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_fetch_diagnostics[jinja-pyrefly]     124.4310 (1.0)      130.5832 (1.0)      127.2086 (1.0)       1.8860 (1.0)      127.3995 (1.0)       3.3252 (1.0)           4;0  7.8611 (1.0)          10           1
test_fetch_diagnostics[jinja-ty]          127.5364 (1.02)     137.5456 (1.05)     132.2176 (1.04)      3.0441 (1.61)     131.9572 (1.04)      4.2723 (1.28)          3;0  7.5633 (0.96)         10           1
test_fetch_diagnostics[jinja-pyright]     286.0230 (2.30)     336.9109 (2.58)     307.5707 (2.42)     18.6872 (9.91)     307.0353 (2.41)     31.6510 (9.52)          5;0  3.2513 (0.41)         10           1
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------------ benchmark 'pandas': 3 tests ------------------------------------------------------------------------------------------
Name (time in ms)                               Min                   Max                Mean             StdDev              Median                IQR            Outliers     OPS            Rounds  Iterations
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_fetch_diagnostics[pandas-ty]          279.5276 (1.0)        302.6971 (1.0)      290.5714 (1.0)       8.4075 (1.0)      287.1560 (1.0)      14.6762 (1.0)           3;0  3.4415 (1.0)          10           1
test_fetch_diagnostics[pandas-pyrefly]     515.2284 (1.84)       544.7320 (1.80)     524.7709 (1.81)     11.9116 (1.42)     517.7050 (1.80)     17.4099 (1.19)          2;0  1.9056 (0.55)         10           1
test_fetch_diagnostics[pandas-pyright]     933.7581 (3.34)     1,002.1076 (3.31)     974.3720 (3.35)     23.6602 (2.81)     981.7017 (3.42)     41.5926 (2.83)          4;0  1.0263 (0.30)         10           1
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

----------------------------------------------------------------------------------------- benchmark 'prefect': 3 tests -----------------------------------------------------------------------------------------
Name (time in ms)                                Min                 Max                Mean             StdDev              Median                IQR            Outliers     OPS            Rounds  Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_fetch_diagnostics[prefect-ty]          119.6068 (1.0)      124.3774 (1.0)      121.5190 (1.0)       1.5453 (1.0)      121.2688 (1.0)       2.1728 (1.0)           4;0  8.2292 (1.0)          10           1
test_fetch_diagnostics[prefect-pyrefly]     428.1855 (3.58)     473.9171 (3.81)     446.3560 (3.67)     16.1744 (10.47)    444.7778 (3.67)     27.8795 (12.83)         5;0  2.2404 (0.27)         10           1
test_fetch_diagnostics[prefect-pyright]     861.3169 (7.20)     907.2987 (7.29)     878.4323 (7.23)     17.0292 (11.02)    872.1708 (7.19)     29.8330 (13.73)         3;0  1.1384 (0.14)         10           1
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------------ benchmark 'pytorch': 3 tests -----------------------------------------------------------------------------------------
Name (time in ms)                                Min                 Max                Mean             StdDev              Median                IQR            Outliers      OPS            Rounds  Iterations
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_fetch_diagnostics[pytorch-ty]           47.4029 (1.0)       52.9500 (1.0)       50.7074 (1.0)       1.6112 (1.0)       51.1733 (1.0)       1.2666 (1.0)           3;1  19.7210 (1.0)          10           1
test_fetch_diagnostics[pytorch-pyrefly]     166.4415 (3.51)     175.2218 (3.31)     168.4454 (3.32)      2.6925 (1.67)     167.3733 (3.27)      2.6971 (2.13)          1;1   5.9366 (0.30)         10           1
test_fetch_diagnostics[pytorch-pyright]     454.2465 (9.58)     508.4146 (9.60)     472.4543 (9.32)     14.9319 (9.27)     469.2766 (9.17)     12.8563 (10.15)         2;1   2.1166 (0.11)         10           1

Incremental

----------------------------------------------------------------------------------------- benchmark 'black': 3 tests -----------------------------------------------------------------------------------------
Name (time in ms)                             Min                 Max                Mean            StdDev              Median                IQR            Outliers       OPS            Rounds  Iterations
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_incremental_edit[black-ty]            9.4605 (1.0)        9.7666 (1.0)        9.6273 (1.0)      0.1018 (1.0)        9.6256 (1.0)       0.1905 (1.0)           5;0  103.8714 (1.0)          10           1
test_incremental_edit[black-pyrefly]     170.8236 (18.06)    193.0729 (19.77)    185.7380 (19.29)    6.6044 (64.88)    185.7899 (19.30)     8.3591 (43.88)         3;0    5.3839 (0.05)         10           1
test_incremental_edit[black-pyright]     435.0229 (45.98)    458.8385 (46.98)    448.5559 (46.59)    7.8568 (77.18)    448.8126 (46.63)    12.3359 (64.76)         3;0    2.2294 (0.02)         10           1
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

----------------------------------------------------------------------------------------- benchmark 'discord.py': 3 tests ------------------------------------------------------------------------------------------
Name (time in ms)                                  Min                 Max                Mean             StdDev              Median                 IQR            Outliers      OPS            Rounds  Iterations
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_incremental_edit[discord.py-ty]           11.6751 (1.0)       12.3491 (1.0)       11.9583 (1.0)       0.2168 (1.0)       11.9407 (1.0)        0.1131 (1.0)           4;3  83.6240 (1.0)          10           1
test_incremental_edit[discord.py-pyrefly]     363.2718 (31.12)    480.7902 (38.93)    423.8286 (35.44)    48.6729 (224.53)   436.7698 (36.58)    104.9666 (928.23)        6;0   2.3594 (0.03)         10           1
test_incremental_edit[discord.py-pyright]     458.0812 (39.24)    481.0607 (38.96)    475.4985 (39.76)     6.8778 (31.73)    478.0520 (40.04)      3.6961 (32.69)         1;2   2.1031 (0.03)         10           1
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------------------------------------------------- benchmark 'homeassistant': 3 tests ---------------------------------------------------------------------------------------------
Name (time in ms)                                       Min                   Max                  Mean             StdDev                Median                IQR            Outliers      OPS            Rounds  Iterations
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_incremental_edit[homeassistant-ty]             21.8815 (1.0)         22.8287 (1.0)         22.3703 (1.0)       0.2974 (1.0)         22.3340 (1.0)       0.4893 (1.0)           3;0  44.7021 (1.0)          10           1
test_incremental_edit[homeassistant-pyright]       514.2854 (23.50)      535.6224 (23.46)      522.5869 (23.36)     7.5869 (25.51)      519.4796 (23.26)    12.0478 (24.62)         3;0   1.9136 (0.04)         10           1
test_incremental_edit[homeassistant-pyrefly]     1,105.8705 (50.54)    1,195.8744 (52.38)    1,153.7581 (51.58)    28.0634 (94.37)    1,160.8398 (51.98)    39.3058 (80.33)         3;0   0.8667 (0.02)         10           1
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------------------- benchmark 'isort': 3 tests ----------------------------------------------------------------------------------------
Name (time in ms)                             Min                 Max                Mean            StdDev              Median               IQR            Outliers      OPS            Rounds  Iterations
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_incremental_edit[isort-ty]           10.1290 (1.0)       11.3836 (1.0)       10.5138 (1.0)      0.3575 (1.0)       10.4004 (1.0)      0.2091 (1.0)           2;1  95.1132 (1.0)          10           1
test_incremental_edit[isort-pyrefly]     127.0601 (12.54)    145.5532 (12.79)    135.0604 (12.85)    4.9420 (13.82)    134.0219 (12.89)    4.9549 (23.69)         2;1   7.4041 (0.08)         10           1
test_incremental_edit[isort-pyright]     391.5492 (38.66)    416.5735 (36.59)    407.5843 (38.77)    7.5267 (21.05)    409.3451 (39.36)    8.4329 (40.32)         3;0   2.4535 (0.03)         10           1
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------------------- benchmark 'jinja': 3 tests ----------------------------------------------------------------------------------------
Name (time in ms)                             Min                 Max                Mean            StdDev              Median               IQR            Outliers      OPS            Rounds  Iterations
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_incremental_edit[jinja-ty]           52.8323 (1.0)       58.4933 (1.0)       54.4423 (1.0)      1.6256 (1.0)       54.2111 (1.0)      1.5798 (1.0)           1;1  18.3681 (1.0)          10           1
test_incremental_edit[jinja-pyrefly]     190.0055 (3.60)     200.0931 (3.42)     196.0336 (3.60)     3.3738 (2.08)     195.8991 (3.61)     4.6862 (2.97)          4;0   5.1012 (0.28)         10           1
test_incremental_edit[jinja-pyright]     449.5884 (8.51)     466.3390 (7.97)     456.7712 (8.39)     5.4815 (3.37)     455.8810 (8.41)     6.7072 (4.25)          3;0   2.1893 (0.12)         10           1
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------------------------- benchmark 'pandas': 3 tests ----------------------------------------------------------------------------------------------
Name (time in ms)                                Min                   Max                  Mean              StdDev                Median                 IQR            Outliers      OPS            Rounds  Iterations
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_incremental_edit[pandas-ty]             43.4786 (1.0)         71.6500 (1.0)         52.6861 (1.0)       10.6539 (1.0)         46.6107 (1.0)       19.4685 (2.06)          3;0  18.9803 (1.0)          10           1
test_incremental_edit[pandas-pyright]       475.8649 (10.94)      531.3990 (7.42)       500.1655 (9.49)      18.0852 (1.70)       497.7416 (10.68)      9.4399 (1.0)           4;3   1.9993 (0.11)         10           1
test_incremental_edit[pandas-pyrefly]     3,502.8000 (80.56)    4,395.4530 (61.35)    3,955.8306 (75.08)    257.6791 (24.19)    3,951.3062 (84.77)    303.0145 (32.10)         3;0   0.2528 (0.01)         10           1
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

----------------------------------------------------------------------------------------------- benchmark 'prefect': 3 tests ----------------------------------------------------------------------------------------------
Name (time in ms)                                 Min                   Max                  Mean              StdDev                Median                 IQR            Outliers       OPS            Rounds  Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_incremental_edit[prefect-ty]              4.8863 (1.0)          5.4782 (1.0)          5.1683 (1.0)        0.1729 (1.0)          5.1511 (1.0)        0.2274 (1.0)           2;0  193.4889 (1.0)          10           1
test_incremental_edit[prefect-pyright]       557.9992 (114.20)     577.8817 (105.49)     568.2647 (109.95)     6.2294 (36.03)      567.5641 (110.18)     9.2906 (40.85)         3;0    1.7597 (0.01)         10           1
test_incremental_edit[prefect-pyrefly]     2,342.2367 (479.34)   3,614.2212 (659.74)   2,930.9432 (567.11)   475.6049 (>1000.0)  2,937.2390 (570.21)   915.9068 (>1000.0)       4;0    0.3412 (0.00)         10           1
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

----------------------------------------------------------------------------------------------- benchmark 'pytorch': 3 tests ----------------------------------------------------------------------------------------------
Name (time in ms)                                 Min                   Max                  Mean              StdDev                Median                 IQR            Outliers       OPS            Rounds  Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_incremental_edit[pytorch-ty]              4.4972 (1.0)          5.1331 (1.0)          4.7089 (1.0)        0.1941 (1.0)          4.6489 (1.0)        0.1583 (1.0)           3;1  212.3648 (1.0)          10           1
test_incremental_edit[pytorch-pyright]       383.8624 (85.36)      390.7582 (76.13)      386.2723 (82.03)      2.1002 (10.82)      386.0654 (83.04)      2.5315 (15.99)         3;0    2.5888 (0.01)         10           1
test_incremental_edit[pytorch-pyrefly]     2,218.3648 (493.28)   2,574.7509 (501.60)   2,388.2273 (507.18)   108.1246 (557.11)   2,376.1079 (511.11)   162.9907 (>1000.0)       2;0    0.4187 (0.00)         10           1
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Legend:
  Outliers: 1 Standard Deviation from Mean; 1.5 IQR (InterQuartile Range) from 1st Quartile and 3rd Quartile.
  OPS: Operations Per Second, computed as 1 / Mean
============================================================ 24 passed, 3 skipped, 2745 warnings in 535.80s (0:08:55) ============================================================

astral-sh-bot · 2025-11-25T08:11:59Z

`ruff-ecosystem` results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

scripts/ty_benchmark/src/benchmark/test_lsp_diagnostics.py

AlexWaygood

A couple of nits from skimming ;)

this looks really cool, thank you!!

scripts/ty_benchmark/src/benchmark/lsp_client.py

scripts/ty_benchmark/src/benchmark/projects.py

scripts/ty_benchmark/src/benchmark/test_lsp_diagnostics.py

MichaReiser added the ty Multi-file analysis & type inference label Nov 25, 2025

MichaReiser force-pushed the micha/lsp-benchmarks branch from 5f3016c to fad88b8 Compare November 28, 2025 12:39

MichaReiser added the internal An internal refactor or improvement label Nov 28, 2025

MichaReiser marked this pull request as ready for review November 28, 2025 12:58

MichaReiser requested review from AlexWaygood, carljm, dcreager and sharkdp as code owners November 28, 2025 12:58

MichaReiser requested a review from charliermarsh November 28, 2025 21:29

charliermarsh approved these changes Nov 29, 2025

View reviewed changes

scripts/ty_benchmark/src/benchmark/test_lsp_diagnostics.py Outdated Show resolved Hide resolved

AlexWaygood reviewed Nov 29, 2025

View reviewed changes

MichaReiser added 11 commits December 1, 2025 11:41

Basic LSP benchmarks

0a4c69c

Almost working benchmarks

1872a30

Something that's close to working

0b22cf4

Discard changes to crates/ty_server/src/server/main_loop.rs

6d652e6

More improvements

09864f5

More improvements

64343d7

Docs

b8010d3

Update snapshots after Pyrefly/ty upgrade

d2b09cc

Discard changes to scripts/ty_benchmark/snapshots/discord.py_mypy.txt

964c33e

Disable indexing for pyrefly

c6ee46d

Code review feedback

1b33102

MichaReiser force-pushed the micha/lsp-benchmarks branch from 096e539 to 1b33102 Compare December 1, 2025 11:17

Code review feedback

488dc0f

MichaReiser merged commit 2e229aa into main Dec 1, 2025
36 checks passed

MichaReiser deleted the micha/lsp-benchmarks branch December 1, 2025 11:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ty] LSP Benchmarks #21625

[ty] LSP Benchmarks #21625

MichaReiser commented Nov 25, 2025 •

edited

Loading

Uh oh!

astral-sh-bot bot commented Nov 25, 2025 •

edited

Loading

Uh oh!

Uh oh!

AlexWaygood left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[ty] LSP Benchmarks #21625

[ty] LSP Benchmarks #21625

Conversation

MichaReiser commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test plan

Uh oh!

astral-sh-bot bot commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

ruff-ecosystem results

Linter (stable)

Linter (preview)

Formatter (stable)

Formatter (preview)

Uh oh!

Uh oh!

AlexWaygood left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

MichaReiser commented Nov 25, 2025 •

edited

Loading

astral-sh-bot bot commented Nov 25, 2025 •

edited

Loading

`ruff-ecosystem` results