Benchmark tests #55

AlanCoding · 2025-02-09T02:51:56Z

Fixes #10

AlanCoding · 2025-02-09T03:02:56Z

Output from github checks as of this last push:

--------------------------------------------- benchmark 'by_system': 2 tests --------------------------------------------
Name (time in ms)                        Mean                 Min                 Max            StdDev            Rounds
-------------------------------------------------------------------------------------------------------------------------
test_clear_time_with_full_server     272.1684 (1.0)      268.4134 (1.0)      275.7764 (1.0)      2.7523 (1.0)           5
test_clear_time_with_only_pool       272.8960 (1.00)     271.1785 (1.01)     278.5536 (1.01)     3.1687 (1.15)          5
-------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------- benchmark 'by_task': 4 tests ---------------------------------------------------
Name (time in ms)                               Mean                   Min                   Max            StdDev            Rounds
------------------------------------------------------------------------------------------------------------------------------------
test_clear_sleep_by_task_number[1]           11.0525 (1.0)         10.8961 (1.0)         11.2495 (1.0)      0.0750 (1.0)          76
test_clear_sleep_by_task_number[10]          33.2726 (3.01)        32.9364 (3.02)        33.8469 (3.01)     0.21[30](https://github.com/ansible/dispatcher/actions/runs/13221747836/job/36907674966?pr=55#step:7:31) (2.84)         21
test_clear_sleep_by_task_number[100]        272.8379 (24.69)      271.3385 (24.90)      278.4673 (24.75)    3.1478 (41.97)         5
test_clear_sleep_by_task_number[1000]     2,688.8506 (243.28)   2,677.9305 (245.77)   2,702.0961 (240.20)   9.9851 (133.13)        5
------------------------------------------------------------------------------------------------------------------------------------

----------------------------------------------- benchmark 'by_worker_math': 6 tests -----------------------------------------------
Name (time in ms)                             Mean                   Min                   Max             StdDev            Rounds
-----------------------------------------------------------------------------------------------------------------------------------
test_clear_math_by_worker_count[24]       705.4190 (1.0)        700.6283 (1.0)        709.1435 (1.0)       3.6526 (1.25)          5
test_clear_math_by_worker_count[50]       709.9372 (1.01)       701.5107 (1.00)       728.2789 (1.03)     10.5836 (3.64)          5
test_clear_math_by_worker_count[75]       706.0603 (1.00)       701.7364 (1.00)       709.5937 (1.00)      2.9115 (1.0)           5
test_clear_math_by_worker_count[12]       710.7516 (1.01)       707.2474 (1.01)       717.0115 (1.01)      4.2373 (1.46)          5
test_clear_math_by_worker_count[4]        715.9432 (1.01)       711.6714 (1.02)       724.5065 (1.02)      5.6573 (1.94)          5
test_clear_math_by_worker_count[1]      1,669.4381 (2.37)     1,656.6375 (2.36)     1,682.2486 (2.37)     11.4911 (3.95)          5
-----------------------------------------------------------------------------------------------------------------------------------

----------------------------------------------- benchmark 'by_worker_sleep': 6 tests ----------------------------------------------
Name (time in ms)                              Mean                   Min                   Max            StdDev            Rounds
-----------------------------------------------------------------------------------------------------------------------------------
test_clear_sleep_by_worker_count[75]        36.2413 (1.0)         35.3554 (1.0)         37.1886 (1.0)      0.8152 (2.10)          5
test_clear_sleep_by_worker_count[50]        41.5776 (1.15)        40.9844 (1.16)        41.9939 (1.13)     0.3878 (1.0)           6
test_clear_sleep_by_worker_count[24]        59.4188 (1.64)        56.7509 (1.61)        60.3596 (1.62)     1.0737 (2.77)          9
test_clear_sleep_by_worker_count[12]       102.2730 (2.82)        98.4321 (2.78)       108.8095 (2.93)     2.9042 (7.49)          8
test_clear_sleep_by_worker_count[4]        272.9203 (7.53)       270.7186 (7.66)       278.1032 (7.48)     3.0140 (7.77)          5
test_clear_sleep_by_worker_count[1]      1,070.3013 (29.53)    1,064.9014 (30.12)    1,077.[31](https://github.com/ansible/dispatcher/actions/runs/13221747836/job/36907674966?pr=55#step:7:32)80 (28.97)    4.8643 (12.54)         5
-----------------------------------------------------------------------------------------------------------------------------------

Legend:
  Outliers: 1 Standard Deviation from Mean; 1.5 IQR (InterQuartile Range) from 1st Quartile and 3rd Quartile.
  OPS: Operations Per Second, computed as 1 / Mean

AlanCoding · 2025-02-09T15:04:19Z

I added test_clear_no_op_by_task_number, but now I'm remembering why I didn't do this before. It gets like 7k tasks per second, but this is just measuring the speed of the test. There's no legitimate way I see to do this faster. The tests are synchronous tests, and there's kind of this obvious way that it's going to be a bottleneck before the actual dispatcher service is. Presumably you could swarm postgres with a bunch of clients producing messages to get a more accurate count, but I also have no motivation to do this.

AlanCoding · 2025-02-15T18:48:03Z

Part of the reason I would like to get this in is that #9 will probably share a version of the subprocess fixture for an on-demand dispatcher service for a test. Then that would make a ton of sense for the dab_task app. But I hesitate to look into that as long as this stays open.

Alex-Izquierdo

LGTM but I would suggest to not give priority to benchmarks at this point.

tests/benchmark/conftest.py

AlanCoding · 2025-02-17T16:04:49Z

This is a test added:

------------------------------------------------ benchmark 'control': 7 tests -----------------------------------------------
Name (time in ms)                            Mean                 Min                 Max            StdDev            Rounds
-----------------------------------------------------------------------------------------------------------------------------
test_alive_benchmark                      14.6062 (1.0)       14.1533 (1.0)       15.1241 (1.0)      0.2238 (1.0)          33
test_alive_benchmark_while_busy[0]        21.1566 (1.45)      20.4147 (1.44)      22.2239 (1.47)     0.4106 (1.83)         42
test_alive_benchmark_while_busy[3]        22.6903 (1.55)      21.8802 (1.55)      23.8478 (1.58)     0.4461 (1.99)         40
test_alive_benchmark_while_busy[4]        25.8459 (1.77)      25.1231 (1.78)      28.4265 (1.88)     0.7736 (3.46)         37
test_alive_benchmark_while_busy[5]        26.5553 (1.82)      25.6479 (1.81)      28.55[76](https://github.com/ansible/dispatcher/actions/runs/13373702518/job/37347949687?pr=55#step:7:77) (1.89)     0.5955 (2.66)         36
test_alive_benchmark_while_busy[10]       38.0772 (2.61)      37.0583 (2.62)      44.3333 (2.93)     1.4217 (6.35)         25
test_alive_benchmark_while_busy[100]     284.7240 (19.49)    282.8920 (19.99)    286.38[78](https://github.com/ansible/dispatcher/actions/runs/13373702518/job/37347949687?pr=55#step:7:79) (18.94)    1.5827 (7.07)          5
-----------------------------------------------------------------------------------------------------------------------------

I had some flake, but realized this is because the code pre-dates other code, and had not yet adopted the ready_event which assures we have the dispatcher listening before we send messages. Blocking on that before starting the benchmark seems to work extremely well.

With the Github results, this gives a very satisfying increase by business, even with the re-connection issues polluting the data.

Adopt new error handling patterns done elsewhere Propoerly parameterize the worker number Move event trigger to drain_queue method Fix changed event meanings Add artifacting of benchmark data Add benchmark test for control task Add some control message benchmarks Combine with existing test methods module Update unit test Update to new config problems Avoid retyping no longer necessary Do some modernization combine test_pool files Update test to new pattern Use better start_working call

AlanCoding · 2025-03-07T18:48:45Z

Passing again (1m runtime), but locally I am still having problems with flake from

FAILED tests/benchmark/test_control.py::test_alive_benchmark_while_busy[3] - AssertionError: assert [] == [{'node_id': 'benchmark-server'}]
  
  Right contains one more item: {'node_id': 'benchmark-server'}
  
  Full diff:
  + []
  - [
  -     {
  -         'node_id': 'benchmark-server',
  -     },
  - ]

Also, changes in the meantime have changed the connection handling which changes the results for control messages greatly.

2025-03-07T18:38:28.5802787Z ----------------------------------------------- benchmark 'control': 7 tests ----------------------------------------------
2025-03-07T18:38:28.5803258Z Name (time in ms)                           Mean                Min                Max             StdDev            Rounds
2025-03-07T18:38:28.5803679Z ---------------------------------------------------------------------------------------------------------------------------
2025-03-07T18:38:28.5804164Z test_alive_benchmark_while_busy[0]        1.9886 (1.0)       1.0918 (1.0)       7.9138 (1.31)      1.3613 (1.0)         106
2025-03-07T18:38:28.5804647Z test_alive_benchmark                      2.5241 (1.27)      1.2705 (1.16)      6.0438 (1.0)       1.4140 (1.04)         32
2025-03-07T18:38:28.5805242Z test_alive_benchmark_while_busy[3]        3.2259 (1.62)      1.8900 (1.73)     15.8701 (2.63)      1.8881 (1.39)        190
2025-03-07T18:38:28.5805868Z test_alive_benchmark_while_busy[4]        3.6245 (1.82)      2.2871 (2.09)     12.9793 (2.15)      1.7260 (1.27)        117
2025-03-07T18:38:28.5806390Z test_alive_benchmark_while_busy[5]        4.1748 (2.10)      2.6307 (2.41)      9.6280 (1.59)      1.3930 (1.02)        112
2025-03-07T18:38:28.5806911Z test_alive_benchmark_while_busy[10]       6.3618 (3.20)      4.0869 (3.74)     14.8314 (2.45)      2.1895 (1.61)         74
2025-03-07T18:38:28.5807441Z test_alive_benchmark_while_busy[100]     38.2911 (19.26)    31.3676 (28.73)    60.7578 (10.05)    10.2007 (7.49)         17
2025-03-07T18:38:28.5807928Z ---------------------------------------------------------------------------------------------------------------------------

This is fairly consistently about 10x faster. It still shows the climb with business. There is also a major outliner in the max values, which could be some other unexpected connection opening or something.

AlanCoding marked this pull request as ready for review February 9, 2025 15:04

AlanCoding requested a review from kdelee February 9, 2025 15:06

AlanCoding mentioned this pull request Feb 10, 2025

Create pypi project and automate the build and publish #51

Open

AlanCoding requested a review from Alex-Izquierdo February 10, 2025 03:22

AlanCoding force-pushed the benchmark_stash branch from fb3101e to 3fc3292 Compare February 12, 2025 18:34

AlanCoding requested a review from pb82 February 14, 2025 21:07

AlanCoding force-pushed the benchmark_stash branch from 3fc3292 to f6b5c0c Compare February 14, 2025 21:08

Alex-Izquierdo reviewed Feb 17, 2025

View reviewed changes

tests/benchmark/conftest.py Show resolved Hide resolved

AlanCoding mentioned this pull request Feb 18, 2025

Test file re-organization #81

Closed

3 tasks

AlanCoding force-pushed the benchmark_stash branch from f5d30cd to a6eae7e Compare February 22, 2025 03:40

AlanCoding marked this pull request as draft February 24, 2025 04:18

AlanCoding force-pushed the benchmark_stash branch from 6a4cb43 to 23f4cf0 Compare March 7, 2025 15:16

AlanCoding force-pushed the benchmark_stash branch from 65f03d5 to e82029b Compare March 7, 2025 18:14

AlanCoding added 2 commits March 7, 2025 13:17

Update test

85f8034

Make it so it can finish now

69af2d8

AlanCoding added 2 commits March 7, 2025 13:52

Split for readability

e092299

Run more math

80fe28a

AlanCoding added the testing label Mar 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmark tests #55

Benchmark tests #55

AlanCoding commented Feb 9, 2025 •

edited

Loading

AlanCoding commented Feb 9, 2025

AlanCoding commented Feb 9, 2025

AlanCoding commented Feb 15, 2025

Alex-Izquierdo left a comment

AlanCoding commented Feb 17, 2025

AlanCoding commented Mar 7, 2025

Benchmark tests #55

Are you sure you want to change the base?

Benchmark tests #55

Conversation

AlanCoding commented Feb 9, 2025 • edited Loading

AlanCoding commented Feb 9, 2025

AlanCoding commented Feb 9, 2025

AlanCoding commented Feb 15, 2025

Alex-Izquierdo left a comment

Choose a reason for hiding this comment

AlanCoding commented Feb 17, 2025

AlanCoding commented Mar 7, 2025

AlanCoding commented Feb 9, 2025 •

edited

Loading