Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmark tests #55

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from
Draft

Conversation

AlanCoding
Copy link
Member

@AlanCoding AlanCoding commented Feb 9, 2025

Fixes #10

@AlanCoding
Copy link
Member Author

Output from github checks as of this last push:

--------------------------------------------- benchmark 'by_system': 2 tests --------------------------------------------
Name (time in ms)                        Mean                 Min                 Max            StdDev            Rounds
-------------------------------------------------------------------------------------------------------------------------
test_clear_time_with_full_server     272.1684 (1.0)      268.4134 (1.0)      275.7764 (1.0)      2.7523 (1.0)           5
test_clear_time_with_only_pool       272.8960 (1.00)     271.1785 (1.01)     278.5536 (1.01)     3.1687 (1.15)          5
-------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------- benchmark 'by_task': 4 tests ---------------------------------------------------
Name (time in ms)                               Mean                   Min                   Max            StdDev            Rounds
------------------------------------------------------------------------------------------------------------------------------------
test_clear_sleep_by_task_number[1]           11.0525 (1.0)         10.8961 (1.0)         11.2495 (1.0)      0.0750 (1.0)          76
test_clear_sleep_by_task_number[10]          33.2726 (3.01)        32.9364 (3.02)        33.8469 (3.01)     0.21[30](https://github.com/ansible/dispatcher/actions/runs/13221747836/job/36907674966?pr=55#step:7:31) (2.84)         21
test_clear_sleep_by_task_number[100]        272.8379 (24.69)      271.3385 (24.90)      278.4673 (24.75)    3.1478 (41.97)         5
test_clear_sleep_by_task_number[1000]     2,688.8506 (243.28)   2,677.9305 (245.77)   2,702.0961 (240.20)   9.9851 (133.13)        5
------------------------------------------------------------------------------------------------------------------------------------

----------------------------------------------- benchmark 'by_worker_math': 6 tests -----------------------------------------------
Name (time in ms)                             Mean                   Min                   Max             StdDev            Rounds
-----------------------------------------------------------------------------------------------------------------------------------
test_clear_math_by_worker_count[24]       705.4190 (1.0)        700.6283 (1.0)        709.1435 (1.0)       3.6526 (1.25)          5
test_clear_math_by_worker_count[50]       709.9372 (1.01)       701.5107 (1.00)       728.2789 (1.03)     10.5836 (3.64)          5
test_clear_math_by_worker_count[75]       706.0603 (1.00)       701.7364 (1.00)       709.5937 (1.00)      2.9115 (1.0)           5
test_clear_math_by_worker_count[12]       710.7516 (1.01)       707.2474 (1.01)       717.0115 (1.01)      4.2373 (1.46)          5
test_clear_math_by_worker_count[4]        715.9432 (1.01)       711.6714 (1.02)       724.5065 (1.02)      5.6573 (1.94)          5
test_clear_math_by_worker_count[1]      1,669.4381 (2.37)     1,656.6375 (2.36)     1,682.2486 (2.37)     11.4911 (3.95)          5
-----------------------------------------------------------------------------------------------------------------------------------

----------------------------------------------- benchmark 'by_worker_sleep': 6 tests ----------------------------------------------
Name (time in ms)                              Mean                   Min                   Max            StdDev            Rounds
-----------------------------------------------------------------------------------------------------------------------------------
test_clear_sleep_by_worker_count[75]        36.2413 (1.0)         35.3554 (1.0)         37.1886 (1.0)      0.8152 (2.10)          5
test_clear_sleep_by_worker_count[50]        41.5776 (1.15)        40.9844 (1.16)        41.9939 (1.13)     0.3878 (1.0)           6
test_clear_sleep_by_worker_count[24]        59.4188 (1.64)        56.7509 (1.61)        60.3596 (1.62)     1.0737 (2.77)          9
test_clear_sleep_by_worker_count[12]       102.2730 (2.82)        98.4321 (2.78)       108.8095 (2.93)     2.9042 (7.49)          8
test_clear_sleep_by_worker_count[4]        272.9203 (7.53)       270.7186 (7.66)       278.1032 (7.48)     3.0140 (7.77)          5
test_clear_sleep_by_worker_count[1]      1,070.3013 (29.53)    1,064.9014 (30.12)    1,077.[31](https://github.com/ansible/dispatcher/actions/runs/13221747836/job/36907674966?pr=55#step:7:32)80 (28.97)    4.8643 (12.54)         5
-----------------------------------------------------------------------------------------------------------------------------------

Legend:
  Outliers: 1 Standard Deviation from Mean; 1.5 IQR (InterQuartile Range) from 1st Quartile and 3rd Quartile.
  OPS: Operations Per Second, computed as 1 / Mean

@AlanCoding
Copy link
Member Author

I added test_clear_no_op_by_task_number, but now I'm remembering why I didn't do this before. It gets like 7k tasks per second, but this is just measuring the speed of the test. There's no legitimate way I see to do this faster. The tests are synchronous tests, and there's kind of this obvious way that it's going to be a bottleneck before the actual dispatcher service is. Presumably you could swarm postgres with a bunch of clients producing messages to get a more accurate count, but I also have no motivation to do this.

@AlanCoding
Copy link
Member Author

Part of the reason I would like to get this in is that #9 will probably share a version of the subprocess fixture for an on-demand dispatcher service for a test. Then that would make a ton of sense for the dab_task app. But I hesitate to look into that as long as this stays open.

Copy link
Collaborator

@Alex-Izquierdo Alex-Izquierdo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but I would suggest to not give priority to benchmarks at this point.

@AlanCoding
Copy link
Member Author

This is a test added:

------------------------------------------------ benchmark 'control': 7 tests -----------------------------------------------
Name (time in ms)                            Mean                 Min                 Max            StdDev            Rounds
-----------------------------------------------------------------------------------------------------------------------------
test_alive_benchmark                      14.6062 (1.0)       14.1533 (1.0)       15.1241 (1.0)      0.2238 (1.0)          33
test_alive_benchmark_while_busy[0]        21.1566 (1.45)      20.4147 (1.44)      22.2239 (1.47)     0.4106 (1.83)         42
test_alive_benchmark_while_busy[3]        22.6903 (1.55)      21.8802 (1.55)      23.8478 (1.58)     0.4461 (1.99)         40
test_alive_benchmark_while_busy[4]        25.8459 (1.77)      25.1231 (1.78)      28.4265 (1.88)     0.7736 (3.46)         37
test_alive_benchmark_while_busy[5]        26.5553 (1.82)      25.6479 (1.81)      28.55[76](https://github.com/ansible/dispatcher/actions/runs/13373702518/job/37347949687?pr=55#step:7:77) (1.89)     0.5955 (2.66)         36
test_alive_benchmark_while_busy[10]       38.0772 (2.61)      37.0583 (2.62)      44.3333 (2.93)     1.4217 (6.35)         25
test_alive_benchmark_while_busy[100]     284.7240 (19.49)    282.8920 (19.99)    286.38[78](https://github.com/ansible/dispatcher/actions/runs/13373702518/job/37347949687?pr=55#step:7:79) (18.94)    1.5827 (7.07)          5
-----------------------------------------------------------------------------------------------------------------------------

I had some flake, but realized this is because the code pre-dates other code, and had not yet adopted the ready_event which assures we have the dispatcher listening before we send messages. Blocking on that before starting the benchmark seems to work extremely well.

With the Github results, this gives a very satisfying increase by business, even with the re-connection issues polluting the data.

@AlanCoding AlanCoding mentioned this pull request Feb 18, 2025
3 tasks
@AlanCoding AlanCoding marked this pull request as draft February 24, 2025 04:18
Adopt new error handling patterns done elsewhere

Propoerly parameterize the worker number

Move event trigger to drain_queue method

Fix changed event meanings

Add artifacting of benchmark data

Add benchmark test for control task

Add some control message benchmarks

Combine with existing test methods module

Update unit test

Update to new config problems

Avoid retyping no longer necessary

Do some modernization

combine test_pool files

Update test to new pattern

Use better start_working call
@AlanCoding
Copy link
Member Author

Passing again (1m runtime), but locally I am still having problems with flake from

FAILED tests/benchmark/test_control.py::test_alive_benchmark_while_busy[3] - AssertionError: assert [] == [{'node_id': 'benchmark-server'}]
  
  Right contains one more item: {'node_id': 'benchmark-server'}
  
  Full diff:
  + []
  - [
  -     {
  -         'node_id': 'benchmark-server',
  -     },
  - ]

Also, changes in the meantime have changed the connection handling which changes the results for control messages greatly.

2025-03-07T18:38:28.5802787Z ----------------------------------------------- benchmark 'control': 7 tests ----------------------------------------------
2025-03-07T18:38:28.5803258Z Name (time in ms)                           Mean                Min                Max             StdDev            Rounds
2025-03-07T18:38:28.5803679Z ---------------------------------------------------------------------------------------------------------------------------
2025-03-07T18:38:28.5804164Z test_alive_benchmark_while_busy[0]        1.9886 (1.0)       1.0918 (1.0)       7.9138 (1.31)      1.3613 (1.0)         106
2025-03-07T18:38:28.5804647Z test_alive_benchmark                      2.5241 (1.27)      1.2705 (1.16)      6.0438 (1.0)       1.4140 (1.04)         32
2025-03-07T18:38:28.5805242Z test_alive_benchmark_while_busy[3]        3.2259 (1.62)      1.8900 (1.73)     15.8701 (2.63)      1.8881 (1.39)        190
2025-03-07T18:38:28.5805868Z test_alive_benchmark_while_busy[4]        3.6245 (1.82)      2.2871 (2.09)     12.9793 (2.15)      1.7260 (1.27)        117
2025-03-07T18:38:28.5806390Z test_alive_benchmark_while_busy[5]        4.1748 (2.10)      2.6307 (2.41)      9.6280 (1.59)      1.3930 (1.02)        112
2025-03-07T18:38:28.5806911Z test_alive_benchmark_while_busy[10]       6.3618 (3.20)      4.0869 (3.74)     14.8314 (2.45)      2.1895 (1.61)         74
2025-03-07T18:38:28.5807441Z test_alive_benchmark_while_busy[100]     38.2911 (19.26)    31.3676 (28.73)    60.7578 (10.05)    10.2007 (7.49)         17
2025-03-07T18:38:28.5807928Z ---------------------------------------------------------------------------------------------------------------------------

This is fairly consistently about 10x faster. It still shows the climb with business. There is also a major outliner in the max values, which could be some other unexpected connection opening or something.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add test for responsiveness during scaleup burst
2 participants