Skip to content

Commit 7c7cb95

Browse files
dipannita08copybara-github
authored andcommitted
BEGIN_PUBLIC Refactor GoodputMonitor to use multiprocessing instead of multithreading. END_PUBLIC
This change moves the background goodput, step deviation, and rolling window goodput upload tasks into separate processes (instead of separate threads). This improves isolation and prevents potential issues with shared resources and GIL limitations. Configuration is now passed to worker functions, and each worker initializes its own necessary clients. This change also improves debug logging, and the management of child processes and resources such that operations are resilient to failures. Further details in: https://b.corp.google.com/issues/439660448#comment29 Tested: - Unit tests - E2E AXLearn workload with simulated restart segments on v4 w/ test package https://test.pypi.org/project/ml-goodput-measurement/0.1.44/ - https://cloudlogging.app.goo.gl/6wvrmByBYWsbrBui7 - https://screenshot.googleplex.com/BMSgtJWFrnFaFZV - https://screenshot.googleplex.com/Afhrrvps3mkDGe7 - Customer validation (https://cloudlogging.app.goo.gl/7mPJJfhQi7d64fX7A) PiperOrigin-RevId: 805092698
1 parent d643a2d commit 7c7cb95

File tree

2 files changed

+950
-709
lines changed

2 files changed

+950
-709
lines changed

0 commit comments

Comments
 (0)