Skip to content

Releases: AI-Hypercomputer/ml-goodput-measurement

Release: ml-goodput-measurement v0.0.15

11 Sep 22:15
Compare
Choose a tag to compare

Changes:

  • Refactor the GoodputMonitor to use multiprocessing instead of multithreading.
  • Add more context with structured logging for metric uploads to GCM Monitoring.

Full Changelog: v0.0.14...v0.0.15

Release ml-goodput-measurement v0.0.13

25 Jul 16:53
Compare
Choose a tag to compare

PyPi Package can be installed from here

Changelog:

[0.0.13] - 2025-07-25

  • Fix occasional gaps in cumulative Goodput Monitor dashboard.

[0.0.12] - 2025-06-25

  • Support monitoring disruption badput due to infrastructure recovery.
  • Support monitoring ideal step time.
  • Code clean up and bug fixes.

[0.0.11] - 2025-06-06

  • Support for monitoring performance degradations.
  • Support for monitoring rolling window Goodput.
  • Force upload of final metrics and safe exit.

[0.0.10] - 2025-04-28

  • Support for custom badput events which are synchronous and training-overlapped.
  • Handling of edge case caching scenario.

[0.0.9] - SKIPPED

  • Used for external testing. Please upgrade to 0.0.10.

[0.0.8] - 2025-04-03

  • Fix computation of ideal step time when step_times is empty.

[0.0.7] - 2025-03-24

  • Cache updates to Other/Unknown Badput.
  • Exclude monitoring asynchronous Badput types in GCM.
  • Total and last step updates with hidden events.
  • Interval Query Monitoring in GCM.

[0.0.6] - 2025-03-17

  • Updates to data loading Badput buckets (Separated into Async & Sync).
  • Short term fix to Pathways SuspendResume anomalous step time detection.
  • Updates to account for Pathways Elastic Training.
  • Automatic asynchronous upload of goodput, badput and step time deviation metrics to GCM.

Release ml-goodput-measurement v0.0.5

04 Feb 02:18
Compare
Choose a tag to compare

[0.0.5] - 2025-02-03

  • Goodput Cache and library improvements.
  • Query and Monitor API support for checkpoint save and restore.
  • Interval Query API support.
  • Query and Monitor API support for step time deviation.

PyPi Package can be installed from here

Initial release of ML Goodput Measurement Library

01 Mar 01:06
Compare
Choose a tag to compare
  • Bug Fixes
    • Fixes a typing mismatch in total step time calculation.
  • Code cleanup

Full Changelog: v0.0.1...v0.0.2

Initial release of ML Goodput Measurement PyPi package

27 Feb 00:50
Compare
Choose a tag to compare
  • Feature: Contains the Goodput module which allows logging and retrieval of training job's overall productive Goodput