Skip to content

Move chatty logs from INFO to DEBUG#346

Merged
shurkat-nvidia merged 3 commits into
NVIDIA:mainfrom
shurkat-nvidia:log-verbosity-reduction
Jun 5, 2026
Merged

Move chatty logs from INFO to DEBUG#346
shurkat-nvidia merged 3 commits into
NVIDIA:mainfrom
shurkat-nvidia:log-verbosity-reduction

Conversation

@shurkat-nvidia

Copy link
Copy Markdown
Contributor

Moves the following three logs from INFO to DEBUG levels

  • PersistentAsyncCaller: ..., Starting Async Caller
  • PersistentAsyncCaller: ..., Destroying Async Caller
  • Cleaning up worker data cache with ... entries

Keeps only rank 0 at INFO level for "PersistentAsyncCaller: ...", but all ranks to DEBUG for "Cleaning up worker data cache"

@greptile-apps

greptile-apps Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR reduces log noise by demoting three lifecycle messages in PersistentAsyncCaller from INFO to DEBUG, with "Starting Async Caller" and "Destroying Async Caller" retaining INFO only for rank 0.

  • _start_worker and close now emit INFO only on rank 0 and DEBUG on all other ranks, keeping a single representative lifecycle signal visible at INFO in multi-rank runs.
  • cleanup_worker_data_cache demotes its cache-cleanup message to DEBUG for all ranks unconditionally, consistent with it being a worker-internal detail rather than a user-facing event.

Confidence Score: 5/5

Safe to merge — the change is a targeted log-level adjustment with no functional impact on the checkpointing logic.

All three changes are isolated to logger call sites and do not touch any checkpoint scheduling, worker lifecycle, or data-handling code paths. The rank-conditional pattern applied in both _start_worker and close is consistent and matches the stated intent of the PR.

No files require special attention.

Important Files Changed

Filename Overview
src/nvidia_resiliency_ext/checkpointing/async_ckpt/core.py Log-level adjustments in three locations: Starting/Destroying Async Caller kept at INFO for rank 0 and demoted to DEBUG for other ranks; Cleaning up worker data cache demoted to DEBUG for all ranks.

Sequence Diagram

sequenceDiagram
    participant R0 as Rank 0
    participant RN as Rank N (N>0)
    participant LOG as Logger

    Note over R0,LOG: _start_worker
    R0->>LOG: INFO "PersistentAsyncCaller: 0, Starting Async Caller"
    RN->>LOG: DEBUG "PersistentAsyncCaller: N, Starting Async Caller"

    Note over R0,LOG: close
    R0->>LOG: INFO "PersistentAsyncCaller: 0, Destroying Async Caller"
    RN->>LOG: DEBUG "PersistentAsyncCaller: N, Destroying Async Caller"

    Note over R0,LOG: cleanup_worker_data_cache (classmethod)
    R0->>LOG: DEBUG "Cleaning up worker data cache with X entries"
    RN->>LOG: DEBUG "Cleaning up worker data cache with X entries"
Loading

Reviews (4): Last reviewed commit: "Make destroying messages symmetric with ..." | Re-trigger Greptile

@hexinw-nvidia hexinw-nvidia added the ci-approved Approved to run CI label Jun 4, 2026
ankurv-nvidia
ankurv-nvidia previously approved these changes Jun 4, 2026
Moves the following three logs from INFO to DEBUG levels

 - PersistentAsyncCaller: ..., Starting Async Caller
 - PersistentAsyncCaller: ..., Destroying Async Caller
 - Cleaning up worker data cache with ... entries

Keeps only rank 0 at INFO level for "PersistentAsyncCaller: ...", but
all ranks to DEBUG for "Cleaning up worker data cache"
@shurkat-nvidia shurkat-nvidia force-pushed the log-verbosity-reduction branch from 99e9fd0 to 1e94053 Compare June 4, 2026 21:45
Comment thread src/nvidia_resiliency_ext/checkpointing/async_ckpt/core.py Outdated
@ankurv-nvidia ankurv-nvidia self-requested a review June 4, 2026 22:16
@shurkat-nvidia shurkat-nvidia merged commit 75dc8a5 into NVIDIA:main Jun 5, 2026
6 checks passed
@shurkat-nvidia shurkat-nvidia deleted the log-verbosity-reduction branch June 5, 2026 16:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-approved Approved to run CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants