Skip to content

Add metrics_log_dir option to LoggingConfig#992

Merged
mcgibbon merged 6 commits intomainfrom
feature/dump_metrics
Mar 20, 2026
Merged

Add metrics_log_dir option to LoggingConfig#992
mcgibbon merged 6 commits intomainfrom
feature/dump_metrics

Conversation

@mcgibbon
Copy link
Contributor

@mcgibbon mcgibbon commented Mar 19, 2026

This PR adds the ability to log wandb metrics to a local directory. Currently a jsonl file of scalar metrics is written, but this may be updated to include other metric types in later PRs.

Changes:

  • Added DiskMetricLogger for logging scalar metrics to disk

  • Wired DiskMetricLogger into WandB, MockWandB, and LoggingConfig

  • Added metrics_log_dir option to LoggingConfig which, if given, will have wandb metrics logged to it.

  • Require step when logging to WandB to facilitate disk logging and avoid missing logs when step is not passed.

  • Tests added

Resolves # (delete if none)

mcgibbon and others added 3 commits March 19, 2026 16:35
Adds a JSONL-based metric logger that writes scalar metrics to a
`metrics.jsonl` file within a specified directory. The logger is
resume-safe: on construction it reads any existing file to determine a
high-water mark, and silently skips steps at or below that mark. This
will be wired into the WandB singleton in a follow-up commit.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Add a `metrics_log_dir` parameter to `WandB.configure()` that enables
writing scalar metrics to disk as JSONL, independent of whether wandb
is enabled. This allows collaborators without wandb access to still
capture training metrics. The same parameter is added to `MockWandB`
and to `LoggingConfig` so it can be set via YAML config.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
This ensures disk metric logging is never silently skipped due to a
missing step. The single call site that omitted step (benchmark/run.py)
now passes step=0.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@mcgibbon
Copy link
Contributor Author

Note the metrics logging does not require log_to_wandb=True.

@jpdunc23 jpdunc23 self-requested a review March 20, 2026 17:10
Copy link
Member

@jpdunc23 jpdunc23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any major issues that should block merge, so approved! Left a few suggestions that I think are worthwhile, but up to you.

Comment on lines +54 to +55
if self._high_water_mark is not None and step <= self._high_water_mark:
return
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WandB logging to a past step issues a warning. I think it would be good to do the same here for visibility.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, added a logging.warning with the step and high-water mark values.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agent review suggested the following to avoid leaking disk logger file handles:

# fme/core/testing/wandb.py, inside mock_wandb()
try:
    yield wandb.singleton
finally:
    if hasattr(wandb.singleton, '_disk_logger') and wandb.singleton._disk_logger is not None:
        wandb.singleton._disk_logger.close()
    wandb.singleton = original

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, added cleanup in the finally block of mock_wandb.

"""Return only JSON-serializable scalar entries from *data*."""
result: dict[str, Any] = {}
for key, value in data.items():
if isinstance(value, int | float | bool):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From agent review:

In _extract_scalars, isinstance(value, float) matches NaN and Inf, which bypass the json.dumps validation path. Python's json.dumps produces non-standard NaN / Infinity tokens (allowed by default via allow_nan=True), making the JSONL unreadable by strict JSON parsers. ML training metrics can legitimately be NaN (e.g., loss explosion).

Apparently this may lead to parsing errors for certain versions of jq and possibly for some pandas.read_json usages. I don't think we should change the behavior here, but maybe a brief note in the docs for DiskMetricLogger would be helpful to say that these values are possible or maybe just encourage use of the read_metrics util.

I guess NaN / Inf values are also not included in tests, might be worthwhile to add them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a test for NaN/Inf round-tripping through read_metrics, with a docstring noting the strict-parser caveat.

result[key] = value
else:
try:
json.dumps(value)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From agent review:

The fallback json.dumps(value) path accepts lists, dicts, None, etc. -- not just scalars. Consider renaming to _extract_serializable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, renamed to _extract_serializable.

Non-JSON-serializable values (e.g. images, tensors) are silently dropped.
"""

def __init__(self, directory: str):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From agent review:

Accepting str | os.PathLike would be a minor ergonomic improvement.

return str(tmp_path / "metrics")


class TestDiskMetricLogger:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Remove the class def here.

I see agents using these test class defs a lot, which I think it can be useful to group related tests (as in test_wandb.py in this PR) or to simplify test naming since it allows for reuse of the same test_ method name in distinct classes. But in this case I think it just adds unnecessary nesting.

mcgibbon and others added 2 commits March 20, 2026 20:02
- Add warning when logging to a step at or below the high-water mark
- Rename _extract_scalars to _extract_serializable (accepts non-scalars)
- Accept str | os.PathLike for directory parameters
- Close disk logger file handle in mock_wandb finally block
- Flatten test class to module-level functions
- Add test for NaN/Inf value round-tripping

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@mcgibbon mcgibbon enabled auto-merge (squash) March 20, 2026 20:05
@mcgibbon mcgibbon merged commit 4819add into main Mar 20, 2026
7 checks passed
@mcgibbon mcgibbon deleted the feature/dump_metrics branch March 20, 2026 20:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants