Add metrics_log_dir option to LoggingConfig by mcgibbon · Pull Request #992 · ai2cm/ace

mcgibbon · 2026-03-19T19:37:43Z

This PR adds the ability to log wandb metrics to a local directory. Currently a jsonl file of scalar metrics is written, but this may be updated to include other metric types in later PRs.

Changes:

Added DiskMetricLogger for logging scalar metrics to disk
Wired DiskMetricLogger into WandB, MockWandB, and LoggingConfig
Added metrics_log_dir option to LoggingConfig which, if given, will have wandb metrics logged to it.
Require step when logging to WandB to facilitate disk logging and avoid missing logs when step is not passed.
Tests added

Resolves # (delete if none)

Adds a JSONL-based metric logger that writes scalar metrics to a `metrics.jsonl` file within a specified directory. The logger is resume-safe: on construction it reads any existing file to determine a high-water mark, and silently skips steps at or below that mark. This will be wired into the WandB singleton in a follow-up commit. Co-Authored-By: Claude Opus 4.6 <[email protected]>

Add a `metrics_log_dir` parameter to `WandB.configure()` that enables writing scalar metrics to disk as JSONL, independent of whether wandb is enabled. This allows collaborators without wandb access to still capture training metrics. The same parameter is added to `MockWandB` and to `LoggingConfig` so it can be set via YAML config. Co-Authored-By: Claude Opus 4.6 <[email protected]>

This ensures disk metric logging is never silently skipped due to a missing step. The single call site that omitted step (benchmark/run.py) now passes step=0. Co-Authored-By: Claude Opus 4.6 <[email protected]>

mcgibbon · 2026-03-19T20:07:46Z

Note the metrics logging does not require log_to_wandb=True.

jpdunc23

I don't see any major issues that should block merge, so approved! Left a few suggestions that I think are worthwhile, but up to you.

jpdunc23 · 2026-03-20T18:16:09Z

fme/core/disk_metric_logger.py

+        if self._high_water_mark is not None and step <= self._high_water_mark:
+            return


WandB logging to a past step issues a warning. I think it would be good to do the same here for visibility.

Done, added a logging.warning with the step and high-water mark values.

jpdunc23 · 2026-03-20T18:21:53Z

fme/core/testing/wandb.py

Agent review suggested the following to avoid leaking disk logger file handles:

# fme/core/testing/wandb.py, inside mock_wandb() try: yield wandb.singleton finally: if hasattr(wandb.singleton, '_disk_logger') and wandb.singleton._disk_logger is not None: wandb.singleton._disk_logger.close() wandb.singleton = original

Done, added cleanup in the finally block of mock_wandb.

jpdunc23 · 2026-03-20T18:34:06Z

fme/core/disk_metric_logger.py

+    """Return only JSON-serializable scalar entries from *data*."""
+    result: dict[str, Any] = {}
+    for key, value in data.items():
+        if isinstance(value, int | float | bool):


From agent review:

In _extract_scalars, isinstance(value, float) matches NaN and Inf, which bypass the json.dumps validation path. Python's json.dumps produces non-standard NaN / Infinity tokens (allowed by default via allow_nan=True), making the JSONL unreadable by strict JSON parsers. ML training metrics can legitimately be NaN (e.g., loss explosion).

Apparently this may lead to parsing errors for certain versions of jq and possibly for some pandas.read_json usages. I don't think we should change the behavior here, but maybe a brief note in the docs for DiskMetricLogger would be helpful to say that these values are possible or maybe just encourage use of the read_metrics util.

I guess NaN / Inf values are also not included in tests, might be worthwhile to add them.

Added a test for NaN/Inf round-tripping through read_metrics, with a docstring noting the strict-parser caveat.

jpdunc23 · 2026-03-20T18:36:28Z

fme/core/disk_metric_logger.py

+            result[key] = value
+        else:
+            try:
+                json.dumps(value)


From agent review:

The fallback json.dumps(value) path accepts lists, dicts, None, etc. -- not just scalars. Consider renaming to _extract_serializable.

Done, renamed to _extract_serializable.

jpdunc23 · 2026-03-20T18:38:13Z

fme/core/disk_metric_logger.py

+    Non-JSON-serializable values (e.g. images, tensors) are silently dropped.
+    """
+
+    def __init__(self, directory: str):


From agent review:

Accepting str | os.PathLike would be a minor ergonomic improvement.

jpdunc23 · 2026-03-20T18:58:48Z

fme/core/test_disk_metric_logger.py

+    return str(tmp_path / "metrics")
+
+
+class TestDiskMetricLogger:


Nit: Remove the class def here.

I see agents using these test class defs a lot, which I think it can be useful to group related tests (as in test_wandb.py in this PR) or to simplify test naming since it allows for reuse of the same test_ method name in distinct classes. But in this case I think it just adds unnecessary nesting.

- Add warning when logging to a step at or below the high-water mark - Rename _extract_scalars to _extract_serializable (accepts non-scalars) - Accept str | os.PathLike for directory parameters - Close disk logger file handle in mock_wandb finally block - Flatten test class to module-level functions - Add test for NaN/Inf value round-tripping Co-Authored-By: Claude Opus 4.6 <[email protected]>

mcgibbon and others added 3 commits March 19, 2026 16:35

Make step a required argument in WandB.log()

0612103

This ensures disk metric logging is never silently skipped due to a missing step. The single call site that omitted step (benchmark/run.py) now passes step=0. Co-Authored-By: Claude Opus 4.6 <[email protected]>

Merge branch 'main' into feature/dump_metrics

15c7827

jpdunc23 self-requested a review March 20, 2026 17:10

jpdunc23 approved these changes Mar 20, 2026

View reviewed changes

mcgibbon and others added 2 commits March 20, 2026 20:02

Merge branch 'main' into feature/dump_metrics

9890c8f

mcgibbon enabled auto-merge (squash) March 20, 2026 20:05

mcgibbon merged commit 4819add into main Mar 20, 2026
7 checks passed

mcgibbon deleted the feature/dump_metrics branch March 20, 2026 20:21

oliverwm1 mentioned this pull request Mar 23, 2026

Dump wandb metrics to disk #976

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add metrics_log_dir option to LoggingConfig#992

Add metrics_log_dir option to LoggingConfig#992
mcgibbon merged 6 commits intomainfrom
feature/dump_metrics

mcgibbon commented Mar 19, 2026 •

edited

Loading

Uh oh!

mcgibbon commented Mar 19, 2026

Uh oh!

jpdunc23 left a comment

Uh oh!

jpdunc23 Mar 20, 2026

Uh oh!

mcgibbon Mar 20, 2026

Uh oh!

jpdunc23 Mar 20, 2026

Uh oh!

mcgibbon Mar 20, 2026

Uh oh!

jpdunc23 Mar 20, 2026

Uh oh!

mcgibbon Mar 20, 2026

Uh oh!

jpdunc23 Mar 20, 2026

Uh oh!

mcgibbon Mar 20, 2026

Uh oh!

jpdunc23 Mar 20, 2026

Uh oh!

jpdunc23 Mar 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		if self._high_water_mark is not None and step <= self._high_water_mark:
		return

Conversation

mcgibbon commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mcgibbon commented Mar 19, 2026

Uh oh!

jpdunc23 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mcgibbon commented Mar 19, 2026 •

edited

Loading