Add metrics_log_dir option to LoggingConfig#992
Conversation
Adds a JSONL-based metric logger that writes scalar metrics to a `metrics.jsonl` file within a specified directory. The logger is resume-safe: on construction it reads any existing file to determine a high-water mark, and silently skips steps at or below that mark. This will be wired into the WandB singleton in a follow-up commit. Co-Authored-By: Claude Opus 4.6 <[email protected]>
Add a `metrics_log_dir` parameter to `WandB.configure()` that enables writing scalar metrics to disk as JSONL, independent of whether wandb is enabled. This allows collaborators without wandb access to still capture training metrics. The same parameter is added to `MockWandB` and to `LoggingConfig` so it can be set via YAML config. Co-Authored-By: Claude Opus 4.6 <[email protected]>
This ensures disk metric logging is never silently skipped due to a missing step. The single call site that omitted step (benchmark/run.py) now passes step=0. Co-Authored-By: Claude Opus 4.6 <[email protected]>
|
Note the metrics logging does not require log_to_wandb=True. |
jpdunc23
left a comment
There was a problem hiding this comment.
I don't see any major issues that should block merge, so approved! Left a few suggestions that I think are worthwhile, but up to you.
| if self._high_water_mark is not None and step <= self._high_water_mark: | ||
| return |
There was a problem hiding this comment.
WandB logging to a past step issues a warning. I think it would be good to do the same here for visibility.
There was a problem hiding this comment.
Done, added a logging.warning with the step and high-water mark values.
There was a problem hiding this comment.
Agent review suggested the following to avoid leaking disk logger file handles:
# fme/core/testing/wandb.py, inside mock_wandb()
try:
yield wandb.singleton
finally:
if hasattr(wandb.singleton, '_disk_logger') and wandb.singleton._disk_logger is not None:
wandb.singleton._disk_logger.close()
wandb.singleton = originalThere was a problem hiding this comment.
Done, added cleanup in the finally block of mock_wandb.
| """Return only JSON-serializable scalar entries from *data*.""" | ||
| result: dict[str, Any] = {} | ||
| for key, value in data.items(): | ||
| if isinstance(value, int | float | bool): |
There was a problem hiding this comment.
From agent review:
In
_extract_scalars,isinstance(value, float)matches NaN and Inf, which bypass thejson.dumpsvalidation path. Python'sjson.dumpsproduces non-standardNaN/Infinitytokens (allowed by default viaallow_nan=True), making the JSONL unreadable by strict JSON parsers. ML training metrics can legitimately be NaN (e.g., loss explosion).
Apparently this may lead to parsing errors for certain versions of jq and possibly for some pandas.read_json usages. I don't think we should change the behavior here, but maybe a brief note in the docs for DiskMetricLogger would be helpful to say that these values are possible or maybe just encourage use of the read_metrics util.
I guess NaN / Inf values are also not included in tests, might be worthwhile to add them.
There was a problem hiding this comment.
Added a test for NaN/Inf round-tripping through read_metrics, with a docstring noting the strict-parser caveat.
| result[key] = value | ||
| else: | ||
| try: | ||
| json.dumps(value) |
There was a problem hiding this comment.
From agent review:
The fallback json.dumps(value) path accepts lists, dicts, None, etc. -- not just scalars. Consider renaming to
_extract_serializable.
There was a problem hiding this comment.
Done, renamed to _extract_serializable.
fme/core/disk_metric_logger.py
Outdated
| Non-JSON-serializable values (e.g. images, tensors) are silently dropped. | ||
| """ | ||
|
|
||
| def __init__(self, directory: str): |
There was a problem hiding this comment.
From agent review:
Accepting
str | os.PathLikewould be a minor ergonomic improvement.
fme/core/test_disk_metric_logger.py
Outdated
| return str(tmp_path / "metrics") | ||
|
|
||
|
|
||
| class TestDiskMetricLogger: |
There was a problem hiding this comment.
Nit: Remove the class def here.
I see agents using these test class defs a lot, which I think it can be useful to group related tests (as in test_wandb.py in this PR) or to simplify test naming since it allows for reuse of the same test_ method name in distinct classes. But in this case I think it just adds unnecessary nesting.
- Add warning when logging to a step at or below the high-water mark - Rename _extract_scalars to _extract_serializable (accepts non-scalars) - Accept str | os.PathLike for directory parameters - Close disk logger file handle in mock_wandb finally block - Flatten test class to module-level functions - Add test for NaN/Inf value round-tripping Co-Authored-By: Claude Opus 4.6 <[email protected]>
This PR adds the ability to log wandb metrics to a local directory. Currently a jsonl file of scalar metrics is written, but this may be updated to include other metric types in later PRs.
Changes:
Added DiskMetricLogger for logging scalar metrics to disk
Wired DiskMetricLogger into WandB, MockWandB, and LoggingConfig
Added
metrics_log_diroption to LoggingConfig which, if given, will have wandb metrics logged to it.Require
stepwhen logging to WandB to facilitate disk logging and avoid missing logs when step is not passed.Tests added
Resolves # (delete if none)