Skip to content

Improve MLflow artifact upload diagnostics and error handling#431

Merged
gphuang merged 1 commit intofeat/6-mlflow-loggingfrom
copilot/sub-pr-378
Dec 17, 2025
Merged

Improve MLflow artifact upload diagnostics and error handling#431
gphuang merged 1 commit intofeat/6-mlflow-loggingfrom
copilot/sub-pr-378

Conversation

Copy link

Copilot AI commented Dec 17, 2025

Addresses diagnostic and error handling gaps in the MLflow artifact upload implementation based on review feedback.

Changes

  • Consistent failure tracking: Reset consecutive_failures counter on successful trace file uploads (matching log file behavior)
  • Better diagnostics in upload_mlflow_artifacts: Separate MLflow-not-requested from wrong-rank conditions to provide clearer error messages when artifact upload fails silently
  • Additional warning: Detect when MLflow is requested but writer isn't initialized to avoid silent failures

Example

# Before: Silent failure if MLflow writer is None for any reason
if mlflow_writer is None:
    return None

# After: Distinguish between expected and unexpected None cases
if mlflow_run_name is None:
    return None  # MLflow not requested - expected
    
if args.rank != expected_rank:
    debug_rank_all("WARNING: called from wrong rank...")  # Wrong rank - expected
    return None
    
if mlflow_writer is None:
    debug_rank_0("WARNING: writer not initialized...")  # Unexpected - helps debugging

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot AI mentioned this pull request Dec 17, 2025
Copilot AI changed the title [WIP] Add automatic upload of PyTorch logs to MLflow Improve MLflow artifact upload diagnostics and error handling Dec 17, 2025
Copilot AI requested a review from gphuang December 17, 2025 15:02
@gphuang gphuang marked this pull request as ready for review December 17, 2025 15:32
Copilot AI review requested due to automatic review settings December 17, 2025 15:32
@gphuang gphuang merged commit ea6d0a9 into feat/6-mlflow-logging Dec 17, 2025
1 check passed
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot wasn't able to review any files in this pull request.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@gphuang gphuang deleted the copilot/sub-pr-378 branch December 17, 2025 15:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants