Skip to content

Feat/ddp mlflow #655

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

Feat/ddp mlflow #655

wants to merge 5 commits into from

Conversation

KeitaW
Copy link
Contributor

@KeitaW KeitaW commented Apr 28, 2025

Issue #, if available:

Description of changes:

This PR makes ddp test case compatible with both CPU and GPU + add MLFlow support (local tracking by default with optional remote tracking server option).

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@KeitaW KeitaW requested review from allela-roy and nghtm April 28, 2025 19:36
@KeitaW KeitaW self-assigned this Apr 28, 2025
self.epochs_run = 0
self.snapshot_path = snapshot_path
self.use_mlflow = use_mlflow
self.tracking_uri = tracking_uri if tracking_uri else f"file://{os.environ['HOME']}/mlruns"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test case is already compatible with local/remote tracking server.

@allela-roy
Copy link
Contributor

@KeitaW , to showcase MLflow, shouldn't we use the SageMaker Managed MLflow as described here https://docs.aws.amazon.com/sagemaker/latest/dg/mlflow-track-experiments.html

@allela-roy
Copy link
Contributor

Working on the compatibility with Managed MLflow

@KeitaW
Copy link
Contributor Author

KeitaW commented Apr 29, 2025

@KeitaW , to showcase MLflow, shouldn't we use the SageMaker Managed MLflow as described here https://docs.aws.amazon.com/sagemaker/latest/dg/mlflow-track-experiments.html

Thanks for testing.
It should work with ManagedMLFlow too. How to set up etc. should be addressed in the architecture sub directory not in the test case (we may add link to the guidance).

@allela-roy
Copy link
Contributor

Ok, then let's add this to pip install mlflow==2.13.2 sagemaker-mlflow==0.1.0 to the 0.create-venv.sh as well.

@KeitaW KeitaW added the enhancement New feature or request label May 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants