Feat/ddp mlflow #655

KeitaW · 2025-04-28T19:36:16Z

Issue #, if available:

Description of changes:

This PR makes ddp test case compatible with both CPU and GPU + add MLFlow support (local tracking by default with optional remote tracking server option).

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

KeitaW · 2025-04-28T21:07:19Z

3.test_cases/pytorch/ddp/ddp.py

+        self.epochs_run = 0
+        self.snapshot_path = snapshot_path
+        self.use_mlflow = use_mlflow
+        self.tracking_uri = tracking_uri if tracking_uri else f"file://{os.environ['HOME']}/mlruns"


The test case is already compatible with local/remote tracking server.

allela-roy · 2025-04-29T08:24:52Z

@KeitaW , to showcase MLflow, shouldn't we use the SageMaker Managed MLflow as described here https://docs.aws.amazon.com/sagemaker/latest/dg/mlflow-track-experiments.html

allela-roy · 2025-04-29T08:34:01Z

Working on the compatibility with Managed MLflow

KeitaW · 2025-04-29T15:01:59Z

@KeitaW , to showcase MLflow, shouldn't we use the SageMaker Managed MLflow as described here https://docs.aws.amazon.com/sagemaker/latest/dg/mlflow-track-experiments.html

Thanks for testing.
It should work with ManagedMLFlow too. How to set up etc. should be addressed in the architecture sub directory not in the test case (we may add link to the guidance).

allela-roy · 2025-04-30T07:42:28Z

Ok, then let's add this to pip install mlflow==2.13.2 sagemaker-mlflow==0.1.0 to the 0.create-venv.sh as well.

3.test_cases/pytorch/ddp/slurm/0.create-venv.sh

KeitaW added 2 commits April 28, 2025 19:35

ignore venv

645d6e4

make ddp test case compatible with CPU/GPU and add mlflow support

6e3d8e3

KeitaW requested review from allela-roy and nghtm April 28, 2025 19:36

make the test case compatible with Managed MLFlow

ca72ae1

KeitaW self-assigned this Apr 28, 2025

KeitaW commented Apr 28, 2025

View reviewed changes

update

8f0f291

KeitaW added the enhancement New feature or request label May 1, 2025

KeitaW commented May 3, 2025

View reviewed changes

3.test_cases/pytorch/ddp/slurm/0.create-venv.sh Outdated Show resolved Hide resolved

Update 3.test_cases/pytorch/ddp/slurm/0.create-venv.sh

f98d519

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feat/ddp mlflow #655

Feat/ddp mlflow #655

Uh oh!

KeitaW commented Apr 28, 2025 •

edited

Loading

Uh oh!

KeitaW Apr 28, 2025

Uh oh!

allela-roy commented Apr 29, 2025

Uh oh!

allela-roy commented Apr 29, 2025

Uh oh!

KeitaW commented Apr 29, 2025 •

edited

Loading

Uh oh!

allela-roy commented Apr 30, 2025

Uh oh!

Uh oh!

Uh oh!

Feat/ddp mlflow #655

Are you sure you want to change the base?

Feat/ddp mlflow #655

Uh oh!

Conversation

KeitaW commented Apr 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

KeitaW Apr 28, 2025

Choose a reason for hiding this comment

Uh oh!

allela-roy commented Apr 29, 2025

Uh oh!

allela-roy commented Apr 29, 2025

Uh oh!

KeitaW commented Apr 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

allela-roy commented Apr 30, 2025

Uh oh!

Uh oh!

Uh oh!

KeitaW commented Apr 28, 2025 •

edited

Loading

KeitaW commented Apr 29, 2025 •

edited

Loading