Skip to content

Add flyte.replay() for re-executing runs with original or updated code#906

Open
kumare3 wants to merge 1 commit intomainfrom
flyte-replay
Open

Add flyte.replay() for re-executing runs with original or updated code#906
kumare3 wants to merge 1 commit intomainfrom
flyte-replay

Conversation

@kumare3
Copy link
Copy Markdown
Contributor

@kumare3 kumare3 commented Apr 5, 2026

Replay lets users re-run a previous execution using the same inputs and RunSpec, optionally swapping in a new TaskTemplate with updated code. This is useful for:

  • Debugging: reproduce a failed run locally with the exact same inputs
  • Iterating: re-run with modified task code against production inputs
  • Testing: verify a fix against the inputs that caused a failure

Usage examples:

# Replay remotely with original task template
flyte.replay("my-run-name")

# Replay a specific action within a run
flyte.replay("my-run-name", action_name="a1")

# Replay with updated code
flyte.replay("my-run-name", task_template=my_updated_task)

# Replay locally (requires a task_template with executable code)
flyte.with_replaycontext(mode="local").replay(
    "my-run-name", task_template=my_task
)

# Replay remotely with explicit mode
flyte.with_replaycontext(mode="remote").replay("my-run-name")

Implementation notes:

  • Follows the same _Runner/with_runcontext pattern: _Replayer holds config, with_replaycontext() returns a _Replayer, top-level replay() uses defaults
  • Local replay always requires a task_template (validated early, before network calls) since there is no Python function to execute without one
  • Extracts shared local execution logic into run_task_locally() in _run.py, eliminating duplication between _Runner._run_local and _Replayer._replay_local

Replay lets users re-run a previous execution using the same inputs and
RunSpec, optionally swapping in a new TaskTemplate with updated code.
This is useful for:

- Debugging: reproduce a failed run locally with the exact same inputs
- Iterating: re-run with modified task code against production inputs
- Testing: verify a fix against the inputs that caused a failure

Usage examples:

    # Replay remotely with original task template
    flyte.replay("my-run-name")

    # Replay a specific action within a run
    flyte.replay("my-run-name", action_name="a1")

    # Replay with updated code
    flyte.replay("my-run-name", task_template=my_updated_task)

    # Replay locally (requires a task_template with executable code)
    flyte.with_replaycontext(mode="local").replay(
        "my-run-name", task_template=my_task
    )

    # Replay remotely with explicit mode
    flyte.with_replaycontext(mode="remote").replay("my-run-name")

Implementation notes:
- Follows the same _Runner/with_runcontext pattern: _Replayer holds config,
  with_replaycontext() returns a _Replayer, top-level replay() uses defaults
- Local replay always requires a task_template (validated early, before
  network calls) since there is no Python function to execute without one
- Extracts shared local execution logic into run_task_locally() in _run.py,
  eliminating duplication between _Runner._run_local and _Replayer._replay_local

Signed-off-by: Ketan Umare <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant