Skip to content

fix: populate LogContext from live JobSet pods for clustered tasks#7464

Merged
AdilFayyaz merged 5 commits into
mainfrom
adil/jobsets-log-context
Jun 8, 2026
Merged

fix: populate LogContext from live JobSet pods for clustered tasks#7464
AdilFayyaz merged 5 commits into
mainfrom
adil/jobsets-log-context

Conversation

@AdilFayyaz

Copy link
Copy Markdown
Contributor

Why are the changes needed?

The clustered (JobSet) plugin only emitted templated log URLs synthesized from predicted pod names, which requires a pod-log template to be configured and breaks when real pod names carry the Job-assigned random suffix. The console needs a structured LogContext to fetch logs natively regardless of log-template config.

What changes were proposed in this pull request?

  • Add getLogContext, which lists a JobSet's live child pods (via the jobset.sigs.k8s.io/jobset-name label) and builds a core.LogContext from the real pods using flytek8s.BuildPodLogContext.
  • Identify the primary (rank-0) pod by name prefix and exclude Pending pods that have no logs/container IDs yet; best-effort — returns nil on list error or when no pods are ready, falling back to the existing templated Logs path.
  • Wire LogContext into TaskInfo in GetTaskPhase.
  • Refactor test helper dummyPluginCtx to accept a client.Reader and add emptyK8sReader for tests that don't inspect pods.

How was this patch tested?

  • go test ./flyteplugins/go/tasks/plugins/k8s/clustered/...
  • New TestGetTaskPhase_LogContext verifies a running JobSet with live worker pods produces a LogContext where the rank-0 pod is marked primary (matched despite its random suffix) and Pending pods are excluded.
  • Existing phase tests updated to pass an explicit K8sReader and continue to pass.

Setup process

Screenshots

Check all the applicable boxes

  • I updated the documentation accordingly.
  • All new and existing tests passed.
  • All commits are signed-off.

Related PRs

Stack

If you do use git town to manage PR Stacks, the stack relevant to this PR
will show below. Otherwise, you can ignore this section.

Docs link

Signed-off-by: M. Adil Fayyaz <62440954+AdilFayyaz@users.noreply.github.com>
@AdilFayyaz AdilFayyaz self-assigned this Jun 1, 2026
Copilot AI review requested due to automatic review settings June 1, 2026 22:08
@github-actions github-actions Bot added the flyte2 label Jun 1, 2026
@AdilFayyaz AdilFayyaz requested a review from pingsutw June 1, 2026 22:13

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves log retrieval for clustered (JobSet) tasks by populating TaskInfo.LogContext from live JobSet child pods (rather than only emitting templated/predicted pod log URLs). This enables the console/dataproxy to fetch logs natively even when pod name templates are missing or when real pod names include random suffixes.

Changes:

  • Populate pluginsCore.TaskInfo.LogContext in GetTaskPhase using a new getLogContext helper.
  • Implement getLogContext to list JobSet child pods via the jobset.sigs.k8s.io/jobset-name label and build per-pod contexts.
  • Refactor clustered plugin tests to pass an explicit K8sReader and add a new test validating LogContext behavior.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
flyteplugins/go/tasks/plugins/k8s/clustered/phase.go Wires LogContext into TaskInfo returned by GetTaskPhase.
flyteplugins/go/tasks/plugins/k8s/clustered/logs.go Adds getLogContext to build structured log metadata from live JobSet pods.
flyteplugins/go/tasks/plugins/k8s/clustered/clustered_test.go Refactors plugin context helpers for K8sReader and adds/updates tests for LogContext.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread flyteplugins/go/tasks/plugins/k8s/clustered/logs.go
Comment thread flyteplugins/go/tasks/plugins/k8s/clustered/clustered_test.go
Comment thread flyteplugins/go/tasks/plugins/k8s/clustered/clustered_test.go Outdated
Signed-off-by: M. Adil Fayyaz <62440954+AdilFayyaz@users.noreply.github.com>
Comment thread flyteplugins/go/tasks/plugins/k8s/clustered/logs.go Outdated
Signed-off-by: M. Adil Fayyaz <62440954+AdilFayyaz@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 2, 2026 19:37

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

Comment thread flyteplugins/go/tasks/plugins/k8s/clustered/logs.go
Comment thread flyteplugins/go/tasks/plugins/k8s/clustered/logs.go Outdated
Comment thread flyteplugins/go/tasks/plugins/k8s/clustered/logs.go
Comment thread flyteplugins/go/tasks/plugins/k8s/clustered/logs.go Outdated
Signed-off-by: M. Adil Fayyaz <62440954+AdilFayyaz@users.noreply.github.com>
@AdilFayyaz AdilFayyaz requested a review from pingsutw June 2, 2026 20:57
Signed-off-by: M. Adil Fayyaz <62440954+AdilFayyaz@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 4, 2026 21:34

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

Comment thread flyteplugins/go/tasks/plugins/k8s/clustered/logs.go
Comment thread flyteplugins/go/tasks/plugins/k8s/clustered/clustered_test.go
Comment thread flyteplugins/go/tasks/plugins/k8s/clustered/logs.go
@AdilFayyaz AdilFayyaz merged commit a3526bd into main Jun 8, 2026
22 checks passed
@AdilFayyaz AdilFayyaz deleted the adil/jobsets-log-context branch June 8, 2026 17:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants