Skip to content

fix(v2): preserve terminal phase on TaskAction delete events#7293

Merged
pingsutw merged 1 commit intov2from
fix/v2-aborted-after-terminal
Apr 27, 2026
Merged

fix(v2): preserve terminal phase on TaskAction delete events#7293
pingsutw merged 1 commit intov2from
fix/v2-aborted-after-terminal

Conversation

@pingsutw
Copy link
Copy Markdown
Member

@pingsutw pingsutw commented Apr 27, 2026

Why are the changes needed?

Successful (and failed) runs were flipping to Aborted in the UI some time after completion, even without restarting devbox — most reproducibly after the laptop slept and woke.

  • DeleteFunc fires for both real deletes and DeletedFinalStateUnknown tombstones (delivered when the informer missed events during a watch gap, e.g. sleep/resume).
  • buildActionUpdate then unconditionally overwrote phase with ABORTED for any delete event, even for TaskActions that had already reached Succeeded/Failed.
  • notifyRunService pushed that ABORTED phase to the run service, overwriting the recorded terminal status in the DB → UI shows Aborted.

The pre-existing labelTerminalStatusRecorded guard (shouldSkipTaskAction) was only consulted by AddFunc/UpdateFunc, never by DeleteFunc.

What changes were proposed in this pull request?

Two guards in actions/k8s/client.go:

  1. DeleteFunc now calls shouldSkipTaskAction after unwrapping the tombstone, so deletes for actions already labeled terminal-status-recorded are dropped before dispatch.
  2. buildActionUpdate only forces ABORTED when the current phase is not terminal. This preserves Succeeded/Failed/TimedOut for legitimate post-terminal CR cleanup as well.

How was this patch tested?

go test ./actions/k8s/... -count=1 — passes.

New unit tests in actions/k8s/client_test.go:

  • TestBuildActionUpdate_DeleteAfterTerminalPreservesPhase — Succeeded and Failed cases (the bug scenario).
  • TestBuildActionUpdate_DeleteOnNonTerminalForcesAborted — confirms in-flight deletes still produce ABORTED.

Labels

  • fixed

Check all the applicable boxes

  • I updated the documentation accordingly.
  • All new and existing tests passed.
  • All commits are signed-off.

Related PRs

When the controller-runtime informer (added in #7272) delivers a delete
event — including DeletedFinalStateUnknown tombstones after a watch gap
(e.g. laptop sleep/resume) — buildActionUpdate unconditionally forced
the phase to ABORTED. notifyRunService then overwrote the previously
recorded Succeeded/Failed status in the DB, so completed runs flipped
to Aborted in the UI.

Two guards:

- DeleteFunc now calls shouldSkipTaskAction so tombstones for actions
  already labeled terminal-status-recorded are dropped before dispatch.
- buildActionUpdate only forces ABORTED when the current phase is not
  terminal, preserving Succeeded/Failed/TimedOut on legitimate deletes
  too (e.g. CR GC after the run finished).

Signed-off-by: Kevin Su <pingsutw@apache.org>
@pingsutw pingsutw self-assigned this Apr 27, 2026
@pingsutw pingsutw added this to the V2 GA milestone Apr 27, 2026
@pingsutw pingsutw merged commit 09a16b9 into v2 Apr 27, 2026
20 checks passed
@pingsutw pingsutw deleted the fix/v2-aborted-after-terminal branch April 27, 2026 23:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants