Skip to content

FEATURE: SSE Resumption in execd#583

Open
Pangjiping wants to merge 10 commits intoalibaba:mainfrom
Pangjiping:feat/sse-resumption
Open

FEATURE: SSE Resumption in execd#583
Pangjiping wants to merge 10 commits intoalibaba:mainfrom
Pangjiping:feat/sse-resumption

Conversation

@Pangjiping
Copy link
Copy Markdown
Collaborator

@Pangjiping Pangjiping commented Mar 27, 2026

Summary

Problem

  • POST /command stream output over SSE; if the client disconnects, they lose the tail of the stream and cannot catch up or reconnect to the same execution cleanly.
  • Need a stable, monotonic event id (eid) per stdout/stderr chunk so clients can request "everything after after_eid" without ambiguity.

What we changed (how it’s solved)

  • pkg/sbuf: bounded ring-buffer store per stream id, keyed by eid, with strict monotonic checks - storage for replay after disconnect.
  • command_stream + SSE path: for RunCommand / RunCode with resumeEnabled, the primary SSE registers a hub, and writeSSE both writes to the live connection and appends stdout/stderr frames (eid > 0) to the buffer; on request end, cleanup removes hub + buffer entry.
  • GET /command/:id/resume: EventsAfter replays buffered SSE payloads, then if the command is still running and no primary SSE holds the live slot, tryAttachResume attaches the new response writer as the sole live consumer; 409 if the primary stream is still active.
  • specs/execd-api.yaml: documents GET /command/{id}/resume, after_eid, responses (200 SSE, 409 via shared Conflict), aligned with the implementation.

Testing

  • Not run (explain why)
  • Unit tests
  • Integration tests
  • e2e / manual verification

Breaking Changes

  • None
  • Yes (describe impact and migration path)

Checklist

  • Linked Issue or clearly described motivation. execd for SSE Resumption Between execd and SDKs #507
  • Added/updated docs (if needed)
  • Added/updated tests (if needed)
  • Security impact considered
  • Backward compatibility considered

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 839e0aa277

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

ninan-nn
ninan-nn previously approved these changes Mar 27, 2026
Copy link
Copy Markdown
Collaborator

@ninan-nn ninan-nn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM : > I think this should be described as best-effort resume rather than lossless resumption. There are still at least two event-loss windows: concurrent stdout/stderr append can drop older eids due to the buffer’s strict monotonic check, and the replay-then-attach flow can miss events generated between the buffer snapshot and live handoff. So reconnect improves recoverability, but it does not yet guarantee gap-free continuation.

@Pangjiping Pangjiping force-pushed the feat/sse-resumption branch from e2e4ff4 to 7ac7afd Compare March 28, 2026 08:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component/execd feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants