Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 839e0aa277
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
ninan-nn
left a comment
There was a problem hiding this comment.
LGTM : > I think this should be described as best-effort resume rather than lossless resumption. There are still at least two event-loss windows: concurrent stdout/stderr append can drop older eids due to the buffer’s strict monotonic check, and the replay-then-attach flow can miss events generated between the buffer snapshot and live handoff. So reconnect improves recoverability, but it does not yet guarantee gap-free continuation.
839e0aa to
441e3b5
Compare
e2e4ff4 to
7ac7afd
Compare
# Conflicts: # components/execd/pkg/web/model/error.go
Summary
Problem
POST /commandstream output over SSE; if the client disconnects, they lose the tail of the stream and cannot catch up or reconnect to the same execution cleanly.What we changed (how it’s solved)
pkg/sbuf: bounded ring-buffer store per stream id, keyed by eid, with strict monotonic checks - storage for replay after disconnect.RunCommand/RunCodewith resumeEnabled, the primary SSE registers a hub, and writeSSE both writes to the live connection and appends stdout/stderr frames (eid > 0) to the buffer; on request end, cleanup removes hub + buffer entry.GET /command/:id/resume: EventsAfter replays buffered SSE payloads, then if the command is still running and no primary SSE holds the live slot, tryAttachResume attaches the new response writer as the sole live consumer; 409 if the primary stream is still active.GET /command/{id}/resume, after_eid, responses (200 SSE, 409 via shared Conflict), aligned with the implementation.Testing
Breaking Changes
Checklist
execdand SDKs #507