Skip to content

feat(execd): tune jupyter idle polling and sse completion wait#577

Merged
Pangjiping merged 3 commits intoalibaba:mainfrom
skyler0513:feat/execd-jupyter-idle-polling
Mar 28, 2026
Merged

feat(execd): tune jupyter idle polling and sse completion wait#577
Pangjiping merged 3 commits intoalibaba:mainfrom
skyler0513:feat/execd-jupyter-idle-polling

Conversation

@skyler0513
Copy link
Copy Markdown
Contributor

Summary

This PR improves execd's Jupyter execution stream handling and SSE tail latency behavior.

Changes included:

  • add configurable Jupyter idle polling interval via EXECD_JUPYTER_IDLE_POLL_INTERVAL / --jupyter-idle-poll-interval
  • replace the fixed 300ms post-idle polling sleep with the configured poll interval
  • update RunCode and RunInSession to wait for OnExecuteComplete as a best-effort signal instead of always sleeping for a fixed graceful shutdown window

Background

This change was motivated by an execution latency regression observed on code execution nodes.

Based on local observation:

  • without sandbox, execution latency was around 60ms
  • with sandbox enabled, execution latency increased to around 2.4s

After tracing the slow path, the added tail latency was mainly caused by fixed waiting behavior in the execd execution flow. With this change, the observed latency was
reduced to around 100ms in the same scenario.

Motivation

Previously:

  • Jupyter post-idle polling used a hard-coded 300ms interval
  • SSE handlers always paid a fixed tail latency because the controller slept unconditionally after execution

This change makes the Jupyter idle polling behavior configurable and reduces unnecessary fixed response tail latency for SSE endpoints.

Testing

Executed under components/execd:

  • go test ./pkg/flag
  • go test ./pkg/jupyter/execute
  • go test ./...
  • go test -race ./...
  • golangci-lint run -v ./...

Results:

  • go test ./pkg/flag: pass
  • go test ./pkg/jupyter/execute: pass
  • go test ./...: fails on existing pkg/web/controller TestReadMetrics
  • go test -race ./...: fails on existing pkg/web/controller TestReadMetrics and an existing race in pkg/runtime
  • golangci-lint run -v ./...: reports existing gocognit complexity issue at pkg/jupyter/execute/execute.go:117

Notes

  • The repository currently has pre-existing test/lint issues unrelated to this change.
  • This PR only modifies components/execd.

Copilot AI review requested due to automatic review settings March 26, 2026 08:19
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR reduces execd’s SSE tail latency and makes Jupyter “idle” post-processing polling configurable, targeting an observed execution latency regression in sandboxed environments.

Changes:

  • Add a configurable Jupyter idle polling interval via EXECD_JUPYTER_IDLE_POLL_INTERVAL / --jupyter-idle-poll-interval.
  • Replace the fixed 300ms post-idle polling sleep with the configured interval in Jupyter streaming execution.
  • Update SSE endpoints (RunCode, RunInSession) to wait for OnExecuteComplete (best-effort) instead of always sleeping a fixed graceful shutdown window.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 8 comments.

File Description
components/execd/pkg/web/controller/codeinterpreting.go Switch SSE tail behavior from unconditional sleep to best-effort completion wait.
components/execd/pkg/jupyter/execute/execute.go Make post-idle polling interval configurable rather than hard-coded.
components/execd/pkg/flag/parser.go Add env/CLI parsing for the new Jupyter idle poll interval.
components/execd/pkg/flag/flags.go Add the new global flag variable for Jupyter idle polling interval.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@Pangjiping Pangjiping self-assigned this Mar 26, 2026
@Pangjiping Pangjiping added feature New feature or request component/execd labels Mar 26, 2026
@Pangjiping
Copy link
Copy Markdown
Collaborator

Could you fix those go linter error by make golint under components/execd? 😊

@skyler0513
Copy link
Copy Markdown
Contributor Author

I fixed two blockers: the golangci-lint failure and the C# E2E error-contract regression.
For lint, I refactored ExecuteCodeStream into smaller helpers (no behavior change) so cognitive complexity now passes. For C# E2E, I fixed Jupyter terminal-event semantics
so failed executions emit error as terminal and do not emit complete, which prevents error events from being dropped on the client side. I also added regression tests for
both paths.

@Pangjiping
Copy link
Copy Markdown
Collaborator

I fixed two blockers: the golangci-lint failure and the C# E2E error-contract regression. For lint, I refactored ExecuteCodeStream into smaller helpers (no behavior change) so cognitive complexity now passes. For C# E2E, I fixed Jupyter terminal-event semantics so failed executions emit error as terminal and do not emit complete, which prevents error events from being dropped on the client side. I also added regression tests for both paths.

Thanks for your GREAT submission. I will review it and merge it as soon as possible.
It looks like this solves the issue we’ve long been criticized for — overly conservative connection keep-alive by sleep 😂😂.

Copy link
Copy Markdown
Collaborator

@Pangjiping Pangjiping left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Pangjiping Pangjiping merged commit 4346675 into alibaba:main Mar 28, 2026
9 checks passed
@skyler0513
Copy link
Copy Markdown
Contributor Author

LGTM

Hello,Jiping, What is the latest available execd image version?

@Pangjiping
Copy link
Copy Markdown
Collaborator

LGTM

Hello,Jiping, What is the latest available execd image version?

I’ll take care of it today or tomorrow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component/execd feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants