Skip to content

bug: gc sling does not wake warm-idle workers — routed beads stall until manual nudge #1129

@sjarmak

Description

@sjarmak

Summary

gc sling <worker> <bead-id> sets gc.routed_to metadata and wraps the bead in a convoy/wisp, but does NOT send a wake signal to the target worker. As a result, routed beads sit unclaimed indefinitely against any worker that is not currently in an active turn cycle.

Root cause: --nudge defaults to false (cmd/gc/cmd_sling.go:105), and gc hook --inject is wired to the Stop hook only (internal/hooks/config/claude.json:42-51), not SessionStart. A warm-idle worker session (one that ran gc prime and has been sitting at its idle prompt with no active turn) never fires Stop, so the work-check never runs and routed work is never surfaced.

Relationship to #1027

#1027 addressed the cold-spawn pool case — a sling that spawns a fresh polecat now triggers a first turn via prompt_mode=\"arg\" + PromptSuffix. That fix does NOT cover the warm-idle case (session already spawned, primed, and idle). @julianknutsen's closing comment on #1027 explicitly flagged this as separate follow-up:

Caveat: if we later want gc sling --nudge itself to enqueue a post-start reminder for cold pool targets, that remains separate follow-up work because the cold pool branch still does not enqueue a sling reminder before an instance exists.

Reproduction

  1. Let a worker session quiesce at its idle prompt (gc prime done, no active turn).
  2. From any caller: gc sling <rig>/<worker> <bead-id> (without --nudge).
  3. Observe:
    • The bead has metadata.gc.routed_to = <worker>
    • The bead is wrapped in a convoy (sling-<id>) and a wisp with the target formula ✓
    • bd ready excludes the bead (it is considered hooked) ✓
    • gc hook <rig>/<worker> returns exit=1 (no work detected) — the hook does not surface routed-to-this-worker beads as claimable ✓
    • Worker session stays at idle prompt, status on the bead stays open, assignee stays null

Observed in practice (ds-research city)

  • Slung scix_experiments-wqr.9.1scix-worker at T+0. Sat idle.
  • Slung scix_experiments-wqr.9.2scix-worker-2 at T+0. Sat idle.
  • Slung scix_experiments-wqr.9.6scix-worker-3 at T+50min. Claimed + closed within 2 minutes.
  • Difference: scix-worker-3 happened to be in an active turn cycle at T+50min (finishing prior work). Its Stop hook fired, surfaced 9.6, next turn claimed it.
  • wqr.9.1 and 9.2 were slung to sessions that were already quiescent; they never fired Stop, never saw the work.

Workaround (confirmed)

gc session nudge <active-session-id> "check for assigned work"

Any user-prompt nudge unsticks — it starts a turn, the turn ends, Stop fires, gc hook --inject surfaces routed work into the next turn. Time-to-claim: ~30-60s.

Non-workarounds

  • gc session reset <sid> — spawns a fresh session but the new SessionStart still only runs gc prime, no work-check.
  • Session turnover from idle timeout — same reason.
  • Just waiting — Stop never fires because the session has no turn to end.

Acceptance criteria

gc sling to an idle-but-running worker results in the bead being claimed (assignee set, worker started) within 60s without any manual session nudge. Behavior consistent whether the worker was previously active or quiescent.

Proposed fix

Auto-nudge when the sling target has a running session, regardless of the --nudge flag. Keep --nudge as an explicit override for the cold-target case (still useful for enqueueing against a target that hasn't spawned yet).

Implementation lives in the CLI layer at cmd/gc/cmd_sling.go post-DoSling dispatch (not sling.finalize), to preserve the Layer 2 → Layer 0 layering boundary and avoid threading runtime.Provider into SlingDeps. The API handler path (internal/api/handler_sling.go) remains nudge-free to avoid adding WaitIdle latency to HTTP requests.

Environment

  • Repro: local + ds-research city
  • gc from main HEAD

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions