- Status: Draft
- Tracking issue: to be filed
- Author: @shangdinggu
- Last updated: 2026-05-08
- Builds on:
0019-llm-runner.md,0022-llm-tool-calling.md,0026-ipc-streaming.md
RFC 0026 gave the IPC substrate for streaming chunks. This RFC
plugs the LLM runner into it: when stream=True is set in the
init payload, the provider's text deltas flow as IPC chunk
messages to the supervisor's on_chunk callback in real time.
A 30-second LLM response now feels real-time instead of a 30-
second wait.
The design is minimal:
- Optional.
stream=False(default) keeps the existing RFC 0019 single-shot path. No behaviour change for current callers. - Provider opt-in. Providers that want to support streaming
add a
stream(request, on_delta)method. Providers without it are silently used non-streaming when stream=True is requested — graceful degradation, no error. - Text deltas only. Tool-use blocks come as one piece. A future RFC may stream tool-use input deltas, but Anthropic's SDK already returns them whole.
- Multi-iteration aware. In a tool-calling loop (RFC 0022), streaming applies only to iterations that produce text output. Tool-use iterations run non-streaming.
stream: bool = FalseValidated in __post_init__ (must be bool). Round-trips via
to_dict / from_dict.
Providers MAY implement:
def stream(self, request: LlmRequest,
on_delta: Callable[[str], None]) -> LlmResponse:
"""Stream text deltas via on_delta, return final LlmResponse."""on_delta is called for each chunk of text content as it
arrives. The final returned LlmResponse has the full text +
token counts + finish_reason.
A provider without stream() falls back to non-streaming
__call__ — no chunks are emitted but the call still works.
For deterministic tests. Emits the response's text one
character at a time via on_delta, then returns the full
response (matching the next entry in its scripted list).
def stream(self, request, on_delta):
response = self(request) # advances cursor
for ch in response.text:
on_delta(ch)
return responseUses client.messages.stream() context manager. The SDK's
text_stream iterator yields incremental text. We pump each
delta to on_delta and assemble the final response from the
final message.
def stream(self, request, on_delta):
self._ensure_client()
kwargs = self._build_kwargs(request)
with self._client.messages.stream(**kwargs) as stream:
for delta in stream.text_stream:
on_delta(delta)
final = stream.get_final_message()
return self._convert(final, request.model)(Tool-use blocks in the streamed response stay whole — Anthropic
emits them as a finished tool_use block at the end.)
In cc_kernel/runner/llm/__main__.py:
stream = bool(payload.get("stream", False))
provider_supports_stream = hasattr(provider, "stream") and \
callable(provider.stream)
# In the iteration loop:
if stream and provider_supports_stream:
def on_delta(text: str) -> None:
chan.send({
"op": "chunk",
"kind": "text",
"content": text,
"metadata": {"iter": it},
})
response = provider.stream(request, on_delta)
else:
response = provider(request)The chunk's metadata.iter lets a UI distinguish text from
different iterations of a tool-calling loop.
LlmRequest.streamdefaults to False → existing single-turn and tool-calling tests are unchanged.- Providers without
stream()are silently used non-streaming → no breakage of existing test mocks. RunnerExitInfo.textstill reflects the full text regardless of streaming — same final output, just produced incrementally.info.chunks(RFC 0026) populates with per-delta entries when streaming.
A PR claiming this RFC must:
LlmRequest(stream=True).to_dict()round-trips.ScriptedMockProvider.stream("hello", ...)calls on_delta with each character: 'h', 'e', 'l', 'l', 'o'.- LLM runner with
stream=True+ scripted "hi" → 2 IPC chunk messages with content 'h', 'i' before the exit. - Supervisor's
on_chunkcallback receives them in order. stream=False(default) sends NO chunk messages.- Provider without stream() method + stream=True still works (non-streaming fallback, no chunks).
- Multi-iteration tool calling: tool_use iteration emits 0 chunks; final text iteration emits per-delta chunks.
info.textmatches the assembled deltas (same text either way).- No file outside
cc_kernel/,tests/,docs/RFC/modified.