fix(streaming): wire first-token retry and return correct finish_reason on truncation by kilhyeonjun · Pull Request #114 · jwadow/kiro-gateway

kilhyeonjun · 2026-03-25T01:06:33Z

Summary

Fixes #113 — Two streaming reliability fixes:

Wire stream_with_first_token_retry into OpenAI streaming path — The retry mechanism was implemented but never connected. Now routes_openai.py uses it with a closure that reuses the initial validated response on first attempt, only creating new requests on retry.
Return correct finish_reason/stop_reason on truncation — When Kiro API truncates mid-stream (no completion signals), now returns finish_reason: "length" (OpenAI) / stop_reason: "max_tokens" (Anthropic) instead of "stop" / "end_turn".

Changes

routes_openai.py: Replace stream_kiro_to_openai with stream_with_first_token_retry in streaming handler. Uses make_retry_request closure that reuses the initial 200 response on first attempt.
streaming_openai.py: When content_was_truncated is true, set finish_reason = "length" instead of "stop"
streaming_anthropic.py: When content_was_truncated is true, set stop_reason = "max_tokens" instead of "end_turn"

Behavior

Scenario	Before	After
First token timeout (OpenAI)	Immediate failure, no retry	Up to 3 retries (FIRST_TOKEN_MAX_RETRIES)
Kiro API truncation (OpenAI)	`finish_reason: "stop"`	`finish_reason: "length"`
Kiro API truncation (Anthropic)	`stop_reason: "end_turn"`	`stop_reason: "max_tokens"`
Normal completion	`finish_reason: "stop"`	`finish_reason: "stop"` (unchanged)

Testing

Verified locally via Docker:

Streaming responses complete normally with finish_reason: "stop"
First-token retry mechanism is wired (confirmed via log path routes_openai:stream_wrapper)
Truncation detection logic unchanged — only the reported finish_reason differs

…on on truncation Two issues fixed: 1. stream_with_first_token_retry was implemented but never wired into the OpenAI streaming path. routes_openai.py called stream_kiro_to_openai directly, bypassing retry logic entirely. Now uses stream_with_first_token_retry with a make_request closure that reuses the initial validated response on first attempt. 2. When Kiro API truncates a response mid-stream (no usage/context_usage completion signals), the gateway returned finish_reason="stop" (OpenAI) / stop_reason="end_turn" (Anthropic), making clients believe the response completed normally. Now returns "length" / "max_tokens" respectively, so clients can detect truncation and request continuation. Constraint: First attempt reuses the already-validated 200 response to avoid double-requesting Confidence: high Scope-risk: moderate

Tests that mock parse_kiro_stream without usage/context_usage events now trigger truncation detection, causing finish_reason="length" and stop_reason="max_tokens" instead of the expected "stop"/"end_turn". Add usage event (OpenAI) and context_usage event (Anthropic) to mock streams so they represent normal completions, not truncated ones. Confidence: high Scope-risk: narrow

cla-bot bot added the cla-signed Contributor License Agreement has been signed label Mar 25, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(streaming): wire first-token retry and return correct finish_reason on truncation#114

fix(streaming): wire first-token retry and return correct finish_reason on truncation#114
kilhyeonjun wants to merge 2 commits intojwadow:mainfrom
kilhyeonjun:feat/streaming-reliability

kilhyeonjun commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

kilhyeonjun commented Mar 25, 2026

Summary

Changes

Behavior

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant