fix(streaming): wire first-token retry and return correct finish_reason on truncation#114
Open
kilhyeonjun wants to merge 2 commits intojwadow:mainfrom
Open
fix(streaming): wire first-token retry and return correct finish_reason on truncation#114kilhyeonjun wants to merge 2 commits intojwadow:mainfrom
kilhyeonjun wants to merge 2 commits intojwadow:mainfrom
Conversation
…on on truncation Two issues fixed: 1. stream_with_first_token_retry was implemented but never wired into the OpenAI streaming path. routes_openai.py called stream_kiro_to_openai directly, bypassing retry logic entirely. Now uses stream_with_first_token_retry with a make_request closure that reuses the initial validated response on first attempt. 2. When Kiro API truncates a response mid-stream (no usage/context_usage completion signals), the gateway returned finish_reason="stop" (OpenAI) / stop_reason="end_turn" (Anthropic), making clients believe the response completed normally. Now returns "length" / "max_tokens" respectively, so clients can detect truncation and request continuation. Constraint: First attempt reuses the already-validated 200 response to avoid double-requesting Confidence: high Scope-risk: moderate
Tests that mock parse_kiro_stream without usage/context_usage events now trigger truncation detection, causing finish_reason="length" and stop_reason="max_tokens" instead of the expected "stop"/"end_turn". Add usage event (OpenAI) and context_usage event (Anthropic) to mock streams so they represent normal completions, not truncated ones. Confidence: high Scope-risk: narrow
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #113 — Two streaming reliability fixes:
Wire
stream_with_first_token_retryinto OpenAI streaming path — The retry mechanism was implemented but never connected. Nowroutes_openai.pyuses it with a closure that reuses the initial validated response on first attempt, only creating new requests on retry.Return correct finish_reason/stop_reason on truncation — When Kiro API truncates mid-stream (no completion signals), now returns
finish_reason: "length"(OpenAI) /stop_reason: "max_tokens"(Anthropic) instead of"stop"/"end_turn".Changes
routes_openai.py: Replacestream_kiro_to_openaiwithstream_with_first_token_retryin streaming handler. Usesmake_retry_requestclosure that reuses the initial 200 response on first attempt.streaming_openai.py: Whencontent_was_truncatedis true, setfinish_reason = "length"instead of"stop"streaming_anthropic.py: Whencontent_was_truncatedis true, setstop_reason = "max_tokens"instead of"end_turn"Behavior
finish_reason: "stop"finish_reason: "length"stop_reason: "end_turn"stop_reason: "max_tokens"finish_reason: "stop"finish_reason: "stop"(unchanged)Testing
Verified locally via Docker:
finish_reason: "stop"routes_openai:stream_wrapper)