Skip to content

Commit 7240cdb

Browse files
committed
fix(models): surface empty LiteLlm streaming completions as error event
Streaming completions where the provider returns a finish_reason but no text + no tool calls currently produce ZERO yielded LlmResponse events: ``aggregated_llm_response`` only gets set when ``(text or reasoning_parts)`` is truthy, and ``aggregated_llm_response_with_tool_call`` needs a function_call. With neither, the loop simply exits and the downstream Runner observes a silent successful empty stream. This pattern is reported across multiple stalled fix attempts: * #5394 — AnthropicLlm never populates finish_reason on LlmResponse * #5006 — retry with resume message when model returns empty response * #5636 — surface error when model returns STOP with empty content * #3618 / #3699 — Handle empty message in LiteLLM response It hits providers under several real conditions: anthropic content_filter, gemini 2.5-flash-lite STOP-with-empty after tool calls, 0-token completions under safety, model_not_found responses normalized to stop, etc. From the user's perspective the agent "successfully" ends a turn with no visible output. Fix - Track ``last_finish_reason`` + ``last_model_version`` across the stream so we can attribute the empty response. - After both ``aggregated_llm_response`` and ``aggregated_llm_response_with_tool_call`` checks, if BOTH are None AND a finish_reason was observed, yield ONE LlmResponse with ``error_code`` set to the mapped finish_reason, ``error_message`` describing the failure mode, and the provider's ``model_version`` preserved. ``usage_metadata`` + ``grounding_metadata`` (if any) attach to that response so callers do not lose them. - Minimum-surface change: the guard only fires when the stream produced no aggregated response AND a finish_reason was observed. Streams that genuinely yield nothing (test doubles, empty iterators) stay byte-identical. Tests - tests/unittests/models/test_litellm.py adds 4 cases: * content_filter-empty → surfaces with SAFETY error_code * stop-empty → surfaces with STOP finish_reason + error_message * normal text stream → empty-guard does NOT fire (regression) * literally-empty stream (no chunks, no finish_reason) → byte-identical zero responses 281 lite_llm tests pass + 1 skip; 0 regressions.
1 parent 8c9fff8 commit 7240cdb

2 files changed

Lines changed: 193 additions & 0 deletions

File tree

src/google/adk/models/lite_llm.py

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2413,6 +2413,16 @@ async def generate_content_async(
24132413
usage_metadata = None
24142414
grounding_metadata = None
24152415
fallback_index = 0
2416+
# Track the latest finish_reason + model so we can surface an
2417+
# explicit error event when the stream completes with neither text
2418+
# nor tool calls (silent-empty pattern). Without this guard the
2419+
# aggregator simply yields nothing and the downstream Runner sees
2420+
# zero events — the user-visible "empty turn that succeeded" bug
2421+
# tracked across #5394 / #5006 / #3618 / user reports against
2422+
# anthropic, gemini, and other providers under content_filter,
2423+
# safety, model_not_found, and 0-token-completion conditions.
2424+
last_finish_reason: str | None = None
2425+
last_model_version: str | None = None
24162426

24172427
def _finalize_tool_call_response(
24182428
*, model_version: str, finish_reason: str
@@ -2502,7 +2512,11 @@ def _reset_stream_buffers() -> None:
25022512
part_grounding = _extract_grounding_metadata(part)
25032513
if part_grounding:
25042514
grounding_metadata = part_grounding
2515+
if getattr(part, "model", None):
2516+
last_model_version = part.model
25052517
for chunk, finish_reason in _model_response_to_chunk(part):
2518+
if finish_reason:
2519+
last_finish_reason = finish_reason
25062520
if isinstance(chunk, FunctionChunk):
25072521
index = chunk.index or fallback_index
25082522
if index not in function_calls:
@@ -2616,6 +2630,41 @@ def _reset_stream_buffers() -> None:
26162630
)
26172631
yield aggregated_llm_response_with_tool_call
26182632

2633+
# If we observed a finish_reason but produced no aggregated response
2634+
# (no text, no tool calls), surface it as an explicit error event so
2635+
# downstream consumers see actionable signal instead of a silent
2636+
# zero-yield stream. This is the "empty completion" pattern:
2637+
# provider returns 200 OK with a finish_reason (content_filter,
2638+
# safety, length-without-content, model_not_found, 0-token
2639+
# response, ...) but no text or tool deltas. Tracked across:
2640+
# #5394 — AnthropicLlm never populates finish_reason
2641+
# #5006 — retry with resume message when model returns empty
2642+
# #5636 — surface error when model returns STOP with empty content
2643+
# #3618 — Handle empty message in LiteLLM response
2644+
# Multiple stalled PRs (#5512, #5636, #3699) attempted variants of
2645+
# this fix. This guard keeps the change minimal: ONLY fires when
2646+
# the entire stream produced nothing meaningful AND the provider
2647+
# told us why via finish_reason. Mapped error_code lets the
2648+
# downstream agent loop react (retry, surface, escalate) without
2649+
# losing the provider's actual signal.
2650+
if (
2651+
last_finish_reason
2652+
and aggregated_llm_response is None
2653+
and aggregated_llm_response_with_tool_call is None
2654+
):
2655+
mapped_empty_finish = _map_finish_reason(last_finish_reason)
2656+
empty_response = LlmResponse(
2657+
error_code=mapped_empty_finish,
2658+
error_message=_finish_reason_to_error_message(mapped_empty_finish),
2659+
finish_reason=mapped_empty_finish,
2660+
model_version=last_model_version,
2661+
)
2662+
if usage_metadata:
2663+
empty_response.usage_metadata = usage_metadata
2664+
if grounding_metadata:
2665+
empty_response.grounding_metadata = grounding_metadata
2666+
yield empty_response
2667+
26192668
else:
26202669
response = await self.llm_client.acompletion(**completion_args)
26212670
yield _model_response_to_generate_content_response(response)

tests/unittests/models/test_litellm.py

Lines changed: 144 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5275,3 +5275,147 @@ def test_redact_file_uri_for_log_http_url_keeps_scheme_and_tail():
52755275
_redact_file_uri_for_log("https://example.com/path/file.pdf")
52765276
== "https://<redacted>/file.pdf"
52775277
)
5278+
5279+
5280+
# ---------------------------------------------------------------------------
5281+
# Empty-completion guard (fixes #5394 / #5006 / #5636 / #3618 family)
5282+
# ---------------------------------------------------------------------------
5283+
5284+
5285+
@pytest.mark.asyncio
5286+
async def test_streaming_empty_completion_with_content_filter_surfaces_error(
5287+
mock_completion, lite_llm_instance
5288+
):
5289+
"""A stream that ends with finish_reason but no text/tool deltas must
5290+
surface as an explicit error event, not a silent zero-yield stream.
5291+
5292+
Reproduces the "empty completion" pattern reported across
5293+
#5394 / #5006 / #5636 / #3618: provider returns 200 OK with a
5294+
finish_reason (content_filter / safety / model_not_found / etc.)
5295+
but no content. Pre-fix the aggregator simply yielded nothing and
5296+
the downstream Runner saw zero events — silent successful empty
5297+
turn from the user's perspective.
5298+
"""
5299+
stream_chunks = [
5300+
ModelResponseStream(
5301+
model="anthropic/claude-opus-4-8",
5302+
choices=[
5303+
StreamingChoices(finish_reason="content_filter", delta=Delta())
5304+
],
5305+
),
5306+
]
5307+
mock_completion.return_value = iter(stream_chunks)
5308+
5309+
responses = [
5310+
response
5311+
async for response in lite_llm_instance.generate_content_async(
5312+
LLM_REQUEST_WITH_FUNCTION_DECLARATION, stream=True
5313+
)
5314+
]
5315+
5316+
assert len(responses) == 1, (
5317+
"empty completion with finish_reason must surface exactly one error "
5318+
f"LlmResponse, got {len(responses)}: {responses}"
5319+
)
5320+
empty_response = responses[0]
5321+
assert empty_response.error_code == types.FinishReason.SAFETY
5322+
assert empty_response.finish_reason == types.FinishReason.SAFETY
5323+
assert empty_response.error_message
5324+
# The model name must propagate so downstream consumers know what failed.
5325+
assert empty_response.model_version == "anthropic/claude-opus-4-8"
5326+
5327+
5328+
@pytest.mark.asyncio
5329+
async def test_streaming_empty_completion_with_stop_finish_reason_still_surfaces(
5330+
mock_completion, lite_llm_instance
5331+
):
5332+
"""A stream that ends with finish_reason='stop' but no content also
5333+
surfaces — the model said "I'm done" but produced nothing. This is the
5334+
most common shape of the silent-empty bug (#5636 specifically called
5335+
out gemini-2.5-flash-lite returning STOP with empty content after a
5336+
tool call). Pre-fix, the (text or reasoning_parts) guard skipped
5337+
finalization for this case and the aggregator yielded nothing.
5338+
"""
5339+
stream_chunks = [
5340+
ModelResponseStream(
5341+
model="google/gemini-3.5-flash",
5342+
choices=[StreamingChoices(finish_reason="stop", delta=Delta())],
5343+
),
5344+
]
5345+
mock_completion.return_value = iter(stream_chunks)
5346+
5347+
responses = [
5348+
response
5349+
async for response in lite_llm_instance.generate_content_async(
5350+
LLM_REQUEST_WITH_FUNCTION_DECLARATION, stream=True
5351+
)
5352+
]
5353+
5354+
assert len(responses) == 1, (
5355+
"STOP-with-empty stream must surface exactly one response, got "
5356+
f"{len(responses)}: {responses}"
5357+
)
5358+
empty_response = responses[0]
5359+
# The finish_reason is preserved so callers see the model's signal
5360+
# ("I stopped cleanly") + the empty body is also visible (no text
5361+
# content). error_message names the failure mode so the operator
5362+
# does not have to dig.
5363+
assert empty_response.finish_reason == types.FinishReason.STOP
5364+
assert empty_response.error_message
5365+
assert empty_response.model_version == "google/gemini-3.5-flash"
5366+
5367+
5368+
@pytest.mark.asyncio
5369+
async def test_streaming_text_response_does_not_trigger_empty_guard(
5370+
mock_completion, lite_llm_instance
5371+
):
5372+
"""Regression guard: a normal text completion must NOT trigger the
5373+
empty-guard — the existing aggregated_llm_response handles it and
5374+
exactly one response is yielded with the actual text."""
5375+
stream_chunks = [
5376+
ModelResponseStream(
5377+
choices=[
5378+
StreamingChoices(
5379+
finish_reason=None,
5380+
delta=Delta(role="assistant", content="hello"),
5381+
)
5382+
]
5383+
),
5384+
ModelResponseStream(
5385+
choices=[StreamingChoices(finish_reason="stop", delta=Delta())]
5386+
),
5387+
]
5388+
mock_completion.return_value = iter(stream_chunks)
5389+
5390+
responses = [
5391+
response
5392+
async for response in lite_llm_instance.generate_content_async(
5393+
LLM_REQUEST_WITH_FUNCTION_DECLARATION, stream=True
5394+
)
5395+
]
5396+
5397+
# The text deltas yield partial responses + one final aggregated
5398+
# response; the empty-guard MUST NOT fire because text was produced.
5399+
assert len(responses) >= 2
5400+
final = responses[-1]
5401+
assert final.error_code is None
5402+
assert final.finish_reason == types.FinishReason.STOP
5403+
5404+
5405+
@pytest.mark.asyncio
5406+
async def test_streaming_no_chunks_and_no_finish_reason_is_byte_identical(
5407+
mock_completion, lite_llm_instance
5408+
):
5409+
"""The empty-guard ONLY fires when there's a finish_reason. A stream
5410+
that yields literally nothing (test doubles, etc.) must stay
5411+
byte-identical to pre-fix — zero responses, no synthesized error."""
5412+
mock_completion.return_value = iter([])
5413+
5414+
responses = [
5415+
response
5416+
async for response in lite_llm_instance.generate_content_async(
5417+
LLM_REQUEST_WITH_FUNCTION_DECLARATION, stream=True
5418+
)
5419+
]
5420+
5421+
assert responses == []

0 commit comments

Comments
 (0)