-
Notifications
You must be signed in to change notification settings - Fork 4.2k
fix(llm): add fallback to non-streaming mode when content extraction fails #2605
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: feat-emit-stream-tool
Are you sure you want to change the base?
fix(llm): add fallback to non-streaming mode when content extraction fails #2605
Conversation
…fails When streaming responses don't contain extractable content, the code now falls back to non-streaming mode instead of raising an exception. This creates a more resilient system that can handle a wider variety of response formats from different LLM providers, preventing crashes in the agent flow. The key changes: - Add fallback to non-streaming call when content extraction fails - Simplify error handling logic - Keep stream_options for usage metrics collection
@@ -467,70 +443,16 @@ def _handle_streaming_response( | |||
event=LLMStreamChunkEvent(chunk=chunk_content), | |||
) | |||
|
|||
# --- 4) Fallback to non-streaming if no content received |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LLM Streaming Response Error Handling Improvement
Purpose
This PR addresses critical errors that occur when LLM streaming responses don't match the expected format, causing agent flows to crash. We've replaced complex multi-stage extraction logic with a simpler, more reliable fallback mechanism.
Issue Observed
When using certain LLM providers or models (specifically encountered with "openai/gpt-4o"), streaming responses sometimes don't contain extractable content in the expected format, resulting in errors like:
[Agent] ERROR:root:Error in streaming response: No content received from streaming response. Received empty chunks or failed to extract content.
[Agent] ╭───────────────────────────────── LLM Error ──────────────────────────────────╮
[Agent] │ ❌ LLM Call Failed │
[Agent] │ Error: No content received from streaming response. Received empty chunks │
[Agent] │ or failed to extract content. │
[Agent] ╰──────────────────────────────────────────────────────────────────────────────╯
This exception terminates the agent flow, causing a poor user experience:
[Agent] Exception: Failed to get streaming response: No content received from streaming response. Received empty chunks or failed to extract content.
[Agent] [Flow._execute_single_listener] Error in method chat: Failed to get streaming response: No content received from streaming response. Received empty chunks or failed to extract content.
Solution
This solution maintains the original content extraction logic but removes the multiple nested fallback attempts that were still failing in some cases. Instead, we:
- Keep the initial streaming request with usage metrics
- Try to extract content using the existing logic
- If extraction fails (when full_response is empty), fall back to non-streaming mode instead of raising an exception
This approach is more robust because it leverages the already working non-streaming implementation when streaming format extraction fails.
Benefits
- Prevents agent flows from crashing when streaming responses don't match expected formats
- Maintains all existing functionality (including usage metrics collection)
- Works reliably across various LLM providers without requiring provider-specific handling
- Simplifies code maintenance by reducing complexity in error handling
This change significantly improves reliability when working with different models and streaming response formats.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a few points:
1. I’d like to fully understand why the streaming extraction is failing, could it be due to a tool misconfiguration?
2. It seems the capability to send an event for each tool_call was removed.
Overall, it looks like the stream feature with tools might have been dropped, and that’s what I’m concerned about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review! Let me address each point:
-
Regarding why streaming extraction is failing:
- The issue is not with tool misconfiguration, but with response format compatibility.
- When using certain models (like gpt-4o via litellm), the streaming chunks returned don't match the expected structure that the extraction code is looking for.
- Specifically, the code expects chunks to have a clear "delta" -> "content" path, but some models/providers return content in different formats.
- This is a common issue when working with multiple LLM providers through a unified interface.
-
About the event emission for tool calls:
- The tool call detection and handling is still fully present. The main tool call handling happens in the
_handle_tool_call
method which remains unchanged. - The emission of events for tool calls is preserved in both the remaining code sections:
- Line ~550:
self._handle_emit_call_events(full_response, LLMCallType.LLM_CALL)
- Line ~596:
self._handle_emit_call_events(result, LLMCallType.TOOL_CALL)
- Line ~550:
- What we removed was the redundant extraction attempts that were still resulting in errors.
- The tool call detection and handling is still fully present. The main tool call handling happens in the
-
Regarding streaming with tools:
- Streaming with tools is absolutely not dropped - it still works exactly as before when the response format matches expectations.
- We've simply added a fallback to non-streaming mode when content extraction fails, rather than crashing the flow with an exception.
- The non-streaming mode still fully supports tool calls, so functionality is preserved even in fallback mode.
Let's discuss in more detail on our call.
11b15e6
to
ce4f36b
Compare
When streaming responses don't contain extractable content, the code now falls back to non-streaming mode instead of raising an exception. This creates a more resilient system that can handle a wider variety of response formats from different LLM providers, preventing crashes in the agent flow.
The key changes: