Skip to content

fix(llm): add fallback to non-streaming mode when content extraction fails #2605

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: feat-emit-stream-tool
Choose a base branch
from

Conversation

suhasdeshpande
Copy link

When streaming responses don't contain extractable content, the code now falls back to non-streaming mode instead of raising an exception. This creates a more resilient system that can handle a wider variety of response formats from different LLM providers, preventing crashes in the agent flow.

The key changes:

  • Add fallback to non-streaming call when content extraction fails
  • Simplify error handling logic
  • Keep stream_options for usage metrics collection

lucasgomide and others added 2 commits April 10, 2025 17:43
…fails

When streaming responses don't contain extractable content, the code now falls back to non-streaming mode instead of raising an exception. This creates a more resilient system that can handle a wider variety of response formats from different LLM providers, preventing crashes in the agent flow.

The key changes:
- Add fallback to non-streaming call when content extraction fails
- Simplify error handling logic
- Keep stream_options for usage metrics collection
@@ -467,70 +443,16 @@ def _handle_streaming_response(
event=LLMStreamChunkEvent(chunk=chunk_content),
)

# --- 4) Fallback to non-streaming if no content received
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LLM Streaming Response Error Handling Improvement

Purpose

This PR addresses critical errors that occur when LLM streaming responses don't match the expected format, causing agent flows to crash. We've replaced complex multi-stage extraction logic with a simpler, more reliable fallback mechanism.

Issue Observed

When using certain LLM providers or models (specifically encountered with "openai/gpt-4o"), streaming responses sometimes don't contain extractable content in the expected format, resulting in errors like:

[Agent] ERROR:root:Error in streaming response: No content received from streaming response. Received empty chunks or failed to extract content.
[Agent] ╭───────────────────────────────── LLM Error ──────────────────────────────────╮
[Agent] │  ❌ LLM Call Failed                                                          │
[Agent] │  Error: No content received from streaming response. Received empty chunks   │
[Agent] │  or failed to extract content.                                               │
[Agent] ╰──────────────────────────────────────────────────────────────────────────────╯

This exception terminates the agent flow, causing a poor user experience:

[Agent] Exception: Failed to get streaming response: No content received from streaming response. Received empty chunks or failed to extract content.
[Agent] [Flow._execute_single_listener] Error in method chat: Failed to get streaming response: No content received from streaming response. Received empty chunks or failed to extract content.

Solution

This solution maintains the original content extraction logic but removes the multiple nested fallback attempts that were still failing in some cases. Instead, we:

  1. Keep the initial streaming request with usage metrics
  2. Try to extract content using the existing logic
  3. If extraction fails (when full_response is empty), fall back to non-streaming mode instead of raising an exception

This approach is more robust because it leverages the already working non-streaming implementation when streaming format extraction fails.

Benefits

  • Prevents agent flows from crashing when streaming responses don't match expected formats
  • Maintains all existing functionality (including usage metrics collection)
  • Works reliably across various LLM providers without requiring provider-specific handling
  • Simplifies code maintenance by reducing complexity in error handling

This change significantly improves reliability when working with different models and streaming response formats.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few points:
1. I’d like to fully understand why the streaming extraction is failing, could it be due to a tool misconfiguration?
2. It seems the capability to send an event for each tool_call was removed.

Overall, it looks like the stream feature with tools might have been dropped, and that’s what I’m concerned about.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review! Let me address each point:

  1. Regarding why streaming extraction is failing:

    • The issue is not with tool misconfiguration, but with response format compatibility.
    • When using certain models (like gpt-4o via litellm), the streaming chunks returned don't match the expected structure that the extraction code is looking for.
    • Specifically, the code expects chunks to have a clear "delta" -> "content" path, but some models/providers return content in different formats.
    • This is a common issue when working with multiple LLM providers through a unified interface.
  2. About the event emission for tool calls:

    • The tool call detection and handling is still fully present. The main tool call handling happens in the _handle_tool_call method which remains unchanged.
    • The emission of events for tool calls is preserved in both the remaining code sections:
      • Line ~550: self._handle_emit_call_events(full_response, LLMCallType.LLM_CALL)
      • Line ~596: self._handle_emit_call_events(result, LLMCallType.TOOL_CALL)
    • What we removed was the redundant extraction attempts that were still resulting in errors.
  3. Regarding streaming with tools:

    • Streaming with tools is absolutely not dropped - it still works exactly as before when the response format matches expectations.
    • We've simply added a fallback to non-streaming mode when content extraction fails, rather than crashing the flow with an exception.
    • The non-streaming mode still fully supports tool calls, so functionality is preserved even in fallback mode.

Let's discuss in more detail on our call.

@lucasgomide lucasgomide force-pushed the feat-emit-stream-tool branch 15 times, most recently from 11b15e6 to ce4f36b Compare April 16, 2025 22:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants