fix(llm): add fallback to non-streaming mode when content extraction fails #2605

suhasdeshpande · 2025-04-15T06:41:27Z

When streaming responses don't contain extractable content, the code now falls back to non-streaming mode instead of raising an exception. This creates a more resilient system that can handle a wider variety of response formats from different LLM providers, preventing crashes in the agent flow.

The key changes:

Add fallback to non-streaming call when content extraction fails
Simplify error handling logic
Keep stream_options for usage metrics collection

…fails When streaming responses don't contain extractable content, the code now falls back to non-streaming mode instead of raising an exception. This creates a more resilient system that can handle a wider variety of response formats from different LLM providers, preventing crashes in the agent flow. The key changes: - Add fallback to non-streaming call when content extraction fails - Simplify error handling logic - Keep stream_options for usage metrics collection

suhasdeshpande · 2025-04-15T06:47:14Z

src/crewai/llm.py

@@ -467,70 +443,16 @@ def _handle_streaming_response(
                        event=LLMStreamChunkEvent(chunk=chunk_content),
                    )

-            # --- 4) Fallback to non-streaming if no content received


LLM Streaming Response Error Handling Improvement

Purpose

This PR addresses critical errors that occur when LLM streaming responses don't match the expected format, causing agent flows to crash. We've replaced complex multi-stage extraction logic with a simpler, more reliable fallback mechanism.

Issue Observed

When using certain LLM providers or models (specifically encountered with "openai/gpt-4o"), streaming responses sometimes don't contain extractable content in the expected format, resulting in errors like:

[Agent] ERROR:root:Error in streaming response: No content received from streaming response. Received empty chunks or failed to extract content. [Agent] ╭───────────────────────────────── LLM Error ──────────────────────────────────╮ [Agent] │ ❌ LLM Call Failed │ [Agent] │ Error: No content received from streaming response. Received empty chunks │ [Agent] │ or failed to extract content. │ [Agent] ╰──────────────────────────────────────────────────────────────────────────────╯

This exception terminates the agent flow, causing a poor user experience:

[Agent] Exception: Failed to get streaming response: No content received from streaming response. Received empty chunks or failed to extract content. [Agent] [Flow._execute_single_listener] Error in method chat: Failed to get streaming response: No content received from streaming response. Received empty chunks or failed to extract content.

Solution

This solution maintains the original content extraction logic but removes the multiple nested fallback attempts that were still failing in some cases. Instead, we:

Keep the initial streaming request with usage metrics

Try to extract content using the existing logic

If extraction fails (when full_response is empty), fall back to non-streaming mode instead of raising an exception

This approach is more robust because it leverages the already working non-streaming implementation when streaming format extraction fails.

Benefits

Prevents agent flows from crashing when streaming responses don't match expected formats

Maintains all existing functionality (including usage metrics collection)

Works reliably across various LLM providers without requiring provider-specific handling

Simplifies code maintenance by reducing complexity in error handling

This change significantly improves reliability when working with different models and streaming response formats.

I have a few points:
1. I’d like to fully understand why the streaming extraction is failing, could it be due to a tool misconfiguration?
2. It seems the capability to send an event for each tool_call was removed.

Overall, it looks like the stream feature with tools might have been dropped, and that’s what I’m concerned about.

Thanks for the review! Let me address each point:

Regarding why streaming extraction is failing:

The issue is not with tool misconfiguration, but with response format compatibility.

When using certain models (like gpt-4o via litellm), the streaming chunks returned don't match the expected structure that the extraction code is looking for.

Specifically, the code expects chunks to have a clear "delta" -> "content" path, but some models/providers return content in different formats.

This is a common issue when working with multiple LLM providers through a unified interface.

About the event emission for tool calls:

The tool call detection and handling is still fully present. The main tool call handling happens in the _handle_tool_call method which remains unchanged.

The emission of events for tool calls is preserved in both the remaining code sections:

Line ~550: self._handle_emit_call_events(full_response, LLMCallType.LLM_CALL)

Line ~596: self._handle_emit_call_events(result, LLMCallType.TOOL_CALL)

What we removed was the redundant extraction attempts that were still resulting in errors.

Regarding streaming with tools:

Streaming with tools is absolutely not dropped - it still works exactly as before when the response format matches expectations.

We've simply added a fallback to non-streaming mode when content extraction fails, rather than crashing the flow with an exception.

The non-streaming mode still fully supports tool calls, so functionality is preserved even in fallback mode.

Let's discuss in more detail on our call.

lucasgomide and others added 2 commits April 10, 2025 17:43

(wip)feat: emit properly tools event when using a stream LLM mode

00b6e04

suhasdeshpande commented Apr 15, 2025

View reviewed changes

lucasgomide force-pushed the feat-emit-stream-tool branch 15 times, most recently from 11b15e6 to ce4f36b Compare April 16, 2025 22:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(llm): add fallback to non-streaming mode when content extraction fails #2605

fix(llm): add fallback to non-streaming mode when content extraction fails #2605

suhasdeshpande commented Apr 15, 2025

suhasdeshpande Apr 15, 2025

lucasgomide Apr 15, 2025

suhasdeshpande Apr 15, 2025

fix(llm): add fallback to non-streaming mode when content extraction fails #2605

Are you sure you want to change the base?

fix(llm): add fallback to non-streaming mode when content extraction fails #2605

Conversation

suhasdeshpande commented Apr 15, 2025

suhasdeshpande Apr 15, 2025

Choose a reason for hiding this comment

LLM Streaming Response Error Handling Improvement

Purpose

Issue Observed

Solution

Benefits

lucasgomide Apr 15, 2025

Choose a reason for hiding this comment

suhasdeshpande Apr 15, 2025

Choose a reason for hiding this comment