Skip to content

Failover support #344

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,6 @@ RubyLLM does one thing well: **LLM communication in Ruby**.
- **RAG support** → Use dedicated libraries
- **Prompt templates** → Use ERB/Mustache in your app
- **Model data fixes** → File with [Parsera](https://github.com/parsera-labs/api-llm-specs/issues)
- **Auto-failover** → Use `.with_model()` (works mid-conversation, even across providers)
- **Tool interface changes** → Handle in your tool's initializer
- **Testing helpers** → Use dependency injection

Expand Down
106 changes: 105 additions & 1 deletion docs/_advanced/error-handling.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ After reading this guide, you will know:
* How errors are handled during streaming.
* Best practices for handling errors within Tools.
* RubyLLM's automatic retry behavior.
* How to configure automatic failover to backup providers.
* How to enable debug logging.

## RubyLLM Error Hierarchy
Expand Down Expand Up @@ -222,12 +223,115 @@ export RUBYLLM_DEBUG=true

This will cause RubyLLM to log detailed information about API requests and responses, including headers and bodies (with sensitive data like API keys filtered), which can be invaluable for troubleshooting.

## Automatic Failover

RubyLLM provides built-in failover capabilities that automatically switch to backup providers when rate limits or other errors occur. This is especially useful for ensuring reliability in production applications.

### Basic Failover Configuration

Configure failover providers using the `with_failover` method:

```ruby
chat = RubyLLM.chat(provider: :anthropic, model: 'claude-3-5-haiku-20241022')
chat.with_failover(
{ provider: :bedrock, model: 'claude-3-5-haiku-20241022' },
'gpt-4o-mini' # String shorthand for OpenAI models
)

# This will automatically failover if the primary provider hits rate limits
response = chat.ask "What is the capital of France?"
```

### Failover Configuration Options

The `with_failover` method accepts multiple configuration formats:

#### Hash Configuration (Full Control)
```ruby
chat.with_failover(
{ provider: :bedrock, model: 'claude-3-5-haiku-20241022' },
{ provider: :openai, model: 'gpt-4o-mini' }
)
```

#### String Configuration (Model Name Only)
```ruby
# Uses the default provider for the model
chat.with_failover('gpt-4o-mini', 'gemini-2.0-flash')
```

#### Mixed Configuration
```ruby
chat.with_failover(
{ provider: :bedrock, model: 'claude-3-5-haiku-20241022' },
'gpt-4o-mini', # String shorthand
{ provider: :gemini, model: 'gemini-2.0-flash' }
)
```

### Failover with Different Contexts

You can use different API keys or configurations for failover providers:

```ruby
# Create a backup context with different credentials
backup_context = RubyLLM.context do |config|
config.anthropic_api_key = 'backup-api-key'
end

chat = RubyLLM.chat(provider: :anthropic, model: 'claude-3-5-haiku-20241022')
chat.with_failover(
{ provider: :anthropic, model: 'claude-3-5-haiku-20241022', context: backup_context }
)
```

### When Failover Occurs

Failover is triggered by:

* **Rate Limit Errors (`RubyLLM::RateLimitError`)** - The primary use case
* **Service Unavailable Errors (`RubyLLM::ServiceUnavailableError`)** - When the provider is temporarily down
* **Overloaded Errors (`RubyLLM::OverloadedError`)** - When the provider is overloaded
* **Server Errors (`RubyLLM::ServerError`)** - When the provider returns 5xx errors

**Note:** Failover does **not** occur for client errors like `BadRequestError` (400), `UnauthorizedError` (401), or `ForbiddenError` (403), as these typically indicate configuration issues that would affect all providers.

### Failover Behavior

1. RubyLLM attempts the request with the primary provider/model
2. If a failover-eligible error occurs, it tries the first backup configuration
3. If that also fails with a failover-eligible error, it tries the next backup
4. This continues until either a request succeeds or all configurations are exhausted
5. If all configurations fail, the last error is raised

### Example: Production-Ready Setup

```ruby
# Primary: Fast, cheap model
# Backup: More reliable but expensive model
# Final: Different provider entirely
chat = RubyLLM.chat(provider: :anthropic, model: 'claude-3-5-haiku-20241022')
chat.with_failover(
{ provider: :anthropic, model: 'claude-3-5-sonnet-20241022' }, # More capable model
{ provider: :openai, model: 'gpt-4o' }, # Different provider
{ provider: :bedrock, model: 'claude-3-5-haiku-20241022' } # AWS backup
)

begin
response = chat.ask "Analyze this data..."
puts response.content
rescue RubyLLM::Error => e
# Only reached if all providers fail
puts "All providers failed: #{e.message}"
end
```

## Best Practices

* **Be Specific:** Rescue specific error classes whenever possible for tailored recovery logic.
* **Log Errors:** Always log errors, including relevant context (model used, input data if safe) for debugging. Consider using the `response` attribute on `RubyLLM::Error` for more details.
* **User Feedback:** Provide clear, user-friendly feedback when an AI operation fails. Avoid exposing raw API error messages directly.
* **Fallbacks:** Consider fallback mechanisms (e.g., trying a different model, using cached data, providing a default response) if the AI service is critical to your application's function.
* **Fallbacks:** Consider failover mechanisms (e.g., trying a different model, using cached data, providing a default response) if the AI service is critical to your application's function. Use `with_failover` for automatic provider-level failover.
* **Monitor:** Track the frequency of different error types in production to identify recurring issues with providers or your implementation.

## Next Steps
Expand Down
82 changes: 71 additions & 11 deletions lib/ruby_llm/chat.rb
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ def initialize(model: nil, provider: nil, assume_model_exists: false, context: n
@params = {}
@headers = {}
@schema = nil
@failover_configurations = []
@on = {
new_message: nil,
end_message: nil,
Expand Down Expand Up @@ -111,6 +112,21 @@ def with_schema(schema, force: false)
self
end

def with_failover(*configurations)
@failover_configurations = configurations.map do |config|
case config
when Hash
config
when String
model_info = Models.find(config)
{ model: config, provider: model_info.provider.to_sym }
else
raise ArgumentError, "Invalid failover configuration: #{config}"
end
end
self
end

def on_new_message(&block)
@on[:new_message] = block
self
Expand All @@ -136,16 +152,24 @@ def each(&)
end

def complete(&) # rubocop:disable Metrics/PerceivedComplexity
response = @provider.complete(
messages,
tools: @tools,
temperature: @temperature,
model: @model.id,
params: @params,
headers: @headers,
schema: @schema,
&wrap_streaming_block(&)
)
original_provider = @provider
original_model = @model

begin
response = @provider.complete(
messages,
tools: @tools,
temperature: @temperature,
model: @model.id,
params: @params,
headers: @headers,
schema: @schema,
&wrap_streaming_block(&)
)
rescue RubyLLM::RateLimitError, RubyLLM::ServiceUnavailableError,
RubyLLM::OverloadedError, RubyLLM::ServerError => e
response = attempt_failover(original_provider, original_model, e, &)
end

@on[:new_message]&.call unless block_given?

Expand Down Expand Up @@ -220,8 +244,44 @@ def execute_tool(tool_call)
tool.call(args)
end

def attempt_failover(original_provider, original_model, original_error, &)
raise original_error unless @failover_configurations.any?

failover_index = 0
response = nil

@failover_configurations.each do |config|
with_context(config[:context]) if config[:context]
with_model(config[:model], provider: config[:provider])
response = @provider.complete(
messages,
tools: @tools,
temperature: @temperature,
model: @model.id,
params: @params,
headers: @headers,
schema: @schema,
&wrap_streaming_block(&)
)
break
rescue RateLimitError, ServiceUnavailableError, OverloadedError, ServerError => e
raise e if failover_index == @failover_configurations.size - 1

failover_index += 1
next
end

unless response
@provider = original_provider
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to restore context and config also. Probably should just make methods for save original and restore original.

@model = original_model
raise original_error
end

response
end

def instance_variables
super - %i[@connection @config]
super - %i[@connection @config @failover_configurations]
end
end
end

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading