crmne · tpaulshippy · Aug 12, 2025 · Aug 12, 2025 · Aug 12, 2025 · Aug 12, 2025
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -20,7 +20,6 @@ RubyLLM does one thing well: **LLM communication in Ruby**.
 - **RAG support** → Use dedicated libraries
 - **Prompt templates** → Use ERB/Mustache in your app
 - **Model data fixes** → File with [Parsera](https://github.com/parsera-labs/api-llm-specs/issues)
-- **Auto-failover** → Use `.with_model()` (works mid-conversation, even across providers)
 - **Tool interface changes** → Handle in your tool's initializer
 - **Testing helpers** → Use dependency injection
 

diff --git a/docs/_advanced/error-handling.md b/docs/_advanced/error-handling.md
@@ -29,6 +29,7 @@ After reading this guide, you will know:
 *   How errors are handled during streaming.
 *   Best practices for handling errors within Tools.
 *   RubyLLM's automatic retry behavior.
+*   How to configure automatic failover to backup providers.
 *   How to enable debug logging.
 
 ## RubyLLM Error Hierarchy
@@ -222,12 +223,115 @@ export RUBYLLM_DEBUG=true
 
 This will cause RubyLLM to log detailed information about API requests and responses, including headers and bodies (with sensitive data like API keys filtered), which can be invaluable for troubleshooting.
 
+## Automatic Failover
+
+RubyLLM provides built-in failover capabilities that automatically switch to backup providers when rate limits or other errors occur. This is especially useful for ensuring reliability in production applications.
+
+### Basic Failover Configuration
+
+Configure failover providers using the `with_failover` method:
+
+```ruby
+chat = RubyLLM.chat(provider: :anthropic, model: 'claude-3-5-haiku-20241022')
+chat.with_failover(
+  { provider: :bedrock, model: 'claude-3-5-haiku-20241022' },
+  'gpt-4o-mini' # String shorthand for OpenAI models
+)
+
+# This will automatically failover if the primary provider hits rate limits
+response = chat.ask "What is the capital of France?"
+```
+
+### Failover Configuration Options
+
+The `with_failover` method accepts multiple configuration formats:
+
+#### Hash Configuration (Full Control)
+```ruby
+chat.with_failover(
+  { provider: :bedrock, model: 'claude-3-5-haiku-20241022' },
+  { provider: :openai, model: 'gpt-4o-mini' }
+)
+```
+
+#### String Configuration (Model Name Only)
+```ruby
+# Uses the default provider for the model
+chat.with_failover('gpt-4o-mini', 'gemini-2.0-flash')
+```
+
+#### Mixed Configuration
+```ruby
+chat.with_failover(
+  { provider: :bedrock, model: 'claude-3-5-haiku-20241022' },
+  'gpt-4o-mini', # String shorthand
+  { provider: :gemini, model: 'gemini-2.0-flash' }
+)
+```
+
+### Failover with Different Contexts
+
+You can use different API keys or configurations for failover providers:
+
+```ruby
+# Create a backup context with different credentials
+backup_context = RubyLLM.context do |config|
+  config.anthropic_api_key = 'backup-api-key'
+end
+
+chat = RubyLLM.chat(provider: :anthropic, model: 'claude-3-5-haiku-20241022')
+chat.with_failover(
+  { provider: :anthropic, model: 'claude-3-5-haiku-20241022', context: backup_context }
+)
+```
+
+### When Failover Occurs
+
+Failover is triggered by:
+
+*   **Rate Limit Errors (`RubyLLM::RateLimitError`)** - The primary use case
+*   **Service Unavailable Errors (`RubyLLM::ServiceUnavailableError`)** - When the provider is temporarily down
+*   **Overloaded Errors (`RubyLLM::OverloadedError`)** - When the provider is overloaded
+*   **Server Errors (`RubyLLM::ServerError`)** - When the provider returns 5xx errors
+
+**Note:** Failover does **not** occur for client errors like `BadRequestError` (400), `UnauthorizedError` (401), or `ForbiddenError` (403), as these typically indicate configuration issues that would affect all providers.
+
+### Failover Behavior
+
+1.  RubyLLM attempts the request with the primary provider/model
+2.  If a failover-eligible error occurs, it tries the first backup configuration
+3.  If that also fails with a failover-eligible error, it tries the next backup
+4.  This continues until either a request succeeds or all configurations are exhausted
+5.  If all configurations fail, the last error is raised
+
+### Example: Production-Ready Setup
+
+```ruby
+# Primary: Fast, cheap model
+# Backup: More reliable but expensive model  
+# Final: Different provider entirely
+chat = RubyLLM.chat(provider: :anthropic, model: 'claude-3-5-haiku-20241022')
+chat.with_failover(
+  { provider: :anthropic, model: 'claude-3-5-sonnet-20241022' }, # More capable model
+  { provider: :openai, model: 'gpt-4o' }, # Different provider
+  { provider: :bedrock, model: 'claude-3-5-haiku-20241022' } # AWS backup
+)
+
+begin
+  response = chat.ask "Analyze this data..."
+  puts response.content
+rescue RubyLLM::Error => e
+  # Only reached if all providers fail
+  puts "All providers failed: #{e.message}"
+end
+```
+
 ## Best Practices
 
 *   **Be Specific:** Rescue specific error classes whenever possible for tailored recovery logic.
 *   **Log Errors:** Always log errors, including relevant context (model used, input data if safe) for debugging. Consider using the `response` attribute on `RubyLLM::Error` for more details.
 *   **User Feedback:** Provide clear, user-friendly feedback when an AI operation fails. Avoid exposing raw API error messages directly.
-*   **Fallbacks:** Consider fallback mechanisms (e.g., trying a different model, using cached data, providing a default response) if the AI service is critical to your application's function.
+*   **Fallbacks:** Consider failover mechanisms (e.g., trying a different model, using cached data, providing a default response) if the AI service is critical to your application's function. Use `with_failover` for automatic provider-level failover.
 *   **Monitor:** Track the frequency of different error types in production to identify recurring issues with providers or your implementation.
 
 ## Next Steps

diff --git a/lib/ruby_llm/chat.rb b/lib/ruby_llm/chat.rb
@@ -28,6 +28,7 @@ def initialize(model: nil, provider: nil, assume_model_exists: false, context: n
       @params = {}
       @headers = {}
       @schema = nil
+      @failover_configurations = []
       @on = {
         new_message: nil,
         end_message: nil,
@@ -111,6 +112,21 @@ def with_schema(schema, force: false)
       self
     end
 
+    def with_failover(*configurations)
+      @failover_configurations = configurations.map do |config|
+        case config
+        when Hash
+          config
+        when String
+          model_info = Models.find(config)
+          { model: config, provider: model_info.provider.to_sym }
+        else
+          raise ArgumentError, "Invalid failover configuration: #{config}"
+        end
+      end
+      self
+    end
+
     def on_new_message(&block)
       @on[:new_message] = block
       self
@@ -136,16 +152,24 @@ def each(&)
     end
 
     def complete(&) # rubocop:disable Metrics/PerceivedComplexity
-      response = @provider.complete(
-        messages,
-        tools: @tools,
-        temperature: @temperature,
-        model: @model.id,
-        params: @params,
-        headers: @headers,
-        schema: @schema,
-        &wrap_streaming_block(&)
-      )
+      original_provider = @provider
+      original_model = @model
+
+      begin
+        response = @provider.complete(
+          messages,
+          tools: @tools,
+          temperature: @temperature,
+          model: @model.id,
+          params: @params,
+          headers: @headers,
+          schema: @schema,
+          &wrap_streaming_block(&)
+        )
+      rescue RubyLLM::RateLimitError, RubyLLM::ServiceUnavailableError,
+             RubyLLM::OverloadedError, RubyLLM::ServerError => e
+        response = attempt_failover(original_provider, original_model, e, &)
+      end
 
       @on[:new_message]&.call unless block_given?
 
@@ -220,8 +244,44 @@ def execute_tool(tool_call)
       tool.call(args)
     end
 
+    def attempt_failover(original_provider, original_model, original_error, &)
+      raise original_error unless @failover_configurations.any?
+
+      failover_index = 0
+      response = nil
+
+      @failover_configurations.each do |config|
+        with_context(config[:context]) if config[:context]
+        with_model(config[:model], provider: config[:provider])
+        response = @provider.complete(
+          messages,
+          tools: @tools,
+          temperature: @temperature,
+          model: @model.id,
+          params: @params,
+          headers: @headers,
+          schema: @schema,
+          &wrap_streaming_block(&)
+        )
+        break
+      rescue RateLimitError, ServiceUnavailableError, OverloadedError, ServerError => e
+        raise e if failover_index == @failover_configurations.size - 1
+
+        failover_index += 1
+        next
+      end
+
+      unless response
+        @provider = original_provider
+        @model = original_model
+        raise original_error
+      end
+
+      response
+    end
+
     def instance_variables
-      super - %i[@connection @config]
+      super - %i[@connection @config @failover_configurations]
     end
   end
 end
diff --git a/...cr_cassettes/chat_with_failover_does_not_fail_over_when_bad_request_errors_are_raised.yml b/...cr_cassettes/chat_with_failover_does_not_fail_over_when_bad_request_errors_are_raised.yml