diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 196aff76..065e98d9 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -20,7 +20,6 @@ RubyLLM does one thing well: **LLM communication in Ruby**. - **RAG support** → Use dedicated libraries - **Prompt templates** → Use ERB/Mustache in your app - **Model data fixes** → File with [Parsera](https://github.com/parsera-labs/api-llm-specs/issues) -- **Auto-failover** → Use `.with_model()` (works mid-conversation, even across providers) - **Tool interface changes** → Handle in your tool's initializer - **Testing helpers** → Use dependency injection diff --git a/docs/_advanced/error-handling.md b/docs/_advanced/error-handling.md index 1128e837..53f54f69 100644 --- a/docs/_advanced/error-handling.md +++ b/docs/_advanced/error-handling.md @@ -29,6 +29,7 @@ After reading this guide, you will know: * How errors are handled during streaming. * Best practices for handling errors within Tools. * RubyLLM's automatic retry behavior. +* How to configure automatic failover to backup providers. * How to enable debug logging. ## RubyLLM Error Hierarchy @@ -222,12 +223,115 @@ export RUBYLLM_DEBUG=true This will cause RubyLLM to log detailed information about API requests and responses, including headers and bodies (with sensitive data like API keys filtered), which can be invaluable for troubleshooting. +## Automatic Failover + +RubyLLM provides built-in failover capabilities that automatically switch to backup providers when rate limits or other errors occur. This is especially useful for ensuring reliability in production applications. + +### Basic Failover Configuration + +Configure failover providers using the `with_failover` method: + +```ruby +chat = RubyLLM.chat(provider: :anthropic, model: 'claude-3-5-haiku-20241022') +chat.with_failover( + { provider: :bedrock, model: 'claude-3-5-haiku-20241022' }, + 'gpt-4o-mini' # String shorthand for OpenAI models +) + +# This will automatically failover if the primary provider hits rate limits +response = chat.ask "What is the capital of France?" +``` + +### Failover Configuration Options + +The `with_failover` method accepts multiple configuration formats: + +#### Hash Configuration (Full Control) +```ruby +chat.with_failover( + { provider: :bedrock, model: 'claude-3-5-haiku-20241022' }, + { provider: :openai, model: 'gpt-4o-mini' } +) +``` + +#### String Configuration (Model Name Only) +```ruby +# Uses the default provider for the model +chat.with_failover('gpt-4o-mini', 'gemini-2.0-flash') +``` + +#### Mixed Configuration +```ruby +chat.with_failover( + { provider: :bedrock, model: 'claude-3-5-haiku-20241022' }, + 'gpt-4o-mini', # String shorthand + { provider: :gemini, model: 'gemini-2.0-flash' } +) +``` + +### Failover with Different Contexts + +You can use different API keys or configurations for failover providers: + +```ruby +# Create a backup context with different credentials +backup_context = RubyLLM.context do |config| + config.anthropic_api_key = 'backup-api-key' +end + +chat = RubyLLM.chat(provider: :anthropic, model: 'claude-3-5-haiku-20241022') +chat.with_failover( + { provider: :anthropic, model: 'claude-3-5-haiku-20241022', context: backup_context } +) +``` + +### When Failover Occurs + +Failover is triggered by: + +* **Rate Limit Errors (`RubyLLM::RateLimitError`)** - The primary use case +* **Service Unavailable Errors (`RubyLLM::ServiceUnavailableError`)** - When the provider is temporarily down +* **Overloaded Errors (`RubyLLM::OverloadedError`)** - When the provider is overloaded +* **Server Errors (`RubyLLM::ServerError`)** - When the provider returns 5xx errors + +**Note:** Failover does **not** occur for client errors like `BadRequestError` (400), `UnauthorizedError` (401), or `ForbiddenError` (403), as these typically indicate configuration issues that would affect all providers. + +### Failover Behavior + +1. RubyLLM attempts the request with the primary provider/model +2. If a failover-eligible error occurs, it tries the first backup configuration +3. If that also fails with a failover-eligible error, it tries the next backup +4. This continues until either a request succeeds or all configurations are exhausted +5. If all configurations fail, the last error is raised + +### Example: Production-Ready Setup + +```ruby +# Primary: Fast, cheap model +# Backup: More reliable but expensive model +# Final: Different provider entirely +chat = RubyLLM.chat(provider: :anthropic, model: 'claude-3-5-haiku-20241022') +chat.with_failover( + { provider: :anthropic, model: 'claude-3-5-sonnet-20241022' }, # More capable model + { provider: :openai, model: 'gpt-4o' }, # Different provider + { provider: :bedrock, model: 'claude-3-5-haiku-20241022' } # AWS backup +) + +begin + response = chat.ask "Analyze this data..." + puts response.content +rescue RubyLLM::Error => e + # Only reached if all providers fail + puts "All providers failed: #{e.message}" +end +``` + ## Best Practices * **Be Specific:** Rescue specific error classes whenever possible for tailored recovery logic. * **Log Errors:** Always log errors, including relevant context (model used, input data if safe) for debugging. Consider using the `response` attribute on `RubyLLM::Error` for more details. * **User Feedback:** Provide clear, user-friendly feedback when an AI operation fails. Avoid exposing raw API error messages directly. -* **Fallbacks:** Consider fallback mechanisms (e.g., trying a different model, using cached data, providing a default response) if the AI service is critical to your application's function. +* **Fallbacks:** Consider failover mechanisms (e.g., trying a different model, using cached data, providing a default response) if the AI service is critical to your application's function. Use `with_failover` for automatic provider-level failover. * **Monitor:** Track the frequency of different error types in production to identify recurring issues with providers or your implementation. ## Next Steps diff --git a/lib/ruby_llm/chat.rb b/lib/ruby_llm/chat.rb index a5f70117..2e3249d6 100644 --- a/lib/ruby_llm/chat.rb +++ b/lib/ruby_llm/chat.rb @@ -28,6 +28,7 @@ def initialize(model: nil, provider: nil, assume_model_exists: false, context: n @params = {} @headers = {} @schema = nil + @failover_configurations = [] @on = { new_message: nil, end_message: nil, @@ -111,6 +112,21 @@ def with_schema(schema, force: false) self end + def with_failover(*configurations) + @failover_configurations = configurations.map do |config| + case config + when Hash + config + when String + model_info = Models.find(config) + { model: config, provider: model_info.provider.to_sym } + else + raise ArgumentError, "Invalid failover configuration: #{config}" + end + end + self + end + def on_new_message(&block) @on[:new_message] = block self @@ -136,16 +152,24 @@ def each(&) end def complete(&) # rubocop:disable Metrics/PerceivedComplexity - response = @provider.complete( - messages, - tools: @tools, - temperature: @temperature, - model: @model.id, - params: @params, - headers: @headers, - schema: @schema, - &wrap_streaming_block(&) - ) + original_provider = @provider + original_model = @model + + begin + response = @provider.complete( + messages, + tools: @tools, + temperature: @temperature, + model: @model.id, + params: @params, + headers: @headers, + schema: @schema, + &wrap_streaming_block(&) + ) + rescue RubyLLM::RateLimitError, RubyLLM::ServiceUnavailableError, + RubyLLM::OverloadedError, RubyLLM::ServerError => e + response = attempt_failover(original_provider, original_model, e, &) + end @on[:new_message]&.call unless block_given? @@ -220,8 +244,44 @@ def execute_tool(tool_call) tool.call(args) end + def attempt_failover(original_provider, original_model, original_error, &) + raise original_error unless @failover_configurations.any? + + failover_index = 0 + response = nil + + @failover_configurations.each do |config| + with_context(config[:context]) if config[:context] + with_model(config[:model], provider: config[:provider]) + response = @provider.complete( + messages, + tools: @tools, + temperature: @temperature, + model: @model.id, + params: @params, + headers: @headers, + schema: @schema, + &wrap_streaming_block(&) + ) + break + rescue RateLimitError, ServiceUnavailableError, OverloadedError, ServerError => e + raise e if failover_index == @failover_configurations.size - 1 + + failover_index += 1 + next + end + + unless response + @provider = original_provider + @model = original_model + raise original_error + end + + response + end + def instance_variables - super - %i[@connection @config] + super - %i[@connection @config @failover_configurations] end end end diff --git a/spec/fixtures/vcr_cassettes/chat_with_failover_does_not_fail_over_when_bad_request_errors_are_raised.yml b/spec/fixtures/vcr_cassettes/chat_with_failover_does_not_fail_over_when_bad_request_errors_are_raised.yml new file mode 100644 index 00000000..5299eb42 --- /dev/null +++ b/spec/fixtures/vcr_cassettes/chat_with_failover_does_not_fail_over_when_bad_request_errors_are_raised.yml @@ -0,0 +1,58 @@ +--- +http_interactions: +- request: + method: post + uri: https://api.anthropic.com/v1/messages + body: + encoding: UTF-8 + string: '{"model":"claude-3-7-sonnet-20250219","messages":[{"role":"user","content":[{"type":"text","text":""}]}],"stream":false,"max_tokens":64000,"temperature":0.7}' + headers: + User-Agent: + - Faraday v2.13.1 + X-Api-Key: + - "" + Anthropic-Version: + - '2023-06-01' + Content-Type: + - application/json + Accept-Encoding: + - gzip;q=1.0,deflate;q=0.6,identity;q=0.3 + Accept: + - "*/*" + response: + status: + code: 400 + message: Bad Request + headers: + Date: + - Tue, 12 Aug 2025 22:49:39 GMT + Content-Type: + - application/json + Content-Length: + - '120' + Connection: + - keep-alive + Cf-Ray: + - "" + X-Should-Retry: + - 'false' + Request-Id: + - "" + Strict-Transport-Security: + - max-age=31536000; includeSubDomains; preload + Anthropic-Organization-Id: + - "" + Via: + - 1.1 google + Cf-Cache-Status: + - DYNAMIC + X-Robots-Tag: + - none + Server: + - cloudflare + body: + encoding: UTF-8 + string: '{"type":"error","error":{"type":"invalid_request_error","message":"prompt + is too long: 200021 tokens > 200000 maximum"}}' + recorded_at: Tue, 12 Aug 2025 22:49:39 GMT +recorded_with: VCR 6.3.1 diff --git a/spec/fixtures/vcr_cassettes/chat_with_failover_fails_over_to_the_next_provider_when_the_first_one_fails.yml b/spec/fixtures/vcr_cassettes/chat_with_failover_fails_over_to_the_next_provider_when_the_first_one_fails.yml new file mode 100644 index 00000000..1fba4f93 --- /dev/null +++ b/spec/fixtures/vcr_cassettes/chat_with_failover_fails_over_to_the_next_provider_when_the_first_one_fails.yml @@ -0,0 +1,396 @@ +--- +http_interactions: +- request: + method: post + uri: https://api.anthropic.com/v1/messages + body: + encoding: UTF-8 + string: '{"model":"claude-3-7-sonnet-20250219","messages":[{"role":"user","content":[{"type":"text","text":" + What is the capital of France?"}]}],"stream":false,"max_tokens":64000,"temperature":0.7}' + headers: + User-Agent: + - Faraday v2.13.4 + X-Api-Key: + - "" + Anthropic-Version: + - '2023-06-01' + Content-Type: + - application/json + Accept-Encoding: + - gzip;q=1.0,deflate;q=0.6,identity;q=0.3 + Accept: + - "*/*" + response: + status: + code: 200 + message: OK + headers: + Date: + - Mon, 11 Aug 2025 23:45:24 GMT + Content-Type: + - application/json + Transfer-Encoding: + - chunked + Connection: + - keep-alive + Anthropic-Ratelimit-Input-Tokens-Limit: + - '200000' + Anthropic-Ratelimit-Input-Tokens-Remaining: + - '112000' + Anthropic-Ratelimit-Input-Tokens-Reset: + - '2025-08-11T23:45:50Z' + Anthropic-Ratelimit-Output-Tokens-Limit: + - '80000' + Anthropic-Ratelimit-Output-Tokens-Remaining: + - '80000' + Anthropic-Ratelimit-Output-Tokens-Reset: + - '2025-08-11T23:45:24Z' + Anthropic-Ratelimit-Requests-Limit: + - '4000' + Anthropic-Ratelimit-Requests-Remaining: + - '3999' + Anthropic-Ratelimit-Requests-Reset: + - '2025-08-11T23:45:09Z' + Anthropic-Ratelimit-Tokens-Limit: + - '280000' + Anthropic-Ratelimit-Tokens-Remaining: + - '192000' + Anthropic-Ratelimit-Tokens-Reset: + - '2025-08-11T23:45:24Z' + Request-Id: + - "" + Strict-Transport-Security: + - max-age=31536000; includeSubDomains; preload + Anthropic-Organization-Id: + - "" + Via: + - 1.1 google + Cf-Cache-Status: + - DYNAMIC + X-Robots-Tag: + - none + Server: + - cloudflare + Cf-Ray: + - "" + body: + encoding: ASCII-8BIT + string: '{"id":"msg_01TT4JjBFwG4xNknVkVvmknJ","type":"message","role":"assistant","model":"claude-3-7-sonnet-20250219","content":[{"type":"text","text":"The + capital of France is Paris."}],"stop_reason":"end_turn","stop_sequence":null,"usage":{"input_tokens":136015,"cache_creation_input_tokens":0,"cache_read_input_tokens":0,"output_tokens":10,"service_tier":"standard"}}' + recorded_at: Mon, 11 Aug 2025 23:45:24 GMT +- request: + method: post + uri: https://api.anthropic.com/v1/messages + body: + encoding: UTF-8 + string: '{"model":"claude-3-7-sonnet-20250219","messages":[{"role":"user","content":[{"type":"text","text":" + What is the capital of France?"}]},{"role":"assistant","content":[{"type":"text","text":"The + capital of France is Paris."}]},{"role":"user","content":[{"type":"text","text":"What + is the capital of Germany?"}]}],"stream":false,"max_tokens":64000,"temperature":0.7}' + headers: + User-Agent: + - Faraday v2.13.4 + X-Api-Key: + - "" + Anthropic-Version: + - '2023-06-01' + Content-Type: + - application/json + Accept-Encoding: + - gzip;q=1.0,deflate;q=0.6,identity;q=0.3 + Accept: + - "*/*" + response: + status: + code: 200 + message: OK + headers: + Date: + - Mon, 11 Aug 2025 23:45:41 GMT + Content-Type: + - application/json + Transfer-Encoding: + - chunked + Connection: + - keep-alive + Anthropic-Ratelimit-Input-Tokens-Limit: + - '200000' + Anthropic-Ratelimit-Input-Tokens-Remaining: + - '35000' + Anthropic-Ratelimit-Input-Tokens-Reset: + - '2025-08-11T23:46:31Z' + Anthropic-Ratelimit-Output-Tokens-Limit: + - '80000' + Anthropic-Ratelimit-Output-Tokens-Remaining: + - '80000' + Anthropic-Ratelimit-Output-Tokens-Reset: + - '2025-08-11T23:45:41Z' + Anthropic-Ratelimit-Requests-Limit: + - '4000' + Anthropic-Ratelimit-Requests-Remaining: + - '3999' + Anthropic-Ratelimit-Requests-Reset: + - '2025-08-11T23:45:25Z' + Anthropic-Ratelimit-Tokens-Limit: + - '280000' + Anthropic-Ratelimit-Tokens-Remaining: + - '115000' + Anthropic-Ratelimit-Tokens-Reset: + - '2025-08-11T23:45:41Z' + Request-Id: + - "" + Strict-Transport-Security: + - max-age=31536000; includeSubDomains; preload + Anthropic-Organization-Id: + - "" + Via: + - 1.1 google + Cf-Cache-Status: + - DYNAMIC + X-Robots-Tag: + - none + Server: + - cloudflare + Cf-Ray: + - "" + body: + encoding: ASCII-8BIT + string: '{"id":"msg_01CGKftupKeG9XamZvU18Stw","type":"message","role":"assistant","model":"claude-3-7-sonnet-20250219","content":[{"type":"text","text":"The + capital of Germany is Berlin."}],"stop_reason":"end_turn","stop_sequence":null,"usage":{"input_tokens":136035,"cache_creation_input_tokens":0,"cache_read_input_tokens":0,"output_tokens":10,"service_tier":"standard"}}' + recorded_at: Mon, 11 Aug 2025 23:45:41 GMT +- request: + method: post + uri: https://api.anthropic.com/v1/messages + body: + encoding: UTF-8 + string: '{"model":"claude-3-7-sonnet-20250219","messages":[{"role":"user","content":[{"type":"text","text":" + What is the capital of France?"}]},{"role":"assistant","content":[{"type":"text","text":"The + capital of France is Paris."}]},{"role":"user","content":[{"type":"text","text":"What + is the capital of Germany?"}]},{"role":"assistant","content":[{"type":"text","text":"The + capital of Germany is Berlin."}]},{"role":"user","content":[{"type":"text","text":"What + is the capital of Italy?"}]}],"stream":false,"max_tokens":64000,"temperature":0.7}' + headers: + User-Agent: + - Faraday v2.13.4 + X-Api-Key: + - "" + Anthropic-Version: + - '2023-06-01' + Content-Type: + - application/json + Accept-Encoding: + - gzip;q=1.0,deflate;q=0.6,identity;q=0.3 + Accept: + - "*/*" + response: + status: + code: 200 + message: OK + headers: + Date: + - Mon, 11 Aug 2025 23:45:58 GMT + Content-Type: + - application/json + Transfer-Encoding: + - chunked + Connection: + - keep-alive + Anthropic-Ratelimit-Input-Tokens-Limit: + - '200000' + Anthropic-Ratelimit-Input-Tokens-Remaining: + - '0' + Anthropic-Ratelimit-Input-Tokens-Reset: + - '2025-08-11T23:47:12Z' + Anthropic-Ratelimit-Output-Tokens-Limit: + - '80000' + Anthropic-Ratelimit-Output-Tokens-Remaining: + - '80000' + Anthropic-Ratelimit-Output-Tokens-Reset: + - '2025-08-11T23:45:58Z' + Anthropic-Ratelimit-Requests-Limit: + - '4000' + Anthropic-Ratelimit-Requests-Remaining: + - '3999' + Anthropic-Ratelimit-Requests-Reset: + - '2025-08-11T23:45:43Z' + Anthropic-Ratelimit-Tokens-Limit: + - '280000' + Anthropic-Ratelimit-Tokens-Remaining: + - '80000' + Anthropic-Ratelimit-Tokens-Reset: + - '2025-08-11T23:45:58Z' + Request-Id: + - "" + Strict-Transport-Security: + - max-age=31536000; includeSubDomains; preload + Anthropic-Organization-Id: + - "" + Via: + - 1.1 google + Cf-Cache-Status: + - DYNAMIC + X-Robots-Tag: + - none + Server: + - cloudflare + Cf-Ray: + - "" + body: + encoding: ASCII-8BIT + string: '{"id":"msg_01Euf7NDSa3Xx8DfupuQBBcj","type":"message","role":"assistant","model":"claude-3-7-sonnet-20250219","content":[{"type":"text","text":"The + capital of Italy is Rome."}],"stop_reason":"end_turn","stop_sequence":null,"usage":{"input_tokens":136055,"cache_creation_input_tokens":0,"cache_read_input_tokens":0,"output_tokens":10,"service_tier":"standard"}}' + recorded_at: Mon, 11 Aug 2025 23:45:58 GMT +- request: + method: post + uri: https://api.anthropic.com/v1/messages + body: + encoding: UTF-8 + string: '{"model":"claude-3-7-sonnet-20250219","messages":[{"role":"user","content":[{"type":"text","text":" + What is the capital of France?"}]},{"role":"assistant","content":[{"type":"text","text":"The + capital of France is Paris."}]},{"role":"user","content":[{"type":"text","text":"What + is the capital of Germany?"}]},{"role":"assistant","content":[{"type":"text","text":"The + capital of Germany is Berlin."}]},{"role":"user","content":[{"type":"text","text":"What + is the capital of Italy?"}]},{"role":"assistant","content":[{"type":"text","text":"The + capital of Italy is Rome."}]},{"role":"user","content":[{"type":"text","text":"What + is the capital of England?"}]}],"stream":false,"max_tokens":64000,"temperature":0.7}' + headers: + User-Agent: + - Faraday v2.13.4 + X-Api-Key: + - "" + Anthropic-Version: + - '2023-06-01' + Content-Type: + - application/json + Accept-Encoding: + - gzip;q=1.0,deflate;q=0.6,identity;q=0.3 + Accept: + - "*/*" + response: + status: + code: 429 + message: Too Many Requests + headers: + Date: + - Mon, 11 Aug 2025 23:46:00 GMT + Content-Type: + - application/json + Content-Length: + - '530' + Connection: + - keep-alive + Cf-Ray: + - "" + X-Should-Retry: + - 'true' + Anthropic-Ratelimit-Input-Tokens-Limit: + - '200000' + Anthropic-Ratelimit-Input-Tokens-Remaining: + - '0' + Anthropic-Ratelimit-Input-Tokens-Reset: + - '2025-08-11T23:47:12Z' + Anthropic-Ratelimit-Output-Tokens-Limit: + - '80000' + Anthropic-Ratelimit-Output-Tokens-Remaining: + - '80000' + Anthropic-Ratelimit-Output-Tokens-Reset: + - '2025-08-11T23:46:00Z' + Anthropic-Ratelimit-Requests-Limit: + - '4000' + Anthropic-Ratelimit-Requests-Remaining: + - '4000' + Anthropic-Ratelimit-Requests-Reset: + - '2025-08-11T23:46:00Z' + Retry-After: + - '65' + Anthropic-Ratelimit-Tokens-Limit: + - '280000' + Anthropic-Ratelimit-Tokens-Remaining: + - '80000' + Anthropic-Ratelimit-Tokens-Reset: + - '2025-08-11T23:46:00Z' + Request-Id: + - "" + Strict-Transport-Security: + - max-age=31536000; includeSubDomains; preload + Anthropic-Organization-Id: + - "" + Via: + - 1.1 google + Cf-Cache-Status: + - DYNAMIC + X-Robots-Tag: + - none + Server: + - cloudflare + body: + encoding: UTF-8 + string: '{"type":"error","error":{"type":"rate_limit_error","message":"This + request would exceed the rate limit for your organization (ce669383-80a8-43c4-bbb1-fbfcc2e806e9) + of 200,000 input tokens per minute. For details, refer to: https://docs.anthropic.com/en/api/rate-limits. + You can see the response headers for current usage. Please reduce the prompt + length or the maximum tokens requested, or try again later. You may also contact + sales at https://www.anthropic.com/contact-sales to discuss your options for + a rate limit increase."}}' + recorded_at: Mon, 11 Aug 2025 23:45:59 GMT +- request: + method: post + uri: https://bedrock-runtime..amazonaws.com/model/us.anthropic.claude-3-7-sonnet-20250219-v1:0/invoke + body: + encoding: UTF-8 + string: '{"anthropic_version":"bedrock-2023-05-31","messages":[{"role":"user","content":[{"type":"text","text":" + What is the capital of France?"}]},{"role":"assistant","content":[{"type":"text","text":"The + capital of France is Paris."}]},{"role":"user","content":[{"type":"text","text":"What + is the capital of Germany?"}]},{"role":"assistant","content":[{"type":"text","text":"The + capital of Germany is Berlin."}]},{"role":"user","content":[{"type":"text","text":"What + is the capital of Italy?"}]},{"role":"assistant","content":[{"type":"text","text":"The + capital of Italy is Rome."}]},{"role":"user","content":[{"type":"text","text":"What + is the capital of England?"}]}],"max_tokens":4096,"temperature":0.7}' + headers: + User-Agent: + - Faraday v2.13.4 + Host: + - bedrock-runtime..amazonaws.com + X-Amz-Date: + - 20250812T001549Z + X-Amz-Security-Token: + - "" + X-Amz-Content-Sha256: + - a4c4579d1f2fe24b5df17ec052660059f7f983e7ae877ec03720cae9104f6eee + Authorization: + - AWS4-HMAC-SHA256 Credential=/20250812//bedrock/aws4_request, + SignedHeaders=host;x-amz-content-sha256;x-amz-date;x-amz-security-token, Signature=b9773539cd6600de9a61bf730048759f16c2a17b3da174ef2adbbba92580a8a0 + Content-Type: + - application/json + Accept: + - application/json + Accept-Encoding: + - gzip;q=1.0,deflate;q=0.6,identity;q=0.3 + response: + status: + code: 200 + message: OK + headers: + Date: + - Tue, 12 Aug 2025 00:16:04 GMT + Content-Type: + - application/json + Content-Length: + - '429' + Connection: + - keep-alive + X-Amzn-Requestid: + - bd6b2cb0-b5f7-4817-b060-d93c32866681 + X-Amzn-Bedrock-Invocation-Latency: + - '13318' + X-Amzn-Bedrock-Output-Token-Count: + - '26' + X-Amzn-Bedrock-Input-Token-Count: + - '136083' + body: + encoding: UTF-8 + string: '{"id":"msg_bdrk_01FK7LhdR4y7N9DRT5HMwbEb","type":"message","role":"assistant","model":"claude-3-7-sonnet-20250219","content":[{"type":"text","text":"The + capital of England is London. More specifically, London is the capital city + of both England and the United Kingdom."}],"stop_reason":"end_turn","stop_sequence":null,"usage":{"input_tokens":136083,"cache_creation_input_tokens":0,"cache_read_input_tokens":0,"output_tokens":26}}' + recorded_at: Tue, 12 Aug 2025 00:16:04 GMT +recorded_with: VCR 6.3.1 diff --git a/spec/ruby_llm/chat_error_spec.rb b/spec/ruby_llm/chat_error_spec.rb index afe58c73..cd34250b 100644 --- a/spec/ruby_llm/chat_error_spec.rb +++ b/spec/ruby_llm/chat_error_spec.rb @@ -76,7 +76,7 @@ Psych::Parser.code_point_limit = 20_000_000 if Psych::Parser.respond_to?(:code_point_limit=) # Create a huge conversation (matching in spec_helper) - massive_text = 'a' * 1_000_000 + massive_text = MASSIVE_TEXT # Create a few copies in the conversation 5.times do diff --git a/spec/ruby_llm/chat_failover_spec.rb b/spec/ruby_llm/chat_failover_spec.rb new file mode 100644 index 00000000..4c73d539 --- /dev/null +++ b/spec/ruby_llm/chat_failover_spec.rb @@ -0,0 +1,141 @@ +# frozen_string_literal: true + +RSpec.describe RubyLLM::Chat do + context 'with failover' do + include_context 'with configured RubyLLM' + + it 'fails over to the next provider when the first one fails' do + chat = RubyLLM.chat(provider: :anthropic, model: 'claude-3-7-sonnet') + chat.with_failover({ provider: :bedrock, model: 'claude-3-7-sonnet' }) + + chat.ask "#{MASSIVE_TEXT_FOR_RATE_LIMIT_TEST} What is the capital of France?" + + chat.ask 'What is the capital of Germany?' + chat.ask 'What is the capital of Italy?' + response = chat.ask 'What is the capital of England?' + + expect(response.content).to include('London') + end + + it 'has a list of models to failover to' do + chat = RubyLLM.chat(provider: :anthropic, model: 'claude-3-7-sonnet') + chat.with_failover({ provider: :bedrock, model: 'claude-3-7-sonnet' }, 'gpt-5') + + expected_failover_configurations = [ + { provider: :bedrock, model: 'claude-3-7-sonnet' }, + { provider: :openai, model: 'gpt-5' } + ] + expect(chat.instance_variable_get(:@failover_configurations)).to eq(expected_failover_configurations) + end + + it 'does not fail over when bad request errors are raised' do + allow(RubyLLM::Models).to receive(:resolve).and_call_original + + chat = RubyLLM.chat(provider: :anthropic, model: 'claude-3-7-sonnet') + chat.with_failover({ provider: :bedrock, model: 'claude-3-7-sonnet' }) + + prompt = MASSIVE_TEXT_FOR_RATE_LIMIT_TEST * 3 + + expect { chat.ask prompt }.to raise_error(RubyLLM::BadRequestError) + + expect(RubyLLM::Models).to have_received(:resolve).once + end + + it 'can failover with an alternate context' do + response_message = RubyLLM::Message.new(content: 'Paris', role: :assistant) + model = RubyLLM::Model::Info.new(id: 'claude-3-7-sonnet', provider: :anthropic) + + # Mock the primary provider (will fail with rate limit) + primary_provider = instance_double(RubyLLM::Providers::Anthropic) + primary_connection = instance_double(RubyLLM::Connection) + allow(primary_provider).to receive_messages( + connection: primary_connection, + complete: RubyLLM::RateLimitError.new, + slug: :anthropic + ) + allow(primary_provider).to receive(:complete).and_raise(RubyLLM::RateLimitError) + + # Mock the backup provider (will succeed) + backup_provider = instance_double(RubyLLM::Providers::Anthropic) + backup_connection = instance_double(RubyLLM::Connection) + allow(backup_provider).to receive_messages( + connection: backup_connection, + complete: response_message, + slug: :anthropic + ) + + # Configure Models.resolve to return different providers based on context + allow(RubyLLM::Models).to receive(:resolve) do |_model_id, provider:, assume_exists:, config:| + _ = provider + _ = assume_exists + if config.anthropic_api_key == 'backup-key' + [model, backup_provider] + else + [model, primary_provider] + end + end + + chat = RubyLLM.chat(provider: :anthropic, model: 'claude-3-7-sonnet') + backup_context = RubyLLM.context do |config| + config.anthropic_api_key = 'backup-key' + end + chat.with_failover({ provider: :anthropic, model: 'claude-3-7-sonnet', context: backup_context }) + + response = chat.ask "#{MASSIVE_TEXT_FOR_RATE_LIMIT_TEST} What is the capital of France?" + + expect(response.content).to include('Paris') + + # Verify that the backup provider with the backup-key was actually called + expect(backup_provider).to have_received(:complete).once + end + + [ + RubyLLM::RateLimitError, + RubyLLM::ServiceUnavailableError, + RubyLLM::OverloadedError, + RubyLLM::ServerError + ].each do |error_class| + it "fails over when #{error_class.name.split('::').last} is raised" do + response_message = RubyLLM::Message.new(content: 'Success', role: :assistant) + model = RubyLLM::Model::Info.new(id: 'claude-3-7-sonnet', provider: :anthropic) + + # Mock the primary provider (will fail with the specified error) + primary_provider = instance_double(RubyLLM::Providers::Anthropic) + primary_connection = instance_double(RubyLLM::Connection) + allow(primary_provider).to receive_messages( + connection: primary_connection, + complete: error_class.new, + slug: :anthropic + ) + allow(primary_provider).to receive(:complete).and_raise(error_class) + + # Mock the backup provider (will succeed) + backup_provider = instance_double(RubyLLM::Providers::Bedrock) + backup_connection = instance_double(RubyLLM::Connection) + allow(backup_provider).to receive_messages( + connection: backup_connection, + complete: response_message, + slug: :bedrock + ) + + # Configure Models.resolve to return different providers + allow(RubyLLM::Models).to receive(:resolve) do |_model_id, provider:, **_kwargs| + case provider + when :anthropic + [model, primary_provider] + when :bedrock + [model, backup_provider] + end + end + + chat = RubyLLM.chat(provider: :anthropic, model: 'claude-3-7-sonnet') + chat.with_failover({ provider: :bedrock, model: 'claude-3-7-sonnet' }) + + response = chat.ask 'Test question' + + expect(response.content).to eq('Success') + expect(backup_provider).to have_received(:complete).once + end + end + end +end diff --git a/spec/spec_helper.rb b/spec/spec_helper.rb index 3ce35494..0136c764 100644 --- a/spec/spec_helper.rb +++ b/spec/spec_helper.rb @@ -44,6 +44,9 @@ require 'webmock/rspec' require_relative 'support/streaming_error_helpers' +MASSIVE_TEXT = 'a' * 1_000_000 +MASSIVE_TEXT_FOR_RATE_LIMIT_TEST = ('word ' * (200_000 - 64_000)).freeze + # VCR Configuration VCR.configure do |config| config.cassette_library_dir = 'spec/fixtures/vcr_cassettes' @@ -90,7 +93,8 @@ config.filter_sensitive_data('') { |interaction| interaction.response.headers['Cf-Ray']&.first } # Filter large strings used to test "context length exceeded" error handling - config.filter_sensitive_data('') { 'a' * 1_000_000 } + config.filter_sensitive_data('') { MASSIVE_TEXT } + config.filter_sensitive_data('') { MASSIVE_TEXT_FOR_RATE_LIMIT_TEST } # Filter cookies config.before_record do |interaction|