Skip to content

fix(bedrock): only retry transient botocore errors#83

Merged
dpomian merged 1 commit into
AmadeusITGroup:mainfrom
MarouaneBenabdelkader:fix/bedrock-retry-scope
May 18, 2026
Merged

fix(bedrock): only retry transient botocore errors#83
dpomian merged 1 commit into
AmadeusITGroup:mainfrom
MarouaneBenabdelkader:fix/bedrock-retry-scope

Conversation

@MarouaneBenabdelkader

Copy link
Copy Markdown
Contributor

Addresses the Copilot review comment on #69.

Replaces except Exception in the Bedrock retry loop with except (ClientError, BotoCoreError) plus _is_retryable(exc). Only retries throttling, 5xx, transient model
errors, and transport-level timeouts. Permanent errors (ValidationException, AccessDeniedException, malformed-response KeyError, etc.) now surface immediately
instead of being hidden under up to 6s of backoff.

The previous 'except Exception' caught and retried KeyError from
malformed responses, missing-credential errors, and other programmer or
config mistakes -- wasting up to 6s of backoff per failed call and
hiding the real cause.

Narrow the catch to (ClientError, BotoCoreError) and gate retries with
_is_retryable, which only retries throttling, 5xx, transient model
errors, and transport-level BotoCoreError. Permanent ClientError codes
(ValidationException, AccessDeniedException, ResourceNotFoundException)
and any non-botocore exception surface immediately.

Addresses the Copilot review comment on PR AmadeusITGroup#69.
Copilot AI review requested due to automatic review settings May 18, 2026 12:56

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adjusts the AWS Bedrock Titan embedding retry loop to only retry transient boto3/botocore failures, allowing permanent errors to surface immediately instead of being masked by backoff.

Changes:

  • Replace broad except Exception retry handling with except (ClientError, BotoCoreError) and a retryability gate.
  • Introduce _RETRYABLE_BEDROCK_CODES and _is_retryable() to classify retryable Bedrock error codes / HTTP 5xx responses.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +43 to +49
def _is_retryable(self, exc: Exception) -> bool:
if isinstance(exc, ClientError):
code = exc.response.get("Error", {}).get("Code", "")
status = exc.response.get("ResponseMetadata", {}).get("HTTPStatusCode", 0)
return code in _RETRYABLE_BEDROCK_CODES or 500 <= status < 600
# BotoCoreError covers connection/read timeouts and other transport-level issues
return isinstance(exc, BotoCoreError)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for your contribution!

@dpomian dpomian merged commit f9f3137 into AmadeusITGroup:main May 18, 2026
12 of 13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants