WIP: Add retry logic for WellKnownAvailable endpoint checks #779

sdodson · 2025-08-11T19:02:00Z

The operator now retries the well-known endpoint check every second for up to 15 seconds before setting itself unavailable. This prevents premature "unavailable" status during temporary network issues or API server startup scenarios.

Changes:

Add retry loop with 15-second timeout and 1-second intervals
Preserve existing ControllerProgressingError handling logic
Set WellKnownAvailable=False with reason "NotReady" after retries exhausted
Maintain proper progressing status during retry attempts

This change improves operator reliability by giving well-known endpoints time to become ready while maintaining the same final error handling behavior.

Assisted-by: Cursor, Claude Sonnet 4

Related to https://issues.redhat.com/browse/OCPBUGS-20056 but since this only addresses one of the paths I'm not linking this until I can prove whether or not this seems to help.

The operator now retries the well-known endpoint check every second for up to 15 seconds before setting itself unavailable. This prevents premature "unavailable" status during temporary network issues or API server startup scenarios. Changes: - Add retry loop with 15-second timeout and 1-second intervals - Preserve existing ControllerProgressingError handling logic - Set WellKnownAvailable=False with reason "NotReady" after retries exhausted - Maintain proper progressing status during retry attempts This change improves operator reliability by giving well-known endpoints time to become ready while maintaining the same final error handling behavior. Assisted-by: Cursor, Claude Sonnet 4

sdodson · 2025-08-11T19:02:35Z

/test ?

openshift-ci · 2025-08-11T19:02:35Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: sdodson
Once this PR has been reviewed and has the lgtm label, please assign liouk for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci · 2025-08-11T19:02:39Z

@sdodson: The following commands are available to trigger required jobs:

/test e2e-agnostic

/test e2e-agnostic-upgrade

/test e2e-console-login

/test e2e-gcp-operator-encryption-perf

/test e2e-gcp-operator-encryption-rotation

/test e2e-oidc-techpreview

/test e2e-operator

/test e2e-operator-encryption

/test images

/test okd-scos-images

/test unit

/test verify

/test verify-bindata

/test verify-deps

The following commands are available to trigger optional jobs:

/test e2e-agnostic-ipv6

/test e2e-aws-external-oidc

/test e2e-aws-single-node

/test e2e-azure-external-oidc

/test e2e-gcp-external-oidc

/test okd-scos-e2e-aws-ovn

/test test-operator-integration

Use /test all to run the following jobs that were automatically triggered:

pull-ci-openshift-cluster-authentication-operator-master-e2e-agnostic

pull-ci-openshift-cluster-authentication-operator-master-e2e-agnostic-ipv6

pull-ci-openshift-cluster-authentication-operator-master-e2e-agnostic-upgrade

pull-ci-openshift-cluster-authentication-operator-master-e2e-aws-single-node

pull-ci-openshift-cluster-authentication-operator-master-e2e-console-login

pull-ci-openshift-cluster-authentication-operator-master-e2e-operator

pull-ci-openshift-cluster-authentication-operator-master-images

pull-ci-openshift-cluster-authentication-operator-master-okd-scos-e2e-aws-ovn

pull-ci-openshift-cluster-authentication-operator-master-okd-scos-images

pull-ci-openshift-cluster-authentication-operator-master-test-operator-integration

pull-ci-openshift-cluster-authentication-operator-master-unit

pull-ci-openshift-cluster-authentication-operator-master-verify

pull-ci-openshift-cluster-authentication-operator-master-verify-bindata

pull-ci-openshift-cluster-authentication-operator-master-verify-deps

In response to this:

/test ?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

sdodson · 2025-08-11T19:06:59Z

/payload-test periodic-ci-openshift-release-master-ci-4.20-e2e-aws-upgrade-ovn-single-node

openshift-ci · 2025-08-11T19:07:01Z

@sdodson: it appears that you have attempted to use some version of the payload command, but your comment was incorrectly formatted and cannot be acted upon. See the docs for usage info.

sdodson · 2025-08-11T19:07:30Z

/payload-job periodic-ci-openshift-release-master-ci-4.20-e2e-aws-upgrade-ovn-single-node

openshift-ci · 2025-08-11T19:07:33Z

@sdodson: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

periodic-ci-openshift-release-master-ci-4.20-e2e-aws-upgrade-ovn-single-node

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/6e4f93f0-76e6-11f0-86f8-a1b16a4176a6-0

openshift-ci · 2025-10-15T12:15:36Z

@sdodson: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 11, 2025

openshift-ci bot requested review from ibihim and liouk August 11, 2025 19:02

sdodson marked this pull request as draft August 11, 2025 19:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

WIP: Add retry logic for WellKnownAvailable endpoint checks #779

WIP: Add retry logic for WellKnownAvailable endpoint checks #779

sdodson commented Aug 11, 2025

Uh oh!

sdodson commented Aug 11, 2025

Uh oh!

openshift-ci bot commented Aug 11, 2025

Uh oh!

openshift-ci bot commented Aug 11, 2025

Uh oh!

sdodson commented Aug 11, 2025

Uh oh!

openshift-ci bot commented Aug 11, 2025

Uh oh!

sdodson commented Aug 11, 2025

Uh oh!

openshift-ci bot commented Aug 11, 2025

Uh oh!

openshift-ci bot commented Oct 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

WIP: Add retry logic for WellKnownAvailable endpoint checks #779

Are you sure you want to change the base?

WIP: Add retry logic for WellKnownAvailable endpoint checks #779

Conversation

sdodson commented Aug 11, 2025

Uh oh!

sdodson commented Aug 11, 2025

Uh oh!

openshift-ci bot commented Aug 11, 2025

Uh oh!

openshift-ci bot commented Aug 11, 2025

Uh oh!

sdodson commented Aug 11, 2025

Uh oh!

openshift-ci bot commented Aug 11, 2025

Uh oh!

sdodson commented Aug 11, 2025

Uh oh!

openshift-ci bot commented Aug 11, 2025

Uh oh!

openshift-ci bot commented Oct 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant