Skip to content

Conversation

@sdodson
Copy link
Member

@sdodson sdodson commented Aug 11, 2025

The operator now retries the well-known endpoint check every second for up to 15 seconds before setting itself unavailable. This prevents premature "unavailable" status during temporary network issues or API server startup scenarios.

Changes:

  • Add retry loop with 15-second timeout and 1-second intervals
  • Preserve existing ControllerProgressingError handling logic
  • Set WellKnownAvailable=False with reason "NotReady" after retries exhausted
  • Maintain proper progressing status during retry attempts

This change improves operator reliability by giving well-known endpoints time to become ready while maintaining the same final error handling behavior.

Assisted-by: Cursor, Claude Sonnet 4

Related to https://issues.redhat.com/browse/OCPBUGS-20056 but since this only addresses one of the paths I'm not linking this until I can prove whether or not this seems to help.

The operator now retries the well-known endpoint check every second for up to 15 seconds before setting itself unavailable. This prevents premature "unavailable" status during temporary network issues or API server startup scenarios.

Changes:
- Add retry loop with 15-second timeout and 1-second intervals
- Preserve existing ControllerProgressingError handling logic
- Set WellKnownAvailable=False with reason "NotReady" after retries exhausted
- Maintain proper progressing status during retry attempts

This change improves operator reliability by giving well-known endpoints time to become ready while maintaining the same final error handling behavior.

Assisted-by: Cursor, Claude Sonnet 4
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 11, 2025
@openshift-ci openshift-ci bot requested review from ibihim and liouk August 11, 2025 19:02
@sdodson
Copy link
Member Author

sdodson commented Aug 11, 2025

/test ?

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 11, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: sdodson
Once this PR has been reviewed and has the lgtm label, please assign liouk for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 11, 2025

@sdodson: The following commands are available to trigger required jobs:

/test e2e-agnostic
/test e2e-agnostic-upgrade
/test e2e-console-login
/test e2e-gcp-operator-encryption-perf
/test e2e-gcp-operator-encryption-rotation
/test e2e-oidc-techpreview
/test e2e-operator
/test e2e-operator-encryption
/test images
/test okd-scos-images
/test unit
/test verify
/test verify-bindata
/test verify-deps

The following commands are available to trigger optional jobs:

/test e2e-agnostic-ipv6
/test e2e-aws-external-oidc
/test e2e-aws-single-node
/test e2e-azure-external-oidc
/test e2e-gcp-external-oidc
/test okd-scos-e2e-aws-ovn
/test test-operator-integration

Use /test all to run the following jobs that were automatically triggered:

pull-ci-openshift-cluster-authentication-operator-master-e2e-agnostic
pull-ci-openshift-cluster-authentication-operator-master-e2e-agnostic-ipv6
pull-ci-openshift-cluster-authentication-operator-master-e2e-agnostic-upgrade
pull-ci-openshift-cluster-authentication-operator-master-e2e-aws-single-node
pull-ci-openshift-cluster-authentication-operator-master-e2e-console-login
pull-ci-openshift-cluster-authentication-operator-master-e2e-operator
pull-ci-openshift-cluster-authentication-operator-master-images
pull-ci-openshift-cluster-authentication-operator-master-okd-scos-e2e-aws-ovn
pull-ci-openshift-cluster-authentication-operator-master-okd-scos-images
pull-ci-openshift-cluster-authentication-operator-master-test-operator-integration
pull-ci-openshift-cluster-authentication-operator-master-unit
pull-ci-openshift-cluster-authentication-operator-master-verify
pull-ci-openshift-cluster-authentication-operator-master-verify-bindata
pull-ci-openshift-cluster-authentication-operator-master-verify-deps

In response to this:

/test ?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@sdodson
Copy link
Member Author

sdodson commented Aug 11, 2025

/payload-test periodic-ci-openshift-release-master-ci-4.20-e2e-aws-upgrade-ovn-single-node

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 11, 2025

@sdodson: it appears that you have attempted to use some version of the payload command, but your comment was incorrectly formatted and cannot be acted upon. See the docs for usage info.

@sdodson
Copy link
Member Author

sdodson commented Aug 11, 2025

/payload-job periodic-ci-openshift-release-master-ci-4.20-e2e-aws-upgrade-ovn-single-node

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 11, 2025

@sdodson: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-ci-4.20-e2e-aws-upgrade-ovn-single-node

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/6e4f93f0-76e6-11f0-86f8-a1b16a4176a6-0

@sdodson sdodson marked this pull request as draft August 11, 2025 19:11
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 15, 2025

@sdodson: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant