-
Notifications
You must be signed in to change notification settings - Fork 257
ACM-20933: Don't hot loop if the console route is missing #2782
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
After a CD is Installed, we try to set the API and console URIs in the status. The latter is gleaned from the `console` Route on the remote cluster. It has recently come to my attention that it is possible for a cluster not to have that Route at all. But the code path that was looking for it was doing an immediate requeue, so we would end up hot looping, ultimately going into backoff. In scenarios where the absence is transient, this would resolve itself; but if that Route was never expected to exist, this was causing unnecessary churn in the controller, especially at scale. With this change, we: - Split the error path when we fail to retrieve the `console` Route. When the error is a 404, we requeue with a static delay of 10m. Otherwise, we requeue "immediately" as is the usual pattern for such errors. - Carry on updating the API URL even if the console URL can't be determined (previously it was both or neither). (Extra logic was needed here to update only if something changed.) - Use error returns rather than info logs & non-error returns. Not sure why it was doing the latter previously; I'm only guessing it's because that code was from a bygone era where that made more sense for some reason. This modernization should make the error paths pop out of the logs more readily, which would have helped in diagnosing the scaling performance issue in the referenced card.
|
@2uasimojo: This pull request references ACM-20933 which is a valid jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: 2uasimojo The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@2uasimojo: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #2782 +/- ##
==========================================
+ Coverage 50.34% 50.35% +0.01%
==========================================
Files 279 279
Lines 34167 34178 +11
==========================================
+ Hits 17201 17210 +9
- Misses 15612 15615 +3
+ Partials 1354 1353 -1
🚀 New features to boost your workflow:
|
After a CD is Installed, we try to set the API and console URIs in the status. The latter is gleaned from the
consoleRoute on the remote cluster. It has recently come to my attention that it is possible for a cluster not to have that Route at all. But the code path that was looking for it was doing an immediate requeue, so we would end up hot looping, ultimately going into backoff. In scenarios where the absence is transient, this would resolve itself; but if that Route was never expected to exist, this was causing unnecessary churn in the controller, especially at scale.With this change, we:
consoleRoute. When the error is a 404, we requeue with a static delay of 10m. Otherwise, we requeue "immediately" as is the usual pattern for such errors.