ACM-20933: Don't hot loop if the console route is missing #2782

2uasimojo · 2025-11-14T21:14:12Z

After a CD is Installed, we try to set the API and console URIs in the status. The latter is gleaned from the console Route on the remote cluster. It has recently come to my attention that it is possible for a cluster not to have that Route at all. But the code path that was looking for it was doing an immediate requeue, so we would end up hot looping, ultimately going into backoff. In scenarios where the absence is transient, this would resolve itself; but if that Route was never expected to exist, this was causing unnecessary churn in the controller, especially at scale.

With this change, we:

Split the error path when we fail to retrieve the console Route. When the error is a 404, we requeue with a static delay of 10m. Otherwise, we requeue "immediately" as is the usual pattern for such errors.
Carry on updating the API URL even if the console URL can't be determined (previously it was both or neither). (Extra logic was needed here to update only if something changed.)
Use error returns rather than info logs & non-error returns. Not sure why it was doing the latter previously; I'm only guessing it's because that code was from a bygone era where that made more sense for some reason. This modernization should make the error paths pop out of the logs more readily, which would have helped in diagnosing the scaling performance issue in the referenced card.

After a CD is Installed, we try to set the API and console URIs in the status. The latter is gleaned from the `console` Route on the remote cluster. It has recently come to my attention that it is possible for a cluster not to have that Route at all. But the code path that was looking for it was doing an immediate requeue, so we would end up hot looping, ultimately going into backoff. In scenarios where the absence is transient, this would resolve itself; but if that Route was never expected to exist, this was causing unnecessary churn in the controller, especially at scale. With this change, we: - Split the error path when we fail to retrieve the `console` Route. When the error is a 404, we requeue with a static delay of 10m. Otherwise, we requeue "immediately" as is the usual pattern for such errors. - Carry on updating the API URL even if the console URL can't be determined (previously it was both or neither). (Extra logic was needed here to update only if something changed.) - Use error returns rather than info logs & non-error returns. Not sure why it was doing the latter previously; I'm only guessing it's because that code was from a bygone era where that made more sense for some reason. This modernization should make the error paths pop out of the logs more readily, which would have helped in diagnosing the scaling performance issue in the referenced card.

openshift-ci-robot · 2025-11-14T21:14:17Z

@2uasimojo: This pull request references ACM-20933 which is a valid jira issue.

In response to this:

After a CD is Installed, we try to set the API and console URIs in the status. The latter is gleaned from the console Route on the remote cluster. It has recently come to my attention that it is possible for a cluster not to have that Route at all. But the code path that was looking for it was doing an immediate requeue, so we would end up hot looping, ultimately going into backoff. In scenarios where the absence is transient, this would resolve itself; but if that Route was never expected to exist, this was causing unnecessary churn in the controller, especially at scale.

With this change, we:

Split the error path when we fail to retrieve the console Route. When the error is a 404, we requeue with a static delay of 10m. Otherwise, we requeue "immediately" as is the usual pattern for such errors.

Carry on updating the API URL even if the console URL can't be determined (previously it was both or neither). (Extra logic was needed here to update only if something changed.)

Use error returns rather than info logs & non-error returns. Not sure why it was doing the latter previously; I'm only guessing it's because that code was from a bygone era where that made more sense for some reason. This modernization should make the error paths pop out of the logs more readily, which would have helped in diagnosing the scaling performance issue in the referenced card.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci · 2025-11-14T21:14:34Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: 2uasimojo

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [2uasimojo]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

2uasimojo · 2025-11-14T21:17:34Z

/assign @dlom
/cc @akrzos

openshift-ci · 2025-11-14T23:21:34Z

@2uasimojo: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

codecov · 2025-11-14T23:22:11Z

Codecov Report

❌ Patch coverage is 61.90476% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 50.35%. Comparing base (fe9c66b) to head (9d33c41).

Files with missing lines	Patch %	Lines
.../clusterdeployment/clusterdeployment_controller.go	61.90%	7 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #2782      +/-   ##
==========================================
+ Coverage   50.34%   50.35%   +0.01%     
==========================================
  Files         279      279              
  Lines       34167    34178      +11     
==========================================
+ Hits        17201    17210       +9     
- Misses      15612    15615       +3     
+ Partials     1354     1353       -1

Files with missing lines	Coverage Δ
.../clusterdeployment/clusterdeployment_controller.go	`62.10% <61.90%> (+0.11%)`	⬆️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Nov 14, 2025

openshift-ci bot requested review from dlom and suhanime November 14, 2025 21:14

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 14, 2025

openshift-ci bot assigned dlom Nov 14, 2025

openshift-ci bot requested a review from akrzos November 14, 2025 21:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ACM-20933: Don't hot loop if the console route is missing #2782

ACM-20933: Don't hot loop if the console route is missing #2782

Uh oh!

2uasimojo commented Nov 14, 2025

Uh oh!

openshift-ci-robot commented Nov 14, 2025 •

edited by openshift-ci bot

Loading

Uh oh!

openshift-ci bot commented Nov 14, 2025

Uh oh!

2uasimojo commented Nov 14, 2025

Uh oh!

openshift-ci bot commented Nov 14, 2025

Uh oh!

codecov bot commented Nov 14, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ACM-20933: Don't hot loop if the console route is missing #2782

Are you sure you want to change the base?

ACM-20933: Don't hot loop if the console route is missing #2782

Uh oh!

Conversation

2uasimojo commented Nov 14, 2025

Uh oh!

openshift-ci-robot commented Nov 14, 2025 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci bot commented Nov 14, 2025

Uh oh!

2uasimojo commented Nov 14, 2025

Uh oh!

openshift-ci bot commented Nov 14, 2025

Uh oh!

codecov bot commented Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

openshift-ci-robot commented Nov 14, 2025 •

edited by openshift-ci bot

Loading

codecov bot commented Nov 14, 2025 •

edited

Loading