Skip to content

Conversation

@kattz-kawa
Copy link
Contributor

@kattz-kawa kattz-kawa commented Nov 18, 2025

Why we need this PR

  • After new placement rule introduced in Run manager with 2 replicas #180, the manager pods can only be scheduled on two specific nodes. Under these conditions, the default strategy may cause rollout deadlocks during node recovery.
  • Certificate rotation managed by OLM can trigger Deployment rollouts by changing the olmcahash annotation. With the current strategy, this may lead to deadlocks if a node is down during recovery, causing rollout failures. To improve reliability and ensure safe updates under these conditions, we need a more controlled approach.

Changes made

  • Added RollingUpdate strategy with maxSurge: 0 and maxUnavailable: 1 for safer deployment updates.

  • Before (default): When a rollout starts, the Deployment tries to keep the old pod until the new one is ready. On clusters with only 2 nodes, this can fail because there is no room to schedule the new pod, leading to a stuck rollout.

  • After: Pods are updated one at a time. The controller deletes one pod first, then creates a new one, ensuring that rollout can proceed safely even when only 2 nodes are available.

Which issue(s) this PR fixes

RHWA-366

Summary by CodeRabbit

  • Chores
    • Updated deployment strategy configuration for controller-manager components to implement RollingUpdate strategy with specified constraints on pod surges and allowed unavailability. These operational improvements ensure more controlled and stable application updates while maintaining better service availability throughout upgrade processes.

…vailable=1) to mitigate rollout deadlocks during automatic certificate rotation by OLM

Signed-off-by: Katsuya Kawakami <[email protected]>
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 18, 2025

Hi @kattz-kawa. Thanks for your PR.

I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@coderabbitai
Copy link

coderabbitai bot commented Nov 18, 2025

📝 Walkthrough

Walkthrough

Two YAML manifest files are updated to configure the controller-manager deployment strategy from default to explicit RollingUpdate with maxSurge: 0 and maxUnavailable: 1, controlling pod surge and availability during rollouts.

Changes

Cohort / File(s) Summary
Deployment Strategy Configuration
bundle/manifests/self-node-remediation.clusterserviceversion.yaml, config/manager/manager.yaml
Adds deployment strategy block with RollingUpdate type, maxSurge: 0, and maxUnavailable: 1 to control rollout behavior during updates

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

  • Simple, repetitive configuration changes applied consistently across two files
  • No behavioral logic, error handling, or conditional changes
  • Straightforward verification of strategy parameters

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Title check ✅ Passed The title 'Change Deployment UpdateStrategy' directly describes the primary change: modifying the Deployment update strategy from default to RollingUpdate with specific parameters.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 2, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kattz-kawa, slintes

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved label Dec 2, 2025
@slintes
Copy link
Member

slintes commented Dec 2, 2025

/ok-to-test

@slintes
Copy link
Member

slintes commented Dec 2, 2025

/retest

@slintes slintes marked this pull request as ready for review December 2, 2025 14:34
@openshift-ci openshift-ci bot requested review from mshitrit and razo7 December 2, 2025 14:34
@slintes slintes changed the title [WIP] Change Deployment UpdateStrategy Change Deployment UpdateStrategy Dec 2, 2025
@slintes
Copy link
Member

slintes commented Dec 2, 2025

only 4.19 wasn't green yet

/override ci/prow/4.20-openshift-e2e ci/prow/4.18-openshift-e2e ci/prow/4.17-openshift-e2e ci/prow/4.16-openshift-e2e

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 2, 2025

@slintes: Overrode contexts on behalf of slintes: ci/prow/4.16-openshift-e2e, ci/prow/4.17-openshift-e2e, ci/prow/4.18-openshift-e2e, ci/prow/4.20-openshift-e2e

In response to this:

only 4.19 wasn't green yet

/override ci/prow/4.20-openshift-e2e ci/prow/4.18-openshift-e2e ci/prow/4.17-openshift-e2e ci/prow/4.16-openshift-e2e

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@slintes
Copy link
Member

slintes commented Dec 3, 2025

/retest

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 3, 2025

@kattz-kawa: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/4.19-openshift-e2e a7f0d7a link true /test 4.19-openshift-e2e
ci/prow/4.20-openshift-e2e a7f0d7a link true /test 4.20-openshift-e2e

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@slintes
Copy link
Member

slintes commented Dec 3, 2025

probably not related to this PR, but we need to investigate test failures...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants