controller: skip MCP use when MachineConfigPool API is absent on HCP#2291
controller: skip MCP use when MachineConfigPool API is absent on HCP#2291paulczar wants to merge 1 commit into
Conversation
|
Warning Review limit reached
More reviews will be available in 6 minutes and 49 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Repository YAML (base), Central YAML (inherited) Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (3)
📝 WalkthroughWalkthroughThis PR adds graceful handling for clusters where the MachineConfigPool API is unavailable. It introduces a new isMachineConfigPoolAvailable() helper that probes the MCP API by attempting a dummy GET request, then uses this helper in two places: SetupWithManager() conditionally registers the MachineConfigPool watch only when the API is available, while always wiring Node and ConfigMap watches; checkConvergedCluster() checks availability upfront and returns non-converged when MCP is unavailable. The changes include unit tests for the detection logic and integrated convergence behavior. Sequence DiagramsequenceDiagram
participant SetupWithManager
participant isMachineConfigPoolAvailable
participant KubeAPI as Kubernetes API
participant ControllerBuilder
participant checkConvergedCluster
SetupWithManager->>isMachineConfigPoolAvailable: Probe MCP API availability
isMachineConfigPoolAvailable->>KubeAPI: GET MachineConfigPool
alt MCP API Available
KubeAPI-->>isMachineConfigPoolAvailable: Success or NotFound
isMachineConfigPoolAvailable-->>SetupWithManager: true
SetupWithManager->>ControllerBuilder: Register MachineConfigPool watch
else MCP API Unavailable
KubeAPI-->>isMachineConfigPoolAvailable: NoKindMatchError
isMachineConfigPoolAvailable-->>SetupWithManager: false
SetupWithManager->>ControllerBuilder: Skip MachineConfigPool watch, log info
end
ControllerBuilder->>ControllerBuilder: Always register Node and ConfigMap watches
rect rgba(200, 200, 255, 0.5)
Note over checkConvergedCluster: During reconciliation
checkConvergedCluster->>isMachineConfigPoolAvailable: Check MCP availability
alt MCP Available
isMachineConfigPoolAvailable-->>checkConvergedCluster: true
checkConvergedCluster->>checkConvergedCluster: Proceed with MCP convergence check
else MCP Unavailable
isMachineConfigPoolAvailable-->>checkConvergedCluster: false
checkConvergedCluster-->>checkConvergedCluster: Return non-converged
end
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 13 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (13 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@controllers/deployment_mode_handler.go`:
- Around line 127-136: Replace the unbounded context usage in
isMachineConfigPoolAvailable (and similarly in isMachineConfigAvailable) with a
context that has a timeout: create ctx, cancel :=
context.WithTimeout(context.Background(), <reasonableDuration>) and defer
cancel() before calling r.Client.Get, add the context import if missing, and
ensure you handle/return the context deadline exceeded error appropriately
(preserve existing error logic such as k8serrors.IsNotFound).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 2eb91f3f-33c8-411b-90b1-6ae861869d7e
📒 Files selected for processing (3)
controllers/deployment_mode_handler.gocontrollers/deployment_mode_handler_test.gocontrollers/openshift_controller.go
… HCP On ROSA HCP guest clusters, the MachineConfigPool API may be absent while other MCO types exist. OSC 1.12 unconditionally watches MCP at manager startup and lists MCPs during reconcile, causing crash loops or repeated ERROR logs before DaemonSet-based Kata install can complete (KATA-4840). - Probe for MachineConfigPool API availability (same pattern as MachineConfig) - Register MCP watch only when the API is present - Short-circuit checkConvergedCluster() when MCP is unavailable - Use context with timeout for API probe calls - Unit tests for both paths Fixes: rhjira#KATA-4840 Related: KATA-4233, KATA-5177, KATA-4597
89d829d to
7c140b2
Compare
|
@paulczar: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Description of the problem
On ROSA HCP guest clusters, the MachineConfigPool API may be absent while other
MCO types exist. OSC 1.12 unconditionally watches MCP at manager startup and
lists MCPs during reconcile, causing crash loops or repeated ERROR logs before
DaemonSet-based Kata install can complete (KATA-4840).
What I did
checkConvergedCluster()when MCP is unavailableHow to verify
osc-feature-gatesdeploymentMode: DaemonSetcontroller-managerstays up (no MCP cache sync failure)katawhen install completes)Lab validation (author)
Validated on ROSA HCP with
c5n.metalworkers: DaemonSet mode, compute poolauto_repair: false(ROSA ops — not part of this PR), manual reboot afterosc-rpm-install, thenRuntimeClass/kataand a pod withruntimeClassName: katarunning in a KVM sandbox.Fixes: rhjira#KATA-4840
Related: KATA-4233 (DaemonSet reboot), KATA-5177 (live-apply), KATA-4597 (DS image)
Changelog
HCP: skip MachineConfigPool integration when MCP API is absent so DaemonSet
deployment mode can run on guest clusters without full MCO.