OCPBUGS-91650: disable consolidation in karpenter upgrade test#8874
OCPBUGS-91650: disable consolidation in karpenter upgrade test#8874maxcao13 wants to merge 1 commit into
Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
|
@maxcao13: This pull request references Jira Issue OCPBUGS-91650, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Skipping CI for Draft Pull Request. |
|
/test e2e-aws-autonode |
|
/jira refresh |
|
@maxcao13: This pull request references Jira Issue OCPBUGS-91650, which is invalid:
Comment DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/jira refresh |
|
@maxcao13: This pull request references Jira Issue OCPBUGS-91650, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: maxcao13 The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository YAML (base), Central YAML (inherited) Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughIn 🚥 Pre-merge checks | ✅ 11✅ Passed checks (11 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
@maxcao13: This pull request references Jira Issue OCPBUGS-91650, which is valid. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #8874 +/- ##
==========================================
+ Coverage 43.19% 43.28% +0.08%
==========================================
Files 767 771 +4
Lines 94910 95503 +593
==========================================
+ Hits 40997 41335 +338
- Misses 51050 51284 +234
- Partials 2863 2884 +21 see 16 files with indirect coverage changes
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
|
/test e2e-aws |
|
/test e2e-aws-autonode |
Test Resultse2e-aws
|
|
Now I have all the information I need. Let me consolidate the analysis: 12 test failures, 3 distinct root causes:
These are all pre-existing flakes unrelated to the PR changes (which only modify Test Failure Analysis CompleteJob Information
Test Failure AnalysisErrorSummaryAll 12 test failures are pre-existing flakes unrelated to PR #8874. The PR only modifies Root CauseThere are three independent failure modes, none related to this PR: 1. Affects: The 2. HostedCluster condition validation race condition Affects: The test framework ( 3. Namespace deletion timeout during teardown Affects: After the HostedCluster Recommendations
Evidence
|
|
/pipeline-required |
jparrill
left a comment
There was a problem hiding this comment.
Dropped some comments. Thanks!
| // Erroenous consolidation can cause the test to fail where the new Node is consolidated due to Empty or | ||
| // Underutilized before the old node's pods get scheduled to it. We should discuss this in the upstream. | ||
| // Ref: https://redhat.atlassian.net/browse/OCPBUGS-91966 | ||
| karpenterNodePool.Spec.Disruption.ConsolidateAfter = karpenterv1.MustParseNillableDuration("Never") |
There was a problem hiding this comment.
Just confirming: "Never" is the right call here over a longer duration (e.g. "300s"). The race is timing-dependent and any finite value would still flake eventually. Since this test validates drift-based upgrade, not consolidation behavior, fully disabling it is the clean workaround. 👍
We disable consolidation as a hack to prevent flakiness in this blocking test. Erroenous consolidation can cause the test to fail where the new Node is consolidated due to Empty or Underutilized before the old node's pods get scheduled to it. We should discuss the actual problem/fix in the upstream. Ref: https://redhat.atlassian.net/browse/OCPBUGS-91966 Signed-off-by: Max Cao <macao@redhat.com>
6b690a8 to
2d72d2f
Compare
|
Thanks! Addressed @jparrill |
|
/lgtm |
|
Scheduling tests matching the |
|
@maxcao13: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
What this PR does / why we need it:
We disable consolidation as a hack to prevent flakiness in the karpenter_control_plane_upgrade suite and test. Erroenous consolidation can cause the test to fail where the new Node is consolidated by Karpenter due to Empty or Underutilized before the old node's pods get scheduled to it.
We should discuss the actual problem/fix separately in the upstream. Ref: https://redhat.atlassian.net/browse/OCPBUGS-91966. For now this should prevent flaky test failures blocking PRs from merging.
Which issue(s) this PR fixes:
Fixes https://redhat.atlassian.net/browse/OCPBUGS-91650
Special notes for your reviewer:
Checklist:
Summary by CodeRabbit