-
Notifications
You must be signed in to change notification settings - Fork 233
Add Windows VMs vCPU overcommit prevention VAP #2535
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Can one of the admins verify this patch? |
|
Hi @jcanocan. Thanks for your PR. I'm waiting for a openshift member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
| - virtualmachineinstances | ||
| validations: | ||
| - expression: has(object.spec.domain.cpu.dedicatedCpuPlacement) && object.spec.domain.cpu.dedicatedCpuPlacement == true | ||
| message: Windows VMIs require dedicated CPU placement. Set spec.domain.cpu.dedicatedCpuPlacement to true. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spec.domain.cpu.dedicatedCpuPlacement is quite low level. Can we point the user to an instancetype to use? If there is no instancetype available yet, let's create one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we can also follow that path. Great idea. In such case, the drawbacks will still apply. However, we will have a 100% supported way of creating Windows VMs, which is a nice advantage compared with my orginal proposal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After thinking about this twice, if we want to follow this idea, we should force the user to pick up specific preferences for the supported Windows versions. We can include them in the common-instancetypes repo or here, whatever we find more appropriated. This will allow the user to have different windows sizes by using the different instance types we currently provide.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds worth a try!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added an approximation of what it could look like, it is still required to add further preferences for all supported windows versions.
0ada4dc to
4954ab2
Compare
deploy/srep-vap/vcpu-overcommit/101-windows-10-vcpu-restrict.yaml
Outdated
Show resolved
Hide resolved
deploy/srep-vap/vcpu-overcommit/101-windows-10-vcpu-restrict.yaml
Outdated
Show resolved
Hide resolved
deploy/srep-vap/vcpu-overcommit/101-windows-10-vcpu-restrict.yaml
Outdated
Show resolved
Hide resolved
9ac0225 to
704e950
Compare
704e950 to
611a5f8
Compare
|
@ksimon1 many thanks for your reviews! Much appreciated :) |
611a5f8 to
1241d17
Compare
|
/ok-to-test |
| - "4.14" | ||
| - "4.15" | ||
| - "4.16" | ||
| - "4.17" | ||
| - "4.18" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about 4.19 and 4.20?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this means not applying to those version, or say only apply to 4.19+. I will let @jcanocan to clarify.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's the idea. Technically speaking, we can deploy the VAP in versions starting from 4.17 (included), that version is based on Kubernetes 1.30, which is the version VAP were GA'd: https://kubernetes.io/docs/reference/access-authn-authz/validating-admission-policy/ https://docs.redhat.com/en/documentation/openshift_container_platform/4.17/html/release_notes/ocp-4-17-release-notes
Therefore, I will exclude just 4.14/15/16. Let me know if you find another limitation or reason for excluding other versions.
Thanks for raising this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The BU has set the minimum version for WinLI to 4.19 (slack post), so there's no need to roll this out on 4.18 and below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All right, so let's stick with 4.19+ then.
| deploymentMode: Policy | ||
| clusterSelectors: | ||
| matchExpressions: | ||
| - key: hypershift.open-cluster-management.io/hosted-cluster |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want this on every hosted cluster? Do the CRDs exist on every cluster even if kubevirt is unused?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologizes for my ignorance here, I'm sure we wanted in hosted-cluster, but I'm not sure if we want it in the rest too. Could @BraeTroutman please clarify?
Do you mean kubevirt CRDs? We need kubevirt to be installed in order to have virtualmachinesinstances resources available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So yeah we have no selectors as far as I'm aware to match only on hosted clusters that have kubevirt enabled, which means that this just needs to be deployed to every hosted cluster. If kubevirt is unused though, this VAP shouldn't affect KAS performance because no KubeVirt means no use of KubeVirt resources
however, I don't know how the VAP creation will function if the relevant custom resource Kind doesn't exist on the cluster in question... I'll see if I can do a quick verification on another cluster to see how the creation behaves, if it fails upon validation of the VAP
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Validating Admission Policies can still be created successfully even if the target CR isn't defined on the cluster.
I do wonder how ACM will behave expecting to create the VirtualMachineClusterPreference. If the Hosted Cluster does not have KubeVirt installed, that CR will fail to create
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the Hosted Cluster does not have KubeVirt installed, that CR will fail to create
I presume this means the policy fails to be compliant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I presume this means the policy fails to be compliant.
@joshbranham do you know if this is acceptable (failing to be compliant)? Given there's no selector at the moment, any suggestion on the next steps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not an expert on ACM policies, so not sure. I think this could be easily validated by the submitter by testing creating the policy for a cluster with the CRDs and one without, and observe the behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without KubeVirt installed, VirtualMachineClusterPreference will fail to be created.
| - expression: (('kubevirt.io/preference-name' in object.metadata.annotations) && | ||
| (object.metadata.annotations['kubevirt.io/preference-name'].lowerAscii().contains('windows'))) || | ||
| (('kubevirt.io/cluster-preference-name' in object.metadata.annotations) && | ||
| (object.metadata.annotations['kubevirt.io/preference-name'].lowerAscii().contains('windows'))) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you mean kubevirt.io/cluster-preference-name instead of kubevirt.io/preference-name? I am seeing it is checking if it has kubevirt.io/cluster-preference-name so I think you are intent to check the value in kubevirt.io/cluster-preference-name in this condition?
I am not familiar with Virt, just reviewing from the CEL point of review. Appreciate if you can clarify, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are absolutely right! Nice catch! Adjusted.
1241d17 to
673d612
Compare
|
Note that this is re. epic https://issues.redhat.com/browse/CNV-56734 |
|
/approve Please seek a final |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: abyrne55, jcanocan The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we get at least a short blurb here for context? Please include a link to the epic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| - "4.14" | ||
| - "4.15" | ||
| - "4.16" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See previous comment on 4.19 minimum version
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| - expression: (('kubevirt.io/preference-name' in object.metadata.annotations) | ||
| && (object.metadata.annotations['kubevirt.io/preference-name'].lowerAscii().contains('windows'))) | ||
| || (('kubevirt.io/cluster-preference-name' in object.metadata.annotations) | ||
| && (object.metadata.annotations['kubevirt.io/cluster-preference-name'].lowerAscii().contains('windows'))) | ||
| || (('vm.kubevirt.io/os' in object.metadata.annotations) && | ||
| (object.metadata.annotations['vm.kubevirt.io/os'].lowerAscii().contains('windows'))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be misunderstancing KubeVirt preferences, but just in case an existing customer is using kubevirt for something that has "windows" in the name but isn't Windows OS, can we make sure some public-facing documentation warns the customer not to put "windows" in their preference names? I want to make sure our customer service agents have something to point to if a customer complains about their unfortunately-named non-windows VMs not scheduling anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point! @jcanocan Could you follow up on this action, maybe have another card for it? This doesn't need to be a blocker of this PR though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@abyrne55 which documentation might be suitable?
Co-authored-by: Cursor AI Assistant <[email protected]> Signed-off-by: Javier Cano Cano <[email protected]>
673d612 to
3579719
Compare
|
@jcanocan: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
What type of PR is this?
Feature
What this PR does / why we need it?
It is required to enable some sort of protection in Windows based VMs to prevent them to overcommit vCPUs. To achieve this, it is assumed that CPUmanager is enabled on, and only on, Windows Licensed nodes. This is due to the fact that, during the scheduling process, any VM with
spec.domain.cpu.dedicatedCpuPlacementwill be assigned to nodes withcpumanagerfeature enabled. Therefore, the scheduler will effectively assign those VMs to Windows Licensed nodes. Moreover, if the VM does not overcommit, vCPUs will be allowed to run.This approach has the following drawbacks as it is:
Which Jira/Github issue(s) this PR fixes?
Fixes #
Special notes for your reviewer:
Pre-checks (if applicable):
Tested latest changes against a cluster
Included documentation changes with PR
If this is a new object that is not intended for the FedRAMP environment (if unsure, please reach out to team FedRAMP), please exclude it with: