-
Notifications
You must be signed in to change notification settings - Fork 481
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCPCLOUD-2775: add cluster api autoscaler integration enhancement #1736
base: master
Are you sure you want to change the base?
Conversation
@elmiko: This pull request references OCPCLOUD-2775 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.19.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
1c80b56
to
4554fb1
Compare
i'm not sure why it's barfing on the metadata |
figured it out, needed quoting on the github handles |
4554fb1
to
73bfea9
Compare
@elmiko: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
version and would allow us to drop some patches we are carrying. The Cluster | ||
API MachineSet sync controller will be updated to recognize when the | ||
Cluster Autoscaler has made a change to a Cluster API resource and then sync | ||
the change to the corresponding Machine API resource, regardless of which resource | ||
is authoritative. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good to clarify exactly what kind of writes the CAS would be making, am I right in thinking that it's just the scale subresource?
locate the resource. The Cluster API MachineSet sync controller will be updated | ||
to ensure that when the Cluster Autoscaler Operator adds the autoscaling | ||
annotations that they are copied to any related resources, regardless of which | ||
is authoritative. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in this case, since we own the CAO, we don't necessarily need an exception within the CAPI sync controller, and could handle this in CAO. I would expect CAO to look at a MAPI MachineSet, and check if it's authoritative, and then apply the annotations correctly
Will it still be annotations on the CAPI side?
The Cluster Autoscaler Operator will update the Machine API MachineSet resource, and | ||
then the MachineSet sync controller will sync the change to the Cluster API MachineSet | ||
resource. The sync controller will use the managed fields (i.e. `.metadata.managedFields`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make it an error for there to be a MachineAutoscaler that points to both MAPI and CAPI versions of the same MachineSet?
And then CAO can just update the correct, authoritative resource with whatever information it needs to, without having to worry about conflicting resources
I think if we do this, we need less special logic within the sync controllers
In theory we could validate this using a validating admission policy, but I don't think it's fool-proof
* The Cluster Autoscaler Operator has added the minimum and maximum size annotations, and ownership | ||
annotation to a record. If the sync controller sees an update to these annotations on a | ||
non-authoritative resource originating from the Cluster Autoscaler Operator, it will copy | ||
that change to the authoritative resource if no MachineAutoscaler is referencing the | ||
authoritative resource. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we own the CAO, I think implementing this within CAO is likely the easier option.
The sync logic already syncs annotations across between machinesets, we just need to add special logic to handle well known annotations if they somehow end up converting from annotations to structured status
* A provider MachineSet controller has added the scale from zero annotations to a | ||
non-authoritative record. This occurs when the Cluster API resource is marked as | ||
authoritative but the Machine API resource is updated by the provider MachineSet controller. | ||
In these cases the scale from zero annotations will be copied to the non-authoritative | ||
Cluster API resource. The data from the MachineSet controller is only applied to | ||
Machine API resources currently. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an equivalent of this controller in CAPI? Or, if not, is it on the roadmap? If it is on the roadmap, we will want to ensure these controllers following the same pausing as the rest of the controllers
* `.spec.replicas` changes will be synced from the Cluster API MachineSet to the Machine | ||
API MachineSet regardless of which is authoritative when the change originates from the | ||
Cluster Autoscaler. As the Cluster Autoscaler will be configured to operate against Cluster | ||
API resources only, there will be a need to identify when the Cluster Autoscaler has updated | ||
a non-authoritative Cluster API resource so that the authoritative resource can be updated. | ||
This will only occur when the sync controller observes and update to the replicas field from | ||
the Cluster Autoscaler. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We intend to use a VAP to block updates to a non-authoritative resource, have you investigated how we let the CAS through the VAP (what exception do we need in place) to make just this replicas field change?
could create a race condition where updating the minimum and maximum size | ||
values will lead to an inaccurate update to both MachineSets. | ||
|
||
To address the risk of possible race conditions on MachineAutoscalers we have a few |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Third option, CAS will take no action when multiple MachineAutoscalers refer to the same MachineSet (MAPI and CAPI mirrors) and instead will update the status to show that one of the two should be removed
Fourth option, use some sort of admission time validation to try and prevent this
Three and four should be able to be done in conjunction with one another
2. Will we want to remove MachineAutoscalers that reference Cluster API MachineSets | ||
during a downgrade? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't support downgrades, so we probably don't need to be concerned about this.
We won't automatically create these during an upgrade, so downgrading a failed upgrade shouldn't produce an issue
this enhancement describes how we will integrate the cluster autoscaler, and related controllers, with the Cluster API machine management layer.