-
Notifications
You must be signed in to change notification settings - Fork 501
OCPCLOUD-2775: add cluster api autoscaler integration enhancement #1736
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
@elmiko: This pull request references OCPCLOUD-2775 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.19.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
1c80b56
to
4554fb1
Compare
i'm not sure why it's barfing on the metadata |
figured it out, needed quoting on the github handles |
4554fb1
to
73bfea9
Compare
version and would allow us to drop some patches we are carrying. The Cluster | ||
API MachineSet sync controller will be updated to recognize when the | ||
Cluster Autoscaler has made a change to a Cluster API resource and then sync | ||
the change to the corresponding Machine API resource, regardless of which resource | ||
is authoritative. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good to clarify exactly what kind of writes the CAS would be making, am I right in thinking that it's just the scale subresource?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, only the scale subresource.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we could possibly gate scale subresource updates in a different way to other writes 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
going through this again, i think the capi provider can also write an annotation when it expects to delete a node. it will add an annotation to the machine as well so that capi knows which machine to remove.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reviewing the example role we have in the upstream, i'm going to need to review the code a little more to see which resources we expect to update.
rules:
- apiGroups:
- cluster.x-k8s.io
resources:
- machinedeployments
- machinedeployments/scale
- machines
- machinesets
- machinepools
verbs:
- get
- list
- update
- watch
locate the resource. The Cluster API MachineSet sync controller will be updated | ||
to ensure that when the Cluster Autoscaler Operator adds the autoscaling | ||
annotations that they are copied to any related resources, regardless of which | ||
is authoritative. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in this case, since we own the CAO, we don't necessarily need an exception within the CAPI sync controller, and could handle this in CAO. I would expect CAO to look at a MAPI MachineSet, and check if it's authoritative, and then apply the annotations correctly
Will it still be annotations on the CAPI side?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the annotations are available on the CAPI side, we will need to migrate a few of them. eventually we will want the infrastructure templates to carry the capacity info in their status field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have an estimated timeline on having the scale information directly in the status?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no timeline has ever been proposed. this feature was added as "opt-in", providers are not required to make these changes.
enhancements/cluster-api/cluster-autoscaler-integration-with-openshift-cluster-api.md
Show resolved
Hide resolved
enhancements/cluster-api/cluster-autoscaler-integration-with-openshift-cluster-api.md
Outdated
Show resolved
Hide resolved
* A provider MachineSet controller has added the scale from zero annotations to a | ||
non-authoritative record. This occurs when the Cluster API resource is marked as | ||
authoritative but the Machine API resource is updated by the provider MachineSet controller. | ||
In these cases the scale from zero annotations will be copied to the non-authoritative | ||
Cluster API resource. The data from the MachineSet controller is only applied to | ||
Machine API resources currently. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an equivalent of this controller in CAPI? Or, if not, is it on the roadmap? If it is on the roadmap, we will want to ensure these controllers following the same pausing as the rest of the controllers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the last time i looked, most if not all of the providers also package a MachineSet actuators, but we actually want to promote a different behavior in the upstream. we want upstream providers to implement infrastructure template controllers to add the capacity information to the status on the infrastructure template, not as annotations on the MachineSet or MachineDeployment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a timeline for seeing something like this directly in the upstream?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as previous reply, a timelime was never proposed for this as it is "opt-in".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correct me if i'm wrong @JoelSpeed , but my understanding is that the MAPI machineset actuators will not be running when CAPI is enabled for a platform. so, there should be no need to reference these annotations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There will be some period where both sets of controllers are running, but one side will be paused. Eventually we will want to remove the downstream implementations so having an upstream plan will allow us to plan when these could be removed
enhancements/cluster-api/cluster-autoscaler-integration-with-openshift-cluster-api.md
Show resolved
Hide resolved
enhancements/cluster-api/cluster-autoscaler-integration-with-openshift-cluster-api.md
Outdated
Show resolved
Hide resolved
enhancements/cluster-api/cluster-autoscaler-integration-with-openshift-cluster-api.md
Show resolved
Hide resolved
Inactive enhancement proposals go stale after 28d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Mark the proposal as fresh by commenting If this proposal is safe to close now please do so with /lifecycle stale |
Stale enhancement proposals rot after 7d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Mark the proposal as fresh by commenting If this proposal is safe to close now please do so with /lifecycle rotten |
Rotten enhancement proposals close after 7d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Reopen the proposal by commenting /close |
@openshift-bot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
guess we lost track of this. /reopen |
@elmiko: Reopened this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
@elmiko: This pull request references OCPCLOUD-2775 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.20.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
i'm coming back to review this again, will post an update in the near future. |
73bfea9
to
32cf6bb
Compare
32cf6bb
to
667b2f7
Compare
@elmiko: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
i've updated the text in response to the comments here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given we are going to have a staggered approach where some platforms GA before others, do you think this will add significant complexity? We will need the new behaviour depending on whether a feature gate is enabled or not?
annotations that they are copied to any related resources, regardless of which | ||
is authoritative. | ||
|
||
Update the Cluster API MachineSet sync controller to recognize the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps instead the sync controller needs an update to covert the MAPI keyed annotations to CAPI keyed annotations?
That way we wouldn't have the MAPI keys on the CAPI resources
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
an interesting thought. we currently have the CAO doing the conversions for us, but it might make sense to have the sync controller also be able to do this work when it is syncing the resources.
The Cluster Autoscaler Operator will be changed to include logic that can detect | ||
the API group for any MachineSet that is referenced in the `scaleTargetRef` field | ||
of MachineAutoscaler resources. The change will instruct the Operator to search | ||
for records in the `openshift-cluster-api` namespace for resources with the | ||
`cluster.x-k8s.io` group, and to search in the `openshift-machine-api` namespace | ||
for resource with the `machine.openshift.io` group. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Every resource in the openshift-machine-api
namespace will have a mirror in the openshift-cluster-api
namespace, as such, I'd expect searching the openshift-cluster-api
namespace to be enough. There's no need to look at the MAPI objects is there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is basically saying that a user select either the MAPI or CAPI resource in their MachineAutoscaler and the CAO will search the appropriate namespace.
so, if the user specifies a MAPI MachineSet, then the CAO would look in that namespace.
Previously, only a Machine API MachineSet (i.e. a `MachineSet` kind in the | ||
`machine.openshift.io` API group) would be valid target of the `.spec.scaleTargetRef` | ||
field. After this enhancement is implemented, users may specify either a Machine | ||
API MachineSet or a Cluster API MachineSet in the `.spec.scaleTargetRef` field. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given every MAPI machineset will have a CAPI mirror, when CAPI mirroring is enabled, do we actually need to care about the MAPI side?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think we care to the extent that we want to allow users to continue using MAPI resources as the scaleTargetRef
. basically, not breaking existing MachineAutoscalers.
Note that the MachineAutoscaler named "worker-somezone-1" is targeting a Machine API | ||
MachineSet while "worker-somezone-2" is targeting a Cluster API MachineSet. The | ||
Cluster Autoscaler Operator will know by the `apiVersion` field whether to look | ||
for the resource in the `openshift-machine-api` or `openshift-cluster-api` namespace | ||
respectively. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or it could always look at the ClusterAPI side?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it could. i think if we want to go down that route i'll need to rewrite portions of this enhancement to conform with the notion that we only look for the CAPI resources.
resource. The sync controller will use the managed fields (i.e. `.metadata.managedFields`) | ||
of the specified MachineSet to determine if the Cluster Autoscaler Operator made | ||
changes to the annotations, and then replicate those appropriately. In this manner, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we always write to the authoritative API, I don't think managedfields is required here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, i'll revisit this part.
The Cluster Autoscaler Operator will always update the authoritative MachineSet resource. | ||
If a user specifies a non-authoritative MachineSet as the `scaleTargetRef` of a | ||
MachineAutoscaler, the Cluster Autoscaler Operator will use the information on the | ||
MachineSet to determine which resource is authoritative and then update that resource. | ||
Through the MachineSet sync controller, the non-authoritative resource will be updated | ||
with the new information. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm getting confused, but we have two components here.
One is CAO, which we could make smart enough to understand the authoritative API?
And the other is KAS itself, which will always write to CAPI resources, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correct
matchConstraints: | ||
resourceRules: | ||
- apiGroups: ["machine.openshift.io"] | ||
apiVersions: ["v1beta1"] | ||
operations: ["UPDATE"] | ||
resources: ["MachineSet"] | ||
matchConditions: | ||
# Only check requests coming from the cluster autoscaler service account. | ||
- name: "check-only-cluster-autoscaler-service-account-requests" | ||
expression: '(request.userInfo.username in [ | ||
"system:serviceaccount:openshift-machine-api:cluster-autoscaler", | ||
])' | ||
validations: | ||
- expression: 'object.spec.replicas != oldObject.spec.replicas' | ||
messageExpression: "Requested replica change is the same as current value" | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in particular we need to mention that this will be an exception to an existing VAP which prevents writes to non-authoritative resources. Have you spoken to @theobarberbany about this at all? He may be able to help test/write something
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i will add some language about this being an exception to the existing VAP.
i did talk with Theo, these examples are directly inspired by his work.
``` | ||
matchConstraints: | ||
resourceRules: | ||
- apiGroups: ["cluster.x-k8s.io"] | ||
apiVersions: ["v1beta1"] | ||
operations: ["UPDATE"] | ||
resources: ["MachineSet"] | ||
matchConditions: | ||
# Only check requests coming from the cluster autoscaler service account. | ||
- name: "check-only-cluster-autoscaler-service-account-requests" | ||
expression: '(request.userInfo.username in [ | ||
"system:serviceaccount:openshift-machine-api:cluster-autoscaler", | ||
])' | ||
validations: | ||
- expression: 'object.spec.replicas != oldObject.spec.replicas' | ||
messageExpression: "Requested replica change is the same as current value" | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this just a duplicate of the above block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mostly. the difference is targeting CAPI instead of MAPI.
To address this possible risk, the Cluster Autoscaler Operator will only write to the | ||
authoritative MachineSet resource. A user may create a MachineAutoscaler that references | ||
either the authoritative or non-authoritative resource in its `targetScaleRef` field, | ||
but the Cluster Autoscaler Operator will only update the authoritative resource. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't align with the error condition above? Are we going to take first created as the correct object? Can we include a VAP that prevents multiple objects using the same Name in the ref?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which part doesn't align? (i thought i had caught the change)
Are we going to take first created as the correct object?
yes, this is essentially how it will work.
Can we include a VAP that prevents multiple objects using the same Name in the ref?
that sounds like a good upgrade, i'll add something about it.
Another approach to reducing confusion would be to allow only a single type of | ||
MachineSet resource (Machine API or Cluster API) to be specified as a target for | ||
autoscaling. This approach could work if the Cluster API resources are chosen as | ||
the target, but would represent a hard shift in the current MachineAutoscaler | ||
behavior and would require a conversion migration for all upgrades where cluster | ||
autoscaler is in use. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it? If the CAO and KAS always wrote to CAPI, that would be predictable at least?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think the difference is between what the CAO is writing to versus what it is reading from.
afaict, we don't want to force a conversion for all MachineAutoscaler to use scaleTargetRef
that points at the CAPI MachineSet. instead, we update the CAO to be smart enough to find the CAPI MachineSet when the user has specified a MAPI MachineSet in the scaleTargetRef
.
so, i think we are converging on the idea that the CAO should always write to the CAPI resource (regardless of authority), but we need to be able to accept either the MAPI or CAPI resource from the user on the MachineAutoscaler resource.
the paragraph here is about taking the approach where we only allow the user to specify a CAPI MachineSet for the scaleTargetRef
, which would require a conversion on upgrade.
these are great questions, i need to spend some time thinking about this a little more. |
this enhancement describes how we will integrate the cluster autoscaler, and related controllers, with the Cluster API machine management layer.