Skip to content

OCPCLOUD-2775: add cluster api autoscaler integration enhancement #1736

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

elmiko
Copy link
Contributor

@elmiko elmiko commented Jan 15, 2025

this enhancement describes how we will integrate the cluster autoscaler, and related controllers, with the Cluster API machine management layer.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jan 15, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented Jan 15, 2025

@elmiko: This pull request references OCPCLOUD-2775 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.19.0" version, but no target version was set.

In response to this:

this enhancement describes how we will integrate the cluster autoscaler, and related controllers, with the Cluster API machine management layer.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Contributor

openshift-ci bot commented Jan 15, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign ashcrow for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@elmiko elmiko force-pushed the add-cas-cao-capi-integration branch 2 times, most recently from 1c80b56 to 4554fb1 Compare January 16, 2025 16:39
@elmiko
Copy link
Contributor Author

elmiko commented Jan 16, 2025

i'm not sure why it's barfing on the metadata

@elmiko
Copy link
Contributor Author

elmiko commented Jan 16, 2025

figured it out, needed quoting on the github handles

@elmiko elmiko force-pushed the add-cas-cao-capi-integration branch from 4554fb1 to 73bfea9 Compare January 16, 2025 18:51
Comment on lines 85 to 89
version and would allow us to drop some patches we are carrying. The Cluster
API MachineSet sync controller will be updated to recognize when the
Cluster Autoscaler has made a change to a Cluster API resource and then sync
the change to the corresponding Machine API resource, regardless of which resource
is authoritative.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to clarify exactly what kind of writes the CAS would be making, am I right in thinking that it's just the scale subresource?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, only the scale subresource.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we could possibly gate scale subresource updates in a different way to other writes 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

going through this again, i think the capi provider can also write an annotation when it expects to delete a node. it will add an annotation to the machine as well so that capi knows which machine to remove.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reviewing the example role we have in the upstream, i'm going to need to review the code a little more to see which resources we expect to update.

rules:
  - apiGroups:
    - cluster.x-k8s.io
    resources:
    - machinedeployments
    - machinedeployments/scale
    - machines
    - machinesets
    - machinepools
    verbs:
    - get
    - list
    - update
    - watch

Comment on lines +94 to +98
locate the resource. The Cluster API MachineSet sync controller will be updated
to ensure that when the Cluster Autoscaler Operator adds the autoscaling
annotations that they are copied to any related resources, regardless of which
is authoritative.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in this case, since we own the CAO, we don't necessarily need an exception within the CAPI sync controller, and could handle this in CAO. I would expect CAO to look at a MAPI MachineSet, and check if it's authoritative, and then apply the annotations correctly

Will it still be annotations on the CAPI side?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the annotations are available on the CAPI side, we will need to migrate a few of them. eventually we will want the infrastructure templates to carry the capacity info in their status field.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have an estimated timeline on having the scale information directly in the status?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no timeline has ever been proposed. this feature was added as "opt-in", providers are not required to make these changes.

Comment on lines 363 to 368
* A provider MachineSet controller has added the scale from zero annotations to a
non-authoritative record. This occurs when the Cluster API resource is marked as
authoritative but the Machine API resource is updated by the provider MachineSet controller.
In these cases the scale from zero annotations will be copied to the non-authoritative
Cluster API resource. The data from the MachineSet controller is only applied to
Machine API resources currently.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an equivalent of this controller in CAPI? Or, if not, is it on the roadmap? If it is on the roadmap, we will want to ensure these controllers following the same pausing as the rest of the controllers

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the last time i looked, most if not all of the providers also package a MachineSet actuators, but we actually want to promote a different behavior in the upstream. we want upstream providers to implement infrastructure template controllers to add the capacity information to the status on the infrastructure template, not as annotations on the MachineSet or MachineDeployment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a timeline for seeing something like this directly in the upstream?

Copy link
Contributor Author

@elmiko elmiko Feb 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as previous reply, a timelime was never proposed for this as it is "opt-in".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct me if i'm wrong @JoelSpeed , but my understanding is that the MAPI machineset actuators will not be running when CAPI is enabled for a platform. so, there should be no need to reference these annotations?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There will be some period where both sets of controllers are running, but one side will be paused. Eventually we will want to remove the downstream implementations so having an upstream plan will allow us to plan when these could be removed

@openshift-bot
Copy link

Inactive enhancement proposals go stale after 28d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle stale.
Stale proposals rot after an additional 7d of inactivity and eventually close.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 28, 2025
@openshift-bot
Copy link

Stale enhancement proposals rot after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Rotten proposals close after an additional 7d of inactivity.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 4, 2025
@openshift-bot
Copy link

Rotten enhancement proposals close after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Reopen the proposal by commenting /reopen.
Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Exclude this proposal from closing again by commenting /lifecycle frozen.

/close

@openshift-ci openshift-ci bot closed this Apr 12, 2025
Copy link
Contributor

openshift-ci bot commented Apr 12, 2025

@openshift-bot: Closed this PR.

In response to this:

Rotten enhancement proposals close after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Reopen the proposal by commenting /reopen.
Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Exclude this proposal from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@elmiko
Copy link
Contributor Author

elmiko commented May 22, 2025

guess we lost track of this.

/reopen
/remove-lifecycle rotten

@openshift-ci openshift-ci bot reopened this May 22, 2025
Copy link
Contributor

openshift-ci bot commented May 22, 2025

@elmiko: Reopened this PR.

In response to this:

guess we lost track of this.

/reopen
/remove-lifecycle rotten

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci-robot
Copy link

openshift-ci-robot commented May 22, 2025

@elmiko: This pull request references OCPCLOUD-2775 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.20.0" version, but no target version was set.

In response to this:

this enhancement describes how we will integrate the cluster autoscaler, and related controllers, with the Cluster API machine management layer.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label May 22, 2025
@elmiko
Copy link
Contributor Author

elmiko commented May 29, 2025

i'm coming back to review this again, will post an update in the near future.

@elmiko elmiko force-pushed the add-cas-cao-capi-integration branch from 73bfea9 to 32cf6bb Compare June 11, 2025 20:29
@elmiko elmiko force-pushed the add-cas-cao-capi-integration branch from 32cf6bb to 667b2f7 Compare June 11, 2025 21:12
Copy link
Contributor

openshift-ci bot commented Jun 11, 2025

@elmiko: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@elmiko
Copy link
Contributor Author

elmiko commented Jun 12, 2025

i've updated the text in response to the comments here.

Copy link
Contributor

@JoelSpeed JoelSpeed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given we are going to have a staggered approach where some platforms GA before others, do you think this will add significant complexity? We will need the new behaviour depending on whether a feature gate is enabled or not?

annotations that they are copied to any related resources, regardless of which
is authoritative.

Update the Cluster API MachineSet sync controller to recognize the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps instead the sync controller needs an update to covert the MAPI keyed annotations to CAPI keyed annotations?

That way we wouldn't have the MAPI keys on the CAPI resources

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an interesting thought. we currently have the CAO doing the conversions for us, but it might make sense to have the sync controller also be able to do this work when it is syncing the resources.

Comment on lines +134 to +139
The Cluster Autoscaler Operator will be changed to include logic that can detect
the API group for any MachineSet that is referenced in the `scaleTargetRef` field
of MachineAutoscaler resources. The change will instruct the Operator to search
for records in the `openshift-cluster-api` namespace for resources with the
`cluster.x-k8s.io` group, and to search in the `openshift-machine-api` namespace
for resource with the `machine.openshift.io` group.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every resource in the openshift-machine-api namespace will have a mirror in the openshift-cluster-api namespace, as such, I'd expect searching the openshift-cluster-api namespace to be enough. There's no need to look at the MAPI objects is there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is basically saying that a user select either the MAPI or CAPI resource in their MachineAutoscaler and the CAO will search the appropriate namespace.

so, if the user specifies a MAPI MachineSet, then the CAO would look in that namespace.

Comment on lines +204 to +207
Previously, only a Machine API MachineSet (i.e. a `MachineSet` kind in the
`machine.openshift.io` API group) would be valid target of the `.spec.scaleTargetRef`
field. After this enhancement is implemented, users may specify either a Machine
API MachineSet or a Cluster API MachineSet in the `.spec.scaleTargetRef` field.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given every MAPI machineset will have a CAPI mirror, when CAPI mirroring is enabled, do we actually need to care about the MAPI side?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we care to the extent that we want to allow users to continue using MAPI resources as the scaleTargetRef. basically, not breaking existing MachineAutoscalers.

Comment on lines +271 to +275
Note that the MachineAutoscaler named "worker-somezone-1" is targeting a Machine API
MachineSet while "worker-somezone-2" is targeting a Cluster API MachineSet. The
Cluster Autoscaler Operator will know by the `apiVersion` field whether to look
for the resource in the `openshift-machine-api` or `openshift-cluster-api` namespace
respectively.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or it could always look at the ClusterAPI side?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it could. i think if we want to go down that route i'll need to rewrite portions of this enhancement to conform with the notion that we only look for the CAPI resources.

Comment on lines +297 to +299
resource. The sync controller will use the managed fields (i.e. `.metadata.managedFields`)
of the specified MachineSet to determine if the Cluster Autoscaler Operator made
changes to the annotations, and then replicate those appropriately. In this manner,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we always write to the authoritative API, I don't think managedfields is required here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, i'll revisit this part.

Comment on lines +310 to +315
The Cluster Autoscaler Operator will always update the authoritative MachineSet resource.
If a user specifies a non-authoritative MachineSet as the `scaleTargetRef` of a
MachineAutoscaler, the Cluster Autoscaler Operator will use the information on the
MachineSet to determine which resource is authoritative and then update that resource.
Through the MachineSet sync controller, the non-authoritative resource will be updated
with the new information.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm getting confused, but we have two components here.

One is CAO, which we could make smart enough to understand the authoritative API?

And the other is KAS itself, which will always write to CAPI resources, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct

Comment on lines +391 to +406
matchConstraints:
resourceRules:
- apiGroups: ["machine.openshift.io"]
apiVersions: ["v1beta1"]
operations: ["UPDATE"]
resources: ["MachineSet"]
matchConditions:
# Only check requests coming from the cluster autoscaler service account.
- name: "check-only-cluster-autoscaler-service-account-requests"
expression: '(request.userInfo.username in [
"system:serviceaccount:openshift-machine-api:cluster-autoscaler",
])'
validations:
- expression: 'object.spec.replicas != oldObject.spec.replicas'
messageExpression: "Requested replica change is the same as current value"
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in particular we need to mention that this will be an exception to an existing VAP which prevents writes to non-authoritative resources. Have you spoken to @theobarberbany about this at all? He may be able to help test/write something

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i will add some language about this being an exception to the existing VAP.

i did talk with Theo, these examples are directly inspired by his work.

Comment on lines +408 to +424
```
matchConstraints:
resourceRules:
- apiGroups: ["cluster.x-k8s.io"]
apiVersions: ["v1beta1"]
operations: ["UPDATE"]
resources: ["MachineSet"]
matchConditions:
# Only check requests coming from the cluster autoscaler service account.
- name: "check-only-cluster-autoscaler-service-account-requests"
expression: '(request.userInfo.username in [
"system:serviceaccount:openshift-machine-api:cluster-autoscaler",
])'
validations:
- expression: 'object.spec.replicas != oldObject.spec.replicas'
messageExpression: "Requested replica change is the same as current value"
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this just a duplicate of the above block?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly. the difference is targeting CAPI instead of MAPI.

Comment on lines +434 to +437
To address this possible risk, the Cluster Autoscaler Operator will only write to the
authoritative MachineSet resource. A user may create a MachineAutoscaler that references
either the authoritative or non-authoritative resource in its `targetScaleRef` field,
but the Cluster Autoscaler Operator will only update the authoritative resource.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't align with the error condition above? Are we going to take first created as the correct object? Can we include a VAP that prevents multiple objects using the same Name in the ref?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which part doesn't align? (i thought i had caught the change)

Are we going to take first created as the correct object?

yes, this is essentially how it will work.

Can we include a VAP that prevents multiple objects using the same Name in the ref?

that sounds like a good upgrade, i'll add something about it.

Comment on lines +485 to +490
Another approach to reducing confusion would be to allow only a single type of
MachineSet resource (Machine API or Cluster API) to be specified as a target for
autoscaling. This approach could work if the Cluster API resources are chosen as
the target, but would represent a hard shift in the current MachineAutoscaler
behavior and would require a conversion migration for all upgrades where cluster
autoscaler is in use.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it? If the CAO and KAS always wrote to CAPI, that would be predictable at least?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think the difference is between what the CAO is writing to versus what it is reading from.

afaict, we don't want to force a conversion for all MachineAutoscaler to use scaleTargetRef that points at the CAPI MachineSet. instead, we update the CAO to be smart enough to find the CAPI MachineSet when the user has specified a MAPI MachineSet in the scaleTargetRef.

so, i think we are converging on the idea that the CAO should always write to the CAPI resource (regardless of authority), but we need to be able to accept either the MAPI or CAPI resource from the user on the MachineAutoscaler resource.

the paragraph here is about taking the approach where we only allow the user to specify a CAPI MachineSet for the scaleTargetRef, which would require a conversion on upgrade.

@elmiko
Copy link
Contributor Author

elmiko commented Jun 23, 2025

Given we are going to have a staggered approach where some platforms GA before others, do you think this will add significant complexity? We will need the new behaviour depending on whether a feature gate is enabled or not?

these are great questions, i need to spend some time thinking about this a little more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants