Allow to prefix provisioningClassName to filter provisioning requests #7676

macsko · 2025-01-08T11:40:08Z

What type of PR is this?

/kind feature

What this PR does / why we need it:

This PR adds the ability to set a provisioningClassName prefix, and then the CA will only process provisioning requests that have a matching prefix. This can allow to simply run multiple CA instances and route specific provisioning requests to them, while being backward compatible.

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Added provisioningClassPrefix option that allows to filter ProvisioningRequests' provisioningClassName to process by specific Cluster Autoscaler instance.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

k8s-ci-robot · 2025-01-08T11:40:15Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: macsko
Once this PR has been reviewed and has the lgtm label, please assign aleksandra-malinowska for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

cluster-autoscaler/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

macsko · 2025-01-08T12:20:12Z

/cc @aleksandra-malinowska

cluster-autoscaler/processors/provreq/injector_test.go

cluster-autoscaler/processors/provreq/injector.go

gabesaba

lgtm after addressing last 2 nits

cluster-autoscaler/processors/provreq/injector_test.go

gabesaba

/lgtm

x13n · 2025-01-23T16:10:00Z

Running multiple CA instances takes more than adding a prefix for provisioning requests. For instance, regular pending pods will trigger scale up in every CA and different provisioning requests may trigger scaling in the same node group. Is this a part of some broader feature you're trying to build?

macsko · 2025-01-27T11:49:26Z

Running multiple CA instances takes more than adding a prefix for provisioning requests. For instance, regular pending pods will trigger scale up in every CA and different provisioning requests may trigger scaling in the same node group. Is this a part of some broader feature you're trying to build?

If we want to have one CA that will do scale up, (basic) provreq processing etc. and second CA that does only check capacity provreq processing (with prefixed class name), then this should be enough to work correctly. Check capacity Provision doesn't take node pools into consideration (ref, nodeInfos are not used). It base its assumptions only on the global cluster state, so all the nodes and pods within a cluster. Given that, if we run the second CA without node pools configured, then there won't be any scale up/down activity in the CA, but only provreq processing.

If the above assumptions are not enough, we could add a yet another flag to CA that will disable scale up and scale down activities leaving only provreq processing.

aleksandra-malinowska · 2025-01-27T12:37:24Z

Running multiple CA instances takes more than adding a prefix for provisioning requests. For instance, regular pending pods will trigger scale up in every CA and different provisioning requests may trigger scaling in the same node group. Is this a part of some broader feature you're trying to build?

It's already possible to 'shard' the cluster and run multiple CA instances by passing a different set of node group (or node group prefixes) to each of them. In practice this will work only if each workload's requirements fit only node groups from the same shard, otherwise multiple instances can trigger scale-up.

For regular pods that fulfill these requirements, the only consequence of sharding would be spamming fake NotTriggerScaleUp events from instances that can't request scale-up. Not perfect, but possible to ignore in absence of a solution.

For ProvisioningRequest, the instances that can't do anything for them won't just spam events though - they'll actually modify the ProvisioningRequest object, updating the condition. This actually needs to be fixed for any multi-CA setup to work.

x13n · 2025-01-27T19:45:45Z

Sharding can work only if you can split workloads and node groups at the same time.

However, is this this specific scenario intended to work only with check-capacity Provisioning Requests? That sounds safe, as there's no responsibility overlap once prefixes are configured correctly on each instance.

k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. area/cluster-autoscaler labels Jan 8, 2025

k8s-ci-robot requested review from BigDarkClown and x13n January 8, 2025 11:40

k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jan 8, 2025

k8s-ci-robot requested a review from aleksandra-malinowska January 8, 2025 12:20

gabesaba reviewed Jan 9, 2025

View reviewed changes

cluster-autoscaler/processors/provreq/injector_test.go Outdated Show resolved Hide resolved

cluster-autoscaler/processors/provreq/injector_test.go Outdated Show resolved Hide resolved

cluster-autoscaler/processors/provreq/injector_test.go Outdated Show resolved Hide resolved

gabesaba reviewed Jan 9, 2025

View reviewed changes

cluster-autoscaler/processors/provreq/injector.go Outdated Show resolved Hide resolved

macsko force-pushed the allow_to_prefix_provisioning_class_name_to_filter_prs branch from 001b922 to 4fa718f Compare January 9, 2025 14:38

macsko requested a review from gabesaba January 9, 2025 14:40

gabesaba reviewed Jan 9, 2025

View reviewed changes

cluster-autoscaler/processors/provreq/injector_test.go Outdated Show resolved Hide resolved

cluster-autoscaler/processors/provreq/injector_test.go Outdated Show resolved Hide resolved

Allow to prefix provisioningClassName to filter provisioning requests

43513b7

macsko force-pushed the allow_to_prefix_provisioning_class_name_to_filter_prs branch from 4fa718f to 43513b7 Compare January 10, 2025 10:19

macsko requested a review from gabesaba January 10, 2025 10:20

gabesaba reviewed Jan 10, 2025

View reviewed changes

k8s-ci-robot assigned gabesaba Jan 10, 2025

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow to prefix provisioningClassName to filter provisioning requests #7676

Allow to prefix provisioningClassName to filter provisioning requests #7676

macsko commented Jan 8, 2025

k8s-ci-robot commented Jan 8, 2025

macsko commented Jan 8, 2025

gabesaba left a comment

gabesaba left a comment

x13n commented Jan 23, 2025

macsko commented Jan 27, 2025

aleksandra-malinowska commented Jan 27, 2025

x13n commented Jan 27, 2025

Allow to prefix provisioningClassName to filter provisioning requests #7676

Are you sure you want to change the base?

Allow to prefix provisioningClassName to filter provisioning requests #7676

Conversation

macsko commented Jan 8, 2025

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

k8s-ci-robot commented Jan 8, 2025

macsko commented Jan 8, 2025

gabesaba left a comment

Choose a reason for hiding this comment

gabesaba left a comment

Choose a reason for hiding this comment

x13n commented Jan 23, 2025

macsko commented Jan 27, 2025

aleksandra-malinowska commented Jan 27, 2025

x13n commented Jan 27, 2025