Skip to content

✨ Surface aggregated machine versions in status#13341

Open
miltalex wants to merge 1 commit intokubernetes-sigs:mainfrom
miltalex:feat/expose-version
Open

✨ Surface aggregated machine versions in status#13341
miltalex wants to merge 1 commit intokubernetes-sigs:mainfrom
miltalex:feat/expose-version

Conversation

@miltalex
Copy link
Copy Markdown
Contributor

What this PR does / why we need it:

  • Introduces status.versions (aggregated Kubernetes version + replica counts) across core APIs: Cluster control plane/workers, MachineSet, MachineDeployment, MachinePool, and
    KubeadmControlPlane.
  • Add a shared version aggregation helper and use it in controllers to report versions from machine kubelet versions.
  • Updates topology upgrade checks to use status.versions when available, reducing the need to list machines/nodes for simple version drift detection.
  • Adds tests covering aggregation, controller status updates, and upgrade-check behavior.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #13303

P.S: I have tested the above changes locally using the docker provider. For example during upgrade

  Control Plane:
    Available Replicas:   0
    Desired Replicas:     3
    Ready Replicas:       0
    Replicas:             4
    Up To Date Replicas:  1
    Versions:
      Replicas:  3
      Version:   v1.34.3
      Replicas:  1
      Version:   v1.35.1
Control Plane:
   Available Replicas:   0
   Desired Replicas:     3
   Ready Replicas:       0
   Replicas:             4
   Up To Date Replicas:  2
   Versions:
     Replicas:  2
     Version:   v1.34.3
     Replicas:  1
     Version:   v1.35.1

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Feb 16, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign chrischdi for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

This PR is currently missing an area label, which is used to identify the modified component when generating release notes.

Area labels can be added by org members by writing /area ${COMPONENT} in a comment

Please see the labels list for possible areas.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added do-not-merge/needs-area PR is missing an area label size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Feb 16, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Hi @miltalex. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@miltalex miltalex force-pushed the feat/expose-version branch from 0288f90 to 3f39a80 Compare March 10, 2026 19:36
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 16, 2026
})

return versions
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: This function looks very similar to internal/util/version.VersionsFromMachines, except that it operates on collections.Machines rather than []*clusterv1.Machine.

Would it make sense to call internalversion.VersionsFromMachines(machines.UnsortedList()) here instead, to avoid maintaining two copies of the same logic?

Is there some nuanced difference between the functions? (If so, that'd be useful to note in a comment.)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I switched this to use internal/util/version.VersionsFromMachines(machines.UnsortedList()) so we rely on one shared aggregation/sorting implementation.

Comment thread internal/topology/check/upgrade.go Outdated
mdVersion := md.Spec.Template.Spec.Version
if len(md.Status.Versions) > 0 {
for _, statusVersion := range md.Status.Versions {
if statusVersion.Version != mdVersion {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: The version comparison here uses raw string equality while the existing fallback path below parses versions with semver.ParseTolerant. If a provider or kubelet ever reports the version in a slightly different form (e.g. "1.32.0" vs "v1.32.0"), this could give a different answer than the fallback path. Is raw string comparison intentional here for simplicity, or would it be worth using semver-tolerant comparison for consistency?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the comparison to be semver-tolerant for both status.versions and fallback machine/node paths, so equivalent forms like v1.32.0 and 1.32.0 are treated consistently.

```

Following fields MUST be implemented in the ControlPlane `status`.
Following fields SHOULD be implemented in the ControlPlane `status`.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Would it make sense to require that at least one of these is implemented? (It seems like the weaker language unintentionally allows providers to drop version reporting entirely, but I don't think that's the intent.)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed that change was by mistake, was aiming to suggest the providers to move to versions.

Comment thread api/core/v1beta2/common_types.go Outdated
}

// StatusVersion groups version-related status information.
// +kubebuilder:validation:MinProperties=1
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Given that Version is +required, does this validation provide value? I don't think it's incorrect, but may be noise in the CRD schema.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed it. It makes sense

Signed-off-by: Miltiadis Alexis <[email protected]>
@miltalex miltalex force-pushed the feat/expose-version branch from 3f39a80 to 3336482 Compare March 31, 2026 06:58
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 31, 2026
@sbueringer
Copy link
Copy Markdown
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Mar 31, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

@miltalex: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-cluster-api-apidiff-main 3336482 link false /test pull-cluster-api-apidiff-main

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Copy link
Copy Markdown
Member

@fabriziopandini fabriziopandini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this PR, I really would like to try to have this merged at the beginning of the next cycle.

IMO the two main points to be addressed are

  • spec.version vs kubelet.version
  • how to deal with the fact that in some case version different but are not sortable

Comment on lines +409 to +443
ControlPlane providers MUST report version information in the ControlPlane `status` by implementing
at least one of the following fields.

```go
type FooControlPlaneStatus struct {
// versions is the aggregated Kubernetes versions in this control plane.
// +optional
// +listType=map
// +listMapKey=version
// +kubebuilder:validation:MaxItems=100
Versions []clusterv1.StatusVersion `json:"versions,omitempty"`

// version represents the minimum Kubernetes version for the control plane machines
// in the cluster.
//
// Deprecated: This field is deprecated and is going to be removed in a future API version.
// Please use status.versions instead.
// +optional
// +kubebuilder:validation:MinLength=1
// +kubebuilder:validation:MaxLength=256
Version string `json:"version,omitempty"`

// See other rules for more details about mandatory/optional fields in ControlPlane status.
// Other fields SHOULD be added based on the needs of your provider.
}
```

NOTE: To align with API conventions, we recommend since the v1beta2 contract that the `Version` field should be
`status.versions` is the preferred source of truth for surfacing control plane versions.
`status.version` is still read as fallback for backward compatibility.
Providers SHOULD implement `status.versions`, and MAY additionally implement the deprecated `status.version`
for compatibility during the transition period.

NOTE: To align with API conventions, we recommend since the v1beta2 contract that the `Version` field should be
of type `string` (it was `*string` before). Both are compatible with the v1beta2 contract though.
NOTE: The minimum Kubernetes version, and more specifically the API server version, will be used to determine
when a control plane is fully upgraded (spec.version == status.version) and for enforcing Kubernetes version skew
policies when a Cluster derived from a ClusterClass is managed by the Topology controller.
NOTE: The minimum Kubernetes version, and more specifically the API server version, will be used to determine
when a control plane is fully upgraded and for enforcing Kubernetes version skew policies when a Cluster derived
from a ClusterClass is managed by the Topology controller.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about

Suggested change
ControlPlane providers MUST report version information in the ControlPlane `status` by implementing
at least one of the following fields.
```go
type FooControlPlaneStatus struct {
// versions is the aggregated Kubernetes versions in this control plane.
// +optional
// +listType=map
// +listMapKey=version
// +kubebuilder:validation:MaxItems=100
Versions []clusterv1.StatusVersion `json:"versions,omitempty"`
// version represents the minimum Kubernetes version for the control plane machines
// in the cluster.
//
// Deprecated: This field is deprecated and is going to be removed in a future API version.
// Please use status.versions instead.
// +optional
// +kubebuilder:validation:MinLength=1
// +kubebuilder:validation:MaxLength=256
Version string `json:"version,omitempty"`
// See other rules for more details about mandatory/optional fields in ControlPlane status.
// Other fields SHOULD be added based on the needs of your provider.
}
```
NOTE: To align with API conventions, we recommend since the v1beta2 contract that the `Version` field should be
`status.versions` is the preferred source of truth for surfacing control plane versions.
`status.version` is still read as fallback for backward compatibility.
Providers SHOULD implement `status.versions`, and MAY additionally implement the deprecated `status.version`
for compatibility during the transition period.
NOTE: To align with API conventions, we recommend since the v1beta2 contract that the `Version` field should be
of type `string` (it was `*string` before). Both are compatible with the v1beta2 contract though.
NOTE: The minimum Kubernetes version, and more specifically the API server version, will be used to determine
when a control plane is fully upgraded (spec.version == status.version) and for enforcing Kubernetes version skew
policies when a Cluster derived from a ClusterClass is managed by the Topology controller.
NOTE: The minimum Kubernetes version, and more specifically the API server version, will be used to determine
when a control plane is fully upgraded and for enforcing Kubernetes version skew policies when a Cluster derived
from a ClusterClass is managed by the Topology controller.
ControlPlane providers MUST report version information in the ControlPlane `status` by implementing
at least one of the following fields.
`status.versions` is the preferred source of truth for surfacing control plane versions.
... type with only versions + others
`status.version` can be used as alternative (or as a fallback mechanism), but the support
for this field will be removed in the next Cluster API contract version
... type with only version + others
NOTE: To align with API conventions, we recommend since the v1beta2 contract that the `Version` field should be
of type `string` (it was `*string` before). Both are compatible with the v1beta2 contract though.
NOTE: The minimum Kubernetes version, and more specifically the API server version, will be used to determine
when a control plane is fully upgraded and for enforcing Kubernetes version skew policies when a Cluster derived
from a ClusterClass is managed by the Topology controller.

So we give a cleaner guidance on the target state (without

Comment on lines +374 to +377
case lowestErr != nil && vErr == nil:
lowest = v
case lowestErr != nil && vErr != nil && v < lowest:
lowest = v
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If think we should fail in case of error (changing lowest in arbitrary ways due to errors seems not correct)

vSemver, vErr := semver.ParseTolerant(v)
switch {
case lowestErr == nil && vErr == nil:
if version.Compare(vSemver, lowestSemver, version.WithBuildTags()) < 0 {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By using version.WithBuildTags() we are introducing notion of order, because in case of two version where it is not possible to determine order, the first one will be considered.

I'm wondering if this should be surfaced in the contract the fact that the list of version must be ordered from the older to the newer version.


// Keep status.version as a deprecated fallback by reporting the lowest version.
if len(controlPlane.KCP.Status.Versions) > 0 {
controlPlane.KCP.Status.Version = controlPlane.KCP.Status.Versions[0].Version //nolint:staticcheck // status.version is intentionally backfilled for backward compatibility until the deprecated field is removed.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's ass a note that [0] relies on the assumption that the first version is the minor in the list.

mp.Status.ReadyReplicas = mp.Status.Replicas
mp.Status.AvailableReplicas = mp.Status.Replicas
mp.Status.UpToDateReplicas = mp.Spec.Replicas
mp.Status.Versions = nil
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if we should move here the logic currently implemented in IsMachinePoolUpgrading that gets version from nodes

Comment on lines +37 to +40
if machine.Status.NodeInfo == nil || machine.Status.NodeInfo.KubeletVersion == "" {
continue
}
versionCounts[machine.Status.NodeInfo.KubeletVersion]++
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few concern on the idea of reading version from node info.

The main concern is that we don't have any guarantee that the version reported by kubelet matches the spec.version (it is usually the case, but we never made this a formal contract for image builder / for boostrap providers).

If by chances mismatch happens, then all the comparison between spec.version and status version might fail in unexpected ways.

A secondary concern is that we are now using this func for KCP, while KCP before was reading from spec.version

if errj == nil {
return false
}
return versions[i].Version < versions[j].Version
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we add also name as a tie breaker in case versions are equal?

Comment on lines +91 to +93
versionCounts := map[string]int32{}
AddMachineKubeletVersions(versionCounts, machines)
return StatusVersionsFromCountMap(versionCounts)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we can use this func to have a stronger notion or order of versions.

If we assume that machines are created in order, and that we usually upgrade, we can assume that sorting machines by creation timestamp might help to improve version ordering.

However, with the current implementation that uses a map[string]int32 as an intermediate struct, we will lose this notion of ordering, so this need some thinking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-area PR is missing an area label ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

More visibility on current versions

5 participants