✨ Surface aggregated machine versions in status#13341
✨ Surface aggregated machine versions in status#13341miltalex wants to merge 1 commit intokubernetes-sigs:mainfrom
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
This PR is currently missing an area label, which is used to identify the modified component when generating release notes. Area labels can be added by org members by writing Please see the labels list for possible areas. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
Hi @miltalex. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
0288f90 to
3f39a80
Compare
| }) | ||
|
|
||
| return versions | ||
| } |
There was a problem hiding this comment.
thought: This function looks very similar to internal/util/version.VersionsFromMachines, except that it operates on collections.Machines rather than []*clusterv1.Machine.
Would it make sense to call internalversion.VersionsFromMachines(machines.UnsortedList()) here instead, to avoid maintaining two copies of the same logic?
Is there some nuanced difference between the functions? (If so, that'd be useful to note in a comment.)
There was a problem hiding this comment.
Good point. I switched this to use internal/util/version.VersionsFromMachines(machines.UnsortedList()) so we rely on one shared aggregation/sorting implementation.
| mdVersion := md.Spec.Template.Spec.Version | ||
| if len(md.Status.Versions) > 0 { | ||
| for _, statusVersion := range md.Status.Versions { | ||
| if statusVersion.Version != mdVersion { |
There was a problem hiding this comment.
question: The version comparison here uses raw string equality while the existing fallback path below parses versions with semver.ParseTolerant. If a provider or kubelet ever reports the version in a slightly different form (e.g. "1.32.0" vs "v1.32.0"), this could give a different answer than the fallback path. Is raw string comparison intentional here for simplicity, or would it be worth using semver-tolerant comparison for consistency?
There was a problem hiding this comment.
I updated the comparison to be semver-tolerant for both status.versions and fallback machine/node paths, so equivalent forms like v1.32.0 and 1.32.0 are treated consistently.
| ``` | ||
|
|
||
| Following fields MUST be implemented in the ControlPlane `status`. | ||
| Following fields SHOULD be implemented in the ControlPlane `status`. |
There was a problem hiding this comment.
question: Would it make sense to require that at least one of these is implemented? (It seems like the weaker language unintentionally allows providers to drop version reporting entirely, but I don't think that's the intent.)
There was a problem hiding this comment.
indeed that change was by mistake, was aiming to suggest the providers to move to versions.
| } | ||
|
|
||
| // StatusVersion groups version-related status information. | ||
| // +kubebuilder:validation:MinProperties=1 |
There was a problem hiding this comment.
question: Given that Version is +required, does this validation provide value? I don't think it's incorrect, but may be noise in the CRD schema.
There was a problem hiding this comment.
removed it. It makes sense
Signed-off-by: Miltiadis Alexis <[email protected]>
3f39a80 to
3336482
Compare
|
/ok-to-test |
|
@miltalex: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
fabriziopandini
left a comment
There was a problem hiding this comment.
Thanks for this PR, I really would like to try to have this merged at the beginning of the next cycle.
IMO the two main points to be addressed are
- spec.version vs kubelet.version
- how to deal with the fact that in some case version different but are not sortable
| ControlPlane providers MUST report version information in the ControlPlane `status` by implementing | ||
| at least one of the following fields. | ||
|
|
||
| ```go | ||
| type FooControlPlaneStatus struct { | ||
| // versions is the aggregated Kubernetes versions in this control plane. | ||
| // +optional | ||
| // +listType=map | ||
| // +listMapKey=version | ||
| // +kubebuilder:validation:MaxItems=100 | ||
| Versions []clusterv1.StatusVersion `json:"versions,omitempty"` | ||
|
|
||
| // version represents the minimum Kubernetes version for the control plane machines | ||
| // in the cluster. | ||
| // | ||
| // Deprecated: This field is deprecated and is going to be removed in a future API version. | ||
| // Please use status.versions instead. | ||
| // +optional | ||
| // +kubebuilder:validation:MinLength=1 | ||
| // +kubebuilder:validation:MaxLength=256 | ||
| Version string `json:"version,omitempty"` | ||
|
|
||
| // See other rules for more details about mandatory/optional fields in ControlPlane status. | ||
| // Other fields SHOULD be added based on the needs of your provider. | ||
| } | ||
| ``` | ||
|
|
||
| NOTE: To align with API conventions, we recommend since the v1beta2 contract that the `Version` field should be | ||
| `status.versions` is the preferred source of truth for surfacing control plane versions. | ||
| `status.version` is still read as fallback for backward compatibility. | ||
| Providers SHOULD implement `status.versions`, and MAY additionally implement the deprecated `status.version` | ||
| for compatibility during the transition period. | ||
|
|
||
| NOTE: To align with API conventions, we recommend since the v1beta2 contract that the `Version` field should be | ||
| of type `string` (it was `*string` before). Both are compatible with the v1beta2 contract though. | ||
| NOTE: The minimum Kubernetes version, and more specifically the API server version, will be used to determine | ||
| when a control plane is fully upgraded (spec.version == status.version) and for enforcing Kubernetes version skew | ||
| policies when a Cluster derived from a ClusterClass is managed by the Topology controller. | ||
| NOTE: The minimum Kubernetes version, and more specifically the API server version, will be used to determine | ||
| when a control plane is fully upgraded and for enforcing Kubernetes version skew policies when a Cluster derived | ||
| from a ClusterClass is managed by the Topology controller. |
There was a problem hiding this comment.
What about
| ControlPlane providers MUST report version information in the ControlPlane `status` by implementing | |
| at least one of the following fields. | |
| ```go | |
| type FooControlPlaneStatus struct { | |
| // versions is the aggregated Kubernetes versions in this control plane. | |
| // +optional | |
| // +listType=map | |
| // +listMapKey=version | |
| // +kubebuilder:validation:MaxItems=100 | |
| Versions []clusterv1.StatusVersion `json:"versions,omitempty"` | |
| // version represents the minimum Kubernetes version for the control plane machines | |
| // in the cluster. | |
| // | |
| // Deprecated: This field is deprecated and is going to be removed in a future API version. | |
| // Please use status.versions instead. | |
| // +optional | |
| // +kubebuilder:validation:MinLength=1 | |
| // +kubebuilder:validation:MaxLength=256 | |
| Version string `json:"version,omitempty"` | |
| // See other rules for more details about mandatory/optional fields in ControlPlane status. | |
| // Other fields SHOULD be added based on the needs of your provider. | |
| } | |
| ``` | |
| NOTE: To align with API conventions, we recommend since the v1beta2 contract that the `Version` field should be | |
| `status.versions` is the preferred source of truth for surfacing control plane versions. | |
| `status.version` is still read as fallback for backward compatibility. | |
| Providers SHOULD implement `status.versions`, and MAY additionally implement the deprecated `status.version` | |
| for compatibility during the transition period. | |
| NOTE: To align with API conventions, we recommend since the v1beta2 contract that the `Version` field should be | |
| of type `string` (it was `*string` before). Both are compatible with the v1beta2 contract though. | |
| NOTE: The minimum Kubernetes version, and more specifically the API server version, will be used to determine | |
| when a control plane is fully upgraded (spec.version == status.version) and for enforcing Kubernetes version skew | |
| policies when a Cluster derived from a ClusterClass is managed by the Topology controller. | |
| NOTE: The minimum Kubernetes version, and more specifically the API server version, will be used to determine | |
| when a control plane is fully upgraded and for enforcing Kubernetes version skew policies when a Cluster derived | |
| from a ClusterClass is managed by the Topology controller. | |
| ControlPlane providers MUST report version information in the ControlPlane `status` by implementing | |
| at least one of the following fields. | |
| `status.versions` is the preferred source of truth for surfacing control plane versions. | |
| ... type with only versions + others | |
| `status.version` can be used as alternative (or as a fallback mechanism), but the support | |
| for this field will be removed in the next Cluster API contract version | |
| ... type with only version + others | |
| NOTE: To align with API conventions, we recommend since the v1beta2 contract that the `Version` field should be | |
| of type `string` (it was `*string` before). Both are compatible with the v1beta2 contract though. | |
| NOTE: The minimum Kubernetes version, and more specifically the API server version, will be used to determine | |
| when a control plane is fully upgraded and for enforcing Kubernetes version skew policies when a Cluster derived | |
| from a ClusterClass is managed by the Topology controller. |
So we give a cleaner guidance on the target state (without
| case lowestErr != nil && vErr == nil: | ||
| lowest = v | ||
| case lowestErr != nil && vErr != nil && v < lowest: | ||
| lowest = v |
There was a problem hiding this comment.
If think we should fail in case of error (changing lowest in arbitrary ways due to errors seems not correct)
| vSemver, vErr := semver.ParseTolerant(v) | ||
| switch { | ||
| case lowestErr == nil && vErr == nil: | ||
| if version.Compare(vSemver, lowestSemver, version.WithBuildTags()) < 0 { |
There was a problem hiding this comment.
By using version.WithBuildTags() we are introducing notion of order, because in case of two version where it is not possible to determine order, the first one will be considered.
I'm wondering if this should be surfaced in the contract the fact that the list of version must be ordered from the older to the newer version.
|
|
||
| // Keep status.version as a deprecated fallback by reporting the lowest version. | ||
| if len(controlPlane.KCP.Status.Versions) > 0 { | ||
| controlPlane.KCP.Status.Version = controlPlane.KCP.Status.Versions[0].Version //nolint:staticcheck // status.version is intentionally backfilled for backward compatibility until the deprecated field is removed. |
There was a problem hiding this comment.
Let's ass a note that [0] relies on the assumption that the first version is the minor in the list.
| mp.Status.ReadyReplicas = mp.Status.Replicas | ||
| mp.Status.AvailableReplicas = mp.Status.Replicas | ||
| mp.Status.UpToDateReplicas = mp.Spec.Replicas | ||
| mp.Status.Versions = nil |
There was a problem hiding this comment.
Wondering if we should move here the logic currently implemented in IsMachinePoolUpgrading that gets version from nodes
| if machine.Status.NodeInfo == nil || machine.Status.NodeInfo.KubeletVersion == "" { | ||
| continue | ||
| } | ||
| versionCounts[machine.Status.NodeInfo.KubeletVersion]++ |
There was a problem hiding this comment.
I have a few concern on the idea of reading version from node info.
The main concern is that we don't have any guarantee that the version reported by kubelet matches the spec.version (it is usually the case, but we never made this a formal contract for image builder / for boostrap providers).
If by chances mismatch happens, then all the comparison between spec.version and status version might fail in unexpected ways.
A secondary concern is that we are now using this func for KCP, while KCP before was reading from spec.version
| if errj == nil { | ||
| return false | ||
| } | ||
| return versions[i].Version < versions[j].Version |
There was a problem hiding this comment.
should we add also name as a tie breaker in case versions are equal?
| versionCounts := map[string]int32{} | ||
| AddMachineKubeletVersions(versionCounts, machines) | ||
| return StatusVersionsFromCountMap(versionCounts) |
There was a problem hiding this comment.
I'm wondering if we can use this func to have a stronger notion or order of versions.
If we assume that machines are created in order, and that we usually upgrade, we can assume that sorting machines by creation timestamp might help to improve version ordering.
However, with the current implementation that uses a map[string]int32 as an intermediate struct, we will lose this notion of ordering, so this need some thinking.
What this PR does / why we need it:
KubeadmControlPlane.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)format, will close the issue(s) when PR gets merged):Fixes #13303
P.S: I have tested the above changes locally using the docker provider. For example during upgrade