Skip to content

✨ Write kubeadm control plane version file for workers to use to fetch the matching kubeadm binary#13433

Open
AcidLeroy wants to merge 10 commits intokubernetes-sigs:mainfrom
AcidLeroy:kubeadm-version
Open

✨ Write kubeadm control plane version file for workers to use to fetch the matching kubeadm binary#13433
AcidLeroy wants to merge 10 commits intokubernetes-sigs:mainfrom
AcidLeroy:kubeadm-version

Conversation

@AcidLeroy
Copy link
Copy Markdown

@AcidLeroy AcidLeroy commented Mar 10, 2026

What this PR does / why we need it

Kubernetes allows some skew between the control plane and kubelets, but kubeadm's own skew policy requires the kubeadm binary used for kubeadm join to match the kubeadm used when the cluster was created or last upgraded on that path—so you cannot rely on an older kubeadm on the worker when the control plane is newer.

That conflicts with real Cluster API flows (e.g. scaling or remediating workers still on an older Kubernetes while the control plane has moved ahead), as discussed in #13315.

This PR:

  • Resolves the control plane Kubernetes version for join (from KubeadmControlPlane when available) and uses it when generating join bootstrap data so join config matches the cluster the node is joining. If the control plane object cannot be read while a controlPlaneRef is set, reconciliation fails and status conditions surface the error (no silent fallback to the Machine version in that case). When there is no control plane ref or the referenced object does not expose a version, the controller falls back to the Machine's Kubernetes version as before.
  • Surfaces how join version was chosen on KubeadmConfig: the ControlPlaneKubernetesVersionAvailable condition stays True for both success paths, but Reason (and Message) distinguish version read from the control plane reference vs version taken from the Machine because the reference is unset or has no version—so operators can see at a glance whether the skew contract is being driven by the cluster control plane or the worker.
  • Adds a new contentFormat field to spec.files with two values: "raw" (default, content used verbatim as today) and "template" (content rendered as Go text/template). Template data includes {{ .controlPlane.version }}, so operators can wire their own steps (scripts, package installs, downloads) to install a kubeadm binary that matches the control plane before kubeadm join runs—without CAPI prescribing a single install mechanism.
  • RBAC for the kubeadm bootstrap controller to read KubeadmControlPlane where needed for version resolution.
  • E2E template for cluster-template-topology-kubeadm-version moved into its own kustomize sub-folder (consistent with other topology templates) and wired into the generate-e2e-templates-main Makefile target.
  • Tests: unit coverage for template parse vs execute failures in spec.files, controller tests for the new condition reasons (including scheme fix for WorkerJoinWithControlPlaneRef), and E2E coverage (KubeadmVersionOnJoin + clusterclass-quick-start-kubeadm-version) demonstrating the pattern end-to-end.
sequenceDiagram
    participant CP as Control plane (newer K8s)
    participant BC as Kubeadm bootstrap controller
    participant W as Worker (older image / kubelet)

    BC->>CP: Read KubeadmControlPlane.spec.version
    CP-->>BC: e.g. v1.35.0
    BC->>BC: Build join data + render spec.files templates
    BC->>W: Render spec.files (contentFormat: template) e.g. fetch script with {{ .controlPlane.version }}
    Note over W: preKubeadmCommands (operator-defined)
    W->>W: Install/fetch kubeadm matching CP version
    W->>CP: kubeadm join (binary matches policy)
Loading

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):

Related to #13315

/area bootstrap
/area test

@k8s-ci-robot k8s-ci-robot added the area/bootstrap Issues or PRs related to bootstrap providers label Mar 10, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

@AcidLeroy: The label(s) area/test cannot be applied, because the repository doesn't have them.

Details

In response to this:

What this PR does / why we need it:

When a worker node joins a cluster where the control plane has already been upgraded to a newer Kubernetes version, the kubeadm version skew policy can be violated: the joining node's kubeadm binary (matching its older OS image) is older than the control plane version. This becomes a real problem in scenarios like scaling up a MachineDeployment during an upgrade or when supporting workers pinned to older Kubernetes versions long-term.

This PR introduces a kubeadm version contract for worker nodes joining a cluster:

  1. Control plane version resolution for join: The kubeadm bootstrap controller now resolves the control plane version (from KubeadmControlPlane.spec.version) and uses it—instead of the joining machine's own version—when generating the kubeadm join configuration. For example, a v1.34 worker joining a v1.35 control plane will get join data generated for v1.35.

  2. Version file written to the node: A file is written at /run/cluster-api/kubeadm-version/version on every joining worker node (via both cloud-init and ignition). This file contains the control plane's Kubernetes version and acts as a contract: operators can provide a custom preKubeadmCommands script that reads this file and fetches/installs the matching kubeadm binary before kubeadm join runs.

  3. RBAC: The kubeadm bootstrap controller now has read access to KubeadmControlPlane resources to look up the control plane version.

  4. E2E tests: A new KubeadmVersionOnJoin e2e test validates the flow end-to-end using CAPD with a dedicated ClusterClass (clusterclass-quick-start-kubeadm-version) that embeds a fetch script reading the version file.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Related to #13315

/area bootstrap
/area test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Mar 10, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign chrischdi for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Mar 10, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Hi @AcidLeroy. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Comment thread Tiltfile Outdated
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 10, 2026
@AcidLeroy AcidLeroy changed the title Write kubeadm control plane version file for workers to use to fetch the matching kubeadm ✨ Write kubeadm control plane version file for workers to use to fetch the matching kubeadm Mar 10, 2026
@AcidLeroy AcidLeroy changed the title ✨ Write kubeadm control plane version file for workers to use to fetch the matching kubeadm ✨ Write kubeadm control plane version file for workers to use to fetch the matching kubeadm binary Mar 10, 2026
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 10, 2026
@AcidLeroy AcidLeroy marked this pull request as draft March 11, 2026 17:18
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 11, 2026
Copy link
Copy Markdown

@zarcen zarcen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting all this together @AcidLeroy. Have some suggestions

Comment thread bootstrap/kubeadm/internal/cloudinit/node.go Outdated
Comment thread bootstrap/kubeadm/internal/controllers/kubeadmconfig_controller.go Outdated
Comment thread bootstrap/kubeadm/internal/controllers/kubeadmconfig_controller_test.go Outdated
Comment thread bootstrap/kubeadm/internal/controllers/kubeadmconfig_controller_test.go Outdated
Comment thread bootstrap/kubeadm/internal/ignition/clc/clc.go
Comment thread bootstrap/kubeadm/internal/ignition/clc/clc_test.go
Comment thread hack/kind-install.sh Outdated
Comment thread test/e2e/kubeadm_version_on_join_test.go Outdated
Comment thread bootstrap/kubeadm/internal/cloudinit/cloudinit_test.go Outdated
@linux-foundation-easycla
Copy link
Copy Markdown

linux-foundation-easycla bot commented Mar 11, 2026

CLA Signed

The committers listed above are authorized under a signed CLA.

@k8s-ci-robot k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. and removed cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Mar 11, 2026
Copy link
Copy Markdown
Member

@neolit123 neolit123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i posted some comments on slack:
https://kubernetes.slack.com/archives/C8TSNPY4T/p1773251442281699

i think CAPI can just write the ClusterConfiguration on disk too.

@AcidLeroy
Copy link
Copy Markdown
Author

i posted some comments on slack: https://kubernetes.slack.com/archives/C8TSNPY4T/p1773251442281699

i think CAPI can just write the ClusterConfiguration on disk too.

@neolit123 I will provide an alternative solution by writing the ClusterConfiguration in a separate branch for now. If we find that we prefer that one, I'll merge it into this PR.

@AcidLeroy
Copy link
Copy Markdown
Author

@neolit123 Is this sort of what you are thinking: https://github.com/AcidLeroy/cluster-api/pull/3/changes

@neolit123
Copy link
Copy Markdown
Member

neolit123 commented Mar 12, 2026

@neolit123 Is this sort of what you are thinking: https://github.com/AcidLeroy/cluster-api/pull/3/changes

yes, sgtm, but up to maintainers to decide.

EDIT: in the slack thread we figured out that the CAPI v1beta2 ClusteConfiguration doesn't have the kubernetesVersion field, so my proposal is not useful.

@AcidLeroy
Copy link
Copy Markdown
Author

Rather than hard coding a file, we should look into providing a go template file (kubeadmconfig) and then in the kube bootsrtap config controller, we could render that file with the version directly into it.

Look into to resolveFiles in kubeadm and templating the version into the "fetch kubeadm version" script.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file might be in the gitignore file. Should double check.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It definitely is on gitignore. If I create a file with a similar name it's ignored and I don't see why it should be different for this file.

Let's do the following

  • Let's move the file into a sub-folder called cluster-template-topology-kubeadm
  • Let's adjust the generate-e2e-templates-main Makefile target to copy the file up one level (Please make sure it is at the right place in the Makefile target based on alphabetic ordering of the target file name)

That way it fits into our regular folder structure and is easier to find

Comment thread test/e2e/kubeadm_version_on_join.go Outdated
Comment thread test/e2e/kubeadm_version_on_join.go Outdated
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Mar 20, 2026
Comment thread bootstrap/kubeadm/internal/controllers/kubeadmconfig_controller.go Outdated
Comment on lines +843 to +867
// getControlPlaneVersionForJoin returns the control plane (cluster) version from the cluster's ControlPlaneRef,
// e.g. KubeadmControlPlane.spec.version. Returns empty string if the cluster has no ControlPlaneRef or the version
// cannot be read (e.g. control plane not found or does not support version). Used for worker join so that
// a 1.34 node uses kubeadm 1.35 when the control plane is at 1.35, for example.
func (r *KubeadmConfigReconciler) getControlPlaneVersionForJoin(ctx context.Context, scope *Scope) string {
if !scope.Cluster.Spec.ControlPlaneRef.IsDefined() {
return ""
}
controlPlane, err := external.GetObjectFromContractVersionedRef(ctx, r.Client, scope.Cluster.Spec.ControlPlaneRef, scope.Cluster.Namespace)
if err != nil {
scope.V(4).Info("Could not get control plane for version, falling back to machine version", "error", err)
return ""
}
cpVersion, err := contract.ControlPlane().Version().Get(controlPlane)
if err != nil {
if !errors.Is(err, contract.ErrFieldNotFound) {
scope.V(4).Info("Could not get control plane version, falling back to machine version", "error", err)
}
return ""
}
if cpVersion == nil {
return ""
}
return *cpVersion
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: I notice this falls back to the machine version for any error (not found, permission denied, network issue, etc.). Is that intentional for all error types, or would it be worth distinguishing "control plane not found / field not present" (expected) from unexpected failures?

Just wondering whether masking unexpected errors here could make debugging harder down the road.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think there are some error states here that we can requeue for and only fall back to machine version as an absolute last resort. I'll push some changes up shortly with an alternative to what I have here.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zjs, I introduced some changes to include another condition so that we can surface any issues with getting the CP version. In this case, we only fall back to the machine version if we absolutely have to, and we'll be able to see what the issue is via the status conditions. LMK what you think! Thanks!

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious to see how others feel, but personally, I like it!

Copy link
Copy Markdown
Member

@sbueringer sbueringer Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding an entire condition just to surface this error seems to much to me.

We should be very careful with introducing new conditions

Based on some of my other comments we might not need this anymore anyway

cc @fabriziopandini

Comment thread api/bootstrap/kubeadm/v1beta2/kubeadmconfig_types.go Outdated
@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Mar 24, 2026
Comment thread api/bootstrap/kubeadm/v1beta1/kubeadmconfig_types.go
Comment thread api/bootstrap/kubeadm/v1beta2/kubeadmconfig_types.go
Comment thread bootstrap/kubeadm/internal/controllers/template_test.go
Comment thread bootstrap/kubeadm/internal/controllers/kubeadmconfig_controller.go
@sbueringer sbueringer marked this pull request as ready for review April 2, 2026 08:10
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 2, 2026
@k8s-ci-robot k8s-ci-robot requested a review from g-gaston April 2, 2026 08:10
@sbueringer
Copy link
Copy Markdown
Member

Let's use hold instead of draft so we can run CI

/ok-to-test
/test pull-cluster-api-e2e-main-gke

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Apr 2, 2026
type Encoding string

// FileContentFormat specifies how file content is interpreted after resolving content/contentFrom and before writing bootstrap data.
// +kubebuilder:validation:Enum="";go-template
Copy link
Copy Markdown
Member

@sbueringer sbueringer Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would use "raw" and "template" as enum values.

I would not make "" a valid value. This is already covered because the field is optional.

I would use template instead of go-template for consistency with other parts of our API: JSONPatchValue.Template & Naming.Template.

(I think after my suggestions are implemented the linter should also be happy)

),
)
}
if file.ContentFormat != "" && file.ContentFormat != FileContentFormatGoTemplate {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This "manual" validation is not necessary. This is already covered via the enum kubebuilder markers.

See CRD schema:

                      enum:
                      - ""
                      - go-template

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in commit#2d9a51

// (semver.String(), no "v" prefix). For a worker joining a cluster, that version is the control plane
// Kubernetes version when the controller can read it; otherwise the Machine's version.
// +optional
ContentFormat FileContentFormat `json:"contentFormat,omitempty"`
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please align the template part of the godoc to what we did here:

// template defines the template to use for generating the names of the
// Machine objects.
// If not defined, it will fallback to `{{ .machineSet.name }}-{{ .random }}`.
// If the generated name string exceeds 63 characters, it will be trimmed to
// 58 characters and will
// get concatenated with a random suffix of length 5.
// Length of the template string must not exceed 256 characters.
// The template allows the following variables `.cluster.name`,
// `.machineSet.name` and `.random`.
// The variable `.cluster.name` retrieves the name of the cluster object
// that owns the Machines being created.
// The variable `.machineSet.name` retrieves the name of the MachineSet
// object that owns the Machines being created.
// The variable `.random` is substituted with random alphanumeric string,
// without vowels, of length 5. This variable is required part of the
// template. If not provided, validation will fail.
// +optional
// +kubebuilder:validation:MinLength=1
// +kubebuilder:validation:MaxLength=256
Template string `json:"template,omitempty"`

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: let's move this field after content / contentFrom so all the options about file content are grouped toghether

// KubernetesVersion is the effective Kubernetes version for bootstrap data (semver.String(), no "v" prefix).
// For a worker Machine joining a cluster, this is the control plane Kubernetes version when the controller
// can read it; otherwise the Machine's version.
KubernetesVersion string
Copy link
Copy Markdown
Member

@sbueringer sbueringer Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should:

  1. provide it as controlPlane.version similar to builtin variables, e.g. via
  	map[string]interface{}{
  		"controlPlane": map[string]interface{}{
  			"version": version,
  		},
  	})

Accordingly we should not fallback to the Machine version

  1. Provide the version with v prefix otherwise it's inconsistent with the version field in the Cluster object and also with builtin variables

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to have a well defined and stable semantic for controlPlane.version (no fallback logic); if this semantic will not be consistent with the one in buildin variables, we should consider adding a suffix, but we can iterate on this.

notably, having a well defined and stable semantic also allows to ensure a consistent the same behaviour for file templates both for init/join control plane and join workers (less magic knobs/hidden behaviours, easier to understand for users and to maintain for us)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refactored to have controlPlane.version struct

*/

// Package bootstrapfiles contains helpers for KubeadmConfig spec.files processing.
package bootstrapfiles
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this is only used in the KubeadmConfig controller. If there is no need for a separate package I wouldn't create one just for this

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to bootstrap/kubeadm/internal/controllers/template.go to avoid unnecessary new package

Comment thread test/e2e/config/docker.yaml Outdated
- sourcePath: "../data/infrastructure-docker/main/cluster-template-topology-autoscaler.yaml"
- sourcePath: "../data/infrastructure-docker/main/cluster-template-topology.yaml"
- sourcePath: "../data/infrastructure-docker/main/cluster-template-topology-taints.yaml"
- sourcePath: "../data/infrastructure-docker/main/cluster-template-topology-kubeadm-version.yaml"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This list is (or at least should be) alphabetically ordered, please adjust accordingly

(same for the ClusterClass file below)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

made it alphabetically ordered

arch=$$(uname -m)
case "$$arch" in
x86_64) arch="amd64" ;;
aarch64|arm64) arch="arm64" ;;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably a typo "aarch64"

echo "Fetching kubeadm $${version} ($${arch}) from $${url}"
echo "fetch-kubeadm.sh: curl -fLsS -o <tmpfile> $$url"
tmp=$$(mktemp -p /tmp kubeadm.XXXXXX)
if curl -fLsS -o "$$tmp" "$$url"; then
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use something like this? --retry 5 --retry-all-errors (would like to avoid flakes in CI)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added. Good suggestion

preKubeadmCommands:
- |
command -v curl >/dev/null 2>&1 || (apt-get update && apt-get install -y --no-install-recommends curl ca-certificates && rm -rf /var/lib/apt/lists/*)
- 'sh /run/cluster-api/fetch-kubeadm.sh 2>&1 | tee /var/log/fetch-kubeadm.log || true'
Copy link
Copy Markdown
Member

@sbueringer sbueringer Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the bootstrap will continue even if the script fails (but not sure)

Would be good to fail bootstrap if the script fails if that's doable with reasonable effort

"k8s.io/utils/ptr"
)

var _ = Describe("When a worker joins during a control plane upgrade [ClusterClass]", Label("ClusterClass"), func() {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rephrase to "When a worker joins with kubeadm with an older version than the control plane [ClusterClass]"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rephrases as suggested

// +kubebuilder:validation:Enum=base64;gzip;gzip+base64
type Encoding string

// FileContentFormat specifies how file content is interpreted after resolving content/contentFrom and before writing bootstrap data.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This enum is currently defined between Encoding and the encoding constants. Let's please move this to either above or below Encoding

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved as suggested

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

k8s-ci-robot commented Apr 2, 2026

@AcidLeroy: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-cluster-api-e2e-main-gke 27988ea link true /test pull-cluster-api-e2e-main-gke

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

// FileContentFormatRaw means content is used verbatim.
FileContentFormatRaw FileContentFormat = "raw"
// FileContentFormatTemplate means content is rendered as a Go text/template.
// Available template variables are documented by the kubeadm bootstrap provider.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit

Suggested change
// Available template variables are documented by the kubeadm bootstrap provider.

Available template variables are documented down below in this file where FileContentFormat is used
(same in v1beta2)

clusterv1beta1 "sigs.k8s.io/cluster-api/api/core/v1beta1"
)
import clusterv1beta1 "sigs.k8s.io/cluster-api/api/core/v1beta1"

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

Suggested change

(no need of additional empty lines)


import clusterv1 "sigs.k8s.io/cluster-api/api/core/v1beta2"


Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

Suggested change

(no need of additional empty lines)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same below

Comment on lines +867 to +871
// getControlPlaneVersionForJoin returns the control plane (cluster) version from the cluster's ControlPlaneRef,
// e.g. KubeadmControlPlane.spec.version. Returns ("", nil) if the cluster has no ControlPlaneRef or the referenced
// control plane does not expose spec.version (ErrFieldNotFound or unset); callers should fall back to the machine's
// Kubernetes version only in those cases. Returns an error if the control plane object cannot be fetched or if
// the version field cannot be read for any other reason.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still a little bit confused by the semantic of this func and its intended usage.

If the goal of this func is to return the value for the controlPlane.version variable, then there should not be any fallback in the machine's Kubernetes version (and the usage should be the same in init, join workers and join control plane).

Pushing this a little bit further, the entire template resolution should become a internal implementation detail of resolveFile.

I would expect something similar to

Suggested change
// getControlPlaneVersionForJoin returns the control plane (cluster) version from the cluster's ControlPlaneRef,
// e.g. KubeadmControlPlane.spec.version. Returns ("", nil) if the cluster has no ControlPlaneRef or the referenced
// control plane does not expose spec.version (ErrFieldNotFound or unset); callers should fall back to the machine's
// Kubernetes version only in those cases. Returns an error if the control plane object cannot be fetched or if
// the version field cannot be read for any other reason.
// getControlPlaneVersionForJoin returns the control plane (cluster) version from the cluster's ControlPlaneRef,
// e.g. KubeadmControlPlane.spec.version. Returns ("", nil) if the cluster has no ControlPlaneRef or the referenced
// control plane does not expose spec.version (ErrFieldNotFound or unset).
// Returns an error if the control plane object cannot be fetched or if
// the version field cannot be read for any other reason.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also thinking more about it, it makes sense to just error out instead of falling back to machine's version. It'd be some seriously trouble for a joining worker when it cannot fetch controlPlane.version. Will address this in next commit

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be something like kubeadm_join_old_nodes (so we focus on outcomes)?
same comment apply to the func/type name

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/bootstrap Issues or PRs related to bootstrap providers cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants