Before creating an issue, make sure you've checked the following:
Platform
Linux (observed on workers running k0s in HA)
Version
v1.35.1+k0s.1
Sysinfo
`k0s sysinfo`
➡️ Please replace this text with the output of `k0s sysinfo`. ⬅️
What happened?
The applyingUpdate reconciler in pkg/autopilot/controller/signal/k0s/apply.go is not idempotent. If the worker successfully renames /usr/local/bin/k0s.tmp → /usr/local/bin/k0s but the subsequent client.Update(ctx, node) fails (e.g. with a resourceVersion conflict — "the object has been modified"), controller-runtime retries the reconciler. On retry, os.Stat("/usr/local/bin/k0s.tmp") fails because the file was already moved, and the reconciler enters an infinite error loop:
Applying update
Reconciler error error="unable to find update file 'k0s.tmp': stat /usr/local/bin/k0s.tmp: no such file or directory"
The Node's autopilot signal annotation stays stuck at ApplyingUpdate forever, even though the target binary is already correctly in place at /usr/local/bin/k0s. Restarting the worker (or the k0s service) is not sufficient — on restart, the reconciler reads the same stuck annotation and re-enters the same failing loop. The only recovery is operator intervention: delete the autopilot Plan and remove the k0sproject.io/autopilot-signal-data annotation from each affected worker Node before a new Plan can make progress.
Reference source on main:
https://github.com/k0sproject/k0s/blob/main/pkg/autopilot/controller/signal/k0s/apply.go
The reconciler does:
client.Get(ctx, node) — read Node
os.Stat("/usr/local/bin/k0s.tmp") — check downloaded binary
os.Rename("/usr/local/bin/k0s.tmp", "/usr/local/bin/k0s") — replace binary
client.Update(ctx, node) — write Restart status to Node annotation
If step 4 fails after step 3 has run, the controller-runtime retry hits step 2 and fails forever.
Note: PR #6994 ("Requeue Autopilot signal node updates on conflict", merged 2026-01-27) fixes an analogous conflict-on-update bug in pkg/autopilot/controller/plans/cmdprovider/{k0supdate,airgapupdate}/schedulable.go, but does not touch signal/k0s/apply.go. The worker-side applyingUpdate reconciler remains non-idempotent.
Steps to reproduce
-
Create a 2-node cluster (1 controller + 1 worker) running k0s v1.35.1+k0s.1 with autopilot enabled.
On the controller node:
curl --proto '=https' --tlsv1.2 -sSf https://get.k0s.sh | sudo K0S_VERSION=v1.35.1+k0s.1 sh
sudo k0s install controller --enable-worker
sudo k0s start
On the controller, generate a worker join token:
sudo k0s token create --role=worker > /tmp/worker-token
Copy /tmp/worker-token over to the worker node.
On the worker node:
curl --proto '=https' --tlsv1.2 -sSf https://get.k0s.sh | sudo K0S_VERSION=v1.35.1+k0s.1 sh
sudo k0s install worker --token-file /tmp/worker-token
sudo k0s start
Wait for the cluster to be ready (run on the controller):
sudo k0s kubectl wait --for=condition=Ready node --all --timeout=120s
-
On the worker, start a tight loop that moves k0s.tmp into place as soon as it appears:
while true; do
{ [ -f /usr/local/bin/k0s.tmp ] && mv /usr/local/bin/k0s.tmp /usr/local/bin/k0s; } 2>/dev/null || true
sleep 0.05 2>/dev/null || true
done
-
From the controller, create an autopilot Plan that upgrades to a newer version (e.g. v1.35.4+k0s.0) with selector: {} discovery so all nodes are targeted.
-
Observe the worker's autopilot logs:
journalctl -u k0sworker -f | grep k0s.tmp
The applyingUpdate reconciler loops forever with:
unable to find update file 'k0s.tmp': stat /usr/local/bin/k0s.tmp: no such file or directory
-
Verify the binary was moved into place and is the correct target version:
/usr/local/bin/k0s version
# v1.35.4+k0s.0
The Node annotation k0sproject.io/autopilot-signal-data stays stuck at status.status: ApplyingUpdate indefinitely.
Note: This reproducer does not simulate the original client.Update API-conflict trigger. It instead deterministically produces the same on-disk state the race leaves behind — k0s.tmp already moved into place before the reconciler's os.Stat runs. The reconciler then enters the same infinite error loop, which proves the underlying bug is the reconciler's non-idempotency, independent of what triggers the second pass.
Expected behavior
The applyingUpdate reconciler should be idempotent: if k0s.tmp is missing, it should check whether /usr/local/bin/k0s is already the requested target version and, if so, proceed to write the Restart status to the Node annotation instead of erroring out. This would make the reconciler resilient to:
- The
client.Update conflict / retry path.
- Any other scenario in which the binary is already in place from a prior partial run.
Actual behavior
The reconciler enters an infinite error loop. The Node annotation k0sproject.io/autopilot-signal-data is permanently stuck at status.status: ApplyingUpdate, even though the upgrade binary is correctly installed and k0s version reports the new version. The autopilot Plan never progresses to Restart on the affected worker.
Screenshots and logs
# journalctl -u k0sworker -f | grep autopilot
...
applying-update Applying update
applying-update Reconciler error error="unable to find update file 'k0s.tmp': stat /usr/local/bin/k0s.tmp: no such file or directory"
applying-update Applying update
applying-update Reconciler error error="unable to find update file 'k0s.tmp': stat /usr/local/bin/k0s.tmp: no such file or directory"
... (repeats every few seconds indefinitely)
When the trigger is the client.Update conflict, the preceding log line is:
failed to update signal node to status 'Restart':
Operation cannot be fulfilled on nodes "<node>": the object has been modified;
please apply your changes to the latest version and try again
Additional context
No response
Before creating an issue, make sure you've checked the following:
Platform
Linux (observed on workers running k0s in HA)Version
v1.35.1+k0s.1
Sysinfo
`k0s sysinfo`
What happened?
The
applyingUpdatereconciler inpkg/autopilot/controller/signal/k0s/apply.gois not idempotent. If the worker successfully renames/usr/local/bin/k0s.tmp→/usr/local/bin/k0sbut the subsequentclient.Update(ctx, node)fails (e.g. with aresourceVersionconflict — "the object has been modified"), controller-runtime retries the reconciler. On retry,os.Stat("/usr/local/bin/k0s.tmp")fails because the file was already moved, and the reconciler enters an infinite error loop:The Node's autopilot signal annotation stays stuck at
ApplyingUpdateforever, even though the target binary is already correctly in place at/usr/local/bin/k0s. Restarting the worker (or the k0s service) is not sufficient — on restart, the reconciler reads the same stuck annotation and re-enters the same failing loop. The only recovery is operator intervention: delete the autopilotPlanand remove thek0sproject.io/autopilot-signal-dataannotation from each affected worker Node before a new Plan can make progress.Reference source on
main:https://github.com/k0sproject/k0s/blob/main/pkg/autopilot/controller/signal/k0s/apply.go
The reconciler does:
client.Get(ctx, node)— read Nodeos.Stat("/usr/local/bin/k0s.tmp")— check downloaded binaryos.Rename("/usr/local/bin/k0s.tmp", "/usr/local/bin/k0s")— replace binaryclient.Update(ctx, node)— writeRestartstatus to Node annotationIf step 4 fails after step 3 has run, the controller-runtime retry hits step 2 and fails forever.
Note: PR #6994 ("Requeue Autopilot signal node updates on conflict", merged 2026-01-27) fixes an analogous conflict-on-update bug in
pkg/autopilot/controller/plans/cmdprovider/{k0supdate,airgapupdate}/schedulable.go, but does not touchsignal/k0s/apply.go. The worker-sideapplyingUpdatereconciler remains non-idempotent.Steps to reproduce
Create a 2-node cluster (1 controller + 1 worker) running k0s v1.35.1+k0s.1 with autopilot enabled.
On the controller node:
On the controller, generate a worker join token:
sudo k0s token create --role=worker > /tmp/worker-tokenCopy
/tmp/worker-tokenover to the worker node.On the worker node:
Wait for the cluster to be ready (run on the controller):
sudo k0s kubectl wait --for=condition=Ready node --all --timeout=120sOn the worker, start a tight loop that moves
k0s.tmpinto place as soon as it appears:From the controller, create an autopilot
Planthat upgrades to a newer version (e.g. v1.35.4+k0s.0) withselector: {}discovery so all nodes are targeted.Observe the worker's autopilot logs:
The
applyingUpdatereconciler loops forever with:Verify the binary was moved into place and is the correct target version:
/usr/local/bin/k0s version # v1.35.4+k0s.0The Node annotation
k0sproject.io/autopilot-signal-datastays stuck atstatus.status: ApplyingUpdateindefinitely.Note: This reproducer does not simulate the original
client.UpdateAPI-conflict trigger. It instead deterministically produces the same on-disk state the race leaves behind —k0s.tmpalready moved into place before the reconciler'sos.Statruns. The reconciler then enters the same infinite error loop, which proves the underlying bug is the reconciler's non-idempotency, independent of what triggers the second pass.Expected behavior
The
applyingUpdatereconciler should be idempotent: ifk0s.tmpis missing, it should check whether/usr/local/bin/k0sis already the requested target version and, if so, proceed to write theRestartstatus to the Node annotation instead of erroring out. This would make the reconciler resilient to:client.Updateconflict / retry path.Actual behavior
The reconciler enters an infinite error loop. The Node annotation
k0sproject.io/autopilot-signal-datais permanently stuck atstatus.status: ApplyingUpdate, even though the upgrade binary is correctly installed andk0s versionreports the new version. The autopilot Plan never progresses toRestarton the affected worker.Screenshots and logs
When the trigger is the
client.Updateconflict, the preceding log line is:Additional context
No response