Autopilot `applyingUpdate` reconciler is non-idempotent — workers wedge forever after `client.Update` conflict

### Before creating an issue, make sure you've checked the following:

- [x] You are running the latest released version of k0s
- [x] Make sure you've searched for existing issues, both open and closed
- [x] Make sure you've searched for PRs too, a fix might've been merged already
- [x] You're looking at docs for the released version, "main" branch docs are usually ahead of released versions.

### Platform

```shell
Linux (observed on workers running k0s in HA)
```

### Version

v1.35.1+k0s.1

### Sysinfo

<details><summary>`k0s sysinfo`</summary>
<pre>
➡️ Please replace this text with the output of `k0s sysinfo`. ⬅️
</pre>
</details>


### What happened?

The `applyingUpdate` reconciler in `pkg/autopilot/controller/signal/k0s/apply.go` is **not idempotent**. If the worker successfully renames `/usr/local/bin/k0s.tmp` → `/usr/local/bin/k0s` but the subsequent `client.Update(ctx, node)` fails (e.g. with a `resourceVersion` conflict — "the object has been modified"), controller-runtime retries the reconciler. On retry, `os.Stat("/usr/local/bin/k0s.tmp")` fails because the file was already moved, and the reconciler enters an infinite error loop:

```
Applying update
Reconciler error  error="unable to find update file 'k0s.tmp': stat /usr/local/bin/k0s.tmp: no such file or directory"
```

The Node's autopilot signal annotation stays stuck at `ApplyingUpdate` forever, even though the target binary is already correctly in place at `/usr/local/bin/k0s`. Restarting the worker (or the k0s service) is **not** sufficient — on restart, the reconciler reads the same stuck annotation and re-enters the same failing loop. The only recovery is operator intervention: delete the autopilot `Plan` and remove the `k0sproject.io/autopilot-signal-data` annotation from each affected worker Node before a new Plan can make progress.

Reference source on `main`:
https://github.com/k0sproject/k0s/blob/main/pkg/autopilot/controller/signal/k0s/apply.go

The reconciler does:

1. `client.Get(ctx, node)` — read Node
2. `os.Stat("/usr/local/bin/k0s.tmp")` — check downloaded binary
3. `os.Rename("/usr/local/bin/k0s.tmp", "/usr/local/bin/k0s")` — replace binary
4. `client.Update(ctx, node)` — write `Restart` status to Node annotation

If step 4 fails after step 3 has run, the controller-runtime retry hits step 2 and fails forever.

Note: PR #6994 ("Requeue Autopilot signal node updates on conflict", merged 2026-01-27) fixes an analogous conflict-on-update bug in  `pkg/autopilot/controller/plans/cmdprovider/{k0supdate,airgapupdate}/schedulable.go`, but does **not** touch `signal/k0s/apply.go`. The worker-side `applyingUpdate` reconciler remains non-idempotent.

### Steps to reproduce

1. Create a 2-node cluster (1 controller + 1 worker) running k0s **v1.35.1+k0s.1** with autopilot enabled.
   **On the controller node:**

   ```bash
   curl --proto '=https' --tlsv1.2 -sSf https://get.k0s.sh | sudo K0S_VERSION=v1.35.1+k0s.1 sh
   sudo k0s install controller --enable-worker
   sudo k0s start
   ```

   **On the controller, generate a worker join token:**

   ```bash
   sudo k0s token create --role=worker > /tmp/worker-token
   ```

   Copy `/tmp/worker-token` over to the worker node.

   **On the worker node:**

   ```bash
   curl --proto '=https' --tlsv1.2 -sSf https://get.k0s.sh | sudo K0S_VERSION=v1.35.1+k0s.1 sh
   sudo k0s install worker --token-file /tmp/worker-token
   sudo k0s start
   ```

   **Wait for the cluster to be ready** (run on the controller):

   ```bash
   sudo k0s kubectl wait --for=condition=Ready node --all --timeout=120s
   ```

2. On the worker, start a tight loop that moves `k0s.tmp` into place as soon as it appears:

   ```bash
   while true; do
     { [ -f /usr/local/bin/k0s.tmp ] && mv /usr/local/bin/k0s.tmp /usr/local/bin/k0s; } 2>/dev/null || true
     sleep 0.05 2>/dev/null || true
   done
   ```

3. From the controller, create an autopilot `Plan` that upgrades to a newer version (e.g. v1.35.4+k0s.0) with `selector: {}` discovery so all nodes are targeted.
4. Observe the worker's autopilot logs:

   ```
   journalctl -u k0sworker -f | grep k0s.tmp
   ```

   The `applyingUpdate` reconciler loops forever with:

   ```
   unable to find update file 'k0s.tmp': stat /usr/local/bin/k0s.tmp: no such file or directory
   ```

5. Verify the binary was moved into place and is the correct target version:

   ```bash
   /usr/local/bin/k0s version
   # v1.35.4+k0s.0
   ```

   The Node annotation `k0sproject.io/autopilot-signal-data` stays stuck at `status.status: ApplyingUpdate` indefinitely.

**Note:** This reproducer does **not** simulate the original `client.Update` API-conflict trigger. It instead deterministically produces the same on-disk state the race leaves behind — `k0s.tmp` already moved into place before the reconciler's `os.Stat` runs. The reconciler then enters the same infinite error loop, which proves the underlying bug is the reconciler's non-idempotency, independent of what triggers the second pass.

### Expected behavior

The `applyingUpdate` reconciler should be idempotent: if `k0s.tmp` is missing, it should check whether `/usr/local/bin/k0s` is already the requested target version and, if so, proceed to write the `Restart` status to the Node annotation instead of erroring out. This would make the reconciler resilient to:

1. The `client.Update` conflict / retry path.
2. Any other scenario in which the binary is already in place from a prior partial run.

### Actual behavior

The reconciler enters an infinite error loop. The Node annotation `k0sproject.io/autopilot-signal-data` is permanently stuck at `status.status: ApplyingUpdate`, even though the upgrade binary is correctly installed and `k0s version` reports the new version. The autopilot Plan never progresses to `Restart` on the affected worker.

### Screenshots and logs

```
# journalctl -u k0sworker -f | grep autopilot
...
applying-update  Applying update
applying-update  Reconciler error  error="unable to find update file 'k0s.tmp': stat /usr/local/bin/k0s.tmp: no such file or directory"
applying-update  Applying update
applying-update  Reconciler error  error="unable to find update file 'k0s.tmp': stat /usr/local/bin/k0s.tmp: no such file or directory"
... (repeats every few seconds indefinitely)
```

When the trigger is the `client.Update` conflict, the preceding log line is:

```
failed to update signal node to status 'Restart':
Operation cannot be fulfilled on nodes "<node>": the object has been modified;
please apply your changes to the latest version and try again
```


### Additional context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Autopilot `applyingUpdate` reconciler is non-idempotent — workers wedge forever after `client.Update` conflict #7703

Before creating an issue, make sure you've checked the following:

Platform

Version

Sysinfo

What happened?

Steps to reproduce

Expected behavior

Actual behavior

Screenshots and logs

Additional context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Autopilot applyingUpdate reconciler is non-idempotent — workers wedge forever after client.Update conflict #7703

Description

Before creating an issue, make sure you've checked the following:

Platform

Version

Sysinfo

What happened?

Steps to reproduce

Expected behavior

Actual behavior

Screenshots and logs

Additional context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Autopilot `applyingUpdate` reconciler is non-idempotent — workers wedge forever after `client.Update` conflict #7703