Failure in `partitions_test` - "Multiple commit values" during `force_become_primary`

Instance here:
https://github.com/microsoft/CCF/actions/runs/22304875267/job/64521474040

We're doing this inside `run_ledger_chunk_bytes_check`:
```
        for node, chunk_size in nodes_and_sizes[1:]:
            force_become_primary(network, args, node)
            with node.client("user0") as c:
                written = 0
                # Stop just before this node completes the chunk
                target_chunk_size = chunk_size - unit_size
                while written < target_chunk_size:
                    r = c.post(
                        "/app/log/public",
                        {"id": chunk_size, "msg": "X" * unit_size},
                    )
                    assert r.status_code == http.HTTPStatus.OK, r
                    written += unit_size
                c.wait_for_commit(r)

            force_become_primary(network, args, smallest_node)   # <<< Error in here
```

The first thing that `force_become_primary()` does is `network.wait_for_node_commit_sync()`, to ensure we start from an agreed state, but in this instance we didn't get there. Node 0 persistently reports commit is at `6.56`, while the other 2 know it has advanced to `7.66`.
```
12:17:18.814 | INFO     | infra.network:wait_for_node_commit_sync:1680 - [0] GET /node/commit
12:17:18.814 | INFO     | infra.network:wait_for_node_commit_sync:1680 - 200 {"transaction_id":"6.56"}
12:17:18.830 | INFO     | infra.network:wait_for_node_commit_sync:1680 - [1] GET /node/commit
12:17:18.830 | INFO     | infra.network:wait_for_node_commit_sync:1680 - 200 {"transaction_id":"7.66"}
12:17:18.846 | INFO     | infra.network:wait_for_node_commit_sync:1680 - [2] GET /node/commit
12:17:18.846 | INFO     | infra.network:wait_for_node_commit_sync:1680 - 200 {"transaction_id":"7.66"}
12:17:18.963 | INFO     | infra.network:wait_for_node_commit_sync:1680 - [0] GET /node/commit
12:17:18.963 | INFO     | infra.network:wait_for_node_commit_sync:1680 - 200 {"transaction_id":"6.56"}
12:17:18.979 | INFO     | infra.network:wait_for_node_commit_sync:1680 - [1] GET /node/commit
12:17:18.979 | INFO     | infra.network:wait_for_node_commit_sync:1680 - 200 {"transaction_id":"7.66"}
12:17:18.995 | INFO     | infra.network:wait_for_node_commit_sync:1680 - [2] GET /node/commit
12:17:18.995 | INFO     | infra.network:wait_for_node_commit_sync:1680 - 200 {"transaction_id":"7.66"}
```

During this, there is one worrying line from CTest interspersed:
https://github.com/microsoft/CCF/actions/runs/22304875267/job/64521474040#step:10:3375
```
12:17:19.263 | INFO     | infra.network:wait_for_node_commit_sync:1680 - [0] GET /node/commit
Errors while running CTest
12:17:19.263 | INFO     | infra.network:wait_for_node_commit_sync:1680 - 200 {"transaction_id":"6.56"}
```

Is that just a print/flush ordering mixup, or has something corrupted under our feet?

Anyway, the node logs don't show much more. There's a lot of election churn beforehand, but that's expected. We see the big writes arriving on Node 2. But then Node 0 seems to not here from the primary for a while? Becomes pre-vote candidate? But then becomes follower _just before we timeout_?

```
2026-02-23T12:17:15.532867Z 0   [info ] CCF/src/consensus/aft/raft.h:2249    | Becoming follower n[4a5a5afd3706f6c601a15bf8cbe612b0b69103fa45c1b282d00c16aa399ac648]: 7.56
2026-02-23T12:17:15.543530Z 0   [info ] CCF/src/consensus/aft/raft.h:1899    | Send request_pre_vote_response from n[4a5a5afd3706f6c601a15bf8cbe612b0b69103fa45c1b282d00c16aa399ac648] to n[833f127625454ddf92072473da9851f938f4765b112e97a83ba1bdca792a3c5b]: false
2026-02-23T12:17:15.543613Z 0   [info ] CCF/src/consensus/aft/raft.h:1819    | Recv request_vote to n[4a5a5afd3706f6c601a15bf8cbe612b0b69103fa45c1b282d00c16aa399ac648] from n[833f127625454ddf92072473da9851f938f4765b112e97a83ba1bdca792a3c5b]: granted vote to candidate at 6.56 with local state at 6.56
2026-02-23T12:17:15.543643Z 0   [info ] CCF/src/consensus/aft/raft.h:1884    | Send request_vote_response from n[4a5a5afd3706f6c601a15bf8cbe612b0b69103fa45c1b282d00c16aa399ac648] to n[833f127625454ddf92072473da9851f938f4765b112e97a83ba1bdca792a3c5b]: true
2026-02-23T12:17:19.496238Z 0   [info ] CCF/src/consensus/aft/raft.h:2098    | Becoming pre-vote candidate n[4a5a5afd3706f6c601a15bf8cbe612b0b69103fa45c1b282d00c16aa399ac648]: 7
2026-02-23T12:17:19.497479Z 0   [info ] CCF/src/host/socket.h:54             | TCP Node Outgoing connected
2026-02-23T12:17:19.498583Z 0   [info ] CCF/src/consensus/aft/raft.h:2010    | Recv request vote response to n[4a5a5afd3706f6c601a15bf8cbe612b0b69103fa45c1b282d00c16aa399ac648] from n[a0d8d254d76dd21f17b6ff28fcbe62e26ef9c22bbc54a0f907d8eba8b6344c60]: they voted no
2026-02-23T12:17:19.498787Z 0   [info ] CCF/src/consensus/aft/raft.h:2010    | Recv request vote response to n[4a5a5afd3706f6c601a15bf8cbe612b0b69103fa45c1b282d00c16aa399ac648] from n[833f127625454ddf92072473da9851f938f4765b112e97a83ba1bdca792a3c5b]: they voted no
2026-02-23T12:17:19.548420Z 0   [info ] CCF/src/consensus/aft/raft.h:2249    | Becoming follower n[4a5a5afd3706f6c601a15bf8cbe612b0b69103fa45c1b282d00c16aa399ac648]: 7.56
2026-02-23T12:17:23.099462Z 0   [info ] CCF/src/host/sig_term.h:39           | SIGTERM: Shutting down enclave gracefully...
```

This may be a sign that Node 2 is essentially stalled - spinning writing these huge entries to disk, and leaving a lull in consensus traffic, which we don't wait long enough for?

A plausible explanation, but also a single repro - let's wait and see if this occurs again.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failure in `partitions_test` - "Multiple commit values" during `force_become_primary` #7687

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Failure in partitions_test - "Multiple commit values" during force_become_primary #7687

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Failure in `partitions_test` - "Multiple commit values" during `force_become_primary` #7687