Goal
Replace the heterogeneous shutdown patterns across zaino-state, zaino-serve, and zainod with a single primitive: a root CancellationToken owned by the indexer, with child_token() clones distributed to each long-running task. SIGINT/SIGTERM cancels the root; every leaf observes cancellation immediately, no polling.
Motivation
After #1033 / #1049 landed, tokio_util::sync::CancellationToken is in use at one site — DbLifecycle::shutdown. The remaining shutdown sites still use a status-polling pattern with 100 ms intervals plus JoinHandle::abort(). The polling is correct but introduces a latency floor (50–500 ms typical) and forfeits graceful drain when abort() is the only effective interrupt.
Target architecture
zainod root CancellationToken
├── child: jsonrpc server token (Site 2 sub-issue)
├── child: grpc server token (Site 2 sub-issue)
├── child: mempool token (Site 1 sub-issue)
├── child: state service token (no-op — task is foreign, only abort()able)
├── child: fetch service token (no-op — shutdown delegates downward)
└── child: chain_index / finalised_state token (already CT, via #1049)
Cancelling the root propagates to every leaf. child_token().cancel() does NOT propagate up — contains worker-bug blast radius per service.
Sub-issues
Explicit non-goals
backends/state.rs — the long-running task is owned by zebra_state upstream. We can only abort(); no CT plumbing is possible without upstream changes.
backends/fetch.rs — shutdown delegates to the underlying gRPC client; no local task to cancel.
Acceptance criteria
- Supervisor loop in
zainod/indexer.rs selects on root.cancelled() instead of polling.
- Every long-running task we own accepts a
CancellationToken (directly or via trait getter) and selects against .cancelled().
Notify / status-polling shutdown patterns are absent from sources after this lands. JoinHandle::abort() survives only as the Drop-time backstop.
cargo nextest run passes; zainod shutdown latency under SIGINT measurably lower (target: <50 ms from signal to last task exit).
Related
Goal
Replace the heterogeneous shutdown patterns across
zaino-state,zaino-serve, andzainodwith a single primitive: a rootCancellationTokenowned by the indexer, withchild_token()clones distributed to each long-running task. SIGINT/SIGTERM cancels the root; every leaf observes cancellation immediately, no polling.Motivation
After #1033 / #1049 landed,
tokio_util::sync::CancellationTokenis in use at one site —DbLifecycle::shutdown. The remaining shutdown sites still use a status-polling pattern with 100 ms intervals plusJoinHandle::abort(). The polling is correct but introduces a latency floor (50–500 ms typical) and forfeits graceful drain whenabort()is the only effective interrupt.Target architecture
Cancelling the root propagates to every leaf.
child_token().cancel()does NOT propagate up — contains worker-bug blast radius per service.Sub-issues
zainod/indexer.rsholds the root token; SIGINT/SIGTERM handlers callroot.cancel(); supervisor loop selects onroot.cancelled()Explicit non-goals
backends/state.rs— the long-running task is owned byzebra_stateupstream. We can onlyabort(); no CT plumbing is possible without upstream changes.backends/fetch.rs— shutdown delegates to the underlying gRPC client; no local task to cancel.Acceptance criteria
zainod/indexer.rsselects onroot.cancelled()instead of polling.CancellationToken(directly or via trait getter) and selects against.cancelled().Notify/ status-polling shutdown patterns are absent from sources after this lands.JoinHandle::abort()survives only as theDrop-time backstop.cargo nextest runpasses;zainodshutdown latency under SIGINT measurably lower (target: <50 ms from signal to last task exit).Related