Batch cross-chain updates and confirmations.#5947
Batch cross-chain updates and confirmations.#5947afck wants to merge 9 commits intolinera-io:mainfrom
Conversation
Instead of acquiring the write lock individually for each cross-chain update or confirmation, requests are now enqueued in a per-chain mpsc channel. A cooperatively-polled shared future drains the channel, acquires the write lock once, and processes all pending requests in a single batch with one save() call. This reduces lock contention and storage overhead during periods of high cross-chain traffic. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Deduplicate the identical loop in `process_cross_chain_update` and `confirm_updated_recipient` into a generic `enqueue_and_drive<R>` helper. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add a dedicated WorkerError::BatchRolledBack variant so that callers whose requests were rolled back (due to another request in the batch failing or a save failure) receive a clear retriable error instead of a misleading success. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
BatchFuture only requires Send on non-web targets. Replace tokio::select! (unavailable on wasm) with futures::future::select. Suppress arc_with_non_send_sync lint at the construction site since Arc is intentional for WorkerState sharing and wasm is single-threaded. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| /// periodically removes dead entries. | ||
| chain_workers: ChainWorkerMap<StorageClient>, | ||
| /// Per-chain batch processing state for cross-chain requests. | ||
| chain_batches: ChainBatchMap, |
There was a problem hiding this comment.
What prevented chain batches from being managed by their respective chain workers?
There was a problem hiding this comment.
We collect the requests while we're waiting to acquire the write lock for the chain worker state, so at least the sender needs to be outside the worker state.
And after putting the request in the channel, we wait for the return value and at the same time drive the shared future that is trying to acquire the write lock and process the batch; so that shared future needs to be accessible without already having the lock, too.
| #[cfg(not(web))] | ||
| type BatchFuture = pin::Pin<Box<dyn Future<Output = ()> + Send>>; | ||
| #[cfg(web)] | ||
| type BatchFuture = pin::Pin<Box<dyn Future<Output = ()>>>; |
There was a problem hiding this comment.
@Twey: I think you mentioned a better way to do this at some point? Sorry I forgot the details!
Also below: Not sure if I can get away without #[allow(clippy::arc_with_non_send_sync)]?
| while let Ok(req) = receiver.try_recv() { | ||
| requests.push(req); | ||
| } |
There was a problem hiding this comment.
I wonder if adding a metric here so that we'll be able to see if we actually get batches of more than one request at a time would be useful?
Motivation
In principle, cross-chain updates from different sender chains and cross-chain confirmations from different receivers could all be processed in parallel.
Proposal
As a first step, batch them and make only one
save()call at the end of each batch. This should reduce the number of sequential write locks and sequential DB write batches.Later we could further optimize this and also do the read operations concurrently.
Test Plan
CI should catch regressions; ideally we should measure whether this improves performance.
Release Plan
testnet_conway.Links