Skip to content

Batch cross-chain updates and confirmations.#5947

Open
afck wants to merge 9 commits intolinera-io:mainfrom
afck:xchain-batch2
Open

Batch cross-chain updates and confirmations.#5947
afck wants to merge 9 commits intolinera-io:mainfrom
afck:xchain-batch2

Conversation

@afck
Copy link
Copy Markdown
Contributor

@afck afck commented Apr 8, 2026

Motivation

In principle, cross-chain updates from different sender chains and cross-chain confirmations from different receivers could all be processed in parallel.

Proposal

As a first step, batch them and make only one save() call at the end of each batch. This should reduce the number of sequential write locks and sequential DB write batches.

Later we could further optimize this and also do the read operations concurrently.

Test Plan

CI should catch regressions; ideally we should measure whether this improves performance.

Release Plan

  • Backport to testnet_conway.
  • Release SDK.
  • Hotfix validators.

Links

afck and others added 8 commits April 8, 2026 11:11
Instead of acquiring the write lock individually for each cross-chain
update or confirmation, requests are now enqueued in a per-chain mpsc
channel. A cooperatively-polled shared future drains the channel,
acquires the write lock once, and processes all pending requests in a
single batch with one save() call. This reduces lock contention and
storage overhead during periods of high cross-chain traffic.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Deduplicate the identical loop in `process_cross_chain_update` and
`confirm_updated_recipient` into a generic `enqueue_and_drive<R>` helper.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add a dedicated WorkerError::BatchRolledBack variant so that callers
whose requests were rolled back (due to another request in the batch
failing or a save failure) receive a clear retriable error instead of
a misleading success.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@afck afck requested review from bart-linera, deuszx and ma2bd April 8, 2026 15:17
BatchFuture only requires Send on non-web targets. Replace
tokio::select! (unavailable on wasm) with futures::future::select.
Suppress arc_with_non_send_sync lint at the construction site since
Arc is intentional for WorkerState sharing and wasm is single-threaded.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@afck afck requested a review from Twey April 8, 2026 16:25
Comment thread linera-core/src/worker.rs
/// periodically removes dead entries.
chain_workers: ChainWorkerMap<StorageClient>,
/// Per-chain batch processing state for cross-chain requests.
chain_batches: ChainBatchMap,
Copy link
Copy Markdown
Contributor

@ma2bd ma2bd Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What prevented chain batches from being managed by their respective chain workers?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We collect the requests while we're waiting to acquire the write lock for the chain worker state, so at least the sender needs to be outside the worker state.

And after putting the request in the channel, we wait for the return value and at the same time drive the shared future that is trying to acquire the write lock and process the batch; so that shared future needs to be accessible without already having the lock, too.

Comment thread linera-core/src/worker.rs
Comment on lines +551 to +554
#[cfg(not(web))]
type BatchFuture = pin::Pin<Box<dyn Future<Output = ()> + Send>>;
#[cfg(web)]
type BatchFuture = pin::Pin<Box<dyn Future<Output = ()>>>;
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Twey: I think you mentioned a better way to do this at some point? Sorry I forgot the details!

Also below: Not sure if I can get away without #[allow(clippy::arc_with_non_send_sync)]?

Copy link
Copy Markdown
Contributor

@bart-linera bart-linera left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment thread linera-core/src/worker.rs
Comment on lines +576 to +578
while let Ok(req) = receiver.try_recv() {
requests.push(req);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if adding a metric here so that we'll be able to see if we actually get batches of more than one request at a time would be useful?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants