Skip to content

feat(ethexe-consensus): two-phase compute for instant injected TX promises#5352

Closed
ukint-vs wants to merge 12 commits intomasterfrom
vs/two-phase-compute
Closed

feat(ethexe-consensus): two-phase compute for instant injected TX promises#5352
ukint-vs wants to merge 12 commits intomasterfrom
vs/two-phase-compute

Conversation

@ukint-vs
Copy link
Copy Markdown
Member

@ukint-vs ukint-vs commented Apr 16, 2026

Problem

When someone sends a transaction to a program that was just created in the same Ethereum block, the validator rejects it: "unknown destination." The program exists on-chain, but the validator validated the TX against the previous block's state, where the program didn't exist yet. The user has to wait for the next block (~12s) before retrying.

Solution: Two-Phase Compute

Instead of validating TXs against stale state, the producer now computes canonical events first to get fresh state, then selects TXs against that fresh state.

BEFORE (single phase):
  Block arrives → select TXs (stale state) → build announce → compute
                  ↑ TXs for new programs rejected here

AFTER (two phases):
  Block arrives → compute canonical events only (phase 1)
               → get fresh ProgramStates
               → select TXs against fresh state (phase 2)
               → build announce → compute full announce
                 ↑ TXs for new programs now included

Producer State Machine

         Delay (producer_delay timer)
           │
           ▼
  ┌─ ComputeCanonicalEvents ──────────┐
  │  (run canonical block events       │
  │   without TXs, ~100ms)            │
  └────────────┬───────────────────────┘
               ▼
  ┌─ ReadyForTxCollection ────────────┐
  │  (poll timer: collect TXs from    │
  │   pool using fresh ProgramStates) │
  └────────────┬───────────────────────┘
               ▼
  ┌─ WaitingAnnounceComputed ─────────┐
  │  (full announce with canonical    │
  │   events + TXs, ~400ms)          │
  │  Promises streamed to users here  │
  └────────────┬───────────────────────┘
               ▼
     AggregateBatchCommitment
               │
               ▼
          Coordinator

If a new Ethereum block arrives during any waiting state, the producer drops to Initial. TXs stay in the pool for the next block.

Subordinate Side

Subordinates receive the announce and validate TXs in accept_announce. But they validate against parent-block state (they haven't run canonical events yet). TXs targeting same-block programs would fail as "unknown destination."

Fix: accept_announce is now lenient for state-dependent conditions (UnknownDestination, UninitializedDestination, InsufficientBalance). These resolve after the subordinate computes the announce. Only structural violations still reject (Outdated, NonZeroValue, NotOnCurrentBranch). TX blobs are persisted to DB only after all acceptance checks pass (touched-programs limit, duplicate inclusion), preventing a malicious producer from forcing peers to store junk TXs.

The subordinate's process_announce now calls accept_announce directly instead of pre-checking parent inclusion. On UnknownParent rejection (gossip reordering), the announce is deferred to pending for replay on the next block.

Processor Fix

The processor previously handled all events in a flat loop (injected TXs first, then canonical events). This meant TXs targeting same-block programs would panic because the program wasn't registered yet. But simply swapping the entire order would deprioritize injected TXs over canonical messages.

Fix: split into three phases preserving injected TX priority:

// 1. Router events first: ProgramCreated, CodeValidated, ExecutableBalanceTopUp
//    These register programs and set up state — no messages enqueued.
for event in router_events { handler.handle_router_event(event)?; }

// 2. Injected TXs second: these have execution priority over canonical messages.
for tx in injected_transactions { handler.handle_injected_transaction(source, tx)?; }

// 3. Mirror events last: canonical messages enqueued after injected.
for (actor_id, event) in mirror_events { handler.handle_mirror_event(actor_id, event)?; }

Performance

TxValidityChecker uses Cow<ProgramStates> so the new_for_announce path owns states from DB while new_with_states borrows them from canonical compute, avoiding a BTreeMap clone during TX selection.

What This Replaces

PR #5321 (mini-announces) solved the same problem with depth-2 announce chains: base announce + mini-announce. That required ~1000 lines of changes across CDL counting patches, subordinate ReadyForMoreAnnounces state, gossip reorder guards, and coordinator buffering.

Two-phase compute achieves the same result with ~600 lines, no CDL patches, no subordinate state changes, and no coordinator changes. One announce per block, always.

Test plan

  • 77 ethexe-consensus tests pass (7 new)
  • 22 ethexe-compute tests pass (1 new)
  • 19 ethexe-processor tests pass (including injected_prioritized_over_canonical)
  • 0 clippy errors
  • Reviewed by /autoplan (CEO + Eng with dual Codex/Claude voices)
  • Reviewed by /review (pre-landing adversarial)
  • Codex adversarial: caught processor event ordering bug + accept_announce consensus split
  • Manual: submit injected TX targeting same-block program, verify ~400ms promise

🤖 Generated with Claude Code

ukint-vs and others added 3 commits April 16, 2026 12:38
…mises

Split announce production into two phases so TXs targeting programs created
in the current block are validated against post-canonical ProgramStates
instead of stale parent-block states. This eliminates the ~12s wait for
same-block program TXs.

Phase 1: canonical-only compute (no announce metadata writes) returns
ephemeral ProgramStates. Phase 2: TX selection against those states,
then build and gossip a single announce.

Key changes:
- New ConsensusEvent::ComputeCanonicalEvents / ComputeEvent::CanonicalEventsComputed
- ComputeSubService::compute_canonical_only (assert parent computed, skip DB metadata)
- TxValidityChecker::new_with_states + InjectedTxPool::select_for_announce_with_states
- Producer: Delay → WaitingCanonicalComputed → ReadyForTxCollection → WaitingAnnounceComputed
- accept_announce lenient for state-dependent TX validations (UnknownDestination,
  UninitializedDestination, InsufficientBalance) to prevent consensus split
- Subordinate gossip-reordering fix (defer announces with unknown parent)

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
…ew findings

- Add accept_announce leniency tests (UnknownDestination accepted, NonZeroValue rejected)
- Add compute_canonical_events test (verifies ProgramStates returned, no DB metadata writes)
- Add new_head tests for WaitingCanonicalComputed and ReadyForTxCollection states
- Fix poll_next to immediately poll newly created canonical computation future
- Document concurrent compute slots and mem::replace placeholder in producer

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
The processor's handle_injected_and_events processed injected TXs before
canonical block events. This meant TXs targeting programs created in the
same block would panic at update_state ("failed to find program in known
states") because ProgramCreated hadn't been handled yet.

Swap the order: canonical events first (establishes program state), then
injected TXs (can now reference newly created programs). Required for
two-phase compute where the producer includes TXs for same-block programs.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@semanticdiff-com
Copy link
Copy Markdown

semanticdiff-com Bot commented Apr 16, 2026

Review changes with  SemanticDiff

Changed Files
File Status
  ethexe/consensus/src/tx_validation.rs  50% smaller
  ethexe/consensus/src/validator/subordinate.rs  26% smaller
  ethexe/consensus/src/validator/producer.rs  6% smaller
  ethexe/consensus/src/announces.rs  4% smaller
  ethexe/compute/src/lib.rs  1% smaller
  ethexe/consensus/src/lib.rs  1% smaller
  ethexe/consensus/src/connect/mod.rs  1% smaller
  ethexe/compute/src/service.rs  1% smaller
  ethexe/consensus/src/validator/mod.rs  1% smaller
  ethexe/compute/src/compute.rs  1% smaller
  ethexe/consensus/src/validator/tx_pool.rs  0% smaller
  ethexe/service/src/lib.rs Unsupported file format

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a two-phase computation architecture to improve the responsiveness of injected transaction promise delivery. By decoupling canonical event computation from transaction selection, the system can now validate transactions against post-canonical states within the same block. The changes also include critical fixes for event ordering in the processor, improved leniency in announce validation, and robust handling of out-of-order gossip messages.

Highlights

  • Two-Phase Compute: Implemented a two-phase computation process where canonical events are computed first to generate ephemeral ProgramStates, followed by transaction selection against these updated states, significantly reducing promise delivery latency for injected transactions.
  • Processor Event Ordering: Fixed a critical bug in handle_injected_and_events by ensuring canonical events are processed before injected transactions, preventing panics when transactions target programs created in the same block.
  • Accept Announce Leniency: Updated accept_announce to be more lenient with state-dependent conditions (e.g., UnknownDestination), allowing transactions to be accepted if they resolve after canonical execution.
  • Gossip Reordering Fix: Modified subordinate gossip handling to defer announces with unknown parents to a pending queue instead of rejecting them, preventing permanent announce loss during network reordering.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a two-phase announce production flow in ethexe. It adds support for computing canonical events to generate ephemeral ProgramStates, which are subsequently used to validate and select injected transactions. The Producer state machine is updated to manage this asynchronous process, and transaction validation now accounts for state-dependent conditions that resolve post-canonical execution. Review feedback suggests refactoring duplicated polling logic in the compute service, clarifying test helper documentation, and optimizing performance by passing program states by reference to avoid expensive clones.

Comment thread ethexe/compute/src/compute.rs Outdated
Comment on lines +611 to +613
/// Skip the delay timer and complete two-phase flow:
/// 1. Timer fires → ComputeCanonicalEvents
/// 2. process_canonical_events_computed → PublishMessage + ComputeAnnounce
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The comment describing the two-phase flow for the skip_timer test helper is a bit misleading. It oversimplifies the flow and omits the ReadyForTxCollection state and its poll timer. A more accurate description would improve clarity for future maintenance of this test helper.

Suggested change
/// Skip the delay timer and complete two-phase flow:
/// 1. Timer fires → ComputeCanonicalEvents
/// 2. process_canonical_events_computed → PublishMessage + ComputeAnnounce
/// Skip the initial producer delay and complete the full two-phase announce production flow:
/// 1. produce_announce is triggered, emitting ComputeCanonicalEvents.
/// 2. process_canonical_events_computed is called, transitioning to ReadyForTxCollection with a poll timer.
/// 3. The poll timer fires, triggering build_announce_with_states, which emits PublishMessage and ComputeAnnounce.

Comment thread ethexe/consensus/src/validator/tx_pool.rs Outdated
@ukint-vs ukint-vs added A0-pleasereview PR is ready to be reviewed by the team C1-feature Feature request labels Apr 16, 2026
@ukint-vs
Copy link
Copy Markdown
Member Author

@claude opus review

@claude
Copy link
Copy Markdown

claude Bot commented Apr 16, 2026

Claude finished @ukint-vs's task in 4m 22s —— View job


Review: Two-Phase Compute for Instant Injected TX Promises

  • Read review guidelines (.gemini/styleguide.md)
  • Read changed files and understand architecture
  • Analyze correctness of two-phase compute flow
  • Analyze subordinate leniency changes
  • Analyze processor event ordering fix
  • Check test coverage
  • Post review findings

Summary

This PR replaces the proposed depth-2 mini-announce approach (PR #5321) with a simpler two-phase compute model. Phase 1 runs canonical block events through the processor to get fresh ProgramStates, then phase 2 validates and selects injected TXs against those states before building the announce. This allows same-block program creation + TX targeting to work within a single Ethereum block (~12s), reducing the "unknown destination" rejection problem to zero additional latency. The processor event ordering fix (canonical events before injected TXs) is the critical correctness change that makes the whole thing work at runtime.

The design is clean, well-tested, and significantly simpler than the alternative. The PR adds 571 lines across 13 files with 6 new tests. I found one medium-severity concern and two low-severity observations.


Findings

severity: medium — Subordinate gossip deferral has no replay mechanism within the same block

File: ethexe/consensus/src/validator/subordinate.rs lines 97-107

When the subordinate receives the producer's announce but the parent isn't included yet, it defers to pending_events. However, there's no active retry — pending events are only consumed during state creation (Subordinate::create, Participant::create). If the subordinate stays in WaitingForAnnounce for the same block and the parent gets included later (e.g., via db-sync), the deferred announce won't be retried until the next block forces a state transition.

if !self.ctx.core.db.is_announce_included(verified_announce.data().parent) {
    tracing::trace!("Announce parent not yet included, deferring to pending");
    self.ctx.pending(verified_announce);
    return Ok(self.into());  // stays in WaitingForAnnounce, no retry
}

In practice this is still better than the previous behavior (permanent rejection), and the producer will likely re-gossip or the next block will trigger a state transition. But it means the subordinate can miss participating in validation for this block — the window depends on gossip timing. Consider whether an AnnouncesRequest should be emitted to actively fetch the missing parent, similar to how ConnectService handles missing announces.

Fix this →


severity: lowProgramStates cloned in select_for_announce_with_states

File: ethexe/consensus/src/validator/tx_pool.rs line 100

program_states.clone() copies the entire BTreeMap<ActorId, StateHashWithQueueSize> to construct TxValidityChecker. The checker is used immediately and discarded — borrowing would avoid the allocation. This matches Gemini's earlier observation. Not a correctness issue, but on blocks with many programs this clone adds unnecessary overhead during time-sensitive block production.

let tx_checker = TxValidityChecker::new_with_states(
    self.db.clone(),
    block,
    parent_announce,
    program_states.clone(),  // could borrow instead
)?;

severity: low — TX collection poll timer reuses producer_delay duration

File: ethexe/consensus/src/validator/producer.rs lines 160-161

The ReadyForTxCollection poll timer uses self.ctx.core.producer_delay — the same duration as the initial delay timer. This means phase 2 (TX collection window) has the same duration as the initial delay, which may not be optimal. If the initial delay is already tuned for block arrival latency, the TX collection window might benefit from a shorter timeout since canonical events are already computed at that point.

let mut poll_timer = Timer::new("tx-collection poll", self.ctx.core.producer_delay);

This is fine for now if producer_delay is short (the PR description mentions ~400ms total flow), but it's worth noting as something to tune separately if the delays diverge in the future.


What looks good

  • Processor event ordering fix (ethexe/processor/src/lib.rs): Processing canonical events before injected TXs is the correct and necessary fix. Without this, handle_injected_transaction would panic on programs created in the same block. Clean and minimal change.

  • compute_canonical_only design (ethexe/compute/src/compute.rs): Building a synthetic announce with empty TXs, running it through the same prepare_executable_for_announce pipeline, and returning ephemeral ProgramStates without DB metadata writes is elegant. CAS writes being idempotent means the full announce computation later is safe.

  • Accept-announce leniency (ethexe/consensus/src/announces.rs): The classification of TX validity into state-dependent (lenient) vs. structural (strict) is correct. UnknownDestination, UninitializedDestination, and InsufficientBalance all resolve after canonical execution. NonZeroValue, Outdated, NotOnCurrentBranch, and Duplicate are structural and correctly remain rejection causes.

  • State machine transitions: The new WaitingCanonicalComputed and ReadyForTxCollection states integrate cleanly into the existing producer FSM. New-head during any of these states correctly drops to Initial via DefaultProcessing::new_head. The canonical compute result arriving for a stale block is handled via DefaultProcessing::canonical_events_computed (warning log).

  • Test coverage: 5 new consensus tests + 1 compute test cover the key paths: two-phase happy path (skip_timer), new-head interruption during both new phases, leniency acceptance, structural rejection, and canonical-only compute. Good coverage of the new state transitions.


ukint-vs and others added 6 commits April 16, 2026 13:29
- compute_canonical_only: replace hard assertion on parent computed with
  predecessor computation loop. On fast chains, the parent announce may
  still be computing when the producer starts. Errors would crash the
  service via the `?` propagation in the event loop.

- accept_announce: defer set_injected_transaction until after all acceptance
  checks pass (touched-programs limit, duplicate inclusion). Prevents a
  malicious producer from forcing peers to persist junk TX blobs via
  announces that will be rejected by later checks.

- subordinate: inline accept_announce call in process_announce instead of
  pre-checking is_announce_included. On UnknownParent rejection, defer to
  pending (gossip reordering). Other rejections handled normally. Cleaner
  separation: the defer guard now uses the same acceptance path as the
  happy path.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
…essages

The previous fix swapped the entire event ordering (canonical before
injected), which changed execution priority. Injected TXs should be
processed before canonical messages.

New ordering: Router events first (ProgramCreated, CodeValidated —
registers programs), then injected TXs (preserving priority), then
Mirror events (canonical messages). This fixes UnknownDestination
panics for same-block programs without deprioritizing injected TXs.

Also refactors duplicated canonical poll logic in compute.rs per
Gemini review feedback (loop instead of duplicated if-blocks).

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Use Cow<ProgramStates> so new_for_announce owns the states (from DB)
while new_with_states borrows them (from canonical compute). Avoids
cloning the entire BTreeMap during TX selection.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Method is used by tests and accept_announce but not by the producer
directly (producer uses select_for_announce_with_states instead).

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@ukint-vs ukint-vs added A0-pleasereview PR is ready to be reviewed by the team and removed A0-pleasereview PR is ready to be reviewed by the team labels Apr 16, 2026
ukint-vs and others added 3 commits April 16, 2026 14:21
`anthropics/claude-code-action@v1` doesn't have a `model` input.
Pass `--model claude-opus-4-6` through `claude_args` instead.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Reverts the processor event ordering changes (commits 35a66d6, 49d58e1).

Programs only initialize via Ethereum canonical events. Injected TXs
targeting same-block programs will correctly fail regardless of
registration order because the program isn't initialized until
process_queues executes the init message (after handle_injected_and_events).

The two-phase compute is still useful for TXs targeting programs that
existed before this block but whose state changed (balance top-up, etc.).
The processor doesn't need to change for that case.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@ukint-vs
Copy link
Copy Markdown
Member Author

Closing: two-phase compute doesn't reduce promise latency (the actual goal). The real win comes from mini-announces (#5321) which poll the TX pool after base announce computes, reducing the 0-12s wait to ~200ms. Reopening #5321.

@ukint-vs ukint-vs closed this Apr 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A0-pleasereview PR is ready to be reviewed by the team C1-feature Feature request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant