Skip to content

feat(ethexe): Producer provides only promises hashes, instead of its full body#5377

Open
ecol-master wants to merge 11 commits intomasterfrom
kd/reopen/producer-distribute-promise-hashes
Open

feat(ethexe): Producer provides only promises hashes, instead of its full body#5377
ecol-master wants to merge 11 commits intomasterfrom
kd/reopen/producer-distribute-promise-hashes

Conversation

@ecol-master
Copy link
Copy Markdown
Member

@gear-tech/dev

@ecol-master ecol-master self-assigned this Apr 23, 2026
@ecol-master ecol-master added A1-inprogress Issue is in progress or PR draft is not ready to be reviewed B2-breaking-apis A breaking change of which all stakeholders must be warned D7-performance Increase our node/runtime/programs execution work performance D8-ethexe ethexe-related PR labels Apr 23, 2026
@gear-tech gear-tech deleted a comment from semanticdiff-com Bot Apr 23, 2026
@ecol-master ecol-master force-pushed the kd/reopen/producer-distribute-promise-hashes branch from bbd6434 to 85cfa8a Compare April 24, 2026 12:21
@ecol-master ecol-master marked this pull request as ready for review April 24, 2026 16:04
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request optimizes the promise handling mechanism within the ethexe node by introducing a compact representation for promises. By transferring only hashes instead of full promise bodies, the network load is significantly reduced. Additionally, the RPC API for injected transactions has been refactored to support this new flow, ensuring efficient promise subscription and relaying. These changes also include updates to the node configuration to support different promise emission modes.

Highlights

  • Promise Optimization: Introduced a compact promise representation to reduce network overhead by transferring only hashes instead of full promise bodies.
  • RPC API Refactoring: Refactored the injected transaction RPC API to improve promise handling, including a new subscription manager and relay mechanism.
  • Configuration Updates: Added a new promise emission mode configuration to allow flexible promise handling based on node roles.
  • Dependency Management: Updated several dependencies and added 'bon' for builder pattern support.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the promise handling logic in ethexe by introducing a compact promise system to optimize network data transfer. It adds SignedCompactPromise, a PromiseSubscriptionManager for RPC subscriptions, and a BoundPromiseSink to link promises with their originating announces. The changes span the compute, consensus, network, and service layers, including the addition of a PromiseEmissionMode configuration. Feedback highlights a need for immediate error propagation when promise restoration fails to prevent client timeouts and suggests standardizing on sha3 hashing to align with repository conventions.

Comment thread ethexe/rpc/src/apis/injected/promise_manager.rs
Comment thread core/src/rpc.rs
@ecol-master ecol-master added A0-pleasereview PR is ready to be reviewed by the team and removed A1-inprogress Issue is in progress or PR draft is not ready to be reviewed labels Apr 26, 2026
@ecol-master
Copy link
Copy Markdown
Member Author

@claude review

@claude
Copy link
Copy Markdown

claude Bot commented Apr 27, 2026

Claude finished @ecol-master's task in 4m 17s —— View job


Code Review

This PR reduces network bandwidth by having the producer gossip only hashes of promises (SignedCompactPromise) rather than full promise bodies, and reconstructs the full SignedPromise locally once computation finishes. It also restructures the injected-transaction RPC into distinct server, relay, spawner, and promise_manager modules. The core idea is sound.


severity: high — Compact promise with wrong reply_hash silently leaves subscriber in permanent limbo

File: ethexe/rpc/src/apis/injected/promise_manager.rs lines 112–135

In on_compact_promise, when compact.restore(promise) fails (line 117), the code logs a warning and then inserts the compact promise into waiting_for_compute (line 127). But the promise is already in the DB, so on_computed_promise will never be called again for this tx_hash. The subscriber is left waiting until it times out (20 × slot_duration, ~40 s for 2 s slots) with no error notification.

The only way restore fails here is if the reply_hash in the compact promise doesn't match what was locally computed. That mismatch is permanent for a given tx_hash — deterministic computation will produce the same reply every time. Inserting into waiting_for_compute is useless; the stale entry will only be cleaned up when a new (also-broken) compact arrives for the same tx, or never.

// promise_manager.rs line 117
match compact.restore(promise) {
    Ok(signed_promise) => {}
    Err(err) => {
        warn!();
        self.waiting_for_compute.insert(tx_hash, compact); // ← never useful here
    }
}

The subscriber's oneshot sender should be removed from the map and dropped here (which closes the receiver), so the spawner task wakes up and exits cleanly rather than timing out:

Err(err) => {
    warn!();
    self.cancel_registration(tx_hash); // signal timeout to subscriber now
}

severity: high — Compact promise signature is never cryptographically verified before reaching PromiseSubscriptionManager

File: ethexe/network/src/validator/topic.rs lines 290–304

inner_verify_promise only checks that the sender address is in the validator snapshot — it does not verify the ECDSA signature of the compact promise. This means any node that knows a validator's address can craft a SignedCompactPromise with an arbitrary reply_hash and a garbage signature; the network layer will accept it and forward it to on_compact_promise. The mismatch is only detected at restore time (see issue above), at which point the subscriber has already been registered.

The TODO in promise_manager.rs:32-33 acknowledges that pre-validation is missing, but defers it to the network layer — which currently doesn't do it. Signature verification of compact promises at the gossip level should be added to inner_verify_promise (analogous to how VerifiedValidatorMessage carries a verified signature).


severity: mediumon_computed_promise silently discards mismatched compact, leaves subscriber to time out

File: ethexe/rpc/src/apis/injected/promise_manager.rs lines 141–151

When a compact promise arrived first (stored in waiting_for_compute) and computation now delivers a Promise with a different reply_hash, restore fails at line 142 and is only traced. The subscriber registered for this tx_hash will wait the full timeout with no signal:

Err(_err) => {
    trace!(); // ← subscriber silently hangs
}

Same fix as issue 1 — drop the sender so the subscriber task can exit cleanly.


severity: mediumwaiting_for_compute can accumulate stale entries indefinitely

File: ethexe/rpc/src/apis/injected/promise_manager.rs lines 36, 44

waiting_for_compute: DashMap<_, SignedCompactPromise> has no TTL or eviction policy. Compact promises for transactions that are never computed (e.g., from a superseded announce, or for transactions in abandoned eras) will remain in the map forever. The TODO at line 34 notes moka::sync::Cache as a potential fix. As-is, this is a bounded memory leak proportional to the number of compact promises received for un-computed transactions over the node's lifetime.


severity: mediumverify_promise never penalises peers for invalid compact promises

File: ethexe/network/src/validator/topic.rs lines 306–319

verify_promise returns MessageAcceptance::Ignore (not Reject) for all error cases, including unknown validators. This means peers that gossip invalid compact promises are never penalised via peer_score.invalid_data(source). The existing FIXME comment acknowledges the era-transition case, but the pattern allows any validator-impersonation attempt to go unscored.


severity: medium — Panic rather than graceful shutdown when PublishPromise emitted with neither network nor RPC

File: ethexe/service/src/lib.rs lines 744–745

if rpc.is_none() && network.is_none() {
    panic!("Promise without network or rpc");
}

A misconfigured node hits an unrecoverable panic in the main event loop. bail! (or returning an Err) would give a cleaner, logged shutdown consistent with how other service errors are handled.


severity: lowget_transaction_promise silently swallows a data integrity failure

File: ethexe/rpc/src/apis/injected/server.rs lines 169–179

When compact.restore(promise) fails for a tx that has both a stored Promise and a stored CompactPromise, the RPC returns Ok(None). This hides a DB inconsistency (the stored compact and stored promise don't agree on reply_hash). At minimum this deserves a warn!; the caller has no way to distinguish "not yet promised" from "stored data is corrupted".


severity: lowToDigest and to_hash on InjectedTransaction use different algorithms

File: ethexe/common/src/injected.rs lines 92–123

InjectedTransaction::to_hash() (used as the DB key and Promise::tx_hash) uses blake2b (gear_core::utils::hash), while InjectedTransaction::update_hasher (called when signing) uses Keccak256. Both hash the same fields. The doc comment on the struct correctly explains this ("IMPORTANT: message id == tx hash == blake2b256 hash"), and Promise::reply_hash doc says "blake2b hash" — so the design is intentional and consistent. However, a new reader of on_compact_promise who sees compact.data().tx_hash used as a DB key may not immediately realise the key is blake2b while the signing digest is Keccak256. A cross-reference comment between to_hash and update_hasher would prevent confusion.


Summary

The protocol change itself is solid — the CompactPromise / SignedCompactPromise split correctly relies on the invariant that Promise and CompactPromise share the same signing digest (enforced by the test promise_hashes_digest_equal_to_promise_digest). The main risks are around what happens when that invariant is violated in the network: there is currently no ECDSA signature check at the gossip layer for compact promises, and both on_compact_promise and on_computed_promise leave subscribers silently hanging when a mismatch is detected rather than failing fast.

false,
"Connect node received the promise for signing, this should never happen"
);
// Nothing to do.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

append trace!()

Comment on lines +265 to +275
impl Default for ProtocolTimelines {
fn default() -> Self {
Self {
genesis_ts: 0,
era: NonZeroU64::new(10_000).unwrap(),
election: 200,
slot: NonZeroU64::new(2).unwrap(),
}
}
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this, in tests ProtocolTimelines::mock() can be used

Comment on lines +121 to +124
warn!(
?compact, %tx_hash, error=?err, "failed to create signed promise from parts, producer send invalid signature: compact_promise={compact:?}"
);
self.waiting_for_compute.insert(tx_hash, compact);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like any validator can create infinite number of signed random hashes and overflow rpc heap memory

Comment on lines +88 to +100
pub fn try_register_subscriber(
&self,
tx_hash: HashOf<InjectedTransaction>,
) -> Result<PendingSubscriber, RegisterSubscriberError> {
match self.subscribers.entry(tx_hash) {
Entry::Occupied(_) => Err(RegisterSubscriberError::AlreadyRegistered(tx_hash)),
Entry::Vacant(entry) => {
let (sender, receiver) = oneshot::channel();
entry.insert(sender);
Ok(PendingSubscriber::new(&self.db, tx_hash, receiver))
}
}
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a bug now, but can lead to the bugs in future - tx can be already computed and with promise inside, so rpc would never reply to user, so better to check that here.

Comment on lines +138 to +142
if let Some((_, compact_promise)) = self.waiting_for_compute.remove(&promise.tx_hash) {
match compact_promise.restore(promise) {
Ok(signed_promise) => {
self.db.set_compact_promise(&compact_promise);
self.dispatch_promise(signed_promise);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be discussed with @breathx - whether it's ok to overwrite promise signature by any other validator? Looks like we are not checking that it's validator also

pub enum PromiseEmissionMode {
/// Node should always emit promises during announces execution.
/// Always set [`PromisePolicy::Enabled`].
AlwaysEmit,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better to append at least one test in compute service for this mode

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A0-pleasereview PR is ready to be reviewed by the team B2-breaking-apis A breaking change of which all stakeholders must be warned D7-performance Increase our node/runtime/programs execution work performance D8-ethexe ethexe-related PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants