feat: replication 2.0 by xDarksome · Pull Request #215 · WalletConnect/wcn

xDarksome · 2025-06-10T10:58:27Z

Description

WCN 2.0 replication machinery

Is primary/secondary replica sets a new concept?

New concept - no, but we haven't used it explicitly in replication before, previously if there's a migration then both replica sets were just merged into one, which resulted in us being overly restrictive on the required amount of successful responses.
https://github.com/WalletConnectFoundation/wcn/blob/14e1e6e961678df845a5912b84751c85b0b619ed/crates/core/src/cluster.rs#L363
Here for each additional node in the second replica set (the one within new keyspace, migration), we were adding +1 required successful response.
Imagine we have primary replica set {1, 2, 3, 4, 5} and we've removed/added 5+ node operators at the same time. Then the secondary replica set can be completely different (eg. {6, 7, 8, 9, 10}.)
For the operation to be successful we would require (5/2+1) + 5 = 8 successful responses. But realistically we only need 3 from each replica set.

How Has This Been Tested?

Unit

Due Diligence

Breaking change
Requires a documentation update
Requires a e2e/integration test update

…tion/wcn into feat/replication-2.0

nopestack · 2025-07-22T16:33:41Z

+    std::net::SocketAddrV4,
+};
+
+pub fn node_keypair(operator_id: u8, node_id: u8) -> Keypair {


consider prefixing all of these with fake such that they dont show up on symbol search in some IDEs when looking for similarly named attributes and methods

They are already within the module fake and intended to be used as fake::node_keypair etc.
What use-case do you have in mind? I never had such issues with rust-analyzer.

if intended to be called this way, please add doc-comments pointing that out

This is general code-style of preferring foo::Bar over FooBar. That way if your context is narrow it's fine to import foo::Bar directly and use it as Bar.

If you don't like the naming or it looks un-intuitive to you, we can consider other options.
Another alternative I considered was testing::fake_node_keypair, which to me looked more verbose.

that proposal although verbose, seems perfect

@heilhead please pick between this 2, or propose yours

I don't particularly like the module name fake, and would prefer using testing instead. As for the prefixing/renaming the function, I don't think it's necessary, as this module would only exist in #[cfg(test)] code or behind the testing feature I assume.

Ok, I'll rename all fake modules to testing. fake::SmartContract will be testing::FakeSmartContract tho, as it's an actual "fake" by definition

heilhead · 2025-07-24T05:37:20Z

+
+                // If both replica sets reached quorum, but the responses are different, then we
+                // return `None` indicating that quorum hasn't been reached.
+                (Some(a), Some(b)) => break Some(a).filter(|_| a == b),


Why not if a == b { Some(a) } else { None }?

There was probably a more complicated chain at some point.

heilhead · 2025-07-24T05:41:20Z

+
+            // If we didn't get the quorum and this is a read operation, then try to reconcile
+            // the responses.
+            None => reconciliation::reconcile(operation, &responses[..RF])


Does this apply only to 'collection' responses or to all of them?

Same as before, only map page and cardinality, the match on the operation type is withing the fn

heilhead · 2025-07-24T05:42:50Z

+            this.execute_callback(operation.into(), ResponseChannel(tx))
+                .await
+        });
+        drop(handle);


What's the point of explicitly dropping the handle here?

Clippy complains otherwise

So doing simply tokio::spawn(...); triggers a warning? If that's the case, I'd still prefer let _ = tokio::spawn(...);.

I think I was getting a warning on let _ = as well, will check

error: non-binding `let` on a future --> crates/replication2/src/coordinator/mod.rs:183:9 | 183 | / let _ = tokio::spawn(async move { 184 | | this.execute_callback(operation.into(), ResponseChannel(tx)) 185 | | .await 186 | | }); | |___________^

not binding it at all works, I thought I had issues with it as well, weird

heilhead · 2025-07-24T05:48:03Z

+    operator: &NodeOperator<N>,
+    operation: &Operation<'_>,
+) -> storage_api::Result<operation::Output> {
+    let mut retries_left = operator.nodes.len();


Is it possible to get an empty set of nodes here under any conditions? I suppose it's not, but since you're already returning a result here, maybe make sure that it's never empty? Otherwise it may panic with an overflow on subtraction.

This is an invariant of NodeOperator and it's being checked in the respective constructor. NodeOperator::next_node can also panic, but this would be a bug.

It's worth a comment tho, I'll add one.

Also the field needs to be private, which it isn't rn. Thanks for pointing out.

heilhead · 2025-07-24T05:57:06Z

        operation: Operation<'_>,
-    ) -> impl Future<Output = Result<operation::Output>> + Send;
+    ) -> impl Future<Output = Result<operation::Output>> + Send {
+        async move { self.execute_ref(&operation).await }


So the default implementation of the trait would result in a stack overflow due to recursion? Not sure it's a good idea.

This is a neat trick I didn't know about either. Stack overflow is not possible when dealing with statically typed futures. Compiler is smart enough to throw an error forcing you to implement one of the functions.

nopestack · 2025-07-28T17:33:28Z

+        match response {
+            Ok(output) if &output != self.output => {}
+
+            // Errors are not being repaired.


is it worth logging those regardless?

We won't see anything useful there, only the ErrorKind which will be available as a metric from wcn_rpc.

nopestack · 2025-07-28T17:36:12Z

+pub(super) fn is_repairable(operation: &Operation<'_>) -> bool {
+    use operation::{Borrowed, Owned};
+
+    match operation {


maybe simplify calling matches!

matches! won't give the same variant exhaustion guarantees

nopestack · 2025-07-28T17:36:44Z

+) -> Option<operation::Output> {
+    use operation::{Borrowed, Owned};
+
+    match operation {


same as before here

WIP

da6afed

xDarksome force-pushed the feat/replication-2.0 branch from c5d36c6 to da6afed Compare July 17, 2025 16:07

WalletConnectBot and others added 2 commits July 17, 2025 16:07

Bump VERSION to 250717.1

50fdc73

Merge branch 'main' into feat/replication-2.0

46de705

xDarksome changed the title ~~feat: replication::DriverV2~~ feat: replication 2.0 Jul 17, 2025

xDarksome and others added 12 commits July 18, 2025 15:56

pre test

05cb524

merge remote

a600f28

Bump VERSION to 250718.0

529cbf3

remove NodeOperator::idx

132d696

Merge branch 'feat/replication-2.0' of github.com:WalletConnectFounda…

00bd264

…tion/wcn into feat/replication-2.0

validate NodeOperatorIdx invariant

1c69f81

fix fmt

7ced327

tests

58862f3

Bump VERSION to 250722.0

5401ce5

add replica

044c9d6

remove specific StorageApi ops

85c6efe

Merge branch 'feat/replication-2.0' of github.com:WalletConnectFounda…

9da9512

…tion/wcn into feat/replication-2.0

xDarksome marked this pull request as ready for review July 22, 2025 15:31

xDarksome requested review from heilhead and nopestack July 22, 2025 15:31

nopestack reviewed Jul 22, 2025

View reviewed changes

heilhead reviewed Jul 24, 2025

View reviewed changes

xDarksome and others added 3 commits July 24, 2025 10:17

add NodeOperator::node len comment

e5fc278

fix: make NodeOperator::nodes private

f700124

Bump VERSION to 250724.0

0850032

heilhead approved these changes Jul 24, 2025

View reviewed changes

xDarksome added 3 commits July 24, 2025 16:29

fake -> testing

9ec173f

merge main

9ca3915

fix: not bind tokio::spawn handle

a8029de

nopestack assigned xDarksome Jul 28, 2025

nopestack reviewed Jul 28, 2025

View reviewed changes

nopestack approved these changes Jul 28, 2025

View reviewed changes

xDarksome merged commit f48efef into main Jul 28, 2025
12 checks passed

xDarksome deleted the feat/replication-2.0 branch July 28, 2025 18:04

Uh oh!

Conversation

xDarksome commented Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

How Has This Been Tested?

Due Diligence

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

heilhead Jul 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

xDarksome commented Jun 10, 2025 •

edited

Loading

heilhead Jul 24, 2025 •

edited

Loading