feat: replication 2.0#215
Conversation
c5d36c6 to
da6afed
Compare
…tion/wcn into feat/replication-2.0
…tion/wcn into feat/replication-2.0
| std::net::SocketAddrV4, | ||
| }; | ||
|
|
||
| pub fn node_keypair(operator_id: u8, node_id: u8) -> Keypair { |
There was a problem hiding this comment.
consider prefixing all of these with fake such that they dont show up on symbol search in some IDEs when looking for similarly named attributes and methods
There was a problem hiding this comment.
They are already within the module fake and intended to be used as fake::node_keypair etc.
What use-case do you have in mind? I never had such issues with rust-analyzer.
There was a problem hiding this comment.
if intended to be called this way, please add doc-comments pointing that out
There was a problem hiding this comment.
This is general code-style of preferring foo::Bar over FooBar. That way if your context is narrow it's fine to import foo::Bar directly and use it as Bar.
If you don't like the naming or it looks un-intuitive to you, we can consider other options.
Another alternative I considered was testing::fake_node_keypair, which to me looked more verbose.
There was a problem hiding this comment.
that proposal although verbose, seems perfect
There was a problem hiding this comment.
@heilhead please pick between this 2, or propose yours
There was a problem hiding this comment.
I don't particularly like the module name fake, and would prefer using testing instead. As for the prefixing/renaming the function, I don't think it's necessary, as this module would only exist in #[cfg(test)] code or behind the testing feature I assume.
There was a problem hiding this comment.
Ok, I'll rename all fake modules to testing. fake::SmartContract will be testing::FakeSmartContract tho, as it's an actual "fake" by definition
|
|
||
| // If both replica sets reached quorum, but the responses are different, then we | ||
| // return `None` indicating that quorum hasn't been reached. | ||
| (Some(a), Some(b)) => break Some(a).filter(|_| a == b), |
There was a problem hiding this comment.
Why not if a == b { Some(a) } else { None }?
There was a problem hiding this comment.
There was probably a more complicated chain at some point.
|
|
||
| // If we didn't get the quorum and this is a read operation, then try to reconcile | ||
| // the responses. | ||
| None => reconciliation::reconcile(operation, &responses[..RF]) |
There was a problem hiding this comment.
Does this apply only to 'collection' responses or to all of them?
There was a problem hiding this comment.
Same as before, only map page and cardinality, the match on the operation type is withing the fn
| this.execute_callback(operation.into(), ResponseChannel(tx)) | ||
| .await | ||
| }); | ||
| drop(handle); |
There was a problem hiding this comment.
What's the point of explicitly dropping the handle here?
There was a problem hiding this comment.
Clippy complains otherwise
There was a problem hiding this comment.
So doing simply tokio::spawn(...); triggers a warning? If that's the case, I'd still prefer let _ = tokio::spawn(...);.
There was a problem hiding this comment.
I think I was getting a warning on let _ = as well, will check
There was a problem hiding this comment.
error: non-binding `let` on a future
--> crates/replication2/src/coordinator/mod.rs:183:9
|
183 | / let _ = tokio::spawn(async move {
184 | | this.execute_callback(operation.into(), ResponseChannel(tx))
185 | | .await
186 | | });
| |___________^
There was a problem hiding this comment.
not binding it at all works, I thought I had issues with it as well, weird
| operator: &NodeOperator<N>, | ||
| operation: &Operation<'_>, | ||
| ) -> storage_api::Result<operation::Output> { | ||
| let mut retries_left = operator.nodes.len(); |
There was a problem hiding this comment.
Is it possible to get an empty set of nodes here under any conditions? I suppose it's not, but since you're already returning a result here, maybe make sure that it's never empty? Otherwise it may panic with an overflow on subtraction.
There was a problem hiding this comment.
This is an invariant of NodeOperator and it's being checked in the respective constructor. NodeOperator::next_node can also panic, but this would be a bug.
It's worth a comment tho, I'll add one.
There was a problem hiding this comment.
Also the field needs to be private, which it isn't rn. Thanks for pointing out.
| operation: Operation<'_>, | ||
| ) -> impl Future<Output = Result<operation::Output>> + Send; | ||
| ) -> impl Future<Output = Result<operation::Output>> + Send { | ||
| async move { self.execute_ref(&operation).await } |
There was a problem hiding this comment.
So the default implementation of the trait would result in a stack overflow due to recursion? Not sure it's a good idea.
There was a problem hiding this comment.
This is a neat trick I didn't know about either. Stack overflow is not possible when dealing with statically typed futures. Compiler is smart enough to throw an error forcing you to implement one of the functions.
| match response { | ||
| Ok(output) if &output != self.output => {} | ||
|
|
||
| // Errors are not being repaired. |
There was a problem hiding this comment.
is it worth logging those regardless?
There was a problem hiding this comment.
We won't see anything useful there, only the ErrorKind which will be available as a metric from wcn_rpc.
| pub(super) fn is_repairable(operation: &Operation<'_>) -> bool { | ||
| use operation::{Borrowed, Owned}; | ||
|
|
||
| match operation { |
There was a problem hiding this comment.
matches! won't give the same variant exhaustion guarantees
| ) -> Option<operation::Output> { | ||
| use operation::{Borrowed, Owned}; | ||
|
|
||
| match operation { |
Description
WCN 2.0 replication machinery
New concept - no, but we haven't used it explicitly in replication before, previously if there's a migration then both replica sets were just merged into one, which resulted in us being overly restrictive on the required amount of successful responses.
https://github.com/WalletConnectFoundation/wcn/blob/14e1e6e961678df845a5912b84751c85b0b619ed/crates/core/src/cluster.rs#L363
Here for each additional node in the second replica set (the one within new keyspace, migration), we were adding +1 required successful response.
Imagine we have primary replica set {1, 2, 3, 4, 5} and we've removed/added 5+ node operators at the same time. Then the secondary replica set can be completely different (eg. {6, 7, 8, 9, 10}.)
For the operation to be successful we would require (5/2+1) + 5 = 8 successful responses. But realistically we only need 3 from each replica set.
How Has This Been Tested?
Unit
Due Diligence