-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(peer-store): Remove addresses from peer store on dial failure #5926
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Daniel! Overall LGTM, left a comment.
misc/peer-store/src/memory_store.rs
Outdated
// Remove all attempted addresses. | ||
let mut is_record_updated = false; | ||
for (addr, _) in errors { | ||
is_record_updated |= self.remove_address_silent(&peer, addr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure we should remove the addresses here,
a Transport
error may be for io reasons, which are not deterministic and may not be related with the address itself.
Therefore the addresses may still be correct right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, yeah. But it would also be unfortunate to have perpetually undialable peers in here, especially as we have no TTL mechanism any more, only one LRU cache per peer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm good point.
But in identify we also remove all cached addresses upon a transport error, and as @dknopik already mentioned there is otherwise no way to remove an actually unreachable address.
Maybe we could differentiate between the explicitly added addresses and the ones that we learned through other behaviors with NewExternalAddrOfPeer
? The former may be ones that are likely to be known and trusted, and thus could be persistent even after an TransportError, while the latter can contain unreachable addresses that are removed on a dial failure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. But then I would allow the user to add addresses that are also removable. I'll push a draft.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I am not sure if I like it, it makes the API less clean IMO.
Is it really a problem if we remove a temporarily undialable address? For example, in my use-case (Anchor), such an address would be re-added by discovery at some point. So these addresses are not permanently lost.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could send those through an event, but that would make my code more complex (but now that I think about it, combined with having more granular event variants as you suggested, it might make sense to switch to that anyway, making my point irrelevant 🤔).
I am not sure I understand that. The granular event types are just for the output events, how would that change anything about adding addresses?
But still, there might be some conceivable cases, where users want to add "non-trusted" addresses manually.
Well technically one could still do that through the normal Swarm::add_address
, that maps to NewExternalAddrOfPeer
😄.
Which makes me think: we currently have two different ways of adding new addresses: Swarm::add_address
and Behavior::update_peer
.
In RequestResponse
we deprecated the behavior-level methods for adding and removing addresses, in favor of doing that through the Swarm
. If we want to be consistent, shouldn't we also drop the public methods on the peer_store::Behavior
? Is there any reason why one would not want to just use Swarm::add_address
?
For removing the address again we could add an analogous method Swarm::remove_address
(sorry for opening yet another orthogonal discussion - can be a separate PR as well). Of course then we'd not have the "explicit/ permanent" addresses anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I was not aware of Swarm::add_address
at all, good to know :) In that case my first paragraph makes no sense at all, so please disregard it.
Yeah, your proposal of making update_address
non-pub in favour of events sounds good.
And about the question of removing undialable addresses - what do you think about making it configurable? Just adding a bool to the MemoryStore
's Config
? That does not add too much complexity, IMO. Alternatively, we could just close this and just add Swarm::remove_address
, then the user can at least implement the logic for removing undialable addresses themselves. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @jxs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should have some mechanism for removing undiable addresses, otherwise they could fill up the LRU cache and cause potentially correct addresses to be evicted (although, when we successfully established a connection the address is promoted to the head of the cache, so it would only concern addresses that we haven't dialed yet).
what do you think about making it configurable? Just adding a bool to the
MemoryStore
'sConfig
?
Making it opt-out config sounds good to me.
add
Swarm::remove_address
, then the user can at least implement the logic for removing undialable addresses themselves
I would be in favor of that independently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about allowing users to "pin" specific addresses so they won't be removed by the cache?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @dknopik!
misc/peer-store/src/memory_store.rs
Outdated
// Remove all attempted addresses. | ||
let mut is_record_updated = false; | ||
for (addr, _) in errors { | ||
is_record_updated |= self.remove_address_silent(&peer, addr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm good point.
But in identify we also remove all cached addresses upon a transport error, and as @dknopik already mentioned there is otherwise no way to remove an actually unreachable address.
Maybe we could differentiate between the explicitly added addresses and the ones that we learned through other behaviors with NewExternalAddrOfPeer
? The former may be ones that are likely to be known and trusted, and thus could be persistent even after an TransportError, while the latter can contain unreachable addresses that are removed on a dial failure.
is_record_updated |= self.remove_address_silent(&peer, addr); | ||
} | ||
if is_record_updated { | ||
self.push_event_and_wake(crate::store::Event::RecordUpdated(peer)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Slightly orthogonal to this PR, but noticed it again while reviewing. Can be solved in a separate PR).
I find this even variant not very informative.
I've already raised it in #5724 (comment), but forgot about it again in later reviews. IMO the RecordUpdated
variant should include further info about the info that was added.
For the memory store I think it would make sense to have different variants AddressAdded
AddressRemove
etc. Wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me.
fn remove_address_silent(&mut self, peer: &PeerId, address: &Multiaddr, force: bool) -> bool { | ||
self.records | ||
.get_mut(peer) | ||
.is_some_and(|r| r.remove_address(address)) | ||
.is_some_and(|r| r.remove_address(address, force)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should remove a peer from the hashmap if there are no more known addresses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if and only if there is no custom data, or regardless?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, you're right. Only if there is not custom data.
Description
When the
MemoryStore
receives aFromSwarm::DialFailure
event, it will now modify the store appropriately:LocalPeerId
: Remove the peer from the store.WrongPeerId
: Remove the address for the dialed peer ID and readd it for the received peer ID.Transport
: Remove the failed addresses from the store.Furthermore, we group some repeated event logic in a new function
push_event_and_wake
for internal use.Follow-up to #5724
Notes & open questions
None.
Change checklist