feat(peer-store): Remove addresses from peer store on dial failure #5926

dknopik · 2025-03-10T15:58:23Z

Description

When the MemoryStore receives a FromSwarm::DialFailure event, it will now modify the store appropriately:

LocalPeerId: Remove the peer from the store.
WrongPeerId: Remove the address for the dialed peer ID and readd it for the received peer ID.
Transport: Remove the failed addresses from the store.
other: no action.

Furthermore, we group some repeated event logic in a new function push_event_and_wake for internal use.

Follow-up to #5724

Notes & open questions

None.

Change checklist

I have performed a self-review of my own code
I have made corresponding changes to the documentation
I have added tests that prove my fix is effective or that my feature works
A changelog entry has been made in the appropriate crates

jxs

Thanks Daniel! Overall LGTM, left a comment.

jxs · 2025-03-11T16:07:16Z

misc/peer-store/src/memory_store.rs

+                        // Remove all attempted addresses.
+                        let mut is_record_updated = false;
+                        for (addr, _) in errors {
+                            is_record_updated |= self.remove_address_silent(&peer, addr);


I am not sure we should remove the addresses here,
a Transport error may be for io reasons, which are not deterministic and may not be related with the address itself.
Therefore the addresses may still be correct right?

Hmm, yeah. But it would also be unfortunate to have perpetually undialable peers in here, especially as we have no TTL mechanism any more, only one LRU cache per peer.

Hmm good point.
But in identify we also remove all cached addresses upon a transport error, and as @dknopik already mentioned there is otherwise no way to remove an actually unreachable address.

Maybe we could differentiate between the explicitly added addresses and the ones that we learned through other behaviors with NewExternalAddrOfPeer? The former may be ones that are likely to be known and trusted, and thus could be persistent even after an TransportError, while the latter can contain unreachable addresses that are removed on a dial failure.

Sounds good. But then I would allow the user to add addresses that are also removable. I'll push a draft.

Done. I am not sure if I like it, it makes the API less clean IMO.

Is it really a problem if we remove a temporarily undialable address? For example, in my use-case (Anchor), such an address would be re-added by discovery at some point. So these addresses are not permanently lost.

I could send those through an event, but that would make my code more complex (but now that I think about it, combined with having more granular event variants as you suggested, it might make sense to switch to that anyway, making my point irrelevant 🤔).

I am not sure I understand that. The granular event types are just for the output events, how would that change anything about adding addresses?

But still, there might be some conceivable cases, where users want to add "non-trusted" addresses manually.

Well technically one could still do that through the normal Swarm::add_address, that maps to NewExternalAddrOfPeer 😄.

Which makes me think: we currently have two different ways of adding new addresses: Swarm::add_address and Behavior::update_peer.
In RequestResponse we deprecated the behavior-level methods for adding and removing addresses, in favor of doing that through the Swarm. If we want to be consistent, shouldn't we also drop the public methods on the peer_store::Behavior? Is there any reason why one would not want to just use Swarm::add_address?
For removing the address again we could add an analogous method Swarm::remove_address (sorry for opening yet another orthogonal discussion - can be a separate PR as well). Of course then we'd not have the "explicit/ permanent" addresses anymore.

Ah, I was not aware of Swarm::add_address at all, good to know :) In that case my first paragraph makes no sense at all, so please disregard it.

Yeah, your proposal of making update_address non-pub in favour of events sounds good.

And about the question of removing undialable addresses - what do you think about making it configurable? Just adding a bool to the MemoryStore's Config? That does not add too much complexity, IMO. Alternatively, we could just close this and just add Swarm::remove_address, then the user can at least implement the logic for removing undialable addresses themselves. What do you think?

I think we should have some mechanism for removing undiable addresses, otherwise they could fill up the LRU cache and cause potentially correct addresses to be evicted (although, when we successfully established a connection the address is promoted to the head of the cache, so it would only concern addresses that we haven't dialed yet).

what do you think about making it configurable? Just adding a bool to the MemoryStore's Config?

Making it opt-out config sounds good to me.

add Swarm::remove_address, then the user can at least implement the logic for removing undialable addresses themselves

I would be in favor of that independently.

What about allowing users to "pin" specific addresses so they won't be removed by the cache?

elenaf9

Thanks @dknopik!

misc/peer-store/src/memory_store.rs

elenaf9 · 2025-03-12T03:53:02Z

misc/peer-store/src/memory_store.rs

+                        // Remove all attempted addresses.
+                        let mut is_record_updated = false;
+                        for (addr, _) in errors {
+                            is_record_updated |= self.remove_address_silent(&peer, addr);


Hmm good point.
But in identify we also remove all cached addresses upon a transport error, and as @dknopik already mentioned there is otherwise no way to remove an actually unreachable address.

Maybe we could differentiate between the explicitly added addresses and the ones that we learned through other behaviors with NewExternalAddrOfPeer? The former may be ones that are likely to be known and trusted, and thus could be persistent even after an TransportError, while the latter can contain unreachable addresses that are removed on a dial failure.

elenaf9 · 2025-03-12T04:02:02Z

misc/peer-store/src/memory_store.rs

+                            is_record_updated |= self.remove_address_silent(&peer, addr);
+                        }
+                        if is_record_updated {
+                            self.push_event_and_wake(crate::store::Event::RecordUpdated(peer));


(Slightly orthogonal to this PR, but noticed it again while reviewing. Can be solved in a separate PR).

I find this even variant not very informative.
I've already raised it in #5724 (comment), but forgot about it again in later reviews. IMO the RecordUpdated variant should include further info about the info that was added.
For the memory store I think it would make sense to have different variants AddressAdded AddressRemove etc. Wdyt?

Sounds good to me.

elenaf9 · 2025-03-18T02:10:44Z

misc/peer-store/src/memory_store.rs

+    fn remove_address_silent(&mut self, peer: &PeerId, address: &Multiaddr, force: bool) -> bool {
        self.records
            .get_mut(peer)
-            .is_some_and(|r| r.remove_address(address))
+            .is_some_and(|r| r.remove_address(address, force))
    }


We should remove a peer from the hashmap if there are no more known addresses.

if and only if there is no custom data, or regardless?

Ah yes, you're right. Only if there is not custom data.

dknopik and others added 3 commits March 10, 2025 16:31

feat(peer-store): Remove addresses from peer store on dial failure

352f1aa

clippy

6b9453f

Merge branch 'master' into peer-store-remove-undialable

0514e70

jxs reviewed Mar 11, 2025

View reviewed changes

jxs requested a review from elenaf9 March 11, 2025 16:14

elenaf9 reviewed Mar 12, 2025

View reviewed changes

dknopik added 3 commits March 12, 2025 12:03

make *_silent fns non-pub

0654830

Merge branch 'master' into peer-store-remove-undialable

cc6180b

allow making addresses unremovable by events

0b243cc

This comment was marked as outdated.

Sign in to view

Merge branch 'master' into peer-store-remove-undialable

2c8f8f9

elenaf9 reviewed Mar 18, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(peer-store): Remove addresses from peer store on dial failure #5926

feat(peer-store): Remove addresses from peer store on dial failure #5926

dknopik commented Mar 10, 2025

jxs left a comment

jxs Mar 11, 2025

dknopik Mar 11, 2025

elenaf9 Mar 12, 2025

dknopik Mar 14, 2025

dknopik Mar 14, 2025

elenaf9 Mar 17, 2025 •

edited

Loading

dknopik Mar 17, 2025

dknopik Mar 17, 2025

elenaf9 Mar 18, 2025

drHuangMHT Mar 19, 2025

elenaf9 left a comment

elenaf9 Mar 12, 2025

elenaf9 Mar 12, 2025

dknopik Mar 12, 2025

This comment was marked as outdated.

elenaf9 Mar 18, 2025

dknopik Mar 18, 2025

elenaf9 Mar 18, 2025

feat(peer-store): Remove addresses from peer store on dial failure #5926

Are you sure you want to change the base?

feat(peer-store): Remove addresses from peer store on dial failure #5926

Conversation

dknopik commented Mar 10, 2025

Description

Notes & open questions

Change checklist

jxs left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

elenaf9 Mar 17, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

elenaf9 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

This comment was marked as outdated.

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

elenaf9 Mar 17, 2025 •

edited

Loading