-
Notifications
You must be signed in to change notification settings - Fork 399
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not track HTLC IDs as separate MPP parts which need claiming #3680
Do not track HTLC IDs as separate MPP parts which need claiming #3680
Conversation
TheBlueMatt
commented
Mar 22, 2025
👋 Thanks for assigning @wpaulino as a reviewer! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great on first pass! Will take another look at the test to make sure I understand everything correctly.
🔔 1st Reminder Hey @wpaulino! This PR has been waiting for your review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great find!
Would an alternative way of fixing this be by only queueing a single monitor update for all the HTLC parts belonging to the same channel?
I'd also like to, if possible, prevent future cases like this where the channel suddenly becomes blocked resulting in a force close. Can/should we trigger a panic if a queued monitor update that we need to process to unblock a channel becomes blocked?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Diff looks good to me overall.
Yes, that would also fix this. I thought about trying to do that first, but its not as nice for a backport in that its a fairly nontrivial difference. The advantage that would have, of course, is that it'd be nice performance win on top (MPP claims in one commitment update and one
Hmm, we definitely should, but its not entirely clear to me how. We don't track explicit dependencies in the form of which specific monitor updates we're waiting on to release a monitor update, so I don't really see a clean assertion path here. I guess we could assert for this specific bug that we don't want a channel with no pending monitor updates to be blocked on an MPP claim from itself (with a monitor update in the pending queue), but its pretty tailored to exactly this problem which doesn't seem all that useful. |
3938d04
to
1d8f34a
Compare
CI is still failing |
c3340a8
to
90acd18
Compare
Yea, it hard to repro locally so I'm kinda fighting CI directly :/ |
acad2b4
to
9c0be2e
Compare
Okay, it passed the last time modulo rustfmt, so it should again this time. |
Nope, still hanging :( |
6b5e1f6
to
26e7e96
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3680 +/- ##
==========================================
+ Coverage 89.25% 90.43% +1.18%
==========================================
Files 155 155
Lines 119708 129654 +9946
Branches 119708 129654 +9946
==========================================
+ Hits 106845 117255 +10410
+ Misses 10250 9864 -386
+ Partials 2613 2535 -78 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
8a370bd
to
1367c88
Compare
Okay, this is finally ready for review. The test is quite stable (tho doesn't work at all on windows). |
Nice. LGTM.
Curious if it's some specific windows primitive that just won't play nice? |
Yea, windows is not a real operating system - its still, somehow, doing cooperative multitasking in 2025 (only within the same process, but still) |
In a coming commit we'll need to hold references to `TestChannelManager` in threads, requiring that it be `Sync`.
In a comming commit we'll add a test that relies heavily on lock fairness, which is not provided by the default Rust `Mutex`. Luckily, `parking_lot` provided an `unlock_fair`, which we use here, though it implies we have to manually implement lock poisoning.
Squashed. |
1367c88
to
c83280a
Compare
c83280a
to
d5e4829
Compare
Hmm, think the patch fails on rebase, $ git diff-tree -U1 c83280ab1 93b4479e4
diff --git a/lightning/src/ln/chanmon_update_fail_tests.rs b/lightning/src/ln/chanmon_update_fail_tests.rs
index b386f6fee..e1c95def4 100644
--- a/lightning/src/ln/chanmon_update_fail_tests.rs
+++ b/lightning/src/ln/chanmon_update_fail_tests.rs
@@ -3877,3 +3877,4 @@ fn test_single_channel_multiple_mpp() {
let node_cfgs = create_node_cfgs(9, &chanmon_cfgs);
- let node_chanmgrs = create_node_chanmgrs(9, &node_cfgs, &[None; 9]);
+ let configs = [None, None, None, None, None, None, None, None, None];
+ let node_chanmgrs = create_node_chanmgrs(9, &node_cfgs, &configs);
let mut nodes = create_network(9, &node_cfgs, &node_chanmgrs); |
When we claim an MPP payment, we need to track which channels have had the preimage durably added to their `ChannelMonitor` to ensure we don't remove the preimage from any `ChannelMonitor`s until all `ChannelMonitor`s have the preimage. Previously, we tracked each MPP part, down to the HTLC ID, as a part which we needed to get the preimage on disk for. However, this is not necessary - once a `ChannelMonitor` has a preimage, it applies it to all inbound HTLCs with the same payment hash. Further, this can cause a channel to wait on itself in cases of high-latency synchronous persistence - * If we have receive an MPP payment for which multiple parts came to us over the same channel, * and claim the MPP payment, creating a `ChannelMonitorUpdate` for the first part but enqueueing the remaining HTLC claim(s) in the channel's holding cell, * and we receive a `revoke_and_ack` for the same channel before the `ChannelManager::claim_payment` method completes (as each claim waits for the `ChannelMonitorUpdate` persistence), * we will cause the `ChannelMonitorUpdate` for that `revoke_and_ack` to go into the blocked set, waiting on the MPP parts to be fully claimed, * but when `claim_payment` goes to add the next `ChannelMonitorUpdate` for the MPP claim, it will be placed in the blocked set, since the blocked set is non-empty. Thus, we'll end up with a `ChannelMonitorUpdate` in the blocked set which is needed to unblock the channel since it is a part of the MPP set which blocked the channel.
d5e4829
to
93b4479
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm!
Backported in #3697 |
v0.1.2 - Apr 02, 2025 - "Foolishly Edgy Cases" API Updates =========== * `lightning-invoice` is now re-exported as `lightning::bolt11_invoice` (lightningdevkit#3671). Performance Improvements ======================== * `rapid-gossip-sync` graph parsing is substantially faster, resolving a regression in 0.1 (lightningdevkit#3581). * `NetworkGraph` loading is now substantially faster and does fewer allocations, resulting in a 20% further improvement in `rapid-gossip-sync` loading when initializing from scratch (lightningdevkit#3581). * `ChannelMonitor`s for closed channels are no longer always re-persisted immediately after startup, reducing on-startup I/O burden (lightningdevkit#3619). Bug Fixes ========= * BOLT 11 invoices longer than 1023 bytes long (and up to 7089 bytes) now properly parse (lightningdevkit#3665). * In some cases, when using synchronous persistence with higher latency than the latency to communicate with peers, when receiving an MPP payment with multiple parts received over the same channel, a channel could hang and not make progress, eventually leading to a force-closure due to timed-out HTLCs. This has now been fixed (lightningdevkit#3680). * Some rare cases with multi-hop BOLT 11 route hints or multiple redundant blinded paths could have led to the router creating invalid `Route`s were fixed (lightningdevkit#3586). * Corrected the decay logic in `ProbabilisticScorer`'s historical buckets model. Note that by default historical buckets are only decayed if no new datapoints have been added for a channel for two weeks (lightningdevkit#3562). * `{Channel,Onion}MessageHandler::peer_disconnected` will now be called if a different message handler refused connection by returning an `Err` from its `peer_connected` method (lightningdevkit#3580). * If the counterparty broadcasts a revoked state with pending HTLCs, those will now be claimed with other outputs which we consider to not be vulnerable to pinning attacks if they are not yet claimable by our counterparty, potentially reducing our exposure to pinning attacks (lightningdevkit#3564).