Fix spurious MPP pathfinding failure #3707

valentinewallace · 2025-04-03T21:36:54Z

This bug was recently surfaced to us by a user who wrote a test where the sender is attempting to route an MPP payment split across the two channels that it has with its LSP, where the LSP has a single large channel to the recipient.

Previously this led to a pathfinding failure because our router was not sending the maximum possible value over the first MPP path it found due to overestimating the fees needed to cover a later hop in the route. Because the path that had just been found was not maxxed out, our router assumed that we had already found enough paths to cover the full payment amount and that we were finding additional paths for the purpose of redundant path selection. This caused the router to mark the recipient's only channel as exhausted, with the intention of choosing more unique paths in future iterations. In reality, this ended up with the recipient's only channel being disabled and subsequently failing to find a route entirely.

Update the router to fully utilize the capacity of any paths it finds in this situation, preventing the "redundant path selection" behavior from kicking in.

Closes #3685

ldk-reviews-bot · 2025-04-03T21:36:58Z

👋 Thanks for assigning @TheBlueMatt as a reviewer!
I'll wait for their review and will help manage the review process.
Once they submit their review, I'll check if a second reviewer would be helpful.

lightning/src/routing/router.rs

TheBlueMatt · 2025-04-04T14:45:51Z

lightning/src/routing/router.rs

+
+			// Aggregate the fees of the hops that come after this one, and use those fees to compute the
+			// maximum amount that this hop can contribute to the final value received by the payee.
+			let (next_hops_aggregated_base, next_hops_aggregated_prop) =


Doesn't using the aggregate version have rounding issues? Sadly if we're even one msat off I think the bug will persist :/

Yeah, I'm not sure there's an easy way to get rid of the rounding issues :(

Will move forward with adding a fudge factor when checking whether we've hit prevent_redundant_path_selection. It might also be good to prevent that code from exhausting last-hop channels.

valentinewallace · 2025-04-04T22:48:12Z

Updated some docs and added a fudge factor when deciding if we've fully utilized a path.

diff --git a/lightning/src/routing/router.rs b/lightning/src/routing/router.rs
index 1d529b9b9..73cc6d687 100644
--- a/lightning/src/routing/router.rs
+++ b/lightning/src/routing/router.rs
@@ -2111,9 +2111,10 @@ impl<'a> PaymentPath<'a> {
                                        cur_hop.hop_use_fee_msat = new_fee;
                                        total_fee_paid_msat += new_fee;
                                } else {
-                                       // It should not be possible because this function is called only to reduce the
-                                       // value. In that case, compute_fee was already called with the same fees for
-                                       // larger amount and there was no overflow.
+                                       // It should not be possible because this function is only called either to reduce the
+                                       // value or with a larger amount that was already checked for overflow in
+                                       // `compute_max_final_value_contribution`. In the former case, compute_fee was already
+                                       // called with the same fees for larger amount and there was no overflow.
                                        unreachable!();
                                }
                        }
@@ -2121,6 +2122,9 @@ impl<'a> PaymentPath<'a> {
                value_msat + extra_contribution_msat
        }

+       // Returns the maximum contribution that this path can make to the final value of the payment. May
+       // be slightly lower than the actual max due to rounding errors when aggregating fees along the
+       // path.
        fn compute_max_final_value_contribution(
                &self, used_liquidities: &HashMap<CandidateHopId, u64>, channel_saturation_pow_half: u8
        ) -> u64 {
@@ -3331,9 +3335,15 @@ where L::Target: Logger {
                                                .or_insert(spent_on_hop_msat);
                                        let hop_capacity = hop.candidate.effective_capacity();
                                        let hop_max_msat = max_htlc_from_capacity(hop_capacity, channel_saturation_pow_half);
-                                       if *used_liquidity_msat == hop_max_msat {
-                                               // If this path used all of this channel's available liquidity, we know
-                                               // this path will not be selected again in the next loop iteration.
+
+                                       // If this path used all of this channel's available liquidity, we know this path will not
+                                       // be selected again in the next loop iteration.
+                                       //
+                                       // Allow the used amount to be slightly below the max when deciding whether a channel is
+                                       // fully utilized, to account for rounding errors in
+                                       // `PaymentPath::compute_max_final_value_contribution`.
+                                       let maxed_out_hop_liquidity_range = hop_max_msat.saturating_sub(1000)..(hop_max_msat.saturating_add(1));
+                                       if maxed_out_hop_liquidity_range.contains(used_liquidity_msat) {
                                                prevented_redundant_path_selection = true;
                                        }
                                        debug_assert!(*used_liquidity_msat <= hop_max_msat);

ldk-reviews-bot · 2025-04-07T00:00:12Z

🔔 1st Reminder

Hey @TheBlueMatt! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

TheBlueMatt · 2025-04-07T13:06:10Z

lightning/src/routing/router.rs

+				let max_path_contribution_msat = payment_path.compute_max_final_value_contribution(
+					&used_liquidities, channel_saturation_pow_half
+				);
+				let desired_value_contribution = cmp::min(max_path_contribution_msat, final_value_msat);


Rather than being fuzzy can we not have these determine whether we need to limit at a random hop?

That does simplify things a bit. Pushed with the following diff:

diff --git a/lightning/src/routing/router.rs b/lightning/src/routing/router.rs index 73cc6d687..bc4210b5c 100644 --- a/lightning/src/routing/router.rs +++ b/lightning/src/routing/router.rs @@ -3326,7 +3326,6 @@ where L::Target: Logger { // might have been computed considering a larger value. // Remember that we used these channels so that we don't rely // on the same liquidity in future paths. - let mut prevented_redundant_path_selection = false; for (hop, _) in payment_path.hops.iter() { let spent_on_hop_msat = value_contribution_msat + hop.next_hops_fee_msat; let used_liquidity_msat = used_liquidities @@ -3335,20 +3334,9 @@ where L::Target: Logger { .or_insert(spent_on_hop_msat); let hop_capacity = hop.candidate.effective_capacity(); let hop_max_msat = max_htlc_from_capacity(hop_capacity, channel_saturation_pow_half); - - // If this path used all of this channel's available liquidity, we know this path will not - // be selected again in the next loop iteration. - // - // Allow the used amount to be slightly below the max when deciding whether a channel is - // fully utilized, to account for rounding errors in - // `PaymentPath::compute_max_final_value_contribution`. - let maxed_out_hop_liquidity_range = hop_max_msat.saturating_sub(1000)..(hop_max_msat.saturating_add(1)); - if maxed_out_hop_liquidity_range.contains(used_liquidity_msat) { - prevented_redundant_path_selection = true; - } debug_assert!(*used_liquidity_msat <= hop_max_msat); } - if !prevented_redundant_path_selection { + if max_path_contribution_msat > value_contribution_msat { // If we weren't capped by hitting a liquidity limit on a channel in the path, // we'll probably end up picking the same path again on the next iteration. // Decrease the available liquidity of a hop in the middle of the path.

This bug was recently surfaced to us by a user who wrote a test where the sender is attempting to route an MPP payment split across the two channels that it has with its LSP, where the LSP has a single large channel to the recipient. Previously this led to a pathfinding failure because our router was not sending the maximum possible value over the first MPP path it found due to overestimating the fees needed to cover the following hops. Because the path that had just been found was not maxxed out, our router assumed that we had already found enough paths to cover the full payment amount and that we were finding additional paths for the purpose of redundant path selection. This caused the router to mark the recipient's only channel as exhausted, with the intention of choosing more unique paths in future iterations. In reality, this ended up with the recipient's only channel being disabled and subsequently failing to find a route entirely. Update the router to fully utilize the capacity of any paths it finds in this situation, preventing the "redundant path selection" behavior from kicking in.

MaxFangX · 2025-04-08T05:53:29Z

Nice to see a fix in the works, thanks @valentinewallace!

Because the path that had just been found was not maxxed out

A question that's somewhat related - will this PR also have the effect of generally reducing the # of shards over which MPPs are sent, since found paths are maxed out? We're currently dealing with issues in prod where sending a MPP payment which requires only two of the sender's outbound channels (i.e. two MPP shards) is actually producing a MPP route with seven shards. With a per-path success rate of 85% (our observed probing success rate), the overall payment success rate drops to a measly 0.85^7 = 32%.

And the computed maximum per-path value contribution accounts for the scorer's liquidity bounds, correct?

valentinewallace · 2025-04-08T15:03:57Z

Nice to see a fix in the works, thanks @valentinewallace!

Because the path that had just been found was not maxxed out

A question that's somewhat related - will this PR also have the effect of generally reducing the # of shards over which MPPs are sent, since found paths are maxed out? We're currently dealing with issues in prod where sending a MPP payment which requires only two of the sender's outbound channels (i.e. two MPP shards) is actually producing a MPP route with seven shards. With a per-path success rate of 85% (our observed probing success rate), the overall payment success rate drops to a measly 0.85^7 = 32%.

Hey @MaxFangX, this PR should reduce the number of shards in some cases but I'm not sure that it fully accounts for 7 shards vs 2... Let me know if you have logs for this case or any way to reproduce 👀

And the computed maximum per-path value contribution accounts for the scorer's liquidity bounds, correct?

Actually no, the liquidity bounds are used when sorting paths to pick the paths that are the most likely to succeed. Interesting point though, maybe something to consider.

MaxFangX · 2025-04-08T19:33:25Z

Let me know if you have logs for this case or any way to reproduce 👀

I opened an issue to continue the discussion at #3717

And the computed maximum per-path value contribution accounts for the scorer's liquidity bounds, correct?

Actually no, the liquidity bounds are used when sorting paths to pick the paths that are the most likely to succeed. Interesting point though, maybe something to consider.

I think for us it's not imperative that the liquidity bounds are incorporated while computing the max per-path value contribution since the limiting factor is usually the channel that the user has with our LSP, rather than a public channel in the network.

Just speculating here, but I wonder if it would be a good strategy to:

Compute a number of possible MPP paths to the target
Sort the paths according to their score according to liquidity bounds
Iteratively increase the value contributed in the highest probability path until it runs out of capacity it can contribute, or is no longer the highest probability path
Continue doing this until there is sufficient value contribution across all paths to meet the payment amount.
Remove any MPP paths which were not necessary to meet the payment amount.

This would have the effect of simultaneously sending the highest value along the highest probability paths, as well as reducing the number of paths overall.

ldk-reviews-bot · 2025-04-14T13:40:18Z

✅ Added second reviewer: @wpaulino

ldk-reviews-bot · 2025-04-16T13:40:31Z

🔔 1st Reminder