Skip to content

Conversation

valentinewallace
Copy link
Contributor

@valentinewallace valentinewallace commented Sep 2, 2025

Implements the sender-side of sending payments as an often-offline sender to an often-offline recipient.

  • Fail back hold HTLCs if we are an often-offline node
  • Finalize design
  • Finish tests

Partially addresses #2298
Based on #4044, #4045

@ldk-reviews-bot
Copy link

ldk-reviews-bot commented Sep 2, 2025

👋 I see @wpaulino was un-assigned.
If you'd like another reviewer assignment, please click here.

Copy link

codecov bot commented Sep 3, 2025

Codecov Report

❌ Patch coverage is 93.61702% with 51 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.28%. Comparing base (560150d) to head (6b2f95c).

Files with missing lines Patch % Lines
lightning/src/ln/async_payments_tests.rs 96.23% 17 Missing and 6 partials ⚠️
lightning/src/ln/channelmanager.rs 80.90% 17 Missing and 4 partials ⚠️
lightning/src/ln/functional_test_utils.rs 78.57% 2 Missing and 1 partial ⚠️
lightning/src/ln/channel.rs 90.00% 2 Missing ⚠️
lightning/src/ln/outbound_payment.rs 96.29% 1 Missing ⚠️
lightning/src/offers/flow.rs 90.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4046      +/-   ##
==========================================
- Coverage   88.34%   88.28%   -0.07%     
==========================================
  Files         177      176       -1     
  Lines      131472   132075     +603     
  Branches   131472   132075     +603     
==========================================
+ Hits       116153   116600     +447     
- Misses      12662    12897     +235     
+ Partials     2657     2578      -79     
Flag Coverage Δ
fuzzing ?
tests 88.28% <93.61%> (+0.09%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@joostjager joostjager left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few initial comments.

Awesome to see how you bring everything that has been planned and discussed upfront together in this PR. Great work Val.

// If we expect the HTLCs for this payment to be held at our next-hop counterparty, don't
// retry the payment. In future iterations of this feature, we will send this payment via
// trampoline and the counterparty will retry on our behalf.
if hold_htlcs_at_next_hop {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the case where our trampoline-counterparty didn't want to retry, or has some other kind of problem. I think then we ourselves still want to retry with a different peer?

Although it is probably mostly theoretically. In a client<->LSP setup, there isn't any other peer to retry with.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the case where our trampoline-counterparty didn't want to retry, or has some other kind of problem. I think then we ourselves still want to retry with a different peer?

Yeah agreed, I think we can revisit this when we get trampoline support

/// Parameters for the reply path to a [`HeldHtlcAvailable`] onion message.
pub enum HeldHtlcReplyPath {
/// The reply path to the [`HeldHtlcAvailable`] message should terminate at our node.
ToUs {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's too late now, but I think it would have been okay to just not implement the always-online sender to mostly-offline receiver case yet and focus purely on LSP clients.

let static_invoice = match htlc.source.static_invoice() {
Some(inv) => inv,
None => {
// This is reachable but it means the counterparty is buggy and included a release
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Log?

@@ -10997,6 +11007,10 @@ This indicates a bug inside LDK. Please report this error at https://github.com/
}
};
self.fail_holding_cell_htlcs(htlcs_to_fail, msg.channel_id, counterparty_node_id);
for (static_invoice, reply_path) in static_invoices {
let res = self.flow.enqueue_held_htlc_available(&static_invoice, HeldHtlcReplyPath::ToCounterparty { path: reply_path });
debug_assert!(res.is_ok(), "enqueue_held_htlc_available can only fail for non-async senders");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Log in release mode?

// If we are configured to be an announced node, we are expected to be always-online and can
// advertise the htlc_hold feature.
if config.enable_htlc_hold {
features.set_htlc_hold_optional();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do wonder whether the feature and advertising it is necessary. Another more LSP-oriented approach would be to not advertise anything and configure clients to assume that their channel peer has the feature available. I understand that we want to be generic, but not sure how likely it is to see any setup other than LSP<->client using this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there was some spec discussion about how advertising the feature could help attract new channel peers

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, maybe useful in a future where a node is looking for an LSP...

Comment on lines +5452 to +5509
// If we have an announced channel, we are a node that is expected to be always-online and
// shouldn't be relying on channel counterparties to hold onto our HTLCs for us while
// waiting for the payment recipient to come online.
if channel.context().should_announce() {
any_announced_channels.store(true, Ordering::Relaxed);
}
if any_announced_channels.load(Ordering::Relaxed) {
return false;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wondering if this check is necessary since above in should_send_sync we are strictly checking that this a node that only set private channels.

Copy link
Contributor Author

@valentinewallace valentinewallace Sep 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah so the UserConfig in ChannelManager is just the default for new channels, it can be overridden on a per-channel basis so by itself it doesn't ensure we aren't an announced node. It can also change during runtime or on restart.

Comment on lines 5439 to 5467
let reply_path = HeldHtlcReplyPath::ToUs {
payment_id,
self.get_peers_for_blinded_path(),
);
peers: self.get_peers_for_blinded_path(),
};
let enqueue_held_htlc_available_res =
self.flow.enqueue_held_htlc_available(invoice, reply_path);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this a too rare case but if we are an often-offline sender with an LSP that doesn't support hltc_hold it will end up in this path and sending a held_htlc_available with a reply path back for the release_held_htlc back to us. Feels like this a case where it will most likely fail given that it would require both async sender and receiver to be online at same time. Wondering if it should instead fail the payment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't know for sure whether we're an often-offline sender, unfortunately. It's possible we're an always-online private node, like Lexe wallet's clients. IIUC, in the outlined situation if we were often-offline we'd miss the ReleaseHeldHtlc response and eventually time out the payment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't know for sure whether we're an often-offline sender

Yeah not for sure but I thought setting hold_outbound_htlcs_at_next_hop would be an indication that a node intends to be often-offline?

It's possible we're an always-online private node, like Lexe wallet's clients

in a setup like this, is the intent to also htld_hold it at next hop even though the node is always online?

IIUC, in the outlined situation if we were often-offline we'd miss the ReleaseHeldHtlc response and eventually time out the payment.

yeah, that's how I was seeing it. I guess letting it reach the timeout to fail the payment is fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah not for sure but I thought setting hold_outbound_htlcs_at_next_hop would be an indication that a node intends to be often-offline?

hold_outbound_htlcs_at_next_hop = true is currently the default setting, so because most people generally don't change defaults we extra can't rely on it to be accurate, if that makes sense.

in a setup like this, is the intent to also htld_hold it at next hop even though the node is always online?

Yup, the idea is that the config option exists so a node like that can proactively turn off the option if they want to.

@@ -1131,14 +1151,19 @@ where
/// [`ReleaseHeldHtlc`]: crate::onion_message::async_payments::ReleaseHeldHtlc
/// [`supports_onion_messages`]: crate::types::features::Features::supports_onion_messages
pub fn enqueue_held_htlc_available(
&self, invoice: &StaticInvoice, payment_id: PaymentId, peers: Vec<MessageForwardNode>,
&self, invoice: &StaticInvoice, reply_path_params: HeldHtlcReplyPath,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: needs doc update - no longer takes peers param directly

/// senders.
///
/// Errors if the peer does not support onion messages or we don't have a channel with them.
pub fn peer_connected(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rebase problem that made this come back?

return Err(());
}

let any_announced_channels = AtomicBool::new(false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this is better using Cell?

fn hold_htlc_channels(&self) -> Result<Vec<ChannelDetails>, ()> {
let should_send_async = {
let cfg = self.config.read().unwrap();
cfg.hold_outbound_htlcs_at_next_hop
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't just hold_outbound_htlcs_at_next_hop enough as a condition?

@@ -14844,7 +14879,9 @@ where
fn handle_release_held_htlc(&self, _message: ReleaseHeldHtlc, context: AsyncPaymentsContext) {
match context {
AsyncPaymentsContext::OutboundPayment { payment_id } => {
if let Err(e) = self.send_payment_for_static_invoice(payment_id) {
if let Err(e) =
self.send_payment_for_static_invoice(payment_id, self.list_usable_channels())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactor could be extracted to separate commit?

let (invreq, hold_htlcs_at_next_hop) =
if let PendingOutboundPayment::StaticInvoiceReceived {
invoice_request, hold_htlcs_at_next_hop, ..
} = entry.remove() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if it's better, but you could avoid the double pattern matching by always removing in the match (entry.remove() instead of entry.get()), and re-inserting rather than mutating for InvoiceReceived.

// If we are configured to be an announced node, we are expected to be always-online and can
// advertise the htlc_hold feature.
if config.enable_htlc_hold {
features.set_htlc_hold_optional();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, maybe useful in a future where a node is looking for an LSP...

@@ -2647,11 +2647,12 @@ pub fn get_route(send_node: &Node, route_params: &RouteParameters) -> Result<Rou
let scorer = TestScorer::new();
let keys_manager = TestKeysInterface::new(&[0u8; 32], Network::Testnet);
let random_seed_bytes = keys_manager.get_secure_random_bytes();
let first_hops = send_node.node.list_usable_channels();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this be discoverable through #4045 (comment) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed with @TheBlueMatt offline, this is not something we should be concerned about apparently: "An ldk user can always create a lock inversion by calling ldk things in the same expression 🤷‍♂️ nothing we can really do to stop them"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes okay, but I meant just to auto-discover the issue in this test.

// Create this network topology:
// LSP1
// / | \
// sender | recipient
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty advanced scenario. It seems more likely that an multi-part payment will be sent through one LSP, that LSP holding both htlcs? Or maybe not likely at all with splicing.

payment_hash,
PaymentFailureReason::RetriesExhausted,
);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, there seems to be quite a bit of duplication between these tests. Elsewhere it is quite common to set up a simple test framework to extract code and make the tests itself more readable. But in my experience, Rust doesn't always make it easy to do these things, pushing invisibly towards copy/paste.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely agreed. I can look into it... I'll stop just short of adding any macros :p (which I think could probably de-dup quite a bit)

Copy link
Contributor

@joostjager joostjager Sep 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe that's Rust's mitigation to bypass borrow problems?

Useful to filter for channel peers that support a specific feature, in this
case the hold_htlc feature, in upcoming commits.
As part of supporting sending payments as an often-offline sender, the sender
needs to be able to set a flag in their update_add_htlc message indicating that
the HTLC should be held until receipt of a release_held_htlc onion message from
the often-offline payment recipient.

We don't yet ever set this flag, but lay the groundwork by including the field
in the HTLCSource::OutboundRoute enum variant.

See-also <lightning/bolts#989>
As part of supporting sending payments as an often-offline sender, the sender
needs to be able to set a flag in their update_add_htlc message indicating that
the HTLC should be held until receipt of a release_held_htlc onion message from
the often-offline payment recipient.

We don't yet ever set this flag, but lay the groundwork by including the
parameter in the pay_route method.

See-also <lightning/bolts#989>
As part of supporting sending payments as an often-offline sender, the sender
needs to be able to set a flag in their update_add_htlc message indicating that
the HTLC should be held until receipt of a release_held_htlc onion message from
the often-offline payment recipient.

We don't yet ever set this flag, but lay the groundwork by including the field
in the outbound payment variant for static invoices.

We also add a helper method to gather channels for nodes that advertise support
for the hold_htlc feature, which will be used in the next commit.

See-also <lightning/bolts#989>
As part of supporting sending payments as an often-offline sender, the sender
needs to be able to set a flag in their update_add_htlc message indicating that
the HTLC should be held until receipt of a release_held_htlc onion message from
the often-offline payment recipient.

The prior commits laid groundwork to finally set the flag here in this commit.

See-also <lightning/bolts#989>
As part of supporting sending payments as an often-offline sender, the sender
needs to send held_htlc_available onion messages such that the reply path to
the message terminates at their always-online channel counterparty that is
holding the HTLC. That way when the recipient responds with release_held_htlc,
the sender's counterparty will receive that message.

Here we lay some groundwork for using a counterparty-created reply path when
sending held_htlc_available as an async sender in the next commit.
As part of supporting sending payments as an often-offline sender, the sender
needs to send held_htlc_available onion messages such that the reply path to
the message terminates at their always-online channel counterparty that is
holding the HTLC. That way when the recipient responds with release_held_htlc,
the sender's counterparty will receive that message.

After laying groundwork over some past commits, here we as an async sender send
held_htlc_available messages using reply paths created by our always-online
channel counterparty.
Now that we support the feature of sending payments as an often-offline sender
to an often-offline recipient, including the sender's LSP-side, we can start
conditionally advertising the feature bit to other nodes on the network.
Previously we were taking the NetworkGraph::{channels,nodes} locks before
ChannelManager::per_peer_state's peer_state locks in some tests, which violated
a lock order requirement we have in ChannelManager to take netgraph locks
*after* peer_state locks. See OffersMessageFlow::path_for_release_htlc which is
called while a peer_state lock is held and takes netgraph locks while creating
blinded paths.
Add testing for sending payments from an often-offline sender to an
often-offline recipient.
@valentinewallace
Copy link
Contributor Author

Rebased since #4045, will address feedback/CI issues when I next push.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants