-
Notifications
You must be signed in to change notification settings - Fork 2.2k
switchrpc: add idempotent external HTLC dispatch via SendOnion #10473
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: elle-base-branch-payment-service
Are you sure you want to change the base?
switchrpc: add idempotent external HTLC dispatch via SendOnion #10473
Conversation
Summary of ChangesHello @calvinrzachman, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the reliability of the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces an important idempotency mechanism to the SendOnion RPC, which is a significant reliability improvement for clients. The implementation follows a clean "Register -> Validate -> Act -> Roll-back" pattern, which is well-documented and appears robust. The addition of a comprehensive integration test for concurrent requests provides strong confidence in the correctness of the solution. My feedback is focused on improving the new test code for better maintainability and robustness.
bitromortac
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice 🎉, concept ACK. It looks basically good to go.
|
|
||
| // SendOnion handles the incoming request to send a payment using a | ||
| // preconstructed onion blob provided by the caller. | ||
| // SendOnion provides an idempotent API for dispatching a pre-formed onion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great docs! We should put some of that info into switchrpc.proto as well (only the info that concerns the api user, they don't need to know implementation details).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the docs in switchrpc.proto. Let me know if it looks good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are still some deviations, see other comments
307e665 to
3e0967d
Compare
4b1ac93 to
8d3cb70
Compare
This permits the Switch RPC server to manage the state of attempts.
8d3cb70 to
ad45a62
Compare
bitromortac
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates @calvinrzachman. I think we should still clarify how we transport errors. IMO we should send back gRPC error codes when hitting errors within the scope of SendOnion and return one_of structured CleartextFailure or Success, when it comes from SendHTLC.
|
|
||
| // SendOnion handles the incoming request to send a payment using a | ||
| // preconstructed onion blob provided by the caller. | ||
| // SendOnion provides an idempotent API for dispatching a pre-formed onion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are still some deviations, see other comments
ad45a62 to
ade744a
Compare
Update SendOnion rpc implementation to make use of our new Switch attempt store primitives for idempotence. This will allow SendOnion to reject processing the same htlc attempt id more than once - providing strong assurances to callers that the endpoint is safe to retry when they encounter grpc client and network related failures (eg: timeouts, service unavaiable errors, etc.) Additionally, to guarantee a definitive outcome and prevent orphaned attempts, the handler synchronously transitions an initialized (PENDING) attempt to FAILED if subsequent validation or dispatch fails. This prevents indefinite hangs for TrackOnion queries.
Clarify useage of API for rpc clients.
We can now assert that making multiple calls to SendOnion for the same attempt ID is prevented.
ade744a to
23dbe7f
Compare
bitromortac
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, nice work 🎉
| been successfully processed. A retrying client should interpret this | ||
| as a success and proceed to tracking the payment's result. | ||
| 3. AMBIGUOUS FAILURE (gRPC code Unavailable or DeadlineExceeded): An |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps we should add that context cancelled is also such an error and that this should be the fallback if an error can't be classified in terms of the other cases (we could make this the last point), since it's always safe to retry this RPC.
Change Description
This PR builds on the underlying storage primitives provided by #10049 to make the
SendOnionRPC fully idempotent, allowing clients to safely retry requests after network failures.To achieve this, the
SendOnionhandler now follows a "Register -> Validate -> Act -> Roll-back" pattern:Register: The handler's first action is to call
InitAttempt. This "write ahead" style approach creates a durable record of intent to dispatch that serves as the idempotence anchor, ensuring any retries for the sameattempt_idare correctly identified as duplicates. NOTE: This is analogous to theControlTower.InitPaymentmethod used by routerrpc which ensures the router doesn't launch two parallel payment lifecycle managers for the same payment hash.Validate & Act: Only after the intent to dispatched is durably written, does the handler proceed with request validation and the actual HTLC dispatch via the familiar Switch's
SendHTLCmethod.Transition on Failure (rollback): The process of registering and then acting is not naturally atomic. To close this "atomicity gap," we perform cleanup synchronously as part of error handling to prevent attempts from being "orphaned" in an initialized but not actually dispatched state. If any validation or dispatch error occurs after
InitAttempt, we transition the attempt's state fromPENDINGtoFAILEDby callingFailPendingAttempt. This ensures the operation is atomic from the client's perspective and helps prevent indefinite hangs on subsequentTrackOnioncalls.Client Error Contract
This new implementation establishes a error contract for clients, who must handle four distinct categories of outcomes:
TrackOnion.ErrorCode_DUPLICATE_HTLCin the response body.attempt_idwas already successfully processed.codes.Unavailable.InitAttemptwrite responsible for detecting duplicates.attempt_idrisks a duplicate payment.codes.InvalidArgument,codes.FailedPrecondition, validation failures).attempt_id.Overall, the updated implementation provides several critical benefits for
SendOnionclients:InitAttemptgate ensures that the server will never process the sameattempt_idmore than once. If a request for anattempt_idis received while that ID is already in-flight or completed, it will be rejected as a duplicate. This provides strong assurance against accidental double-spending due to server-side logic failures.SendOnionrequest was processed after network failures (e.g., timeouts, disconnections, server unavailability). They can always resolve the ambiguity by retrying the request until definitive acknowledgement is received from the switchrpc server.InitAttemptgate, its outcome is effectively "frozen" from the perspective of a retrying client. By placing the durable write before any dynamic validation (e.g., checking peer connectivity or liquidity), the API guarantees that a requestR1accepted at timet1cannot have its retryR2rejected for a new, transient reason at timet2. The retry will always receive aDUPLICATE_HTLCacknowledgment, which is important for preventing client-side misinterpretations that can lead to leaked attempts.Steps to Test
The integration test suite (itest) has been updated to verify the new idempotent behavior. We now explicitly assert that concurrent or repeated calls to
SendOnionwith the sameattempt_idare correctly rejected as duplicates, both for the entire life-cycle of the attempt and during a client retry-storm scenario.make itest icase=send_onion_concurrencymake itest icase=send_oniongo test -v -timeout 30s -tags switchrpc github.com/lightningnetwork/lnd/lnrpc/switchrpc