Path Payment Memoization POC #4644

SirTyson · 2025-02-07T21:39:25Z

Description

This PR is a proof of concept optimization for path payments that significantly decreases the duration of exceptional "slow ledger" and modestly improves average ledger close times.

Observation

Sometimes, stellar-core takes a very long time to apply transactions, on the scale of 5-10 seconds. Initially, I analyzed 2 such slow ledgers, 53514768 and 53514801. These ledgers take 3.01 seconds and 2.05 seconds to apply on my laptop, and even longer on validator hardware.

Like most transactions sets, these ledgers contained a large amount of arbitrage path payments, most of which were very similar or identical and most of which failed. I thought we might have an improper offer cache and doing extra disk IO, but offer table caches work as expected. Upon profiling, I found that these ledgers took over 1 second just constructing and committing non-root LedgerTxn objects. Despite not going to disk, the churn on constantly creating, commiting, and eventually rolling back ledgerTxn objects for repetitive path payments was very expensive.

Solution

The issue had to do with our current "exit early" strategy. In order to prevent recomputing the same failed offer pairs, we use the worstBestOfferCache. The issue with this cache is that it reasons on individual asset pairs instead of the payment path as a whole. For path payments, often many asset pairs along the path would succeed and an asset pair very deep in the order book would eventually fail. worstBestOfferCache is useless in preventing us from traversing this failed path, since we will not hit the cached asset pair until we have done most of the work.

The solution to this is to use dynamic programming memoization at the asset path level. To do this, for each failed path payment, I cache the path hash to the src sell amount and destination amount that failed. Before walking the path of another payment op, we check the path hash against the hash. The cache is conservative such that we fail early iff:

The cache contains a src amount equal or larger than the current op source amount
The corresponding destination amount for that src amount is lower or equal to current op destination amount

Intuitively, the reasoning is "If a previous failed transaction gave away more and received less than me, I must also fail."

These conditions assume that no offers along the path have been modified since the last failed path payment has been cached. To achieve this, we invalidate the cache as follows:

Whenever an offer is updated or created, we invalidate all cached paths that contain the pair {sheep, wheat} and {wheat, sheep}. We must invalidate both sides of the trade due to rounding conditions.
Whenever a liquidity bool is deposited to or withdrawn from, we invalidate all paths that contain the pair {assetA, assetB} and {assetB, assetA}. Note this constraint can probably be tightened, but I didn't bother because it happens so rarely.
Whenever a path payment succeeds, invalidate the counter party to each asset trade pair.

Reasoning for point 3. Consider we have two path payments with the same path. The first path payment fails, so we cache it. The 2nd path payment succeeds. Because the two payments target the same path, the success of op 2 has made the market strictly more competitive for path payments of the same path. More formally, for some path N, if a path payment of path N is executed, no op with path N that previously failed could now succeed, so we do not need to invalidate the cache. However, we do need to invalidate the counter party. I.e. if the path assetA -> assetB -> assetC executes, we must invalidate the cache for the counter party pairs {assetC -> B} and {assetB -> assetA}.

Results

Disclosure: I performed all these tests on my laptop. Due to the memory consumption of tracy, I could only record in 500 ledger chunks. These are rough, preliminary estimates. I need to follow up with more ranges and use dev boxes to test. Disclosure over, you've been warned.

Replaying range 53514477 -> 53514977

Total apply time reduced by 20%.
PathPaymentStrictSend reduced by 32%.
PathPaymentStrictReceive reduced by 26%

Most importantly, "slow ledger" spikes we significantly reduced. In particular:

ledger 53514768 close time went from 3.01s -> 812 ms (73% decrease)
ledger 53514801 close time went from 2.05s -> 305 ms (85% decrease)

On more recent ledgers, I get the following results. From 55633887 -> 55634887 (1000 ledgers replayed from today)

total apply time decrease by 8%
There were no significant outliers, so max ledger close time is about the same, ~300 ms worst ledger for both.

TLDR

Modest improvement in the average case. Significant improvement towards mitigating against path payments that cross many offers. No apparent downside, except that this optimization is technically a protocol change (error codes returned may be different than before) and general risk in adding additional complexity to the order book (though in my opinion this cache is fairly straight forward all things considered).

Checklist

Reviewed the contributing document
Rebased on top of master (no merge commits)
Ran clang-format v8.0.0 (via make format or the Visual Studio extension)
Compiles
Ran all tests
If change impacts performance, include supporting evidence per the performance document

SirTyson added 9 commits February 4, 2025 18:02

POC prototype of DP dex memos

79eb3ba

With eviction optimization

8757a7a

Working with invalidation properly

2f3cfb8

Optimized cache

e1d7dfd

simplified path payment cache

2a256e0

Fixed buy-sell asset ordering in cache

b043039

Shared cache between strict send and strict receive

700b7a6

Improve testing wrt error code divergence

9ffa50b

Proper error code handling for strict send

68e86b0

nafeef123 approved these changes Feb 17, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Path Payment Memoization POC #4644

Path Payment Memoization POC #4644

SirTyson commented Feb 7, 2025 •

edited

Loading

Path Payment Memoization POC #4644

Are you sure you want to change the base?

Path Payment Memoization POC #4644

Conversation

SirTyson commented Feb 7, 2025 • edited Loading

Description

Observation

Solution

Results

TLDR

Checklist

SirTyson commented Feb 7, 2025 •

edited

Loading