Skip to content

Async background persistence #3905

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jul 23, 2025

Conversation

joostjager
Copy link
Contributor

@joostjager joostjager commented Jul 2, 2025

Stripped down version of #3778. It allows background persistence to be async, but channel monitor persistence remains sync. This means that for the time being, users wanting async background persistence would be required to implement both the sync and the async KVStore trait. This model is available through process_events_full_async.

process_events_async still takes a synchronous kv store to remain backwards compatible.

Usage in ldk-node: lightningdevkit/ldk-node@main...joostjager:ldk-node:upgrade-to-async-kvstore

@ldk-reviews-bot
Copy link

ldk-reviews-bot commented Jul 2, 2025

👋 Thanks for assigning @tnull as a reviewer!
I'll wait for their review and will help manage the review process.
Once they submit their review, I'll check if a second reviewer would be helpful.

@joostjager joostjager force-pushed the async-persister branch 7 times, most recently from 3fb7d6b to 1847e8d Compare July 3, 2025 09:57
@joostjager joostjager self-assigned this Jul 3, 2025
@joostjager joostjager force-pushed the async-persister branch 2 times, most recently from 1f59bbe to 723a5a6 Compare July 3, 2025 11:52
@joostjager joostjager mentioned this pull request May 12, 2025
24 tasks
@TheBlueMatt TheBlueMatt linked an issue Jul 7, 2025 that may be closed by this pull request
@joostjager joostjager force-pushed the async-persister branch 10 times, most recently from bc9c29a to 90ab1ba Compare July 9, 2025 09:52
@joostjager joostjager marked this pull request as ready for review July 9, 2025 09:52
@joostjager joostjager requested a review from tnull July 9, 2025 09:52
Comment on lines 631 to 663
fn persist_state<'a>(
&self, sweeper_state: &SweeperState,
) -> Pin<Box<dyn Future<Output = Result<(), io::Error>> + 'a + Send>> {
let encoded = &sweeper_state.encode();

self.kv_store.write(
OUTPUT_SWEEPER_PERSISTENCE_PRIMARY_NAMESPACE,
OUTPUT_SWEEPER_PERSISTENCE_SECONDARY_NAMESPACE,
OUTPUT_SWEEPER_PERSISTENCE_KEY,
encoded,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The encoded variable is captured by reference in the returned future, but it's a local variable that will be dropped when the function returns. This creates a potential use-after-free issue. Consider moving ownership of encoded into the future instead:

fn persist_state<'a>(
    &self, sweeper_state: &SweeperState,
) -> Pin<Box<dyn Future<Output = Result<(), io::Error>> + 'a + Send>> {
    let encoded = sweeper_state.encode();

    self.kv_store.write(
        OUTPUT_SWEEPER_PERSISTENCE_PRIMARY_NAMESPACE,
        OUTPUT_SWEEPER_PERSISTENCE_SECONDARY_NAMESPACE,
        OUTPUT_SWEEPER_PERSISTENCE_KEY,
        &encoded,
    )
}

This ensures the data remains valid for the lifetime of the future.

Suggested change
fn persist_state<'a>(
&self, sweeper_state: &SweeperState,
) -> Pin<Box<dyn Future<Output = Result<(), io::Error>> + 'a + Send>> {
let encoded = &sweeper_state.encode();
self.kv_store.write(
OUTPUT_SWEEPER_PERSISTENCE_PRIMARY_NAMESPACE,
OUTPUT_SWEEPER_PERSISTENCE_SECONDARY_NAMESPACE,
OUTPUT_SWEEPER_PERSISTENCE_KEY,
encoded,
)
fn persist_state<'a>(
&self, sweeper_state: &SweeperState,
) -> Pin<Box<dyn Future<Output = Result<(), io::Error>> + 'a + Send>> {
let encoded = sweeper_state.encode();
self.kv_store.write(
OUTPUT_SWEEPER_PERSISTENCE_PRIMARY_NAMESPACE,
OUTPUT_SWEEPER_PERSISTENCE_SECONDARY_NAMESPACE,
OUTPUT_SWEEPER_PERSISTENCE_KEY,
&encoded,
)

Spotted by Diamond

Is this helpful? React 👍 or 👎 to let us know.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this real?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so as the compiler would likely optimize that away, given that encoded will be an owned value (Vec returned by encode()). Still, the change that it suggests looks cleaner.

In general it will be super confusing that we encode at the time of creating the future, but would only actually persist once we dropped the lock. Starting from now we'll need to be super cautious about the side-effects of interleaving persist calls.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is that an async kv store store encodes the data and stores the write action in a queue at the moment the future is created. Things should still happen in the original order.

Can you show a specific scenario where we have to be super cautious even if we have that queue?

Copy link
Contributor Author

@joostjager joostjager Jul 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved &

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is that an async kv store store encodes the data and stores the write action in a queue at the moment the future is created. Things should still happen in the original order.

If that is the idea that we start assuming in this PR, we should probably also start documenting these assumptions in this PR on KVStore already.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this requirement to the async KVStore trait doc

@joostjager joostjager requested review from tnull and TheBlueMatt July 18, 2025 09:52
Copy link
Collaborator

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost LGTM, just one real comment and a doc nit.

}

output_info.status.broadcast(cur_hash, cur_height, spending_tx.clone());
self.broadcaster.broadcast_transactions(&[&spending_tx]);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, it used to be the case that we'd first persist, wait for that to finish, then broadcast. I don't think its critical, but it does seem like we should retain that behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to first await the persist future, and then broadcast.

@ldk-reviews-bot
Copy link

🔔 1st Reminder

Hey @tnull! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

TheBlueMatt
TheBlueMatt previously approved these changes Jul 21, 2025
Copy link
Collaborator

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question about the requirements we want, but figuring out the answer doesn't have to block landing this PR as-is.

) -> Result<Vec<u8>, io::Error>;
/// Persists the given data under the given `key`.
) -> Pin<Box<dyn Future<Output = Result<Vec<u8>, io::Error>> + 'static + Send>>;
/// Persists the given data under the given `key`. Note that the order of multiple writes calls needs to be retained
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh actually, do we want this to be the restriction, or do we want "the order of multiple writes to the same key needs to be retained"? I imagine the second, we don't currently have a need inside LDK to require a strict total order, and it could definitely substantially slow down async persist. cc @tnull

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One related thing I've been thinking about is whether it is okay to skip a stale write? If two consecutive same-key writes are executed out of order, is it fine to simply drop the first write? Or could it be that we do need to read that first written data at some point?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how it could not be okay - writes overwrite, so if there's two writes to the same key we're required to eventually end up with the second one on disk. Only question, I guess, is whether we're allowed to complete the second future first, then the first future later, and still end up with the second future's write. I think that's something we should accept (and document?) but that's the only caller-observable question, I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of a write -> read -> write pattern, but I believe we already established that that isn't happening in LDK. We weren't going to do ordering for reads anyway.

Copy link
Contributor

@tnull tnull Jul 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of a write -> read -> write pattern,

Hmm, that's indeed a good question, i.e., whether we'd need to deal with interleaving reads also, otherwise we may end up reading data that was written later, actually?

but I believe we already established that that isn't happening in LDK.

I'm not sure where we established that, but for LDK that def. won't be the case for much longer, as we'll want to migrate to stores that are not completely held in-memory, and we'll read data on-demand on cache failures.

Copy link
Collaborator

@TheBlueMatt TheBlueMatt Jul 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of a write -> read -> write pattern, but I believe we already established that that isn't happening in LDK. We weren't going to do ordering for reads anyway.
Hmm, that's indeed a good question, i.e., whether we'd need to deal with interleaving reads also, otherwise we may end up reading data that was written later, actually?

I don't see an issue here - after the storer calls write, the data may be in place (ie returned by a call to read) and after write's future completes is will be in place. That is implicit in the API, and is in fact required by any similar-looking API - you cannot know what is happening after you start the write call, so relying on anything other than the above would obviously be race-y. The same holds for multiple calls to write to the same key.

Copy link
Contributor

@tnull tnull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look mostly good to me, some minor comments.

@joostjager joostjager force-pushed the async-persister branch 3 times, most recently from be6eaa8 to 2a00b9b Compare July 22, 2025 15:29
@joostjager joostjager requested a review from tnull July 22, 2025 15:31
@joostjager joostjager removed the request for review from tnull July 23, 2025 08:27
In preparation for the addition of an async KVStore, we here remove the
Persister pseudo-wrapper. The wrapper is thin, would need to be
duplicated for async, and KVStore isn't fully abstracted anyway anymore
because the sweeper takes it directly.
@joostjager
Copy link
Contributor Author

@joostjager joostjager requested a review from tnull July 23, 2025 08:52
Copy link
Contributor

@tnull tnull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixups look mostly good to me, two nits/comments. Feel free to squash from my side.

let (fut, res) = {
let mut state_lock = self.sweeper_state.lock().unwrap();

let (res, persist_if_dirty) = callback(&mut state_lock)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Maybe calling this skip_persist might be more intuitive? I also wonder if adding another update_state_skipping_persist method would be cleaner than having the secondary return vale on the callback that is only used in one place. But no hard blocker.

Copy link
Contributor Author

@joostjager joostjager Jul 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

skip_persist is indeed nicer, made the change.

Another method update_state_skipping_persist doesn't work, because the callback may or may not want to skip persist. Also the duplication, or extra abstraction...

@joostjager
Copy link
Contributor Author

@joostjager joostjager requested a review from tnull July 23, 2025 10:41
@TheBlueMatt TheBlueMatt merged commit ebe571a into lightningdevkit:main Jul 23, 2025
27 of 28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Async KV Store Persister
4 participants