Skip to content

ln: reduce size_of Event from 1680 B -> 576 B #3723

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

phlip9
Copy link
Contributor

@phlip9 phlip9 commented Apr 9, 2025

It looks like the Event enum has gotten pretty large @ 1680 B, which forces even small variants like Event::OnionMessagePeerConnected { peer_node_id: PublicKey } to waste a bunch of memory. It also blows up the size of our handler Future(s), since we move Events into them.

Fortunately we can clean up some of the low-hanging fruit pretty easily. Here's two diffs that reduce size_of::<Event>() from 1680 B -> 576 B.

  1. Box InvoiceContents in Bolt12Invoice and StaticInvoice. This shrinks Event from 1680 B -> 1072 B
  • InvoiceContents is private, so this shouldn't be a breaking change
  1. Box AnchorDescriptor in BumpTransactionEvent. This shrinks Event from 1072 B -> 576 B.
  • AnchorDescriptor is technically public and boxing it is a semver breaking change, but the struct's pretty deep in there... Guess I'll leave that to your discretion.

We could go even further to 320 B by boxing PaymentPurpose, but that feels like a much more invasive / semver breaking change.

phlip9 added 2 commits April 8, 2025 20:52
Reduces `mem::size_of::<Event>()` from 1680 B -> 1072 B
Reduces `mem::size_of::<Event>()` from 1072 B -> 576 B
@ldk-reviews-bot
Copy link

ldk-reviews-bot commented Apr 9, 2025

I've assigned @wpaulino as a reviewer!
I'll wait for their review and will help manage the review process.
Once they submit their review, I'll check if a second reviewer would be helpful.

@ldk-reviews-bot ldk-reviews-bot requested a review from wpaulino April 9, 2025 04:18
Copy link

codecov bot commented Apr 9, 2025

Codecov Report

Attention: Patch coverage is 95.83333% with 1 line in your changes missing coverage. Please review.

Project coverage is 89.09%. Comparing base (7b45811) to head (ddf2296).

Files with missing lines Patch % Lines
lightning/src/offers/invoice.rs 93.75% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3723      +/-   ##
==========================================
- Coverage   89.10%   89.09%   -0.02%     
==========================================
  Files         156      156              
  Lines      123431   123431              
  Branches   123431   123431              
==========================================
- Hits       109985   109967      -18     
- Misses      10760    10774      +14     
- Partials     2686     2690       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@tnull tnull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, I'm not sure if its preferable to trade the size against an increased risk of heap fragmentation? Did you consider that, and how did you decide a smaller Event type was worth it?

Copy link
Contributor

@vincenzopalazzo vincenzopalazzo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m not sure I understand the reason for this change. Did you encounter any issues, or is it just for size reduction?

Copy link
Contributor

@wpaulino wpaulino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see how the large size can be a constraint for environments with smaller stack sizes/limited memory. Events are tracked in LDK within a Vec, so the memory of each one already lives on the heap. The issue comes from moving each event into the event handler, so the 1680B will be put on the stack every time regardless of the event variant size. Perhaps we should modify the event handler to take events by reference instead?

@tnull
Copy link
Contributor

tnull commented Apr 9, 2025

I can see how the large size can be a constraint for environments with smaller stack sizes/limited memory.

Sure, not denying it can become an issue in certain circumstances.

Events are tracked in LDK within a Vec, so the memory of each one already lives on the heap.

The issue is not that they live on the heap, the issue of heap fragmentation is that you have a bazillion tiny allocations that make huge parts of memory unusable after some time, as it becomes harder and harder to find 'fitting gaps' for larger allocations.

The issue comes from moving each event into the event handler, so the 1680B will be put on the stack every time regardless of the event variant size. Perhaps we should modify the event handler to take events by reference instead?

I guess although would be a pity to give up on move semantics? FWIW, it might be worth exploring if the event queue could be an Arc<Mutex<Vec<Event>>> to avoid the reallocations of a Vec per invocation?

@wpaulino
Copy link
Contributor

wpaulino commented Apr 9, 2025

The issue is not that they live on the heap, the issue of heap fragmentation is that you have a bazillion tiny allocations that make huge parts of memory unusable after some time, as it becomes harder and harder to find 'fitting gaps' for larger allocations.

Yeah I definitely don't think we should Box here, was just noting that the events already live on the heap before handling them.

I guess although would be a pity to give up on move semantics?

Not sure how else we can address this otherwise. FWIW, we don't have move semantics on wire messages even though we could, possibly for the same reason as we'd want to avoid them here.

FWIW, it might be worth exploring if the event queue could be an Arc<Mutex<Vec>> to avoid the reallocations of a Vec per invocation?

That extra allocation is not ideal, but it would also go away if we gave the EventHandler references.

@phlip9
Copy link
Contributor Author

phlip9 commented Apr 9, 2025

To be honest, I'm not sure if its preferable to trade the size against an increased risk of heap fragmentation? Did you consider that, and how did you decide a smaller Event type was worth it?

Right, there is slight heap fragmentation in 1.5 / 25 variants in exchange for reduced heap+stack usage, cache misses, etc for all variants.

I'm not super familiar with the bench setup in LDK, but I tried running channelmanager::bench_sends before and after this PR on an M1 mac:

dev/ldk/bench$ RUSTFLAGS="--cfg=ldk_bench" c bench
   Compiling lightning v0.2.0+git (dev/ldk/lightning)
   Compiling lightning-rapid-gossip-sync v0.2.0+git (dev/ldk/lightning-rapid-gossip-sync)
   Compiling lightning-persister v0.2.0+git (dev/ldk/lightning-persister)
   Compiling lightning-bench v0.0.1 (dev/ldk/bench)
    Finished `bench` profile [optimized + debuginfo] target(s) in 3m 18s
     Running benches/bench.rs (target/release/deps/bench-126259c02b665b62)
bench_sends             time:   [4.3944 ms 4.4082 ms 4.4229 ms]
                        change: [-1.5040% -0.7683% -0.1187%] (p = 0.03 < 0.05)
                        Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
  5 (5.00%) high mild

Within the noise threshold, but not nothing. An OnionMessenger process_events bench would probably make this more apparent.

Ultimately the most performant approach IMO would be to just hand a batch of events to the consumer instead of the current "event-by-event" handling. That would complicate the error handling, but would let us persist batches of events that we need to handle asynchronously.

@phlip9
Copy link
Contributor Author

phlip9 commented Apr 9, 2025

Sorry, to add on to this, we currently have some layers of async handlers for each Event that pass by value. Like:

async fn event_handler.get_ldk_handler_future(event) // 14624 B (!)
async fn event_handler.handle_inline(event) // 12928 B
async fn event_handler.handle_event(id, event) // 11136 B
async fn event_handler.persist_and_spawn_handler(event) // 1872 B
async fn event::handle_payment_claimable(id, {..}) // ...
...

And each layer gets blown up by + size_of Event. So reducing size_of Event would have a multiplicative effect in reducing the size of our futures.

@tnull
Copy link
Contributor

tnull commented Apr 10, 2025

I'm not super familiar with the bench setup in LDK, but I tried running channelmanager::bench_sends before and after this PR on an M1 mac:

Did you adjust the sample size to something significant? Otherwise executing a single bench run is far from represenative either way.

Ultimately the most performant approach IMO would be to just hand a batch of events to the consumer instead of the current "event-by-event" handling. That would complicate the error handling, but would let us persist batches of events that we need to handle asynchronously.

Hmm, well, I think we started discussing this in https://github.com/orgs/lightningdevkit/discussions/2381 / #2491, where we established why it's not trivial to "just enable" concurrent event handling for certain variants at least. But I agree that would be the proper/longer term fix for the issue at hand.

Sorry, to add on to this, we currently have some layers of async handlers for each Event that pass by value.

Without more context I'm not quite understanding what these layers do or why you chose to go that way. But it seems the issue you are trying to solve by this PR is partially self-inflicted by your architecture, IIUC? Just out of curiousity, would you like the alternative solution proposed above, i.e., giving Events by reference rather than by value?

@TheBlueMatt
Copy link
Collaborator

Within the noise threshold, but not nothing.

FWIW I'm incredibly skeptical that changing a few boxes would result in a change that is more than 100 nanoseconds across the entire send pipeline, which is definitely unmeasureable in the bench_sends.

As for whether to reduce the size of Events at all, indeed, there's a tradeoff between heap fragmentation and object size. Generally in LDK we try to be cautious about heap allocations as much as possible, and in the case of Event, it shows 😅.

Box AnchorDescriptor in BumpTransactionEvent. This shrinks Event from 1072 B -> 576 B.

If it were just this, I'd say go for it! This event is relatively rare (as it only happens on a timer when we have stuff pending on chain), so the impact is capped.

Box InvoiceContents in Bolt12Invoice and StaticInvoice. This shrinks Event from 1680 B -> 1072 B

But this case is a bit trickier. #3730 may reduce allocations in InvoiceContents by a few (as features are actually added for the offers/invoice-request/invoice/blinded path contexts), but really there's already a lot of small allocations in BOLT 12's, which we kinda need to cut down on.

One thing we could do here is take #3730 a few steps further and have similar pre-allocated variable-length things for some of the stuff in BOLT 12 structs - eg. pre-allocate the blinded paths, pre-allocate the issuer, description, and payer note, and pre-allocate in HumanReadableName. Of course in most of those cases we wouldn't be able to do the pre-allocation without any additional memory usage (unlike #3730), but allocating a bit bigger is a reasonable tradeoff if we're then going to actually store the contents on the heap (a string on the heap is going to have at least 6 pointers overhead anyway - 3 pointers for malloc on the heap and 3 pointers for the String itself, plus you can generally multiply by two for fragmentation costs, so pre-allocating 32 or even 64 bytes isn't really crazy).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants