-
Notifications
You must be signed in to change notification settings - Fork 635
perf: Improve performance of snapshot using a reverse lookup from block -> external hash #3370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: Improve performance of snapshot using a reverse lookup from block -> external hash #3370
Conversation
👋 Hi blarson-b10! Thank you for contributing to ai-dynamo/dynamo. Just a reminder: The 🚀 |
WalkthroughIntroduces a reverse-lookup table in RadixTree for block-to-external-hash mapping, updates construction, event application, cleanup, and snapshot dumping to use it. Adjusts subscriber snapshot initialization logic and adds timing/logs to purge-then-snapshot. Adds a download-size debug log in NATS transport after copying object data, before deserialization. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Sub as Subscriber
participant RT as RadixTree (Indexer)
note over Sub: Startup / Operation
Sub->>Sub: Check router_snapshot_threshold
alt threshold is Some
Sub->>Sub: purge_then_snapshot() start_time=now
Sub->>RT: dump_tree_as_events()
RT->>RT: Use reverse_block_hash_lookup for external hashes
RT-->>Sub: Events snapshot
Sub->>Sub: Log success with elapsed ms
else threshold is None
Sub->>Sub: Log radix state init skipped
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
Pre-merge checks❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
@blarson-b10 Maybe an easier and safer way to do this is to directly make the field |
done. |
@blarson-b10 Looks like the DCO check still needs to be resolved — you may need to sign your commits in this PR. Hopefully it’s a painless fix! |
Signed-off-by: Brian Larson <[email protected]>
2c80846
to
16d6cf6
Compare
hopefully fixed now |
/ok to test 16d6cf6 |
Signed-off-by: PeaBrane <[email protected]>
/ok to test 7b4bd02 |
…ck -> external hash (#3370) Signed-off-by: Brian Larson <[email protected]> Signed-off-by: PeaBrane <[email protected]> Co-authored-by: PeaBrane <[email protected]> Signed-off-by: Piotr Tarasiewicz <[email protected]>
Overview:
In order to snapshot the radix tree, we must convert the nodes back into RouterEvent structs which are the same structs that we process to build the tree.
To do this we must find the external hash used in the lookup table for the block; but this is done using a linear scan over all blocks for the worker. The result is that the algorithm is worst-case quadratic in the number of blocks.
To fix, I have changed how workers are stored in the radix block; the type is now a map from worker id to external sequence hash.
When testing with 2 workers on real traffic loads, snapshot times dropped from 16s -> 600ms.
Details:
Adds new fields, tests, and some additional logging related to snapshotting.
Where should the reviewer start?
indexer.rs has most of the changes.
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Summary by CodeRabbit
Performance
Observability
Refactor