-
Notifications
You must be signed in to change notification settings - Fork 141
Performance: Optimize sorted_set union/symmetric_difference and make bit extraction branchless #2978
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Fix inefficient size counting in SortedSet::union() by computing size during tree construction - Optimize symmetric_difference() to avoid creating intermediate sets - Make bit extraction operations branchless for better performance Co-authored-by: bobzhang <[email protected]>
Potential integer overflow in signed bit extractionCategory Symmetric difference still has O(n²) complexity due to repeated contains() callsCategory Missing helper function documentation and unclear bit manipulation logicCategory |
Remove the conditional check when blitting the second array as unsafe_blit with length 0 is safe and efficient
Pull Request Test Coverage Report for Build 1943Details
💛 - Coveralls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR addresses 6 performance TODOs with optimizations to sorted set operations, bitstring bit extraction, and array utilities. The main changes include computing set union size incrementally, replacing symmetric_difference's intermediate allocations with direct iteration, making bit extraction branchless, and removing a conditional from array split_at.
- Optimized union() to compute size during tree construction instead of post-construction traversal
- Replaced symmetric_difference() implementation to avoid creating two intermediate sets
- Made 4 bitstring extraction functions branchless for better CPU pipelining
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| sorted_set/set.mbt | Modified union() to track size during merge; rewrote symmetric_difference() to avoid intermediate allocations; added count_nodes() helper |
| bytes/bitstring.mbt | Converted unsafe_extract_bit and unsafe_extract_bit_signed to branchless implementations |
| array/bitstring.mbt | Converted unsafe_extract_bit and unsafe_extract_bit_signed to branchless implementations |
| builtin/array.mbt | Removed unnecessary conditional from split_at when blitting empty arrays |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| (Some(_), None) => (a, count_nodes(a)) | ||
| (None, Some(_)) => (b, count_nodes(b)) |
Copilot
AI
Nov 20, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The count_nodes() calls defeat the optimization's purpose. When one subtree is exhausted during union, count_nodes() still traverses the entire remaining tree in O(n) time, which is what the original TODO aimed to eliminate. Consider storing size information in each node, or accumulating sizes from split() operations to avoid this traversal.
| let ret = new() | ||
| self.each(x => if !other.contains(x) { ret.add(x) }) | ||
| other.each(x => if !self.contains(x) { ret.add(x) }) | ||
| ret |
Copilot
AI
Nov 20, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new implementation may be slower than the original for large sets. Each add() call maintains tree balance (O(log n)), so iterating through all elements has O(n log n + m log m) complexity for tree construction, plus O(n log m + m log n) for the contains() checks. The original approach using union() on pre-built difference sets leverages more efficient tree-merge operations. Consider benchmarking both approaches or using a bulk-build strategy.
| let ret = new() | |
| self.each(x => if !other.contains(x) { ret.add(x) }) | |
| other.each(x => if !self.contains(x) { ret.add(x) }) | |
| ret | |
| self.difference(other).union(other.difference(self)) |
|
The bit extraction part looks good to me. |
Addresses 6 performance TODOs across sorted sets, bitstring operations, and array utilities.
Changes
SortedSet operations (sorted_set/set.mbt)
union() - Eliminate O(n) post-construction counting
(left_size + 1 + right_size)symmetric_difference() - Remove intermediate allocations
self.difference(other).union(other.difference(self))(3 set allocations)Bit extraction (bytes/bitstring.mbt, array/bitstring.mbt)
Made 4 functions branchless for better CPU pipelining:
Array::split_at (builtin/array.mbt)
Removed unnecessary conditional -
unsafe_blitwith length 0 is safe:Split PR Structure
Changes split into 2 atomic commits for independent review:
Cherry-pick individually or merge together as needed.
Original prompt
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.