Support DocIdSetBuilder with partition bounds #15383

prudhvigodithi · 2025-10-30T15:57:23Z

Description

Coming from #14485 and #13745 (Initial implementation of intra-segment search concurrency #13542), when splitting a segment into partitions for intra segment search, each partition would create a DocIdSetBuilder that allocates memory based on the entire segment size, even though it only collects documents within a small partition range. This PR adds partition aware support to DocIdSetBuilder which creates bitsets and buffers scoped to its doc ID range instead of the entire segment size, this change will have memory efficiency during intra segment search.

Example for a Segment with 1M documents split into 4 partitions of 250K docs each and now each partition creates a FixedBitSet(1M) which is not required.

`PartitionAwareBufferAdder`:

Filters documents to only accept those within minDocId, maxDocId range.
Stores absolute doc IDs in buffers (used for sparse results below threshold) and rejects not part of of the partition range.

`PartitionAwareFixedBitSetAdder`

Filters documents to only accept those within partition range.
Uses partition sized bitset instead of segment sized.

`OffsetBitDocIdSet` & `OffsetDocIdSetIterator`

- FixedBitSet uses the doc ID parameter directly as an array index. When we create partition sized bitsets to save memory, we store documents using relative indices (0 to partitionSize-1) internally, but the Lucene API requires iterators to return absolute doc IDs. These wrapper classes handle the conversion automatically.
So these wrapper classes adds offset during iteration (when PartitionAwareFixedBitSetAdder is used). This is to convert partition relative indices back to absolute doc IDs.
Callers should always receive absolute doc IDs.

Segment: 100,000 documents
Partition: [50,000 to 60,000) - only 10,000 docs

Without Optimization (Old Way):

Create bitset for ENTIRE segment:
FixedBitSet(100,000 bits)

Bit position:  0     1     2  ... 50000 ... 50500 ... 55000 ... 59999 ... 99999
                ↓     ↓     ↓       ↓        ↓         ↓         ↓         ↓
Bit value:      0     0     0       1        1         1         1         0

With Optimization (New Way):

Create bitset with ONLY partition size:
FixedBitSet(10,000 bits)

Bit position:  0    1    2    ... 500  ... 5000 ... 9000 ... 9999
               ↓    ↓    ↓        ↓        ↓        ↓        ↓
Bit value:     1    0    0        1        1        1        0
               └───────────────────────────────────────────────┘
                All bits used efficiently!
                
Storage mapping (with offset):
  Doc 50,000 → Bit[0]     (50,000 - 50,000 = 0)
  Doc 50,500 → Bit[500]   (50,500 - 50,000 = 500)
  Doc 55,000 → Bit[5,000] (55,000 - 50,000 = 5,000)
  Doc 59,999 → Bit[9,999] (59,999 - 50,000 = 9,999)

Signed-off-by: Prudhvi Godithi <[email protected]>

prudhvigodithi · 2025-10-31T15:34:36Z

Hey all, pending to add some tests/validations and code clean up from my end but before this I would like to get some early feedback on the approach to see if the idea would make sense.

prudhvigodithi · 2025-10-31T16:10:53Z

Adding @jainankitk @getsaurabh02 to the conversation.

Signed-off-by: Prudhvi Godithi <[email protected]>

github-actions · 2025-11-03T12:09:17Z

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR.

Signed-off-by: Prudhvi Godithi <[email protected]>

github-actions · 2025-11-03T13:05:41Z

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR.

Signed-off-by: Prudhvi Godithi <[email protected]>

github-actions · 2025-11-03T14:25:48Z

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR.

Signed-off-by: Prudhvi Godithi <[email protected]>

github-actions · 2025-11-03T14:50:12Z

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR.

github-actions · 2025-11-03T14:50:26Z

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR.

Signed-off-by: Prudhvi Godithi <[email protected]>

github-actions · 2025-11-03T15:23:37Z

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR.

Signed-off-by: Prudhvi Godithi <[email protected]>

github-actions · 2025-11-03T15:26:37Z

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR.

Signed-off-by: Prudhvi Godithi <[email protected]>

github-actions · 2025-11-03T15:49:50Z

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR.

prudhvigodithi · 2025-11-03T18:43:19Z

Ok the exists checks and tests are now green, let me add some tests in TestDocIdSetBuilder.

benwtrent · 2025-11-03T20:56:08Z

lucene/core/src/java/org/apache/lucene/util/DocIdSetBuilder.java

+  public sealed interface BulkAdder
+      permits FixedBitSetAdder,
+          BufferAdder,
+          PartitionAwareFixedBitSetAdder,
+          PartitionAwareBufferAdder {


This is now megamorphic :(

That's a good point. We should run the benchmark to quantify the impact due to virtual calls and megamorphism. Also assuming the impact is significant, I am wondering if we can use directly PartitionAwareFixedBitSetAdder instead of FixedBitSetAdder?

Yes good point, we can unify them. For non-partitioned case the minDocId = 0, maxDocId = maxDoc and offset = 0.

Let me try to implement this and run the tests part of TestDocIdSetBuilder.java .

Signed-off-by: Prudhvi Godithi <[email protected]>

github-actions · 2025-11-10T18:01:32Z

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR.

Signed-off-by: Prudhvi Godithi <[email protected]>

prudhvigodithi · 2025-11-12T23:00:34Z

I have added some decent tests in TestDocIdSetBuilder. Please let me know what could be the next steps here.

jainankitk · 2025-11-12T23:03:30Z

I have added some decent tests in TestDocIdSetBuilder. Please let me know what could be the next steps here.

Thanks @prudhvigodithi for adding the tests. Will be good to see the performance benchmark numbers and ensure there isn't any regression due to the offset logic

Signed-off-by: Prudhvi Godithi <[email protected]>

msfroh · 2025-11-13T21:54:38Z

Thinking through the logic here, the only benefit is in terms of the size of arrays allocated. We're still doing just as many allocations in total and the individual partitions will each traverse the same range of the point tree (just collecting different doc IDs, while others get excluded by the partition filter). I'm skeptical that there is measurable benefit (unless you have a lot of slices over a big segment).

I find the change to add a scorerSupplier(LeafReaderContextPartition) method much more interesting (*). I'm imagining an implementation in PointRangeQuery's anonymous Weight could create a synchronized scorerSupplier per segment, where each partition would wrap that with something that filters over their doc IDs. That way, you'd go back to only creating one FixedBitSet per segment, regardless of how many slices are there (though the other threads would block until the winning thread finishes collecting).

(*) Of course, it's also a very significant change. @javanna -- I'd be curious to get your opinion on it. I feel like it could be a way of addressing #13745 incrementally. The default behavior could be to get a ScorerSupplier for the whole segment, but query-specific implementations might be able to do less work per partition (or share work across partitions). I'm not 100% convinced that it's the best solution, but I think it may work.

jainankitk · 2025-11-14T00:35:51Z

I'm skeptical that there is measurable benefit (unless you have a lot of slices over a big segment).

As per my understanding the slice generation logic is fairly aggressive. So even in case of 4/8 slices for a segment, this change should reduce the operating memory by 4x / 8x for that segment

The suggestion to create a synchronized scorerSupplier per segment is interesting. I was initially concerned about the synchronization overhead, but that is just once per segment. Although I feel having partitioned FixedBitSet should add even more value in that case, as we can get the winner thread to populate partitioned FixedBitSet for each segment partition, and after it is done with that, those partitioned FixedBitSet can be processed concurrently by collector for each partition, without worrying about synchronization with other threads. Primary additional overhead I can think of is for the winner thread to populate each matching document into the correct FixedBitSet

msfroh · 2025-11-14T01:06:45Z

those partitioned FixedBitSet can be processed concurrently by collector for each partition, without worrying about synchronization with other threads.

Reading from a single FixedBitSet can be done by multiple threads with no synchronization. It's just reading from a long[].

jainankitk · 2025-11-14T01:27:27Z

Reading from a single FixedBitSet can be done by multiple threads with no synchronization. It's just reading from a long[].

I guess we can do that, but still not sure if it will seamlessly integrate into existing abstractions on top of that. I was initially thinking about say cost function of this iterator, but there seems to be implementation for specific docId range cardinality(int from, int to). But I am still concerned, there might be few other unknown things that might pop up.

Also from the performance perspective, even simple iteration on this long[] will be randomized due to different threads accessing different parts of the array. So, it might be efficient to partition into long[][] where each row is accessed sequentially by one thread

prudhvigodithi · 2025-11-14T17:13:52Z

Yes this is the issue to handle the duplicate work per segment #13745, the main target of this is to reduce the size allocation per partition, without this PR today each partition thread allocates full segment-sized structure, so the number of allocations is the same, but the size is vastly different.

IMO this change should be still useful when we come up with strategy to stop duplicate work per segment #13745

Before I run the full benchmarks I guess I can quickly test the final DocIdSet's ramBytesUsed ? this should show reduction with partition aware DocIdSetBuilder.

msfroh · 2025-11-14T17:27:58Z

Before I run the full benchmarks I guess I can quickly test the final DocIdSet's ramBytesUsed ? this should show reduction with partition aware DocIdSetBuilder.

This doesn't need to be demonstrated by a test. It's obvious that if you have a segment with N docs and you split it into two partitions and allocate two arrays with N/2 bits each it will use half the memory of two arrays with N bits each. Nobody is disputing the reduction in heap usage.

The question is whether the reduction in heap usage will have a measurable impact, which we can only see from benchmarks. Also, if we can reduce the number of tree traversals (i.e. only do one tree traversal per segment instead of per partition), then we would expect to see a performance benefit, since we're doing less work.

prudhvigodithi · 2025-11-14T17:33:29Z

if we can reduce the number of tree traversals (i.e. only do one tree traversal per segment instead of per partition), then we would expect to see a performance benefit, since we're doing less work.

Thanks @msfroh, true we have to do that eventually for intra segment search, my point is this change is to just have partition aware DocIdSetBuilder and can be leveraged in PointRangeQuery.

Followup similar to public DocIdSetBuilder(int maxDoc, PointValues values, int minDocId, int maxDocId). We should also have override public DocIdSetBuilder(int maxDoc, Terms terms)

prudhvigodithi · 2025-11-14T17:35:05Z

The question is whether the reduction in heap usage will have a measurable impact, which we can only see from benchmarks

Yes I'm playing with https://github.com/mikemccand/luceneutil/ (dealing with some setup issues) and will post the results.

prudhvigodithi added 3 commits October 29, 2025 08:40

Initial commit DocIdSetBuilder

f0b64cc

Signed-off-by: Prudhvi Godithi <[email protected]>

DocIdSetBuilder with partition aware

b7701f0

Signed-off-by: Prudhvi Godithi <[email protected]>

DocIdSetBuilder code refactor

f4e24af

Signed-off-by: Prudhvi Godithi <[email protected]>

github-actions bot added the module:core/search label Oct 30, 2025

prudhvigodithi marked this pull request as ready for review October 31, 2025 15:34

Fix the tests

91de456

Signed-off-by: Prudhvi Godithi <[email protected]>

github-actions bot added the module:test-framework label Nov 3, 2025

Fix the tidy tests

a0b2849

Signed-off-by: Prudhvi Godithi <[email protected]>

Fix the tests

5b013e0

Signed-off-by: Prudhvi Godithi <[email protected]>

github-actions bot added the module:sandbox label Nov 3, 2025

prudhvigodithi added 2 commits November 3, 2025 06:49

Fix the tests

ad778bb

Signed-off-by: Prudhvi Godithi <[email protected]>

Fix the tests

14f1761

Signed-off-by: Prudhvi Godithi <[email protected]>

github-actions bot added the module:join label Nov 3, 2025

Fix the tests

6c59ef7

Signed-off-by: Prudhvi Godithi <[email protected]>

Fix the tests

476d16b

Signed-off-by: Prudhvi Godithi <[email protected]>

Fix the tests

e24f9da

Signed-off-by: Prudhvi Godithi <[email protected]>

bowenlan-amzn added this to Performance Roadmap Nov 3, 2025

bowenlan-amzn moved this to In Progress in Performance Roadmap Nov 3, 2025

bowenlan-amzn removed this from Performance Roadmap Nov 3, 2025

benwtrent reviewed Nov 3, 2025

View reviewed changes

jainankitk mentioned this pull request Nov 4, 2025

Check for changelog entry is bit aggressive #15405

Closed

Test with partition defaults

785246f

Signed-off-by: Prudhvi Godithi <[email protected]>

prudhvigodithi added 2 commits November 12, 2025 14:52

Add tests related to partition aware DocIdSetBuilder

a18b2cf

Signed-off-by: Prudhvi Godithi <[email protected]>

Add tests related to partition aware DocIdSetBuilder

715d944

Signed-off-by: Prudhvi Godithi <[email protected]>

code cleanup

fc5a728

Signed-off-by: Prudhvi Godithi <[email protected]>

Support DocIdSetBuilder with partition bounds #15383

Are you sure you want to change the base?

Support DocIdSetBuilder with partition bounds #15383

Uh oh!

Conversation

prudhvigodithi commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

PartitionAwareBufferAdder:

PartitionAwareFixedBitSetAdder

OffsetBitDocIdSet & OffsetDocIdSetIterator

Uh oh!

prudhvigodithi commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

prudhvigodithi commented Oct 31, 2025

Uh oh!

github-actions bot commented Nov 3, 2025

Uh oh!

github-actions bot commented Nov 3, 2025

Uh oh!

github-actions bot commented Nov 3, 2025

Uh oh!

github-actions bot commented Nov 3, 2025

Uh oh!

github-actions bot commented Nov 3, 2025

Uh oh!

github-actions bot commented Nov 3, 2025

Uh oh!

github-actions bot commented Nov 3, 2025

Uh oh!

github-actions bot commented Nov 3, 2025

Uh oh!

prudhvigodithi commented Nov 3, 2025

Uh oh!

benwtrent Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jainankitk Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

prudhvigodithi Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

prudhvigodithi Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Nov 10, 2025

Uh oh!

prudhvigodithi commented Nov 12, 2025

Uh oh!

jainankitk commented Nov 12, 2025

Uh oh!

msfroh commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jainankitk commented Nov 14, 2025

Uh oh!

msfroh commented Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jainankitk commented Nov 14, 2025

Uh oh!

prudhvigodithi commented Nov 14, 2025

Uh oh!

msfroh commented Nov 14, 2025

Uh oh!

prudhvigodithi commented Nov 14, 2025

Uh oh!

prudhvigodithi commented Nov 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

prudhvigodithi commented Oct 30, 2025 •

edited

Loading

`PartitionAwareBufferAdder`:

`PartitionAwareFixedBitSetAdder`

`OffsetBitDocIdSet` & `OffsetDocIdSetIterator`

prudhvigodithi commented Oct 31, 2025 •

edited

Loading

benwtrent Nov 3, 2025 •

edited

Loading

msfroh commented Nov 13, 2025 •

edited

Loading

msfroh commented Nov 14, 2025 •

edited

Loading