Skip to content

Conversation

k0ushal
Copy link

@k0ushal k0ushal commented May 30, 2025

  • Fixed the tier based candidate selection
  • Default tiers are powers of 4 with the first tier being 0-4M followed by 4-16M, 16-64M and so on.
  • Fixed consolidation window of size 4

Note

Replaces the tiered consolidation algorithm with a cleanup-first, skew-aware, templated selection engine and updates tests accordingly.

  • Index utils (core):
    • Introduce tier::ConsolidationConfig, SegmentAttributes, and templated ConsolidationCandidate with sliding-window, skew-based scoring.
    • Add findBestCleanupCandidate (prefers low live-doc% segments) and findBestConsolidationCandidate (size-based, skew-thresholded) helpers.
    • Wire new flow in ConsolidateTier policy: filter, early-exit on small sets, try cleanup candidates first, then consolidation; copy candidates via iterator range.
    • Move/factor helpers (FillFactor, SizeWithoutRemovals) and define tier::SegmentStats in header; add getSegmentDimensions accessor.
    • Note TODO on "too large segments" threshold formula.
  • API/Test adjustments:
    • Extend AssertCandidates to accept error message.
    • Add and rewrite tests for cleanup vs consolidation preference, singleton/threshold behavior, skew handling (including over-threshold no-merge), window pop/push, and combined live-percentage cases.
  • Misc:
    • Add default and test constructors to SegmentInfo for convenience/testing.

Written by Cursor Bugbot for commit ae1f202. This will update automatically on new commits. Configure here.

@k0ushal k0ushal self-assigned this May 30, 2025
@k0ushal
Copy link
Author

k0ushal commented May 30, 2025

Documentation:
https://github.com/arangodb/documents/pull/145

@k0ushal k0ushal requested a review from alexbakharew May 30, 2025 08:09
@k0ushal k0ushal marked this pull request as draft July 9, 2025 08:18
@k0ushal k0ushal force-pushed the bugfix/consolidation-issues branch from 57714c9 to c1e6ebb Compare July 11, 2025 12:34
@k0ushal k0ushal changed the base branch from master to bugfix/iresearch-address-table-tests July 14, 2025 09:03
@k0ushal k0ushal changed the base branch from bugfix/iresearch-address-table-tests to master July 14, 2025 09:04
@k0ushal k0ushal changed the base branch from master to bugfix/iresearch-address-table-tests July 14, 2025 09:05
Copy link
Member

@goedderz goedderz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments as we talked about. Looks good to me!

Comment on lines 51 to 57
mergeBytes += itrMeta->byte_size;
skew = static_cast<double>(itrMeta->byte_size) / mergeBytes;
delCount += (itrMeta->docs_count - itrMeta->live_docs_count);
mergeScore = skew + (1.0 / (1 + delCount));
cost = mergeBytes * mergeScore;

size_t size_before_consolidation = 0;
size_t size_after_consolidation = 0;
size_t size_after_consolidation_floored = 0;
for (auto& segment_stat : consolidation) {
size_before_consolidation += segment_stat.meta->byte_size;
size_after_consolidation += segment_stat.size;
size_after_consolidation_floored +=
std::max(segment_stat.size, floor_segment_bytes);
} while (itr++ != end);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably inconsequential, but it would suffice to calculate skew, mergeScore and cost once after the loop for the last element.

Comment on lines 90 to 92
size_t nextTier = ConsolidationConfig::tier1;
while (nextTier < num)
nextTier = nextTier << 2;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: You could probably use std::countl_zero and get rid of the loop.

mergeBytes = mergeBytes - removeMeta->byte_size + addMeta->byte_size;
skew = static_cast<double>(addMeta->byte_size) / mergeBytes;
delCount = delCount - getDelCount(removeMeta) + getDelCount(addMeta);
mergeScore = skew + (1 / (1 + delCount));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As already discussed:

We should think about whether calculating the mergeScore this way is sensible. What seems strange is that while the skew is a ratio (of byte-sizes), the second summand is an inverse count. This seems off: intuitively I'd expect e.g. a ratio of live and total documents to be considered alongside the skew.

Copy link
Member

@goedderz goedderz Jul 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually quite bad the way it is, worse than we noticed yesterday @k0ushal.

Note that $\mathrm{skew} \in (0, 1)$. With $\mathrm{delCount} = 1$, we get

$$\begin{align*} \mathrm{mergeScore} &= \mathrm{skew} + \frac{1}{1 + \mathrm{delCount}} \\\ &= \mathrm{skew} + \frac 1 2 \\\ &\leq 1 \frac 1 2 \\\ &= \mathrm{maxMergeScore} \end{align*}$$

.

So this way we are always allowed to consolidate if only one document has been deleted, no matter the size of the files or number of documents therein.

Let us at least do

    mergeScore = skew + live_docs_count / total_docs_count;

instead, as discussed - this has more reasonable properties.

And as a second observation @neunhoef made today while discussing this: Adding these two values is probably not right, either. They should be multiplied instead; the maxMergeScore will need to be adjusted to 0.5 to get a similar effect.

So we should actually do

    mergeScore = skew * live_docs_count / total_docs_count;

(and adapt maxMergeScore).

To understand this better, we should still do some formal worst-case analysis and some tests (specifically unit tests of the consolidation algorithm that play out certain usage scenarios).

Comment on lines 162 to 241
for (auto idx = start; idx != sorted_segments.end();) {

if (getSize(*idx) <= currentTier) {
idx++;
continue;
}

tiers.emplace_back(start, idx - 1);

// The next tier may not necessarily be in the
// next power of 4.
// Consider this example,
// [2, 4, 6, 8, 900]
// While the 2, 4 fall in the 0-4 tier and 6, 8 fall
// in the 4-16 tier, the last segment falls in
// the [256-1024] tier.

currentTier = getConsolidationTier(getSize(*idx));
start = idx++;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed: finding the tier-boundaries could be done by binary search, possibly utilizing std::lower_bound / std::upper_bound.

mergeBytes = mergeBytes - removeMeta->byte_size + addMeta->byte_size;
skew = static_cast<double>(addMeta->byte_size) / mergeBytes;
delCount = delCount - getDelCount(removeMeta) + getDelCount(addMeta);
mergeScore = skew + (1 / (1 + delCount));
Copy link
Member

@goedderz goedderz Jul 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually quite bad the way it is, worse than we noticed yesterday @k0ushal.

Note that $\mathrm{skew} \in (0, 1)$. With $\mathrm{delCount} = 1$, we get

$$\begin{align*} \mathrm{mergeScore} &= \mathrm{skew} + \frac{1}{1 + \mathrm{delCount}} \\\ &= \mathrm{skew} + \frac 1 2 \\\ &\leq 1 \frac 1 2 \\\ &= \mathrm{maxMergeScore} \end{align*}$$

.

So this way we are always allowed to consolidate if only one document has been deleted, no matter the size of the files or number of documents therein.

Let us at least do

    mergeScore = skew + live_docs_count / total_docs_count;

instead, as discussed - this has more reasonable properties.

And as a second observation @neunhoef made today while discussing this: Adding these two values is probably not right, either. They should be multiplied instead; the maxMergeScore will need to be adjusted to 0.5 to get a similar effect.

So we should actually do

    mergeScore = skew * live_docs_count / total_docs_count;

(and adapt maxMergeScore).

To understand this better, we should still do some formal worst-case analysis and some tests (specifically unit tests of the consolidation algorithm that play out certain usage scenarios).

@k0ushal k0ushal force-pushed the bugfix/iresearch-address-table-tests branch from 07286d8 to 872d553 Compare July 16, 2025 19:35
@k0ushal k0ushal force-pushed the bugfix/consolidation-issues branch 2 times, most recently from fb73fcd to f6305e3 Compare July 17, 2025 07:34
@k0ushal k0ushal deleted the branch master July 18, 2025 07:56
@k0ushal k0ushal closed this Jul 18, 2025
@goedderz goedderz reopened this Jul 23, 2025
@goedderz goedderz changed the base branch from bugfix/iresearch-address-table-tests to master July 23, 2025 13:02
@k0ushal k0ushal force-pushed the bugfix/consolidation-issues branch from f6305e3 to 21a2f95 Compare July 23, 2025 13:05
k0ushal added 2 commits July 24, 2025 15:42
- Fixed the tier based candidate selection
- Default tiers are powers of 4 with the first tier
being 0-4M followed by 4-16M, 16-64M and so on.
- Fixed consolidation window of size 4
@k0ushal k0ushal force-pushed the bugfix/consolidation-issues branch from 21a2f95 to d91b909 Compare July 24, 2025 15:43
@k0ushal k0ushal requested a review from goedderz August 25, 2025 07:49
@k0ushal k0ushal force-pushed the bugfix/consolidation-issues branch from 79070ae to 5165f01 Compare August 26, 2025 07:41
@k0ushal k0ushal force-pushed the bugfix/consolidation-issues branch from 5165f01 to 9cfc1fc Compare August 26, 2025 07:56
@k0ushal k0ushal marked this pull request as ready for review August 26, 2025 11:36
uint64_t& docs_count,
uint64_t& live_docs_count) {

auto itrMeta = itr->meta;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I have a slight preference towards

Suggested change
auto itrMeta = itr->meta;
auto* itrMeta = itr->meta;

, but feel free to keep it as is if you prefer it that way.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

auto * definitely is more appropriate since we are expecting itr->meta always to be a pointer.
Changed it.

Comment on lines 96 to 100
void getSegmentDimensions(
std::vector<tier::SegmentStats>::const_iterator itr,
uint64_t& byte_size,
uint64_t& docs_count,
uint64_t& live_docs_count);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, why did you choose return-parameters instead of a product type (tuple or struct)? Due to existing code style?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed this to struct.

Comment on lines 142 to 145
const auto removeSegment = first();
const auto lastSegment = last();

std::advance(segments.first, 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this method get a check or assertion that segments is a non-empty range?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ConsolidationCandidate only gets the first and last pointers. Previously it didn't play so much of a role in deciding the best candidate. It only represented a candidate and all the decision making was done outside of this class.
That is why it let it be the caller's responsibility to ensure that the std::advance operation won't create an assertion.
I've added a note to the function header.

Comment on lines 168 to 170
const auto addSegment = segments.second + 1;

std::advance(segments.second, 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this method get an assertion, checking we have enough space?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left it to be the caller's responsibility to check that before calling push_back() or pop_front().
The reason being that ConsolidationCandidate was designed to only receive the first and last segment iterators by the predecessors. It doesn't get the full sorted_segments vector.
I'll add some documentation to the function.

Comment on lines 197 to 198
template<typename Segment>
bool findBestCleanupCandidate(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only ever used with Segment = tier::SegmentStats if I'm not mistaken; does it need to be a template?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH, I'm conflicted myself about this. I templatized the function to make writing tests easier. For instance, findBestCleanupCandidate() is only concerned with the docs_count and live_docs_count attributes of the segment. We shouldn't have to initialize and pass the entire SegmentStats struct which comprises a nested hierarchy of structs. So I templatized this function and added an accessor method argument to make using this function easier and to achieve decoupling.

But on the other hand, there is an AddSegment() method in the tests that does the setting up of the complex SegmentStats structure.
Perhaps you can make this decision for me. I can see tradeoffs on both sides.

Comment on lines 308 to 309
template<typename Segment>
bool findBestConsolidationCandidate1(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a leftover that should be deleted?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it was. Sorry about that.
Removed it.

Comment on lines 268 to 269
// sort segments in increasing order of the segment byte size
std::sort(sorted_segments.begin(), sorted_segments.end());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a final (nor a strong) opinion on this one; but now that we're using different segment orders in different functions, should we still keep the size-order as the default one via operator<, or should we rather pass an explicit comparison function here as well and remove < from SegmentStats? WDYT? I'm also fine with just leaving it as it is regardless, it's not a real issue either way.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. It made sense to have the operator < in the past. I've removed it now from SegmentStats.

Comment on lines 288 to 291
continue;
}

if (candidate.mergeScore > prev_score ||
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit-pick, for consistency:

Suggested change
continue;
}
if (candidate.mergeScore > prev_score ||
} else if (candidate.mergeScore > prev_score ||


while ((candidate.first() + 1) < sorted_segments.end()) {

if (!best.initialized || (best.mergeScore > candidate.mergeScore && candidate.mergeBytes <= max_segments_bytes))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

best will possibly be initialized with an invalid candidate (that violates the size limit). This will later prevent valid candidates from being selected if they have a worse score.

So I suggest

Suggested change
if (!best.initialized || (best.mergeScore > candidate.mergeScore && candidate.mergeBytes <= max_segments_bytes))
if (candidate.mergeBytes <= max_segments_bytes && (!best.initialized || best.mergeScore > candidate.mergeScore))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I think the candidates checked here can also be below the min window size, though I haven't checked whether this can cause a problem or not.

Comment on lines 281 to 302
while ((candidate.first() + 1) < sorted_segments.end()) {

if (!best.initialized || (best.mergeScore > candidate.mergeScore && candidate.mergeBytes <= max_segments_bytes))
best = candidate;

if (std::distance(candidate.first(), candidate.last()) < (minWindowSize - 1)) {
candidate.push_back();
continue;
}

if (candidate.mergeScore > prev_score ||
candidate.mergeBytes > max_segments_bytes ||
candidate.last() == (sorted_segments.end() - 1)) {
prev_score = candidate.mergeScore;
candidate.pop_front();
}
else if (candidate.mergeScore <= prev_score && candidate.last() < (sorted_segments.end() - 1) &&
candidate.mergeBytes <= max_segments_bytes) {
prev_score = candidate.mergeScore;
candidate.push_back();
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite understand the implementation (which may just be me). I've tried to consolidate it with my own picture of the same algorithm, which goes very roughly like this:

auto left = sorted_segments.begin();
auto best = nullopt;

for(auto right = sorted_segments.begin() + 1; right < sorted_segments.end(); ++right) {
  // shrink candidate set from the left until the size limit is undercut,
  // or until there are only two segments left
  while(estimatedSize(left, right) > maxSize && left + 1 < right) {
    ++left;
  }
  if (estimatedSize(left, right) > maxSize) {
    assert(left + 1 == right);
    // no more valid candidates possible due to size
    break;
  }
  if (!best || skew(best) > skew(left, right)) {
    best = (left, right);
  }
}

I'm uncertain what you're using prev_score for, and relatedly haven't quite grasped when the front or back of the candidate are moved.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed and to document it, the condition for the best candidate selection in the above algorithm is incorrect. It rather needs to be

if (skew(left, right) <= skew_threshhold && (!best || estimatedSize(best) < estimatedSize(left, right))) {
  best = (left, right);
}

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is being reviewed by Cursor Bugbot

Details

Your team is on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle for each member of your team.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.


struct ConsolidationConfig {
static constexpr size_t candidate_size { 2 }; // candidate selection window size: 4
static constexpr double maxMergeScore { 0.4 }; // max score allowed for candidates consolidation.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Candidate Size Mismatch Causes Suboptimal Consolidation

The candidate_size constant is set to 2, but its comment and the PR description indicate an intended value of 4. This means the consolidation algorithm uses a minimum candidate selection window of 2 instead of the intended 4, which may lead to suboptimal consolidation decisions.

Fix in Cursor Fix in Web

uint64_t minWindowSize { tier::ConsolidationConfig::candidate_size };
auto front = segments.begin();
auto rear = front + minWindowSize - 1;
tier::ConsolidationCandidate<Segment> candidate(front, rear, getSegmentAttributes);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Template Function Fails Vector Size Validation

The findBestConsolidationCandidate template function doesn't validate that the segments vector has at least minWindowSize (2) elements. If the vector is smaller, an invalid iterator is passed to the ConsolidationCandidate constructor, causing undefined behavior when its loop dereferences it. This affects direct calls, such as in tests, that may lack external size checks.

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants