Optimize simple time ranged search queries #5759

tontinton · 2025-04-19T17:56:28Z

When the search request contains a time range, we aborted the optimization of converting unneeded split searches into count queries.

Removed the TODO that was in the code for this.

Split the PR into 2: #5758.

rdettai

Thanks for your PR. The way the logic is split here between optimize_split_order and optimize makes it very hard to proof read. Currently it's already a bit confusing because the optimize() logic for each variant depends on the sort order which is different for each variant. It would be good if we could avoid tightening the coupling between the two match statements even further by making the sort orders more complex.

rdettai · 2025-04-22T09:40:07Z

quickwit/quickwit-search/src/leaf.rs

        let min_required_splits = splits
            .iter()
+            // splits are sorted by whether they are contained in the request time range
+            .filter(|split| Self::is_contained(split, &request))
            .map(|split| split.num_docs)
            // computing the partial sum
            .scan(0u64, |partial_sum: &mut u64, num_docs_in_split: u64| {


Can you explain why this works? Say you have 5 splits:

1 is contained in the request

4 are only overlapping the request

In that case min_required_splits would be 1 here, but there might be a biggest_end_timestamp or smallest_start_timestamp in any of the other splits.

My thoughts were that there's a

.take_while(|partial_sum| *partial_sum < num_requested_docs)

so min_required_splits would not get to be 1, but now I see that I should validate that we actually reached this condition, and only if we reached it to set min_required_splits and optimize, otherwise return early.

I pushed a quick fix, I need to test it, but I think it answers the problem.

I didn't know of std::ops::ControlFlow 😄. Nevertheless, this reaches the limit when iterators stop being practical. A for loop is much more readable here.

tontinton · 2025-04-22T20:32:33Z

Thanks for your PR. The way the logic is split here between optimize_split_order and optimize makes it very hard to proof read. Currently it's already a bit confusing because the optimize() logic for each variant depends on the sort order which is different for each variant. It would be good if we could avoid tightening the coupling between the two match statements even further by making the sort orders more complex.

It is a bit confusing, I'll try to think of a better way to do this when I have some more time.

When the search request contains a time range, we aborted the optimization of converting unneeded split searches into count queries.

rdettai · 2025-04-23T10:13:22Z

I would try refactoring this into:

fn optimize()
  match self {
    CandSplitDoBetter::SplitIdHigher(_) => optimize_split_id_higher()
    CandSplitDoBetter::SplitTimestampLower(_) => optimize_split_timesamp_lower()
    ...
    CanSplitDoBetter::Uninformative => {}
  }

where optimize_split_xxx:

sorts
returns early if !is_simple_all_query
applies its variant specific optimization

This would regroup the optimizations logics with the sorts they depend on to be correct. (To help the review, ideally, re-implement the current logic in 1 commit, and in a separate commit add you logic extension.)

Thanks!

tontinton mentioned this pull request Apr 19, 2025

Get count from split metadata on simple time range query #5758

Open

tontinton changed the title ~~Optimize time ranged search queries~~ Optimize simple time ranged search queries Apr 19, 2025

tontinton force-pushed the optimize-timestamp-range-simple-search branch 2 times, most recently from 804be29 to 61c3b74 Compare April 19, 2025 19:51

rdettai reviewed Apr 22, 2025

View reviewed changes

tontinton force-pushed the optimize-timestamp-range-simple-search branch from 61c3b74 to 851c15c Compare April 22, 2025 20:31

tontinton force-pushed the optimize-timestamp-range-simple-search branch from 851c15c to 103b8d9 Compare April 22, 2025 20:44

Optimize time ranged search queries

69e85ff

When the search request contains a time range, we aborted the optimization of converting unneeded split searches into count queries.

tontinton force-pushed the optimize-timestamp-range-simple-search branch from 103b8d9 to 69e85ff Compare April 22, 2025 20:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize simple time ranged search queries #5759

Optimize simple time ranged search queries #5759

tontinton commented Apr 19, 2025 •

edited

Loading

rdettai left a comment

rdettai Apr 22, 2025 •

edited

Loading

tontinton Apr 22, 2025

tontinton Apr 22, 2025

rdettai Apr 23, 2025

tontinton commented Apr 22, 2025

rdettai commented Apr 23, 2025 •

edited

Loading

Optimize simple time ranged search queries #5759

Are you sure you want to change the base?

Optimize simple time ranged search queries #5759

Conversation

tontinton commented Apr 19, 2025 • edited Loading

rdettai left a comment

Choose a reason for hiding this comment

rdettai Apr 22, 2025 • edited Loading

Choose a reason for hiding this comment

tontinton Apr 22, 2025

Choose a reason for hiding this comment

tontinton Apr 22, 2025

Choose a reason for hiding this comment

rdettai Apr 23, 2025

Choose a reason for hiding this comment

tontinton commented Apr 22, 2025

rdettai commented Apr 23, 2025 • edited Loading

tontinton commented Apr 19, 2025 •

edited

Loading

rdettai Apr 22, 2025 •

edited

Loading

rdettai commented Apr 23, 2025 •

edited

Loading