Skip to content

Simplify predicates in PushDownFilter optimizer rule #16362

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

xudong963
Copy link
Member

@xudong963 xudong963 commented Jun 11, 2025

Which issue does this PR close?

  • Closes #.

Rationale for this change

This PR adds predicate simplification capabilities to the PushDownFilter optimizer rule. Currently, when filters contain redundant predicates (like x > 5 AND x > 6), these are not simplified, leading to unnecessary work during query execution.

The goal is to optimize filter predicates by removing redundant conditions and keeping only the most restrictive ones, which can improve query performance.

What changes are included in this PR?

  • New predicate simplification module (simplify_predicates.rs):

    • Groups predicates by column to identify potential simplifications
    • Handles comparison operators (>, >=, <, <=, =) on single columns
    • Keeps the most restrictive predicate for each column when multiple predicates exist
    • Handles equality conflicts (e.g., x = 5 AND x = 6 becomes false)
    • Supports predicates with literals on either side of the comparison
  • Enhanced push_down_filter.rs:

    • Integrates predicate simplification into the filter pushdown logic
    • Calls simplify_predicates before applying other optimizations
    • Includes error handling for edge cases

Are these changes tested?

20+ test cases in simplify_predicates.slt

Are there any user-facing changes?

@xudong963 xudong963 marked this pull request as draft June 11, 2025 02:02
@github-actions github-actions bot added logical-expr Logical plan and expressions optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt) labels Jun 11, 2025
@xudong963 xudong963 force-pushed the simplify_predicates branch from f12788d to fee2358 Compare June 11, 2025 02:51
@xudong963 xudong963 added the enhancement New feature or request label Jun 11, 2025
let new_predicates = simplify_predicates(predicate)?;
if old_predicate_len != new_predicates.len() {
let Some(new_predicate) = conjunction(new_predicates) else {
return plan_err!("at least one expression exists");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can new_predicates be empty?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, as long as the original filter's predicate is not empty, simplify_predicates will keep at least one predicate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The caller of that function doesn't need to hard-code this assumption.
What if simplify_predicates realizes that the only predicate is trivially true and can be removed (for example x = x and we know x is not null).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, currently simplify_predicates doesn't have the ability, but it should have, make sense, I'll update it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in 883f562

Comment on lines +782 to +785
let predicate = split_conjunction_owned(filter.predicate.clone());
let old_predicate_len = predicate.len();
let new_predicates = simplify_predicates(predicate)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new_predicates operates on col comparison_operator literal
sometimes these are tied in conjuncts: a > -5 AND a < 5 or disjuncts: a < -5 OR a > 5
would it make sense to be able to process per-column predicates in both forms?

To do that we could have a function Expr -> captured per-column predicates and then be able to combine such functions on AND and on OR
This might be related https://github.com/trinodb/trino/blob/232916b75d415a5eb643cf922492eb8513d99aae/core/trino-main/src/main/java/io/trino/sql/planner/DomainTranslator.java#L365

Copy link
Member Author

@xudong963 xudong963 Jun 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a > -5 AND a < 5 will be split to [a > -5, a < 5], it seems that the two split predicates can't be simplified, that is, we shoud keep the two.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but if I can have a < -5 OR a > 5 then maybe I can have a < -5 OR a > 5 OR a < -10 OR a > 10. This is simplifiable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, good point, we can do this as the following PR

@xudong963 xudong963 marked this pull request as ready for review June 20, 2025 02:55
@xudong963 xudong963 force-pushed the simplify_predicates branch from fee2358 to 883f562 Compare June 20, 2025 02:56
@xudong963 xudong963 requested a review from findepi June 20, 2025 02:57
@xudong963 xudong963 force-pushed the simplify_predicates branch from 883f562 to 92d9658 Compare June 20, 2025 05:18
@alamb alamb changed the title Simplify predicates in filter Simplify predicates in PushDownFilter optimizer rule Jun 21, 2025
@alamb
Copy link
Contributor

alamb commented Jun 21, 2025

🤖 ./gh_compare_branch_bench.sh Benchmark Script Running
Linux aal-dev 6.11.0-1015-gcp #15~24.04.1-Ubuntu SMP Thu Apr 24 20:41:05 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing simplify_predicates (92d9658) to accd225 diff
BENCH_NAME=sql_planner
BENCH_COMMAND=cargo bench --bench sql_planner
BENCH_FILTER=
BENCH_BRANCH_NAME=simplify_predicates
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented Jun 21, 2025

🤖: Benchmark completed

Details

group                                         main                                   simplify_predicates
-----                                         ----                                   -------------------
logical_aggregate_with_join                   1.00    618.2±2.68µs        ? ?/sec    1.00    619.8±7.85µs        ? ?/sec
logical_select_all_from_1000                  1.00     11.3±0.02ms        ? ?/sec    1.01     11.3±0.03ms        ? ?/sec
logical_select_one_from_700                   1.00    405.9±4.62µs        ? ?/sec    1.00    404.5±1.34µs        ? ?/sec
logical_trivial_join_high_numbered_columns    1.00    364.7±1.15µs        ? ?/sec    1.00    364.4±1.86µs        ? ?/sec
logical_trivial_join_low_numbered_columns     1.01    352.0±2.36µs        ? ?/sec    1.00    350.2±1.06µs        ? ?/sec
physical_intersection                         1.00    819.4±7.01µs        ? ?/sec    1.00    819.6±4.50µs        ? ?/sec
physical_join_consider_sort                   1.00  1351.1±10.79µs        ? ?/sec    1.00   1354.8±6.31µs        ? ?/sec
physical_join_distinct                        1.00    341.3±1.73µs        ? ?/sec    1.01    343.4±1.94µs        ? ?/sec
physical_many_self_joins                      1.00      9.9±0.03ms        ? ?/sec    1.00      9.9±0.03ms        ? ?/sec
physical_plan_clickbench_all                  1.00    137.3±1.28ms        ? ?/sec    1.00    136.7±1.88ms        ? ?/sec
physical_plan_clickbench_q1                   1.00  1661.9±26.75µs        ? ?/sec    1.00  1669.5±18.88µs        ? ?/sec
physical_plan_clickbench_q10                  1.00      2.4±0.03ms        ? ?/sec    1.01      2.4±0.02ms        ? ?/sec
physical_plan_clickbench_q11                  1.00      2.5±0.02ms        ? ?/sec    1.01      2.5±0.04ms        ? ?/sec
physical_plan_clickbench_q12                  1.00      2.7±0.04ms        ? ?/sec    1.01      2.7±0.03ms        ? ?/sec
physical_plan_clickbench_q13                  1.00      2.3±0.04ms        ? ?/sec    1.00      2.3±0.02ms        ? ?/sec
physical_plan_clickbench_q14                  1.00      2.5±0.02ms        ? ?/sec    1.02      2.5±0.12ms        ? ?/sec
physical_plan_clickbench_q15                  1.00      2.4±0.02ms        ? ?/sec    1.00      2.4±0.03ms        ? ?/sec
physical_plan_clickbench_q16                  1.00      2.3±0.02ms        ? ?/sec    1.00      2.3±0.02ms        ? ?/sec
physical_plan_clickbench_q17                  1.01      2.4±0.04ms        ? ?/sec    1.00      2.4±0.02ms        ? ?/sec
physical_plan_clickbench_q18                  1.00  1923.1±20.52µs        ? ?/sec    1.00  1925.9±20.20µs        ? ?/sec
physical_plan_clickbench_q19                  1.00      2.8±0.03ms        ? ?/sec    1.00      2.8±0.03ms        ? ?/sec
physical_plan_clickbench_q2                   1.00  1888.4±18.61µs        ? ?/sec    1.00  1897.5±19.28µs        ? ?/sec
physical_plan_clickbench_q20                  1.00  1653.4±22.79µs        ? ?/sec    1.00  1656.0±15.88µs        ? ?/sec
physical_plan_clickbench_q21                  1.00  1930.0±14.67µs        ? ?/sec    1.01  1941.7±20.54µs        ? ?/sec
physical_plan_clickbench_q22                  1.00      2.5±0.04ms        ? ?/sec    1.00      2.6±0.03ms        ? ?/sec
physical_plan_clickbench_q23                  1.00      2.8±0.03ms        ? ?/sec    1.00      2.8±0.03ms        ? ?/sec
physical_plan_clickbench_q24                  1.00      3.2±0.03ms        ? ?/sec    1.01      3.2±0.04ms        ? ?/sec
physical_plan_clickbench_q25                  1.00      2.0±0.02ms        ? ?/sec    1.01      2.0±0.03ms        ? ?/sec
physical_plan_clickbench_q26                  1.01  1903.6±24.56µs        ? ?/sec    1.00  1882.9±31.59µs        ? ?/sec
physical_plan_clickbench_q27                  1.01      2.1±0.03ms        ? ?/sec    1.00      2.0±0.02ms        ? ?/sec
physical_plan_clickbench_q28                  1.02      2.8±0.03ms        ? ?/sec    1.00      2.7±0.02ms        ? ?/sec
physical_plan_clickbench_q29                  1.01      3.4±0.06ms        ? ?/sec    1.00      3.4±0.02ms        ? ?/sec
physical_plan_clickbench_q3                   1.00  1866.6±26.77µs        ? ?/sec    1.01  1879.0±27.19µs        ? ?/sec
physical_plan_clickbench_q30                  1.00     11.8±0.11ms        ? ?/sec    1.02     12.0±0.29ms        ? ?/sec
physical_plan_clickbench_q31                  1.01      2.8±0.03ms        ? ?/sec    1.00      2.8±0.04ms        ? ?/sec
physical_plan_clickbench_q32                  1.00      2.8±0.03ms        ? ?/sec    1.01      2.8±0.05ms        ? ?/sec
physical_plan_clickbench_q33                  1.00      2.4±0.02ms        ? ?/sec    1.00      2.4±0.02ms        ? ?/sec
physical_plan_clickbench_q34                  1.01      2.1±0.03ms        ? ?/sec    1.00      2.1±0.04ms        ? ?/sec
physical_plan_clickbench_q35                  1.02      2.2±0.03ms        ? ?/sec    1.00      2.1±0.01ms        ? ?/sec
physical_plan_clickbench_q36                  1.00      2.9±0.03ms        ? ?/sec    1.01      2.9±0.03ms        ? ?/sec
physical_plan_clickbench_q37                  1.00      2.8±0.01ms        ? ?/sec    1.02      2.9±0.05ms        ? ?/sec
physical_plan_clickbench_q38                  1.00      2.8±0.03ms        ? ?/sec    1.00      2.8±0.01ms        ? ?/sec
physical_plan_clickbench_q39                  1.00      2.7±0.02ms        ? ?/sec    1.00      2.7±0.03ms        ? ?/sec
physical_plan_clickbench_q4                   1.00  1613.1±15.17µs        ? ?/sec    1.02  1643.5±22.44µs        ? ?/sec
physical_plan_clickbench_q40                  1.01      3.3±0.05ms        ? ?/sec    1.00      3.3±0.02ms        ? ?/sec
physical_plan_clickbench_q41                  1.00      2.8±0.03ms        ? ?/sec    1.01      2.8±0.02ms        ? ?/sec
physical_plan_clickbench_q42                  1.00      2.8±0.04ms        ? ?/sec    1.00      2.8±0.02ms        ? ?/sec
physical_plan_clickbench_q43                  1.01      3.1±0.03ms        ? ?/sec    1.00      3.1±0.03ms        ? ?/sec
physical_plan_clickbench_q44                  1.00  1762.7±14.84µs        ? ?/sec    1.00  1762.1±22.21µs        ? ?/sec
physical_plan_clickbench_q45                  1.00  1763.3±19.24µs        ? ?/sec    1.00  1769.1±18.83µs        ? ?/sec
physical_plan_clickbench_q46                  1.02      2.2±0.04ms        ? ?/sec    1.00      2.2±0.02ms        ? ?/sec
physical_plan_clickbench_q47                  1.00      2.7±0.03ms        ? ?/sec    1.00      2.7±0.04ms        ? ?/sec
physical_plan_clickbench_q48                  1.01      3.3±0.02ms        ? ?/sec    1.00      3.3±0.06ms        ? ?/sec
physical_plan_clickbench_q49                  1.00      3.6±0.03ms        ? ?/sec    1.00      3.6±0.03ms        ? ?/sec
physical_plan_clickbench_q5                   1.00  1838.0±19.96µs        ? ?/sec    1.00  1839.2±20.46µs        ? ?/sec
physical_plan_clickbench_q50                  1.00      3.1±0.03ms        ? ?/sec    1.01      3.1±0.02ms        ? ?/sec
physical_plan_clickbench_q51                  1.01      2.3±0.02ms        ? ?/sec    1.00      2.3±0.03ms        ? ?/sec
physical_plan_clickbench_q52                  1.00      3.0±0.02ms        ? ?/sec    1.00      3.0±0.03ms        ? ?/sec
physical_plan_clickbench_q6                   1.00  1858.1±16.46µs        ? ?/sec    1.01  1869.0±27.71µs        ? ?/sec
physical_plan_clickbench_q7                   1.00  1686.0±21.88µs        ? ?/sec    1.00  1687.5±21.32µs        ? ?/sec
physical_plan_clickbench_q8                   1.00      2.4±0.03ms        ? ?/sec    1.00      2.4±0.03ms        ? ?/sec
physical_plan_clickbench_q9                   1.00      2.3±0.02ms        ? ?/sec    1.00      2.3±0.03ms        ? ?/sec
physical_plan_tpcds_all                       1.00   1020.5±5.50ms        ? ?/sec    1.01   1028.8±6.16ms        ? ?/sec
physical_plan_tpch_all                        1.00     61.9±0.24ms        ? ?/sec    1.01     62.8±1.22ms        ? ?/sec
physical_plan_tpch_q1                         1.00      2.0±0.01ms        ? ?/sec    1.00      2.1±0.02ms        ? ?/sec
physical_plan_tpch_q10                        1.00      3.8±0.02ms        ? ?/sec    1.01      3.8±0.06ms        ? ?/sec
physical_plan_tpch_q11                        1.00      3.2±0.01ms        ? ?/sec    1.01      3.3±0.04ms        ? ?/sec
physical_plan_tpch_q12                        1.00   1791.0±8.70µs        ? ?/sec    1.01  1817.7±27.27µs        ? ?/sec
physical_plan_tpch_q13                        1.00  1455.0±22.49µs        ? ?/sec    1.00  1453.7±27.60µs        ? ?/sec
physical_plan_tpch_q14                        1.00   1923.0±9.62µs        ? ?/sec    1.01   1938.0±9.50µs        ? ?/sec
physical_plan_tpch_q16                        1.00      2.4±0.01ms        ? ?/sec    1.01      2.5±0.01ms        ? ?/sec
physical_plan_tpch_q17                        1.00      2.4±0.01ms        ? ?/sec    1.02      2.4±0.05ms        ? ?/sec
physical_plan_tpch_q18                        1.00      2.7±0.01ms        ? ?/sec    1.01      2.7±0.02ms        ? ?/sec
physical_plan_tpch_q19                        1.00      3.2±0.05ms        ? ?/sec    1.03      3.3±0.03ms        ? ?/sec
physical_plan_tpch_q2                         1.00      5.4±0.06ms        ? ?/sec    1.01      5.5±0.03ms        ? ?/sec
physical_plan_tpch_q20                        1.00      3.1±0.01ms        ? ?/sec    1.01      3.1±0.06ms        ? ?/sec
physical_plan_tpch_q21                        1.00      4.0±0.03ms        ? ?/sec    1.01      4.1±0.04ms        ? ?/sec
physical_plan_tpch_q22                        1.00      2.7±0.03ms        ? ?/sec    1.01      2.7±0.01ms        ? ?/sec
physical_plan_tpch_q3                         1.00      2.5±0.01ms        ? ?/sec    1.01      2.6±0.03ms        ? ?/sec
physical_plan_tpch_q4                         1.00  1517.1±13.22µs        ? ?/sec    1.01  1536.1±19.13µs        ? ?/sec
physical_plan_tpch_q5                         1.00      3.1±0.03ms        ? ?/sec    1.01      3.1±0.01ms        ? ?/sec
physical_plan_tpch_q6                         1.00   861.0±18.71µs        ? ?/sec    1.01    869.5±4.94µs        ? ?/sec
physical_plan_tpch_q7                         1.00      4.2±0.02ms        ? ?/sec    1.01      4.2±0.03ms        ? ?/sec
physical_plan_tpch_q8                         1.00      5.1±0.02ms        ? ?/sec    1.01      5.2±0.05ms        ? ?/sec
physical_plan_tpch_q9                         1.00      4.1±0.05ms        ? ?/sec    1.00      4.1±0.01ms        ? ?/sec
physical_select_aggregates_from_200           1.01     17.7±0.09ms        ? ?/sec    1.00     17.5±0.04ms        ? ?/sec
physical_select_all_from_1000                 1.00     24.7±0.08ms        ? ?/sec    1.01     24.9±0.09ms        ? ?/sec
physical_select_one_from_700                  1.01   1034.0±5.41µs        ? ?/sec    1.00   1028.5±8.41µs        ? ?/sec
physical_sorted_union_orderby                 1.00     41.9±0.25ms        ? ?/sec    1.00     42.0±0.21ms        ? ?/sec
physical_theta_join_consider_sort             1.00  1715.1±10.64µs        ? ?/sec    1.01  1729.3±12.80µs        ? ?/sec
physical_unnest_to_join                       1.00   1275.1±4.14µs        ? ?/sec    1.01  1287.7±17.00µs        ? ?/sec
with_param_values_many_columns                1.00    129.8±1.04µs        ? ?/sec    1.00    129.7±1.12µs        ? ?/sec

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request logical-expr Logical plan and expressions optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants