Skip to content

feat: Use NestedLoopJoin instead of HashJoin/SortMergeJoin for small tables #16450

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

jonathanc-n
Copy link
Contributor

@jonathanc-n jonathanc-n commented Jun 18, 2025

Which issue does this PR close?

  • Closes #.

Rationale for this change

We want to support equijoins in NestedLoopJoin in the case where one of the tables in the join is very small.

What changes are included in this PR?

I have added a nested_loop_equijoin_threshold to the OptimizerOptions which has a default value of 5 (same as DuckDB, here). This is the threshold for the number of rows that can be in either table so that the physical planner will choose a NestedLoopJoinExec over SortMergeJoin and HashJoin.

If either table has less than 5 rows then we will pass the join_on expressions to the join_filter.

Are these changes tested?

By existing tests

@github-actions github-actions bot added core Core DataFusion crate common Related to common crate labels Jun 18, 2025
@jonathanc-n
Copy link
Contributor Author

I will try to run a benchmark on a table with smaller rows and return the result when finished.

@jonathanc-n jonathanc-n changed the title feat: Support Equijoin Expressions in NestedLoopJoin feat: Support equijoins in NestedLoopJoin Jun 18, 2025
@jonathanc-n jonathanc-n marked this pull request as draft June 18, 2025 20:03
@jonathanc-n jonathanc-n marked this pull request as ready for review June 18, 2025 22:18
// If we can use nested loop join then we will combine the expressions in `join_on`
// and pass it into the join filter; create your join filters normally otherwise.
let join_filter: Option<JoinFilter> = if use_nested_loop_join_equijoin {
let combined_join_on_expression: Expr = filter
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that when we combine the JoinOn expressions here it will cause an error when both sides in the expression have the same unqualified name leading to duplicate unqualified fields. Is there a function that is able to qualify it with the schema, I can't seem to find one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, this is originally not a problem when the expression is in JoinOn as each unqualified column is referring to their own table before being combined into one expression.

@jonathanc-n
Copy link
Contributor Author

The upside is that it performs well when both tables are extremely small < 50 rows 😆

@xudong963
Copy link
Member

Maybe it'll be clear to change the title as "Use NestedLoopJoin instead of HashJoin/SortMergeJoin for small tables", I'm confused when I saw the PR title first.

@xudong963
Copy link
Member

The upside is that it performs well when both tables are extremely small < 50 rows 😆

Do we have some benchmark results?

@jonathanc-n jonathanc-n changed the title feat: Support equijoins in NestedLoopJoin feat: Use NestedLoopJoin instead of HashJoin/SortMergeJoin for small tables Jun 19, 2025
@jonathanc-n
Copy link
Contributor Author

jonathanc-n commented Jun 19, 2025

@xudong963 These were tests run with one of the sides having 5 rows:

Click to expand

joins/HashJoin/l=16_r=5 time:   [9.5541 µs 9.6068 µs 9.6640 µs]
                       change: [-1.2426% -0.2737% +0.6768%] (p = 0.57 > 0.05)
                       No change in performance detected.
Found 15 outliers among 100 measurements (15.00%)
 5 (5.00%) low mild
 8 (8.00%) high mild
 2 (2.00%) high severe
joins/NestedLoopJoin/l=16_r=5
                       time:   [8.3347 µs 8.4427 µs 8.5472 µs]
                       change: [+16.951% +17.961% +19.019%] (p = 0.00 < 0.05)
                       Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
 1 (1.00%) high mild
joins/HashJoin/l=64_r=5 time:   [9.7575 µs 9.8982 µs 10.029 µs]
                       change: [-6.8033% -2.8109% -0.1691%] (p = 0.11 > 0.05)
                       No change in performance detected.
Found 30 outliers among 100 measurements (30.00%)
 8 (8.00%) low severe
 5 (5.00%) low mild
 2 (2.00%) high mild
 15 (15.00%) high severe
joins/NestedLoopJoin/l=64_r=5
                       time:   [10.104 µs 10.157 µs 10.228 µs]
                       change: [+12.067% +12.951% +13.830%] (p = 0.00 < 0.05)
                       Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
 1 (1.00%) high mild
joins/HashJoin/l=256_r=5
                       time:   [10.351 µs 10.460 µs 10.576 µs]
                       change: [-0.9628% +0.0307% +1.0802%] (p = 0.95 > 0.05)
                       No change in performance detected.
joins/NestedLoopJoin/l=256_r=5
                       time:   [20.469 µs 20.519 µs 20.577 µs]
                       change: [+8.3494% +9.3946% +10.285%] (p = 0.00 < 0.05)
                       Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
 3 (3.00%) low mild
 3 (3.00%) high severe
joins/HashJoin/l=1024_r=5
                       time:   [24.460 µs 24.713 µs 24.901 µs]
                       change: [+21.363% +31.250% +41.778%] (p = 0.00 < 0.05)
                       Performance has regressed.
Found 23 outliers among 100 measurements (23.00%)
 16 (16.00%) low severe
 3 (3.00%) low mild
 4 (4.00%) high mild
joins/NestedLoopJoin/l=1024_r=5
                       time:   [67.322 µs 67.751 µs 68.268 µs]
                       change: [+6.0403% +8.0079% +9.5998%] (p = 0.00 < 0.05)
                       Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
 1 (1.00%) high mild
 2 (2.00%) high severe
joins/HashJoin/l=4096_r=5
                       time:   [22.298 µs 22.814 µs 23.411 µs]
                       change: [-4.7502% -0.0368% +4.6566%] (p = 0.99 > 0.05)
                       No change in performance detected.
Found 6 outliers among 100 measurements (6.00%)
 5 (5.00%) high mild
 1 (1.00%) high severe
joins/NestedLoopJoin/l=4096_r=5
                       time:   [258.64 µs 259.56 µs 260.40 µs]
                       change: [+5.1991% +8.4282% +10.949%] (p = 0.00 < 0.05)
                       Performance has regressed.
Found 22 outliers among 100 measurements (22.00%)
 13 (13.00%) low severe
 3 (3.00%) low mild
 5 (5.00%) high mild
 1 (1.00%) high severe
joins/HashJoin/l=32768_r=5
                       time:   [118.18 µs 120.61 µs 124.07 µs]
                       change: [-0.5396% +0.8988% +2.6146%] (p = 0.26 > 0.05)
                       No change in performance detected.
Found 4 outliers among 100 measurements (4.00%)
 3 (3.00%) high mild
 1 (1.00%) high severe
joins/NestedLoopJoin/l=32768_r=5
                       time:   [2.3731 ms 2.4175 ms 2.4715 ms]
                       change: [-2.4822% +2.7216% +7.0548%] (p = 0.29 > 0.05)
                       No change in performance detected.
Found 11 outliers among 100 measurements (11.00%)
 6 (6.00%) high mild
 5 (5.00%) high severe
joins/HashJoin/l=5_r=16 time:   [9.3953 µs 9.4891 µs 9.5818 µs]
                       change: [+0.2660% +2.3151% +3.8663%] (p = 0.01 < 0.05)
                       Change within noise threshold.
joins/NestedLoopJoin/l=5_r=16
                       time:   [8.4094 µs 8.4620 µs 8.5189 µs]
                       change: [+7.3597% +13.271% +16.947%] (p = 0.00 < 0.05)
                       Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
 2 (2.00%) low mild
joins/HashJoin/l=5_r=64 time:   [10.281 µs 10.304 µs 10.322 µs]
                       change: [+5.5270% +7.6174% +9.6222%] (p = 0.00 < 0.05)
                       Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
 1 (1.00%) low mild
joins/NestedLoopJoin/l=5_r=64
                       time:   [10.511 µs 10.644 µs 10.845 µs]
                       change: [+7.7402% +11.531% +14.867%] (p = 0.00 < 0.05)
                       Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
 1 (1.00%) low mild
 2 (2.00%) high mild
 1 (1.00%) high severe
joins/HashJoin/l=5_r=256
                       time:   [9.8597 µs 9.9384 µs 10.007 µs]
                       change: [-1.8462% -0.6659% +0.4999%] (p = 0.29 > 0.05)
                       No change in performance detected.
joins/NestedLoopJoin/l=5_r=256
                       time:   [21.108 µs 21.195 µs 21.280 µs]
                       change: [+1.5413% +6.7273% +10.204%] (p = 0.00 < 0.05)
                       Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
 1 (1.00%) high mild
joins/HashJoin/l=5_r=1024
                       time:   [11.088 µs 11.182 µs 11.279 µs]
                       change: [-0.8211% +0.3052% +1.3850%] (p = 0.59 > 0.05)
                       No change in performance detected.
joins/NestedLoopJoin/l=5_r=1024
                       time:   [69.937 µs 76.254 µs 83.638 µs]
                       change: [+5.7682% +9.4840% +14.122%] (p = 0.00 < 0.05)
                       Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
 4 (4.00%) high severe
joins/HashJoin/l=5_r=4096
                       time:   [16.550 µs 16.633 µs 16.713 µs]
                       change: [-1.4548% -0.7755% -0.0668%] (p = 0.03 < 0.05)
                       Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
 1 (1.00%) high mild
joins/NestedLoopJoin/l=5_r=4096
                       time:   [271.97 µs 272.96 µs 273.65 µs]
                       change: [+13.243% +14.024% +14.795%] (p = 0.00 < 0.05)
                       Performance has regressed.
Found 15 outliers among 100 measurements (15.00%)
 7 (7.00%) low severe
 1 (1.00%) low mild
 5 (5.00%) high mild
 2 (2.00%) high severe
joins/HashJoin/l=5_r=32768
                       time:   [79.925 µs 81.041 µs 82.978 µs]
                       change: [+2.3887% +3.6331% +5.0589%] (p = 0.00 < 0.05)
                       Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
 1 (1.00%) high mild
 1 (1.00%) high severe
joins/NestedLoopJoin/l=5_r=32768
                       time:   [2.6531 ms 2.7439 ms 2.8609 ms]
                       change: [+9.0545% +13.275% +18.591%] (p = 0.00 < 0.05)
                       Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
 4 (4.00%) high mild
 4 (4.00%) high severe

Interesting I noticed I wasn't using a filter for the NestedLoopJoin and it was hardly performing faster. The benchmarks I just sent were the benchmarks using the filter for NLJ, which is why there are some regressions. I'll see if I can take some time to see how we can speed up NestedLoopJoin.

The reason why NestedLoopJoin is so slow is due to the cartesian product which is being calculated + running the filter through all the indices (which is also memory expensive). It would probably be faster to just keep one side in memory if possible. and have the other side run a block nested loop join on it. In that case, equijoins tend to be much faster on a smaller table.

@jonathanc-n
Copy link
Contributor Author

I'll try to open a pull request later for creating a performance bench file for specifically benchmarking joins.

@jonathanc-n jonathanc-n marked this pull request as draft June 20, 2025 16:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
common Related to common crate core Core DataFusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants