Improve performance of correlated NOT EXISTS queries

Correlated `NOT EXISTS` queries like
```
SELECT * FROM table_a a WHERE NOT EXISTS (SELECT * FROM table_b b WHERE a.x = b.y AND a.i < b.j)
```
are rewritten to:
```
Filter(not exists)
|
Projection(exists := COALESCE(subquery_or, false))
|
Aggregation([a.*], subquery_or := boolean_or(subquery_true))
|
LeftJoin[a.x = b.y AND a.i < b.j]
|
---- Probe
|
---- Project[b.*, subquery_true := TRUE] -- Build
```

The `LeftJoin` in the plan above will produce row for every match. However, only single match is required to determine that `exists == true` for particular probe row.

To improve this we could add a new `JoinNode` flag, e.g: `boolean singleMatch` which would stop enumerating matches after a single hit.

Perhaps we could refactor or remove `JoinNode#maySkipOutputDuplicates`, which doesn't help here because join operator cannot stop enumerating matches when `maySkipOutputDuplicates==true` since `subquery_true` is not part of equi-clauses (see https://github.com/trinodb/trino/blob/dac669f1b917b8940751923e9e9e1fedda7b1ac6/core/trino-main/src/main/java/io/trino/sql/planner/LocalExecutionPlanner.java#L2781) 

Alternatively, we could determine that `subquery_true` is constant expression, which would allow us to skip output duplicates when `maySkipOutputDuplicates==true`. That might require having "traits" as part of `Reference` (cc @martint )

cc @Dith3r 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve performance of correlated NOT EXISTS queries #21859

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Improve performance of correlated NOT EXISTS queries #21859

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions