Description
Correlated NOT EXISTS
queries like
SELECT * FROM table_a a WHERE NOT EXISTS (SELECT * FROM table_b b WHERE a.x = b.y AND a.i < b.j)
are rewritten to:
Filter(not exists)
|
Projection(exists := COALESCE(subquery_or, false))
|
Aggregation([a.*], subquery_or := boolean_or(subquery_true))
|
LeftJoin[a.x = b.y AND a.i < b.j]
|
---- Probe
|
---- Project[b.*, subquery_true := TRUE] -- Build
The LeftJoin
in the plan above will produce row for every match. However, only single match is required to determine that exists == true
for particular probe row.
To improve this we could add a new JoinNode
flag, e.g: boolean singleMatch
which would stop enumerating matches after a single hit.
Perhaps we could refactor or remove JoinNode#maySkipOutputDuplicates
, which doesn't help here because join operator cannot stop enumerating matches when maySkipOutputDuplicates==true
since subquery_true
is not part of equi-clauses (see
Alternatively, we could determine that subquery_true
is constant expression, which would allow us to skip output duplicates when maySkipOutputDuplicates==true
. That might require having "traits" as part of Reference
(cc @martint )
cc @Dith3r