-
Notifications
You must be signed in to change notification settings - Fork 25.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ESQL: Fix ReplaceMissingFieldsWithNull #125764
ESQL: Fix ReplaceMissingFieldsWithNull #125764
Conversation
The change in 80125a4 is a quick fix and allows breaking an invariant of Layout. Revert that.
When encountering projections, it tries to do the job of field extraction for missing fields by injecting an Eval that creates a literal null with the same name id as the field attribute for the missing field. This is wrong: 1. We only insert an Eval in case that a Project relies on the missing attribute. There could be other plan nodes that rely on the missing attribute. 2. Even for Projects, we only insert an Eval in case we squarely project for the field - in case of aliases (e.g. from RENAME), we do nothing. 3. In case of multiple Projects that use this attribute, we create multiple attributes with the original field attribute's id, causing a wrong Layout. This triggered elastic#121754.
Hi @alex-spies, I've created a changelog YAML for you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @alex-spies, I added a couple of things that I can think of.
.../org/elasticsearch/xpack/esql/optimizer/rules/logical/local/ReplaceMissingFieldWithNull.java
Show resolved
Hide resolved
.../org/elasticsearch/xpack/esql/optimizer/rules/logical/local/ReplaceMissingFieldWithNull.java
Show resolved
Hide resolved
Thanks for the remarks @fang-xing-esql . I'll add a couple of comments to the code so that we don't have to scratch our heads with the same questions in the future. |
This can lead to empty output, which leads to the EsRelation being replaced by a LocalRelation with 0 rows.
Hi @alex-spies, I've updated the changelog YAML for you. |
Pinging @elastic/es-analytical-engine (Team:Analytics) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you @alex-spies !
…-fields-with-null
…-fields-with-null
Last CI run was green except for elasticsearch-ci/part-1 timing out:
Restarted CI as there was a merge conflict now. |
* Revert changes to Layout.java The change in 80125a4 is a quick fix and allows breaking an invariant of Layout. Revert that. * Simplify ReplaceMissingFieldWithNull When encountering projections, it tries to do the job of field extraction for missing fields by injecting an Eval that creates a literal null with the same name id as the field attribute for the missing field. This is wrong: 1. We only insert an Eval in case that a Project relies on the missing attribute. There could be other plan nodes that rely on the missing attribute. 2. Even for Projects, we only insert an Eval in case we squarely project for the field - in case of aliases (e.g. from RENAME), we do nothing. 3. In case of multiple Projects that use this attribute, we create multiple attributes with the original field attribute's id, causing a wrong Layout. This triggered #121754. * Revive logic for EsRelation instead of Project * Update LocalLogicalPlanOptimizerTests * Update test expectations * Do not prune attributes from EsRelation This can lead to empty output, which leads to the EsRelation being replaced by a LocalRelation with 0 rows. * Add tests + capability * Add comments * Update docs/changelog/125764.yaml
* Revert changes to Layout.java The change in 80125a4 is a quick fix and allows breaking an invariant of Layout. Revert that. * Simplify ReplaceMissingFieldWithNull When encountering projections, it tries to do the job of field extraction for missing fields by injecting an Eval that creates a literal null with the same name id as the field attribute for the missing field. This is wrong: 1. We only insert an Eval in case that a Project relies on the missing attribute. There could be other plan nodes that rely on the missing attribute. 2. Even for Projects, we only insert an Eval in case we squarely project for the field - in case of aliases (e.g. from RENAME), we do nothing. 3. In case of multiple Projects that use this attribute, we create multiple attributes with the original field attribute's id, causing a wrong Layout. This triggered #121754. * Revive logic for EsRelation instead of Project * Update LocalLogicalPlanOptimizerTests * Update test expectations * Do not prune attributes from EsRelation This can lead to empty output, which leads to the EsRelation being replaced by a LocalRelation with 0 rows. * Add tests + capability * Add comments * [CI] Auto commit changes from spotless * Update docs/changelog/125764.yaml --------- Co-authored-by: elasticsearchmachine <[email protected]>
Ah yeah, publishing the build scan failed:
|
…lastic#126166) * Revert changes to Layout.java The change in 80125a4 is a quick fix and allows breaking an invariant of Layout. Revert that. * Simplify ReplaceMissingFieldWithNull When encountering projections, it tries to do the job of field extraction for missing fields by injecting an Eval that creates a literal null with the same name id as the field attribute for the missing field. This is wrong: 1. We only insert an Eval in case that a Project relies on the missing attribute. There could be other plan nodes that rely on the missing attribute. 2. Even for Projects, we only insert an Eval in case we squarely project for the field - in case of aliases (e.g. from RENAME), we do nothing. 3. In case of multiple Projects that use this attribute, we create multiple attributes with the original field attribute's id, causing a wrong Layout. This triggered elastic#121754. * Revive logic for EsRelation instead of Project * Update LocalLogicalPlanOptimizerTests * Update test expectations * Do not prune attributes from EsRelation This can lead to empty output, which leads to the EsRelation being replaced by a LocalRelation with 0 rows. * Add tests + capability * Add comments * [CI] Auto commit changes from spotless * Update docs/changelog/125764.yaml --------- Co-authored-by: elasticsearchmachine <[email protected]> (cherry picked from commit 96ca13a) # Conflicts: # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java
…lastic#126166) * Revert changes to Layout.java The change in 80125a4 is a quick fix and allows breaking an invariant of Layout. Revert that. * Simplify ReplaceMissingFieldWithNull When encountering projections, it tries to do the job of field extraction for missing fields by injecting an Eval that creates a literal null with the same name id as the field attribute for the missing field. This is wrong: 1. We only insert an Eval in case that a Project relies on the missing attribute. There could be other plan nodes that rely on the missing attribute. 2. Even for Projects, we only insert an Eval in case we squarely project for the field - in case of aliases (e.g. from RENAME), we do nothing. 3. In case of multiple Projects that use this attribute, we create multiple attributes with the original field attribute's id, causing a wrong Layout. This triggered elastic#121754. * Revive logic for EsRelation instead of Project * Update LocalLogicalPlanOptimizerTests * Update test expectations * Do not prune attributes from EsRelation This can lead to empty output, which leads to the EsRelation being replaced by a LocalRelation with 0 rows. * Add tests + capability * Add comments * [CI] Auto commit changes from spotless * Update docs/changelog/125764.yaml --------- Co-authored-by: elasticsearchmachine <[email protected]> (cherry picked from commit 96ca13a) # Conflicts: # x-pack/plugin/esql/qa/testFixtures/src/main/resources/lookup-join.csv-spec # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/rules/logical/local/ReplaceMissingFieldWithNull.java # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/planner/Layout.java # x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LocalLogicalPlanOptimizerTests.java # x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LocalPhysicalPlanOptimizerTests.java
… (#126186) * Revert changes to Layout.java The change in 80125a4 is a quick fix and allows breaking an invariant of Layout. Revert that. * Simplify ReplaceMissingFieldWithNull When encountering projections, it tries to do the job of field extraction for missing fields by injecting an Eval that creates a literal null with the same name id as the field attribute for the missing field. This is wrong: 1. We only insert an Eval in case that a Project relies on the missing attribute. There could be other plan nodes that rely on the missing attribute. 2. Even for Projects, we only insert an Eval in case we squarely project for the field - in case of aliases (e.g. from RENAME), we do nothing. 3. In case of multiple Projects that use this attribute, we create multiple attributes with the original field attribute's id, causing a wrong Layout. This triggered #121754. * Revive logic for EsRelation instead of Project * Update LocalLogicalPlanOptimizerTests * Update test expectations * Do not prune attributes from EsRelation This can lead to empty output, which leads to the EsRelation being replaced by a LocalRelation with 0 rows. * Add tests + capability * Add comments * [CI] Auto commit changes from spotless * Update docs/changelog/125764.yaml --------- Co-authored-by: elasticsearchmachine <[email protected]> (cherry picked from commit 96ca13a) # Conflicts: # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java
* [8.18] ESQL: ESQL: Fix ReplaceMissingFieldsWithNull (#125764) (#126166) * Revert changes to Layout.java The change in 80125a4 is a quick fix and allows breaking an invariant of Layout. Revert that. * Simplify ReplaceMissingFieldWithNull When encountering projections, it tries to do the job of field extraction for missing fields by injecting an Eval that creates a literal null with the same name id as the field attribute for the missing field. This is wrong: 1. We only insert an Eval in case that a Project relies on the missing attribute. There could be other plan nodes that rely on the missing attribute. 2. Even for Projects, we only insert an Eval in case we squarely project for the field - in case of aliases (e.g. from RENAME), we do nothing. 3. In case of multiple Projects that use this attribute, we create multiple attributes with the original field attribute's id, causing a wrong Layout. This triggered #121754. * Revive logic for EsRelation instead of Project * Update LocalLogicalPlanOptimizerTests * Update test expectations * Do not prune attributes from EsRelation This can lead to empty output, which leads to the EsRelation being replaced by a LocalRelation with 0 rows. * Add tests + capability * Add comments * [CI] Auto commit changes from spotless * Update docs/changelog/125764.yaml --------- Co-authored-by: elasticsearchmachine <[email protected]> (cherry picked from commit 96ca13a) # Conflicts: # x-pack/plugin/esql/qa/testFixtures/src/main/resources/lookup-join.csv-spec # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/rules/logical/local/ReplaceMissingFieldWithNull.java # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/planner/Layout.java # x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LocalLogicalPlanOptimizerTests.java # x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LocalPhysicalPlanOptimizerTests.java * Re-instate fix for LOOKUP JOIN, update tests
* Revert changes to Layout.java The change in 80125a4 is a quick fix and allows breaking an invariant of Layout. Revert that. * Simplify ReplaceMissingFieldWithNull When encountering projections, it tries to do the job of field extraction for missing fields by injecting an Eval that creates a literal null with the same name id as the field attribute for the missing field. This is wrong: 1. We only insert an Eval in case that a Project relies on the missing attribute. There could be other plan nodes that rely on the missing attribute. 2. Even for Projects, we only insert an Eval in case we squarely project for the field - in case of aliases (e.g. from RENAME), we do nothing. 3. In case of multiple Projects that use this attribute, we create multiple attributes with the original field attribute's id, causing a wrong Layout. This triggered elastic#121754. * Revive logic for EsRelation instead of Project * Update LocalLogicalPlanOptimizerTests * Update docs/changelog/125764.yaml * Update test expectations * Do not prune attributes from EsRelation This can lead to empty output, which leads to the EsRelation being replaced by a LocalRelation with 0 rows. * Add tests + capability * Update docs/changelog/125764.yaml * Add comments
Fix #126030
Fix #126036
Fix #121754
There are multiple problems with
ReplaceMissingFieldsWithNull
:When encountering projections, it tries to do the job of field extraction for missing fields by injecting an Eval that creates a literal null with the same name id as the field attribute for the missing field. But:
There are multiple ways of dealing with this:
InsertFieldExtraction
. This will be correct and simple, but we will miss out on important logical optimizations, namely turning theEsRelation
into aLocalRelation
in case all fields are missing, or propagating the literal nulls created in place of the missing field (enabling more constant folding etc.).InsertFieldExtraction
, but in the local logical optimizer. In addition to the duplicated logic, this has a risk of accidentally messing up the ordering of the output fields (Evals place their attributes always on the right hand side), which could cause serious problems unless there's a Project/Stats somewhere downstream (which is probably always given, but still).This PR uses approach 1.ii. (Approach 1.i would also be possible and is arguably more natural. It would require double checking that we don't miss out on some optimizations, though.)