Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESQL: Fix ReplaceMissingFieldsWithNull #125764

Merged

Conversation

alex-spies
Copy link
Contributor

@alex-spies alex-spies commented Mar 27, 2025

Fix #126030
Fix #126036
Fix #121754

There are multiple problems with ReplaceMissingFieldsWithNull:

When encountering projections, it tries to do the job of field extraction for missing fields by injecting an Eval that creates a literal null with the same name id as the field attribute for the missing field. But:

  1. We only insert an Eval in case that a Project relies on the missing attribute. There could be other plan nodes that rely on the missing attribute.
  2. Even for Projects, we only insert an Eval in case we squarely project for the field - in case of aliases (e.g. from RENAME), we do nothing.
  3. In case of multiple Projects that use this attribute, we create multiple attributes with the original field attribute's id, causing a wrong Layout. This triggered ES|QL: NullPointerException planning EVAL #121754. This could even happen if there's simply a command upstream from a single Project in case the upstream command triggers a field extraction for the missing field.

There are multiple ways of dealing with this:

  1. simple solutions:
    1. We could move the creation of the Evals into InsertFieldExtraction. This will be correct and simple, but we will miss out on important logical optimizations, namely turning the EsRelation into a LocalRelation in case all fields are missing, or propagating the literal nulls created in place of the missing field (enabling more constant folding etc.).
    2. We could just place an Eval directly on top of any EsRelation that has missing fields, introducing literal nulls in place of the missing attributes early on. This has a risk of emitting null blocks unnecessarily early in the execution, but the logical optimizer should be able to deal with this.
  2. more complex solutions:
    1. We could just avoid replacing the same field attribute twice. This does not address the problem that upstream query plan nodes might trigger a field extraction, leading to duplicate name ids again.
    2. We could keep the original logic but just use new attribute ids. This is what my first attempt (ESQL: Unique ids when replacing missing fields #125656) does; it seems to work fine, but downstream nodes need updating to use the Eval's newly created ReferenceAttributes instead of the original FieldAttributes, at least in terms of the name id of the ReferenceAttribute. But! We cannot just replace field attributes by reference attributes in downstream nodes, that can lead to class cast exceptions...
    3. Instead of injecting Evals just before Projects, we could inject them just before the first query plan node that needs them. This would replicate the logic in InsertFieldExtraction, but in the local logical optimizer. In addition to the duplicated logic, this has a risk of accidentally messing up the ordering of the output fields (Evals place their attributes always on the right hand side), which could cause serious problems unless there's a Project/Stats somewhere downstream (which is probably always given, but still).

This PR uses approach 1.ii. (Approach 1.i would also be possible and is arguably more natural. It would require double checking that we don't miss out on some optimizations, though.)

The change in 80125a4 is a quick fix
and allows breaking an invariant of Layout. Revert that.
When encountering projections, it tries to do the job of field
extraction for missing fields by injecting an Eval that creates a
literal null with the same name id as the field attribute for the
missing field. This is wrong:
1. We only insert an Eval in case that a Project relies on the missing
   attribute. There could be other plan nodes that rely on the missing
   attribute.
2. Even for Projects, we only insert an Eval in case we squarely project
   for the field - in case of aliases (e.g. from RENAME), we do nothing.
3. In case of multiple Projects that use this attribute, we create
   multiple attributes with the original field attribute's id, causing
   a wrong Layout. This triggered
   elastic#121754.
@alex-spies alex-spies changed the title ESQL: Fix replace missing fields with null ESQL: Fix ReplaceMissingFieldsWithNull Mar 27, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @alex-spies, I've created a changelog YAML for you.

Copy link
Member

@fang-xing-esql fang-xing-esql left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @alex-spies, I added a couple of things that I can think of.

@alex-spies
Copy link
Contributor Author

Thanks for the remarks @fang-xing-esql . I'll add a couple of comments to the code so that we don't have to scratch our heads with the same questions in the future.

This can lead to empty output, which leads to the EsRelation being
replaced by a LocalRelation with 0 rows.
@elasticsearchmachine
Copy link
Collaborator

Hi @alex-spies, I've updated the changelog YAML for you.

@alex-spies alex-spies marked this pull request as ready for review April 2, 2025 17:15
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Apr 2, 2025
Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@alex-spies alex-spies added auto-backport Automatically create backport pull requests when merged v9.0.0 v8.18.0 labels Apr 2, 2025
Copy link
Member

@fang-xing-esql fang-xing-esql left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you @alex-spies !

@alex-spies alex-spies added auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) and removed auto-backport Automatically create backport pull requests when merged labels Apr 2, 2025
@alex-spies
Copy link
Contributor Author

Last CI run was green except for elasticsearch-ci/part-1 timing out:

2025-04-02 20:08:37 CEST	> Task :build-tools:reaper:spotlessJava
2025-04-03 01:07:06 CEST	# Received cancellation signal, interrupting
2025-04-03 01:07:06 CEST	🚨 Error: The command was interrupted by a signal: signal: terminated

Restarted CI as there was a merge conflict now.

alex-spies added a commit that referenced this pull request Apr 3, 2025
* Revert changes to Layout.java

The change in 80125a4 is a quick fix
and allows breaking an invariant of Layout. Revert that.

* Simplify ReplaceMissingFieldWithNull

When encountering projections, it tries to do the job of field
extraction for missing fields by injecting an Eval that creates a
literal null with the same name id as the field attribute for the
missing field. This is wrong:
1. We only insert an Eval in case that a Project relies on the missing
   attribute. There could be other plan nodes that rely on the missing
   attribute.
2. Even for Projects, we only insert an Eval in case we squarely project
   for the field - in case of aliases (e.g. from RENAME), we do nothing.
3. In case of multiple Projects that use this attribute, we create
   multiple attributes with the original field attribute's id, causing
   a wrong Layout. This triggered
   #121754.

* Revive logic for EsRelation instead of Project

* Update LocalLogicalPlanOptimizerTests

* Update test expectations

* Do not prune attributes from EsRelation

This can lead to empty output, which leads to the EsRelation being
replaced by a LocalRelation with 0 rows.

* Add tests + capability

* Add comments

* Update docs/changelog/125764.yaml
alex-spies added a commit that referenced this pull request Apr 3, 2025
* Revert changes to Layout.java

The change in 80125a4 is a quick fix
and allows breaking an invariant of Layout. Revert that.

* Simplify ReplaceMissingFieldWithNull

When encountering projections, it tries to do the job of field
extraction for missing fields by injecting an Eval that creates a
literal null with the same name id as the field attribute for the
missing field. This is wrong:
1. We only insert an Eval in case that a Project relies on the missing
   attribute. There could be other plan nodes that rely on the missing
   attribute.
2. Even for Projects, we only insert an Eval in case we squarely project
   for the field - in case of aliases (e.g. from RENAME), we do nothing.
3. In case of multiple Projects that use this attribute, we create
   multiple attributes with the original field attribute's id, causing
   a wrong Layout. This triggered
   #121754.

* Revive logic for EsRelation instead of Project

* Update LocalLogicalPlanOptimizerTests

* Update test expectations

* Do not prune attributes from EsRelation

This can lead to empty output, which leads to the EsRelation being
replaced by a LocalRelation with 0 rows.

* Add tests + capability

* Add comments

* [CI] Auto commit changes from spotless

* Update docs/changelog/125764.yaml

---------

Co-authored-by: elasticsearchmachine <[email protected]>
@alex-spies
Copy link
Contributor Author

image
The checks here complain that they're still waiting on elasticsearch-ci/part-2, but in buildkite, this pipeline already passed green.

@alex-spies
Copy link
Contributor Author

image
Seems like the reporting of the status of elasticsearch-ci/part-2 may be stuck.

But since the CI passed in CI, this is safe to merge.

@alex-spies
Copy link
Contributor Author

Ah yeah, publishing the build scan failed:

BUILD SUCCESSFUL in 36m 35s
3025 actionable tasks: 1247 executed, 1652 from cache, 126 up-to-date
Publishing build scan...
Publishing build scan failed due to network error 'java.net.SocketTimeoutException: Read timed out' (2 retries remaining)...
Publishing build scan failed due to network error 'java.net.SocketTimeoutException: Read timed out' (1 retry remaining)...
A network error occurred.
If you require assistance with this problem, please report it to your Develocity administrator and include the following information via copy/paste.
----------
Gradle version: 8.13
Plugin version: 3.19.2
Request URL: https://gradle-enterprise.elastic.co/scans/publish/gradle/3.19.2/token
Request ID: dea4cbad-e35c-4b39-af24-a35bcbaf4292
Exception: java.net.SocketTimeoutException: Read timed out
----------

@alex-spies alex-spies merged commit 28a544e into elastic:main Apr 3, 2025
16 of 17 checks passed
@alex-spies alex-spies deleted the fix-replace-missing-fields-with-null branch April 3, 2025 07:26
alex-spies added a commit to alex-spies/elasticsearch that referenced this pull request Apr 3, 2025
…lastic#126166)

* Revert changes to Layout.java

The change in 80125a4 is a quick fix
and allows breaking an invariant of Layout. Revert that.

* Simplify ReplaceMissingFieldWithNull

When encountering projections, it tries to do the job of field
extraction for missing fields by injecting an Eval that creates a
literal null with the same name id as the field attribute for the
missing field. This is wrong:
1. We only insert an Eval in case that a Project relies on the missing
   attribute. There could be other plan nodes that rely on the missing
   attribute.
2. Even for Projects, we only insert an Eval in case we squarely project
   for the field - in case of aliases (e.g. from RENAME), we do nothing.
3. In case of multiple Projects that use this attribute, we create
   multiple attributes with the original field attribute's id, causing
   a wrong Layout. This triggered
   elastic#121754.

* Revive logic for EsRelation instead of Project

* Update LocalLogicalPlanOptimizerTests

* Update test expectations

* Do not prune attributes from EsRelation

This can lead to empty output, which leads to the EsRelation being
replaced by a LocalRelation with 0 rows.

* Add tests + capability

* Add comments

* [CI] Auto commit changes from spotless

* Update docs/changelog/125764.yaml

---------

Co-authored-by: elasticsearchmachine <[email protected]>
(cherry picked from commit 96ca13a)

# Conflicts:
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java
alex-spies added a commit to alex-spies/elasticsearch that referenced this pull request Apr 3, 2025
…lastic#126166)

* Revert changes to Layout.java

The change in 80125a4 is a quick fix
and allows breaking an invariant of Layout. Revert that.

* Simplify ReplaceMissingFieldWithNull

When encountering projections, it tries to do the job of field
extraction for missing fields by injecting an Eval that creates a
literal null with the same name id as the field attribute for the
missing field. This is wrong:
1. We only insert an Eval in case that a Project relies on the missing
   attribute. There could be other plan nodes that rely on the missing
   attribute.
2. Even for Projects, we only insert an Eval in case we squarely project
   for the field - in case of aliases (e.g. from RENAME), we do nothing.
3. In case of multiple Projects that use this attribute, we create
   multiple attributes with the original field attribute's id, causing
   a wrong Layout. This triggered
   elastic#121754.

* Revive logic for EsRelation instead of Project

* Update LocalLogicalPlanOptimizerTests

* Update test expectations

* Do not prune attributes from EsRelation

This can lead to empty output, which leads to the EsRelation being
replaced by a LocalRelation with 0 rows.

* Add tests + capability

* Add comments

* [CI] Auto commit changes from spotless

* Update docs/changelog/125764.yaml

---------

Co-authored-by: elasticsearchmachine <[email protected]>
(cherry picked from commit 96ca13a)

# Conflicts:
#	x-pack/plugin/esql/qa/testFixtures/src/main/resources/lookup-join.csv-spec
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/rules/logical/local/ReplaceMissingFieldWithNull.java
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/planner/Layout.java
#	x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LocalLogicalPlanOptimizerTests.java
#	x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LocalPhysicalPlanOptimizerTests.java
elasticsearchmachine pushed a commit that referenced this pull request Apr 3, 2025
… (#126186)

* Revert changes to Layout.java

The change in 80125a4 is a quick fix
and allows breaking an invariant of Layout. Revert that.

* Simplify ReplaceMissingFieldWithNull

When encountering projections, it tries to do the job of field
extraction for missing fields by injecting an Eval that creates a
literal null with the same name id as the field attribute for the
missing field. This is wrong:
1. We only insert an Eval in case that a Project relies on the missing
   attribute. There could be other plan nodes that rely on the missing
   attribute.
2. Even for Projects, we only insert an Eval in case we squarely project
   for the field - in case of aliases (e.g. from RENAME), we do nothing.
3. In case of multiple Projects that use this attribute, we create
   multiple attributes with the original field attribute's id, causing
   a wrong Layout. This triggered
   #121754.

* Revive logic for EsRelation instead of Project

* Update LocalLogicalPlanOptimizerTests

* Update test expectations

* Do not prune attributes from EsRelation

This can lead to empty output, which leads to the EsRelation being
replaced by a LocalRelation with 0 rows.

* Add tests + capability

* Add comments

* [CI] Auto commit changes from spotless

* Update docs/changelog/125764.yaml

---------

Co-authored-by: elasticsearchmachine <[email protected]>
(cherry picked from commit 96ca13a)

# Conflicts:
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java
elasticsearchmachine pushed a commit that referenced this pull request Apr 3, 2025
* [8.18] ESQL: ESQL: Fix ReplaceMissingFieldsWithNull (#125764) (#126166)

* Revert changes to Layout.java

The change in 80125a4 is a quick fix
and allows breaking an invariant of Layout. Revert that.

* Simplify ReplaceMissingFieldWithNull

When encountering projections, it tries to do the job of field
extraction for missing fields by injecting an Eval that creates a
literal null with the same name id as the field attribute for the
missing field. This is wrong:
1. We only insert an Eval in case that a Project relies on the missing
   attribute. There could be other plan nodes that rely on the missing
   attribute.
2. Even for Projects, we only insert an Eval in case we squarely project
   for the field - in case of aliases (e.g. from RENAME), we do nothing.
3. In case of multiple Projects that use this attribute, we create
   multiple attributes with the original field attribute's id, causing
   a wrong Layout. This triggered
   #121754.

* Revive logic for EsRelation instead of Project

* Update LocalLogicalPlanOptimizerTests

* Update test expectations

* Do not prune attributes from EsRelation

This can lead to empty output, which leads to the EsRelation being
replaced by a LocalRelation with 0 rows.

* Add tests + capability

* Add comments

* [CI] Auto commit changes from spotless

* Update docs/changelog/125764.yaml

---------

Co-authored-by: elasticsearchmachine <[email protected]>
(cherry picked from commit 96ca13a)

# Conflicts:
#	x-pack/plugin/esql/qa/testFixtures/src/main/resources/lookup-join.csv-spec
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/rules/logical/local/ReplaceMissingFieldWithNull.java
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/planner/Layout.java
#	x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LocalLogicalPlanOptimizerTests.java
#	x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LocalPhysicalPlanOptimizerTests.java

* Re-instate fix for LOOKUP JOIN, update tests
andreidan pushed a commit to andreidan/elasticsearch that referenced this pull request Apr 9, 2025
* Revert changes to Layout.java

The change in 80125a4 is a quick fix
and allows breaking an invariant of Layout. Revert that.

* Simplify ReplaceMissingFieldWithNull

When encountering projections, it tries to do the job of field
extraction for missing fields by injecting an Eval that creates a
literal null with the same name id as the field attribute for the
missing field. This is wrong:
1. We only insert an Eval in case that a Project relies on the missing
   attribute. There could be other plan nodes that rely on the missing
   attribute.
2. Even for Projects, we only insert an Eval in case we squarely project
   for the field - in case of aliases (e.g. from RENAME), we do nothing.
3. In case of multiple Projects that use this attribute, we create
   multiple attributes with the original field attribute's id, causing
   a wrong Layout. This triggered
   elastic#121754.

* Revive logic for EsRelation instead of Project

* Update LocalLogicalPlanOptimizerTests

* Update docs/changelog/125764.yaml

* Update test expectations

* Do not prune attributes from EsRelation

This can lead to empty output, which leads to the EsRelation being
replaced by a LocalRelation with 0 rows.

* Add tests + capability

* Update docs/changelog/125764.yaml

* Add comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.17.5 v8.18.0 v8.19.0 v9.0.0 v9.1.0
Projects
None yet
4 participants