Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ES|QL: NullPointerException planning EVAL #121754

Closed
luigidellaquila opened this issue Feb 5, 2025 · 6 comments · Fixed by #125636 or #125764
Closed

ES|QL: NullPointerException planning EVAL #121754

luigidellaquila opened this issue Feb 5, 2025 · 6 comments · Fixed by #125636 or #125764
Assignees
Labels
:Analytics/ES|QL AKA ESQL >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)

Comments

@luigidellaquila
Copy link
Contributor

With csv-spec test dataset

from clientips,* 
| dissect street "%{height_range} %{MNyXV}" 
| rename env AS kxpCK, pod AS etUHW, language_code AS city.country.continent.planet.galaxy
| rename city.country.continent.planet.name as message 
| lookup join message_types_lookup on message
| rename languages.int as language_code 
| lookup join languages_lookup on language_code
| rename language_code as language_code 
| lookup join languages_lookup on language_code 
| sort decade DESC NULLS FIRST, year ASC NULLS FIRST, country.keyword ASC, num ASC NULLS LAST, network.bytes_in ASC, author.keyword ASC, languages.byte DESC NULLS LAST, type ASC NULLS LAST, zip_code NULLS FIRST, height_range, number ASC, language_code ASC NULLS LAST, host_group NULLS LAST, MNyXV ASC NULLS FIRST, kxpCK ASC, language.id ASC, languages.long DESC NULLS FIRST, host.name ASC, city.country.continent.name ASC NULLS FIRST, author DESC NULLS LAST, ip0, event_duration ASC NULLS LAST, scalerank ASC, ip1 DESC, client.ip NULLS FIRST, message NULLS FIRST, city.country.name ASC NULLS FIRST, version ASC NULLS FIRST, airport ASC NULLS FIRST, language.code ASC NULLS LAST, language.name.keyword NULLS FIRST, city.name DESC NULLS LAST, city.country.continent.planet.galaxy NULLS FIRST, publisher ASC NULLS FIRST, abbrev DESC NULLS FIRST, card ASC, salary_change.keyword NULLS FIRST, lk ASC NULLS FIRST	

TODO: simplify the query, it's extreme because it's randomly generated

{
    "error": {
        "root_cause": [
            {
                "type": "null_pointer_exception",
                "reason": "Cannot read field \"nameIds\" because \"set\" is null"
            }
        ],
        "type": "null_pointer_exception",
        "reason": "Cannot read field \"nameIds\" because \"set\" is null"
    },
    "status": 500
}
ESQL request failed with status [INTERNAL_SERVER_ERROR]: java.lang.NullPointerException: Cannot read field "nameIds" because "set" is null
        at org.elasticsearch.xpack.esql.planner.Layout$Builder.build(Layout.java:111)
        at org.elasticsearch.xpack.esql.planner.LocalExecutionPlanner.planEval(LocalExecutionPlanner.java:412)
        at org.elasticsearch.xpack.esql.planner.LocalExecutionPlanner.plan(LocalExecutionPlanner.java:211)
        at org.elasticsearch.xpack.esql.planner.LocalExecutionPlanner.planProject(LocalExecutionPlanner.java:640)
        at org.elasticsearch.xpack.esql.planner.LocalExecutionPlanner.plan(LocalExecutionPlanner.java:217)
        at org.elasticsearch.xpack.esql.planner.LocalExecutionPlanner.planLookupJoin(LocalExecutionPlanner.java:562)
        at org.elasticsearch.xpack.esql.planner.LocalExecutionPlanner.plan(LocalExecutionPlanner.java:243)
        at org.elasticsearch.xpack.esql.planner.LocalExecutionPlanner.planEval(LocalExecutionPlanner.java:406)
        at org.elasticsearch.xpack.esql.planner.LocalExecutionPlanner.plan(LocalExecutionPlanner.java:211)
        at org.elasticsearch.xpack.esql.planner.LocalExecutionPlanner.planProject(LocalExecutionPlanner.java:640)
        at org.elasticsearch.xpack.esql.planner.LocalExecutionPlanner.plan(LocalExecutionPlanner.java:217)
        at org.elasticsearch.xpack.esql.planner.LocalExecutionPlanner.planLookupJoin(LocalExecutionPlanner.java:562)
        at org.elasticsearch.xpack.esql.planner.LocalExecutionPlanner.plan(LocalExecutionPlanner.java:243)
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Feb 5, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@kanoshiou
Copy link
Contributor

I believe I've found a simpler way to reproduce this issue.

curl --request PUT --url http://localhost:9200/index1
curl --request PUT --url http://localhost:9200/index2
curl --request PUT --url http://localhost:9200/index3
curl --request PUT --url http://localhost:9200/index4
curl --request PUT --url http://localhost:9200/index5
curl --request PUT --url http://localhost:9200/index6
curl --request PUT --url http://localhost:9200/index7
curl --request PUT --url http://localhost:9200/index8
curl --request PUT --url http://localhost:9200/index9
curl --request PUT \
  --url http://localhost:9200/index10 \
  --header 'content-type: application/json' \
  --data '{
  "mappings": {
    "properties": {
      "ip0": {
        "type": "ip"
      }
    }
  }
}'

curl --request PUT \
  --url http://localhost:9200/index_lookup \
  --header 'content-type: application/json' \
  --data '{
  "settings": {
    "index": {
      "mode": "lookup"
    }
  },
  "mappings": {
    "properties": {
      "a": {
        "type": "integer"
      }
    }
  }
}'

curl --request POST \
  --url 'http://localhost:9200/_query?format=txt' \
  --header 'content-type: application/json' \
  --data '{
  "query": "from * | lookup join index_lookup on a | sort a"
}'

@alex-spies
Copy link
Contributor

Thanks a lot @kanoshiou , this is excellent.

This also reproduces for me using your simplified instructions. For whatever reason, the empty indices index1, ..., index9 are relevant. When creating index10 and index_lookup first and running the query, it works fine. Creating the indices index1, ..., index9 suddently makes the query cause an NPE. When I deleted index9, suddenly the query worked again. Creating it again reprodcued the NPE, again.

The fact that this can happen with a simple query like from * | lookup join index_lookup on a | sort a means this is not a low-priority bug and we need to get to the bottom of this.

@kanoshiou
Copy link
Contributor

Thanks @alex-spies , for your detailed additions.

The failed query appears to execute an additional EVAL physical operation compared to the successful query. This discrepancy causes the layout builder to append an existing attribute, leading to a mismatch between the layout map's size and numberOfChannels.

I attempted to resolve this issue, but, as you mentioned, it might not be a low-priority one, so I believe it's best to leave it to you maintainers. However, I am still very willing to assist if needed.

@fang-xing-esql
Copy link
Member

I'm able to reproduce the errors with both queries, and a simplified Q1 looks like below, I'll need a sort to make it error out. The two queries stack traces are slightly different, however it is likely that they have the same root case. I'm looking into them further.

+ curl -u elastic:password -H 'Content-Type: application/json' '127.0.0.1:9200/_query?format=txt' -d '{
    "query": "from * 
             | rename city.country.continent.planet.name as message 
             | lookup join message_types_lookup on message
             | sort language_code"
}
'
{"error":{"root_cause":[{"type":"null_pointer_exception","reason":"Cannot read field \"nameIds\" because \"set\" is null"}],"type":"null_pointer_exception","reason":"Cannot read field \"nameIds\" because \"set\" is null","suppressed":[{"type":"null_pointer_exception","reason":"Cannot read field \"nameIds\" because \"set\" is null"}]},"status":500}%   

@alex-spies
Copy link
Contributor

While #125636 provided a quick fix, I'd like to keep this issue open until we have a long-term fix in place if that's okay.

@alex-spies alex-spies reopened this Mar 27, 2025
@alex-spies alex-spies self-assigned this Mar 27, 2025
alex-spies added a commit to alex-spies/elasticsearch that referenced this issue Mar 27, 2025
When encountering projections, it tries to do the job of field
extraction for missing fields by injecting an Eval that creates a
literal null with the same name id as the field attribute for the
missing field. This is wrong:
1. We only insert an Eval in case that a Project relies on the missing
   attribute. There could be other plan nodes that rely on the missing
   attribute.
2. Even for Projects, we only insert an Eval in case we squarely project
   for the field - in case of aliases (e.g. from RENAME), we do nothing.
3. In case of multiple Projects that use this attribute, we create
   multiple attributes with the original field attribute's id, causing
   a wrong Layout. This triggered
   elastic#121754.
alex-spies added a commit to alex-spies/elasticsearch that referenced this issue Apr 2, 2025
When encountering projections, it tries to do the job of field
extraction for missing fields by injecting an Eval that creates a
literal null with the same name id as the field attribute for the
missing field. This is wrong:
1. We only insert an Eval in case that a Project relies on the missing
   attribute. There could be other plan nodes that rely on the missing
   attribute.
2. Even for Projects, we only insert an Eval in case we squarely project
   for the field - in case of aliases (e.g. from RENAME), we do nothing.
3. In case of multiple Projects that use this attribute, we create
   multiple attributes with the original field attribute's id, causing
   a wrong Layout. This triggered
   elastic#121754.
alex-spies added a commit to alex-spies/elasticsearch that referenced this issue Apr 2, 2025
When encountering projections, it tries to do the job of field
extraction for missing fields by injecting an Eval that creates a
literal null with the same name id as the field attribute for the
missing field. This is wrong:
1. We only insert an Eval in case that a Project relies on the missing
   attribute. There could be other plan nodes that rely on the missing
   attribute.
2. Even for Projects, we only insert an Eval in case we squarely project
   for the field - in case of aliases (e.g. from RENAME), we do nothing.
3. In case of multiple Projects that use this attribute, we create
   multiple attributes with the original field attribute's id, causing
   a wrong Layout. This triggered
   elastic#121754.
alex-spies added a commit that referenced this issue Apr 3, 2025
* Revert changes to Layout.java

The change in 80125a4 is a quick fix
and allows breaking an invariant of Layout. Revert that.

* Simplify ReplaceMissingFieldWithNull

When encountering projections, it tries to do the job of field
extraction for missing fields by injecting an Eval that creates a
literal null with the same name id as the field attribute for the
missing field. This is wrong:
1. We only insert an Eval in case that a Project relies on the missing
   attribute. There could be other plan nodes that rely on the missing
   attribute.
2. Even for Projects, we only insert an Eval in case we squarely project
   for the field - in case of aliases (e.g. from RENAME), we do nothing.
3. In case of multiple Projects that use this attribute, we create
   multiple attributes with the original field attribute's id, causing
   a wrong Layout. This triggered
   #121754.

* Revive logic for EsRelation instead of Project

* Update LocalLogicalPlanOptimizerTests

* Update test expectations

* Do not prune attributes from EsRelation

This can lead to empty output, which leads to the EsRelation being
replaced by a LocalRelation with 0 rows.

* Add tests + capability

* Add comments

* Update docs/changelog/125764.yaml
alex-spies added a commit that referenced this issue Apr 3, 2025
* Revert changes to Layout.java

The change in 80125a4 is a quick fix
and allows breaking an invariant of Layout. Revert that.

* Simplify ReplaceMissingFieldWithNull

When encountering projections, it tries to do the job of field
extraction for missing fields by injecting an Eval that creates a
literal null with the same name id as the field attribute for the
missing field. This is wrong:
1. We only insert an Eval in case that a Project relies on the missing
   attribute. There could be other plan nodes that rely on the missing
   attribute.
2. Even for Projects, we only insert an Eval in case we squarely project
   for the field - in case of aliases (e.g. from RENAME), we do nothing.
3. In case of multiple Projects that use this attribute, we create
   multiple attributes with the original field attribute's id, causing
   a wrong Layout. This triggered
   #121754.

* Revive logic for EsRelation instead of Project

* Update LocalLogicalPlanOptimizerTests

* Update test expectations

* Do not prune attributes from EsRelation

This can lead to empty output, which leads to the EsRelation being
replaced by a LocalRelation with 0 rows.

* Add tests + capability

* Add comments

* [CI] Auto commit changes from spotless

* Update docs/changelog/125764.yaml

---------

Co-authored-by: elasticsearchmachine <[email protected]>
alex-spies added a commit that referenced this issue Apr 3, 2025
* Revert changes to Layout.java

The change in 80125a4 is a quick fix
and allows breaking an invariant of Layout. Revert that.

* Simplify ReplaceMissingFieldWithNull

When encountering projections, it tries to do the job of field
extraction for missing fields by injecting an Eval that creates a
literal null with the same name id as the field attribute for the
missing field. This is wrong:
1. We only insert an Eval in case that a Project relies on the missing
   attribute. There could be other plan nodes that rely on the missing
   attribute.
2. Even for Projects, we only insert an Eval in case we squarely project
   for the field - in case of aliases (e.g. from RENAME), we do nothing.
3. In case of multiple Projects that use this attribute, we create
   multiple attributes with the original field attribute's id, causing
   a wrong Layout. This triggered
   #121754.

* Revive logic for EsRelation instead of Project

* Update LocalLogicalPlanOptimizerTests

* Update docs/changelog/125764.yaml

* Update test expectations

* Do not prune attributes from EsRelation

This can lead to empty output, which leads to the EsRelation being
replaced by a LocalRelation with 0 rows.

* Add tests + capability

* Update docs/changelog/125764.yaml

* Add comments
alex-spies added a commit to alex-spies/elasticsearch that referenced this issue Apr 3, 2025
…lastic#126166)

* Revert changes to Layout.java

The change in 80125a4 is a quick fix
and allows breaking an invariant of Layout. Revert that.

* Simplify ReplaceMissingFieldWithNull

When encountering projections, it tries to do the job of field
extraction for missing fields by injecting an Eval that creates a
literal null with the same name id as the field attribute for the
missing field. This is wrong:
1. We only insert an Eval in case that a Project relies on the missing
   attribute. There could be other plan nodes that rely on the missing
   attribute.
2. Even for Projects, we only insert an Eval in case we squarely project
   for the field - in case of aliases (e.g. from RENAME), we do nothing.
3. In case of multiple Projects that use this attribute, we create
   multiple attributes with the original field attribute's id, causing
   a wrong Layout. This triggered
   elastic#121754.

* Revive logic for EsRelation instead of Project

* Update LocalLogicalPlanOptimizerTests

* Update test expectations

* Do not prune attributes from EsRelation

This can lead to empty output, which leads to the EsRelation being
replaced by a LocalRelation with 0 rows.

* Add tests + capability

* Add comments

* [CI] Auto commit changes from spotless

* Update docs/changelog/125764.yaml

---------

Co-authored-by: elasticsearchmachine <[email protected]>
(cherry picked from commit 96ca13a)

# Conflicts:
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java
alex-spies added a commit to alex-spies/elasticsearch that referenced this issue Apr 3, 2025
…lastic#126166)

* Revert changes to Layout.java

The change in 80125a4 is a quick fix
and allows breaking an invariant of Layout. Revert that.

* Simplify ReplaceMissingFieldWithNull

When encountering projections, it tries to do the job of field
extraction for missing fields by injecting an Eval that creates a
literal null with the same name id as the field attribute for the
missing field. This is wrong:
1. We only insert an Eval in case that a Project relies on the missing
   attribute. There could be other plan nodes that rely on the missing
   attribute.
2. Even for Projects, we only insert an Eval in case we squarely project
   for the field - in case of aliases (e.g. from RENAME), we do nothing.
3. In case of multiple Projects that use this attribute, we create
   multiple attributes with the original field attribute's id, causing
   a wrong Layout. This triggered
   elastic#121754.

* Revive logic for EsRelation instead of Project

* Update LocalLogicalPlanOptimizerTests

* Update test expectations

* Do not prune attributes from EsRelation

This can lead to empty output, which leads to the EsRelation being
replaced by a LocalRelation with 0 rows.

* Add tests + capability

* Add comments

* [CI] Auto commit changes from spotless

* Update docs/changelog/125764.yaml

---------

Co-authored-by: elasticsearchmachine <[email protected]>
(cherry picked from commit 96ca13a)

# Conflicts:
#	x-pack/plugin/esql/qa/testFixtures/src/main/resources/lookup-join.csv-spec
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/rules/logical/local/ReplaceMissingFieldWithNull.java
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/planner/Layout.java
#	x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LocalLogicalPlanOptimizerTests.java
#	x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LocalPhysicalPlanOptimizerTests.java
elasticsearchmachine pushed a commit that referenced this issue Apr 3, 2025
… (#126186)

* Revert changes to Layout.java

The change in 80125a4 is a quick fix
and allows breaking an invariant of Layout. Revert that.

* Simplify ReplaceMissingFieldWithNull

When encountering projections, it tries to do the job of field
extraction for missing fields by injecting an Eval that creates a
literal null with the same name id as the field attribute for the
missing field. This is wrong:
1. We only insert an Eval in case that a Project relies on the missing
   attribute. There could be other plan nodes that rely on the missing
   attribute.
2. Even for Projects, we only insert an Eval in case we squarely project
   for the field - in case of aliases (e.g. from RENAME), we do nothing.
3. In case of multiple Projects that use this attribute, we create
   multiple attributes with the original field attribute's id, causing
   a wrong Layout. This triggered
   #121754.

* Revive logic for EsRelation instead of Project

* Update LocalLogicalPlanOptimizerTests

* Update test expectations

* Do not prune attributes from EsRelation

This can lead to empty output, which leads to the EsRelation being
replaced by a LocalRelation with 0 rows.

* Add tests + capability

* Add comments

* [CI] Auto commit changes from spotless

* Update docs/changelog/125764.yaml

---------

Co-authored-by: elasticsearchmachine <[email protected]>
(cherry picked from commit 96ca13a)

# Conflicts:
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java
elasticsearchmachine pushed a commit that referenced this issue Apr 3, 2025
* [8.18] ESQL: ESQL: Fix ReplaceMissingFieldsWithNull (#125764) (#126166)

* Revert changes to Layout.java

The change in 80125a4 is a quick fix
and allows breaking an invariant of Layout. Revert that.

* Simplify ReplaceMissingFieldWithNull

When encountering projections, it tries to do the job of field
extraction for missing fields by injecting an Eval that creates a
literal null with the same name id as the field attribute for the
missing field. This is wrong:
1. We only insert an Eval in case that a Project relies on the missing
   attribute. There could be other plan nodes that rely on the missing
   attribute.
2. Even for Projects, we only insert an Eval in case we squarely project
   for the field - in case of aliases (e.g. from RENAME), we do nothing.
3. In case of multiple Projects that use this attribute, we create
   multiple attributes with the original field attribute's id, causing
   a wrong Layout. This triggered
   #121754.

* Revive logic for EsRelation instead of Project

* Update LocalLogicalPlanOptimizerTests

* Update test expectations

* Do not prune attributes from EsRelation

This can lead to empty output, which leads to the EsRelation being
replaced by a LocalRelation with 0 rows.

* Add tests + capability

* Add comments

* [CI] Auto commit changes from spotless

* Update docs/changelog/125764.yaml

---------

Co-authored-by: elasticsearchmachine <[email protected]>
(cherry picked from commit 96ca13a)

# Conflicts:
#	x-pack/plugin/esql/qa/testFixtures/src/main/resources/lookup-join.csv-spec
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/rules/logical/local/ReplaceMissingFieldWithNull.java
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/planner/Layout.java
#	x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LocalLogicalPlanOptimizerTests.java
#	x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LocalPhysicalPlanOptimizerTests.java

* Re-instate fix for LOOKUP JOIN, update tests
andreidan pushed a commit to andreidan/elasticsearch that referenced this issue Apr 9, 2025
* Revert changes to Layout.java

The change in 80125a4 is a quick fix
and allows breaking an invariant of Layout. Revert that.

* Simplify ReplaceMissingFieldWithNull

When encountering projections, it tries to do the job of field
extraction for missing fields by injecting an Eval that creates a
literal null with the same name id as the field attribute for the
missing field. This is wrong:
1. We only insert an Eval in case that a Project relies on the missing
   attribute. There could be other plan nodes that rely on the missing
   attribute.
2. Even for Projects, we only insert an Eval in case we squarely project
   for the field - in case of aliases (e.g. from RENAME), we do nothing.
3. In case of multiple Projects that use this attribute, we create
   multiple attributes with the original field attribute's id, causing
   a wrong Layout. This triggered
   elastic#121754.

* Revive logic for EsRelation instead of Project

* Update LocalLogicalPlanOptimizerTests

* Update docs/changelog/125764.yaml

* Update test expectations

* Do not prune attributes from EsRelation

This can lead to empty output, which leads to the EsRelation being
replaced by a LocalRelation with 0 rows.

* Add tests + capability

* Update docs/changelog/125764.yaml

* Add comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment