-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ES|QL] Assign new id to alias created by ReplaceMissingFieldWithNull when there is lookup join #125462
Conversation
Hi @fang-xing-esql, I've created a changelog YAML for you. |
Alias alias = joinAttributes.isEmpty() | ||
? new Alias(f.source(), f.name(), Literal.of(f, null), f.id()) | ||
: new Alias(f.source(), f.name(), Literal.of(f, null)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- please add an explanation on why this occurs
- use the ternary operator for the id to pick either f.id or null and call the constructor only once
new Alias(f.source(), f.name(), Literal.of(), joinAttributes.isEmpty() ? f.id(): null))
As a follow-up the fact that the rule has to be aware of the join is fishy.
Pinging @elastic/es-analytical-engine (Team:Analytics) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I observed a problem with the approach. I have a suggestion for a more complete fix.
I think we can merge this as a quick fix as it should enable more queries to run, though.
// fields, creating a new alias for null with the same id as the field id can potentially cause planEval to add a | ||
// duplicated ChannelSet to a layout, and Layout.builder().build() could throw a NullPointerException. | ||
// As a workaround, assign a new alias id to the null alias when join exists and SearchStats is not available. | ||
Alias alias = new Alias(f.source(), f.name(), Literal.of(f, null), joinAttributes.isEmpty() ? f.id() : null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @fang-xing-esql for getting to the bottom of this!
I think this approach unfortunately still has problems. There could be downstream commands that still reference the old name id, but now the attribute with the old id is gone for good (projected away). For instance, there might be another LOOKUP JOIN
command after the projection, or an MV_EXPAND
, and I think this one will just break due to the name id being completely absent from the previous commands' layouts. The only subsequent commands that are OK to use downstream are those taken care of in the next else if
branch below, namely Eval, Filter, OrderBy, RegexExtract and TopN.
I guess there are 2 approaches of eliminating missing fields: either, define a new attribute pointing to a null literal with an own id - and update all downstream references to point to the new attribute; OR update the attribute in place by defining a null literal with the same name id + make sure we never extract the original attribute in the first place; the latter is sketchy because, as we can see here, it can lead to re-defining the same name id multiple times.
Now, I think a part of the correct long-term solution is that we avoid the lookup join when the left hand side join key is missing. But that's not sufficient: I can see that we're introducing multiple Projects for the same missing attribute even if it is not a join key, like so (taken from your csv test):
Project[[birth_date{f}#15, language_code{f}#9]]
\_TopN[[Order[language_code{f}#9,ASC,LAST], Order[birth_date{f}#15,ASC,LAST]],1[INTEGER]]
\_Join[LEFT,[message{r}#3],[message{r}#3],[message{f}#17]]
|_EsqlProject[[birth_date{f}#15, city.country.continent.planet.name{f}#14 AS message, language_code{f}#9]]
| \_EsRelation[*][birth_date{f}#15, city.country.continent.planet.nam..]
\_EsRelation[message_types_lookup][LOOKUP][message{f}#17]
The problem is that there are 2 Projects for birth_date
, which is locally missing, and ReplaceMissingFieldWithNull
will lead to the addition of the null attribute birth_date
with the id 15 twice. Your fix gives that a new id each time, but if birth_date{f}#15
is used anywhere after the second Project
(in a more complex query), it will be missing.
Therefore, I think the second, more crucial, part of a correct solution is that we avoid replacing attributes by other attributes with the same name id, at all. Instead, I think we should update ReplaceMissingFieldWithNull
so that it always assigns new ids - and updates any downstream references to the missing field attribute with a reference attribute to the newly defined null attribute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried the following, and it gives "no child with doc id found" in InsertFieldExtraction
:
from *
| rename city.country.continent.planet.name as message
| lookup join message_types_lookup on message
| keep birth_date, language_code
| mv_expand birth_date
| sort language_code, birth_date
| limit 1
;
That's the diff from ReplaceMissingFieldWithNull
:
Project[[birth_date{r}#868, language_code{f}#859]] = Project[[birth_date{r}#868, language_code{f}#859]]
\_Limit[1[INTEGER],false] = \_Limit[1[INTEGER],false]
\_OrderBy[[Order[language_code{f}#859,ASC,LAST], Order[birth_date{r}#868,ASC,LAST]]] = \_OrderBy[[Order[language_code{f}#859,ASC,LAST], Order[birth_date{r}#868,ASC,LAST]]]
\_MvExpand[birth_date{f}#865,birth_date{r}#868] = \_MvExpand[birth_date{f}#865,birth_date{r}#868]
\_EsqlProject[[birth_date{f}#865, language_code{f}#859]] ! \_Project[[birth_date{r}#881, language_code{f}#859]]
\_Join[LEFT,[message{r}#851],[message{r}#851],[message{f}#867]] ! \_Eval[[null[DATETIME] AS birth_date]]
|_EsqlProject[[birth_date{f}#865, city.country.continent.planet.name{f}#864 AS message, language_code{f}#859]] ! \_Join[LEFT,[message{r}#851],[message{r}#851],[message{f}#867]]
| \_EsRelation[*][birth_date{f}#865, city.country.continent.planet.na..] ! |_Project[[birth_date{r}#865, city.country.continent.planet.name{f}#864 AS message, language_code{f}#859]]
\_EsRelation[message_types_lookup][LOOKUP][message{f}#867] ! | \_Eval[[null[DATETIME] AS birth_date]]
! | \_EsRelation[*][birth_date{f}#865, city.country.continent.planet.na..]
! \_EsRelation[message_types_lookup][LOOKUP][message{f}#867]
Notice how MvExpand
now references a missing attribute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for reviewing @alex-spies!
Today, ReplaceMissingFieldWithNull
only replace missing fields in Project
, Eval
, Filter
, OrderBy
, RegexExtract
and TopN
, it doesn't replace missing fields in the other commands like MvExpand
. One option I can think of is whether covering more commands in this rule can solve this problem, I'll give it a try, if birth_date can be identified as a missing field for MvExpand
, and if replacing it with a null alias works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for catching this issue with MvExpand
@alex-spies ! I added MvExpand
in ReplaceMissingFieldWithNull
, so that the missing field(target) referenced by MvExpand
is replaced by null, expanded's id is kept unchanged, and also added some tests for MvExpand
and Aggregation
after lookup joins.
ReplaceMissingFieldWithNull
does not replace missing fields in aggregations, however I haven't seen queries failed because of it yet, perhaps the downstream processes aggregations in a smarter way.
Another option to address #121754 is to remove duplicated entries(ChannelSet
) in the layout, in case duplicated entries are expected or not avoidable, so that the DefaultLayout.inverse()
does not create a null ChannelSet
in the list.
@costin , I opened 2 issues from the findings of this PR: |
@@ -113,14 +120,40 @@ else if (plan instanceof Project project) { | |||
? f | |||
: Literal.of(f, null) | |||
); | |||
} else if (plan instanceof MvExpand m) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There can also be other downstream commands that will have the same problem, though, like CHANGE_POINT.
// Replace missing target field with null. | ||
Alias alias = new Alias(f.source(), f.name(), Literal.of(f, null)); | ||
NamedExpression nullTarget = alias.toAttribute(); | ||
plan = new Eval(m.source(), m.child(), List.of(alias)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this will change the position of the expanded field in the output because EVAL always places new attributes on the right. This may be fine when there's a projection some time later, but it may also create another bug.
Resolves: #121754
This is one option to resolve this bug.
This
NullPointerException
happens inLayout.Builder.build()
, when there is nullChannelSet
inLayout.Builder.channels
.A null
ChannelSet
could be added inLayout.Builder.channels
by Layout.inverse(), if the numberOfChannels > the # of items in the layout map. `When there are multiple batches of data node executions with lookup join and sort in the query,
LocalExecutionPlanner.planEval
could potentially append duplicatedChannelSet
- duplicated NameId and data type, to the list ofChannelSet
inLayout.Builder
, which causes numberOfChannels > the # of items in the layout map when build() is called and aDefaultLayout
is created.The duplicated NameId is created by
ReplaceMissingFieldWithNull
for null(alias) -a{r}#4
,a{f}#4
, later on both of them are appended to the channels by planEval, that causes the duplicated name ids.This particular issue happens when there are 10+ indices referenced in the FROM clause which triggers the batch execution on data nodes, and there is a lookup join with sort, this PR creates an alias for null with a new id under this particular situation to solve this problem.