-
Notifications
You must be signed in to change notification settings - Fork 4.8k
HIVE-28280: SemanticException when querying VIEW with DISTINCT clause #6103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
51b5bc8 to
d399943
Compare
| case HiveProject hiveProject -> hiveProject; | ||
| case SingleRel singleRel when singleRel.getInput() instanceof HiveProject hiveProject -> hiveProject; | ||
| default -> throw new SemanticException("View " + subqAlias + " is corresponding to " | ||
| + relNode + ", rather than a HiveProject or a SingleRel with HiveProject as its child."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you make the error message clearer? Maybe something like "Could not obtain a HiveProject from " + relNode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea! I have changed the message in the new commit
| + relNode.toString() + ", rather than a HiveProject."); | ||
| HiveProject project = switch (Objects.requireNonNull(relNode)) { | ||
| case HiveProject hiveProject -> hiveProject; | ||
| case SingleRel singleRel when singleRel.getInput() instanceof HiveProject hiveProject -> hiveProject; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to descend into SingleRels recursively, until a HiveProject can be found? Maybe the view definition could have, e.g., a Sort(Filter(Project(...))?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually earlier I tried to produce a plan where the second node is not a Project but couldn't. But just to be safe I have updated the code to recursively check for the first Project through the SingleRels.
| throw new SemanticException("View " + subqAlias + " is corresponding to " | ||
| + relNode.toString() + ", rather than a HiveProject."); | ||
| HiveProject project = switch (Objects.requireNonNull(relNode)) { | ||
| case HiveProject hiveProject -> hiveProject; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: Sonar warns about wrong indentation for the cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I noticed that but there's no good solution to it. In the latest commit it shouldn't complain as the code is:
private Optional<HiveProject> extractFirstProject(RelNode rel) {
return switch (rel) {
case HiveProject hiveProject -> Optional.of(hiveProject);
case SingleRel sr -> extractFirstProject(sr.getInput());
case null, default -> Optional.empty();
};
}
However, I would have liked to see the cases indented to:
private Optional<HiveProject> extractFirstProject(RelNode rel) {
return switch (rel) {
case HiveProject hiveProject -> Optional.of(hiveProject);
case SingleRel sr -> extractFirstProject(sr.getInput());
case null, default -> Optional.empty();
};
}
Actually in checkstyle/checkstyle.xml we have defined the indentation for cases to be 0:
<module name="Indentation">
<property name="basicOffset" value="2" />
<property name="caseIndent" value="0" />
<property name="throwsIndent" value="4" />
</module>
which is fine when you have the older switch-case statement, as it doesn't look too bad:
switch(...) {
case a: ...
case b: ...
}
But when you have a switch expression, like the example shown at the beginning of this comment (i.e., with a return), it makes more sense to treat it as a basicOffset and not a caseIndent in my opinion. But we cannot simply change the value of caseIndent as there are older style switch statements in the repo.
Ideally I would like to have a value of 2 for case indents. Also by default the cases should be indented as seen here: https://checkstyle.sourceforge.io/checks/misc/indentation.html#caseIndent
Sorry for the long write-up, but we shouldn't see checkstyle warnings in the latest commit :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the long comment! I would support the motion to change caseIndent to 2. I guess that would be out-of-scope for this ticket. Until then I would prefer avoiding checkstyle warnings.
| } else { | ||
| throw new SemanticException("View " + subqAlias + " is corresponding to " | ||
| + relNode.toString() + ", rather than a HiveProject."); | ||
| HiveProject project = extractFirstProject(relNode) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need a project and why the first project is important?
How do we use the project afterwards?
Does the code remain correct if we use a project that is not at the top of the plan?
Do we generally support views with LIMIT, ORDER BY, etc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use the projects for authorization in HiveRelFieldTrimmer.trimFields here, and that's the only place where we use it. I believe that is why we need only the first project, that gives us the final list of fields.
I don't think we can use other projects in place of the top one, as the fields could be different.
I am not really sure if we officially support views with limits or order by, but right now it looks like we can support them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The viewToProjectTableSchema has been introduced in HIVE-13095 to determine whether a user has the permission to access fields of a view. There's a comment in https://reviews.apache.org/r/43834/.
It could work if we expand the logic to all kinds of RelNodes. The code in trimFields(Project, ...) does not really use the project. The code could be applied to any kind of RelNode. The code would then work even if no project could be found (e.g., if the top level node is a join or union). The cost: adding a call to each of the trimFields methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed the approach so that it would work with any RelNode. I will look into the failed tests tomorrow, but let me know what your thoughts are regarding the new approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the update, @soumyakanti3578! I think we can simplify it a bit, see my review comment.
| List<RexNode> projects = relNodeToTableAndProjects.get(rel).right; | ||
| List<FieldSchema> tableAllCols = table.getAllCols(); | ||
| for (Ord<RexNode> ord : Ord.zip(projects)) { | ||
| if (fieldsUsed.get(ord.i)) { | ||
| columnAccessInfo.add(table.getCompleteName(), tableAllCols.get(ord.i).getName()); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need the projects. We could just iterate over the fields from the row type of the RelNode. That would also simplify the map, i.e., just a Map<RelNode, Table>, instead of a Map<RelNode, Pair<Table, List<RexNode>>>.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion! I have simplified this in the latest version.
| RelNode input, | ||
| final ImmutableBitSet fieldsUsed, | ||
| Set<RelDataTypeField> extraFields) { | ||
| setColumnAccessInfoForViews(rel, fieldsUsed); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to add a call for the root node somewhere as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually although we were in the trimChild method, I was using passing rel which is the parent and not input which is actually the child.
But I have had to change the code anyway because of test failures. I realized that we cannot call setColumnAccessInfoForViews in trimChild as it's too late because we may mutate fieldsUsed in trimFields before calling trimChild.
In the new version I have moved the call to setColumnAccessInfoForViews just before we call RelFieldTrimmer.trimFields which is the right place to call it in my opinion. Let me know if you have any thoughts about this :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new way to call setColumnAccessInfoForViews is better. It's the first method called in dispatchTrimFields, and dispatchTrimFields is called for all nodes including the root. My concern has been solved. Thanks for the improvement!
f68ad6a to
4a1659c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor suggestion to make the code more efficient. Otherwise LGTM.
| rel.getRowType().getFieldList().stream() | ||
| .map(RelDataTypeField::getIndex) | ||
| .filter(fieldsUsed::get) | ||
| .forEach(i -> columnAccessInfo.add(table.getCompleteName(), tableAllCols.get(i).getName())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the following is more efficient:
int idx = -1;
while((idx = fieldsUsed.nextSetBit(idx+1)) >= 0) {
columnAccessInfo.add(table.getCompleteName(), tableAllCols.get(idx).getName())
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed!
Replaced with
for (int i = fieldsUsed.nextSetBit(0); i >= 0; i = fieldsUsed.nextSetBit(i + 1)) {
columnAccessInfo.add(table.getCompleteName(), tableAllCols.get(i).getName());
}
which is slightly cleaner and more readable in my opinion.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM



What changes were proposed in this pull request?
Earlier we expected a view's logical plan to always have a
HiveProjectas its top node. But we can haveHiveSortLimittoo if the original view definition has a limit.We should support all
RelNodes and not justHiveProject.Why are the changes needed?
We get an error as described in https://issues.apache.org/jira/browse/HIVE-28280, https://issues.apache.org/jira/browse/HIVE-21163
Does this PR introduce any user-facing change?
No
How was this patch tested?