HIVE-28280: SemanticException when querying VIEW with DISTINCT clause #6103

soumyakanti3578 · 2025-09-25T00:00:04Z

What changes were proposed in this pull request?

Earlier we expected a view's logical plan to always have a HiveProject as its top node. But we can have HiveSortLimit too if the original view definition has a limit.

We should support all RelNodes and not just HiveProject.

Why are the changes needed?

We get an error as described in https://issues.apache.org/jira/browse/HIVE-28280, https://issues.apache.org/jira/browse/HIVE-21163

Does this PR introduce any user-facing change?

No

How was this patch tested?

mvn test -pl itests/qtest -Pitests -Dtest=TestMiniLlapCliDriver -Dtest.output.overwrite=true -Dqfile=view_top_relnode_not_project_authorization.q

thomasrebele · 2025-10-09T09:47:49Z

ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java

+            case HiveProject hiveProject -> hiveProject;
+            case SingleRel singleRel when singleRel.getInput() instanceof HiveProject hiveProject -> hiveProject;
+            default -> throw new SemanticException("View " + subqAlias + " is corresponding to "
+                + relNode + ", rather than a HiveProject or a SingleRel with HiveProject as its child.");


Could you make the error message clearer? Maybe something like "Could not obtain a HiveProject from " + relNode.

Good idea! I have changed the message in the new commit

thomasrebele · 2025-10-09T09:48:09Z

ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java

-                + relNode.toString() + ", rather than a HiveProject.");
+          HiveProject project = switch (Objects.requireNonNull(relNode)) {
+            case HiveProject hiveProject -> hiveProject;
+            case SingleRel singleRel when singleRel.getInput() instanceof HiveProject hiveProject -> hiveProject;


Would it make sense to descend into SingleRels recursively, until a HiveProject can be found? Maybe the view definition could have, e.g., a Sort(Filter(Project(...))?

Actually earlier I tried to produce a plan where the second node is not a Project but couldn't. But just to be safe I have updated the code to recursively check for the first Project through the SingleRels.

thomasrebele · 2025-10-09T09:48:52Z

ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java

-            throw new SemanticException("View " + subqAlias + " is corresponding to "
-                + relNode.toString() + ", rather than a HiveProject.");
+          HiveProject project = switch (Objects.requireNonNull(relNode)) {
+            case HiveProject hiveProject -> hiveProject;


Minor: Sonar warns about wrong indentation for the cases.

Yeah I noticed that but there's no good solution to it. In the latest commit it shouldn't complain as the code is:

private Optional<HiveProject> extractFirstProject(RelNode rel) { return switch (rel) { case HiveProject hiveProject -> Optional.of(hiveProject); case SingleRel sr -> extractFirstProject(sr.getInput()); case null, default -> Optional.empty(); }; }

However, I would have liked to see the cases indented to:

private Optional<HiveProject> extractFirstProject(RelNode rel) { return switch (rel) { case HiveProject hiveProject -> Optional.of(hiveProject); case SingleRel sr -> extractFirstProject(sr.getInput()); case null, default -> Optional.empty(); }; }

Actually in checkstyle/checkstyle.xml we have defined the indentation for cases to be 0:

<module name="Indentation"> <property name="basicOffset" value="2" /> <property name="caseIndent" value="0" /> <property name="throwsIndent" value="4" /> </module>

which is fine when you have the older switch-case statement, as it doesn't look too bad:

switch(...) { case a: ... case b: ... }

But when you have a switch expression, like the example shown at the beginning of this comment (i.e., with a return), it makes more sense to treat it as a basicOffset and not a caseIndent in my opinion. But we cannot simply change the value of caseIndent as there are older style switch statements in the repo.

Ideally I would like to have a value of 2 for case indents. Also by default the cases should be indented as seen here: https://checkstyle.sourceforge.io/checks/misc/indentation.html#caseIndent

Sorry for the long write-up, but we shouldn't see checkstyle warnings in the latest commit :)

Thanks for the long comment! I would support the motion to change caseIndent to 2. I guess that would be out-of-scope for this ticket. Until then I would prefer avoiding checkstyle warnings.

zabetak · 2025-10-10T08:07:44Z

ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java

-          } else {
-            throw new SemanticException("View " + subqAlias + " is corresponding to "
-                + relNode.toString() + ", rather than a HiveProject.");
+          HiveProject project = extractFirstProject(relNode)


Why do we need a project and why the first project is important?
How do we use the project afterwards?
Does the code remain correct if we use a project that is not at the top of the plan?
Do we generally support views with LIMIT, ORDER BY, etc?

We use the projects for authorization in HiveRelFieldTrimmer.trimFields here, and that's the only place where we use it. I believe that is why we need only the first project, that gives us the final list of fields.
I don't think we can use other projects in place of the top one, as the fields could be different.
I am not really sure if we officially support views with limits or order by, but right now it looks like we can support them.

The viewToProjectTableSchema has been introduced in HIVE-13095 to determine whether a user has the permission to access fields of a view. There's a comment in https://reviews.apache.org/r/43834/.

It could work if we expand the logic to all kinds of RelNodes. The code in trimFields(Project, ...) does not really use the project. The code could be applied to any kind of RelNode. The code would then work even if no project could be found (e.g., if the top level node is a join or union). The cost: adding a call to each of the trimFields methods.

Changed the approach so that it would work with any RelNode. I will look into the failed tests tomorrow, but let me know what your thoughts are regarding the new approach.

Thank you for the update, @soumyakanti3578! I think we can simplify it a bit, see my review comment.

thomasrebele · 2025-10-24T09:17:35Z

ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveRelFieldTrimmer.java

+      List<RexNode> projects = relNodeToTableAndProjects.get(rel).right;
+      List<FieldSchema> tableAllCols = table.getAllCols();
+      for (Ord<RexNode> ord : Ord.zip(projects)) {
+        if (fieldsUsed.get(ord.i)) {
+          columnAccessInfo.add(table.getCompleteName(), tableAllCols.get(ord.i).getName());
+        }
+      }


I don't think we need the projects. We could just iterate over the fields from the row type of the RelNode. That would also simplify the map, i.e., just a Map<RelNode, Table>, instead of a Map<RelNode, Pair<Table, List<RexNode>>>.

Thanks for the suggestion! I have simplified this in the latest version.

thomasrebele · 2025-10-24T09:19:39Z

ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveRelFieldTrimmer.java

      RelNode input,
      final ImmutableBitSet fieldsUsed,
      Set<RelDataTypeField> extraFields) {
+    setColumnAccessInfoForViews(rel, fieldsUsed);


I think we need to add a call for the root node somewhere as well.

Actually although we were in the trimChild method, I was using passing rel which is the parent and not input which is actually the child.

But I have had to change the code anyway because of test failures. I realized that we cannot call setColumnAccessInfoForViews in trimChild as it's too late because we may mutate fieldsUsed in trimFields before calling trimChild.

In the new version I have moved the call to setColumnAccessInfoForViews just before we call RelFieldTrimmer.trimFields which is the right place to call it in my opinion. Let me know if you have any thoughts about this :)

The new way to call setColumnAccessInfoForViews is better. It's the first method called in dispatchTrimFields, and dispatchTrimFields is called for all nodes including the root. My concern has been solved. Thanks for the improvement!

thomasrebele

Minor suggestion to make the code more efficient. Otherwise LGTM.

thomasrebele · 2025-10-29T13:32:36Z

ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveRelFieldTrimmer.java

+      rel.getRowType().getFieldList().stream()
+          .map(RelDataTypeField::getIndex)
+          .filter(fieldsUsed::get)
+          .forEach(i -> columnAccessInfo.add(table.getCompleteName(), tableAllCols.get(i).getName()));


I think the following is more efficient:

int idx = -1; while((idx = fieldsUsed.nextSetBit(idx+1)) >= 0) { columnAccessInfo.add(table.getCompleteName(), tableAllCols.get(idx).getName()) }

Agreed!

Replaced with

for (int i = fieldsUsed.nextSetBit(0); i >= 0; i = fieldsUsed.nextSetBit(i + 1)) { columnAccessInfo.add(table.getCompleteName(), tableAllCols.get(i).getName()); }

which is slightly cleaner and more readable in my opinion.

sonarqubecloud · 2025-10-29T21:39:05Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

thomasrebele

LGTM

asf-ci-hive added tests pending tests failed and removed tests pending tests failed labels Sep 25, 2025

soumyakanti3578 force-pushed the HIVE-28280 branch from 51b5bc8 to d399943 Compare October 8, 2025 18:46

asf-ci-hive added tests pending tests passed and removed tests failed tests pending labels Oct 8, 2025

thomasrebele reviewed Oct 9, 2025

View reviewed changes

asf-ci-hive added tests pending tests unstable tests passed and removed tests passed tests pending tests unstable labels Oct 9, 2025

zabetak reviewed Oct 10, 2025

View reviewed changes

asf-ci-hive added tests unstable and removed tests pending labels Oct 22, 2025

soumyakanti3578 added 5 commits October 23, 2025 14:55

HIVE-28280: SemanticException when querying VIEW with DISTINCT clause

91847c5

fix failing test

41f206c

address review comments

da11bd7

remove unused import

8fe519e

get projects for any node

bdf8ee6

thomasrebele reviewed Oct 24, 2025

View reviewed changes

simplify logic - use rowType to get fieldsList instead of using projects

4a1659c

soumyakanti3578 force-pushed the HIVE-28280 branch from f68ad6a to 4a1659c Compare October 27, 2025 19:50

asf-ci-hive added tests pending tests passed and removed tests unstable tests pending labels Oct 27, 2025

Addressing sonar issues

f7609ed

asf-ci-hive added tests pending tests passed and removed tests passed tests pending labels Oct 28, 2025

thomasrebele approved these changes Oct 29, 2025

View reviewed changes

Improve columnAccessInfo setting logic

623cc6d

asf-ci-hive added tests pending and removed tests passed labels Oct 29, 2025

asf-ci-hive added tests passed and removed tests pending labels Oct 29, 2025

soumyakanti3578 requested review from thomasrebele and zabetak October 30, 2025 00:01

thomasrebele approved these changes Oct 31, 2025

View reviewed changes

HIVE-28280: SemanticException when querying VIEW with DISTINCT clause #6103

Are you sure you want to change the base?

HIVE-28280: SemanticException when querying VIEW with DISTINCT clause #6103

Conversation

soumyakanti3578 commented Sep 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

thomasrebele Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

thomasrebele left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

soumyakanti3578 Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud bot commented Oct 29, 2025

Quality Gate passed

Uh oh!

thomasrebele left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

soumyakanti3578 commented Sep 25, 2025 •

edited

Loading

thomasrebele Oct 29, 2025 •

edited

Loading

soumyakanti3578 Oct 29, 2025 •

edited

Loading