-
Notifications
You must be signed in to change notification settings - Fork 90
feat: support DDL statements in SqlToSubstrait #432
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
daf7120
to
8f00f9e
Compare
isthmus/src/main/java/io/substrait/isthmus/expression/DdlRelBuilder.java
Outdated
Show resolved
Hide resolved
isthmus/src/main/java/io/substrait/isthmus/expression/DdlRelBuilder.java
Outdated
Show resolved
Hide resolved
isthmus/src/main/java/io/substrait/isthmus/expression/DdlRelBuilder.java
Outdated
Show resolved
Hide resolved
isthmus/src/test/java/io/substrait/isthmus/DdlRoundtripTest.java
Outdated
Show resolved
Hide resolved
8f00f9e
to
a586031
Compare
isthmus/src/test/java/io/substrait/isthmus/DdlRoundtripTest.java
Outdated
Show resolved
Hide resolved
a586031
to
32459c4
Compare
46bc270
to
108bb43
Compare
@vbarua can you also take a look at this. I want to ensure the further refactoring to isthmus is somewhat aligned with these changes. |
import org.apache.calcite.sql.util.SqlBasicVisitor; | ||
import org.apache.calcite.sql2rel.SqlToRelConverter; | ||
|
||
public class DdlRelBuilder extends SqlBasicVisitor<Plan.Root> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The one thing that might be at odds with some of the work I've been doing on the API is that this returns a Plan.Root
instead of a RelRoot
or RelNode
.
What I've been trying to do is modularize the API along two lines, roughly:
SQL -> Calcite RelRoot
Calcite RelRoot -> Isthmus Plan
The aim of this is to enable the workflows of users who are starting from SQL, and users who are starting directly with Calcite RelNodse, but share that majority of code for them. With the API modularized like this, SqlToSubstrait can effectively become the composition of these 2 conversion APIs. However, because what you're proposing here is effectively going from SQL -> Isthmus POJO
directly, it makes that a bit trickier.
Aside from that goal though, a better argument for changing this is that this approach only works when going from SQL to POJO. If we assume that we want to be able to read in Substrait plans that create tables, then we'll need to have a first class entity in Calcite to model this. The built-in LogicalTableModify doesn't work for this, but we can add our own TableCreate relation, along with a LogicalTableCreate. That can be used when going from Sql to Calcit, from Calcite to Isthmus, and also from Isthmus back to Calcite.
Does that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the overall approach makes sense. The question is how to design this properly on the calcite's side.
I see the following options:
-
Use separate class for DDL relations.
It means we can- add the DdlRel class
- implement required methods for visitors (not sure what's better here - introduce a new abstract visitor class for ddl or extend the RelVisitor)
- add the DdlPlanRoot class (with the semantic similar to the RelRoot)
Probably the DdlPlanRoot class should have the same base class with the RelRoot class for uniform handling in Sql->Calcite conversion.
-
Extend the RelNode class to support DDL. This simplifies the required changes, but I guess it diverges from Calcite conventions and add unnecessary complexity to RelNode
Please let me know your thoughts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 makes the most sense to me. I think CREATE is the only operation that we have in Substrait that's not supported as a Calcite relation. I would re-use Calcite's LogicalModify relation for INSERT, DELETE and UPDATE like we are already doing, and the create a relation specifically for CREATE.
implement required methods for visitors
We should be able to add a visit method for our new relation in the Isthmu sRelNodeVisitor utility. The Calcite RelShuttle also has a
RelNode visit(RelNode other);
method for cases like this if we need that.
Would we need a custom RelRoot class, or could we just use the existing RelRoot with the SqlKind set to CREATE_TABLE or OTHER_DDL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To further elucidate, this conversion would slot into here roughly: https://github.com/substrait-io/substrait-java/pull/474/files#diff-9977fe9108f68631bd554929d8e12242611c49b22534be2b701494ac133aa874R117-R140
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess making changes in calcite will take time, what should we do with this PR?
- Merge now as temporary solution
- Hold it off until those changes will be made approved and released
- Wait until feat(isthmus): introduce SubstraitSqlToCalcite and SubstraitStatementParser #474 will be merged and adjust these changes accordingly as a temporary solution
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess making changes in calcite will take time,
Ah, I was not suggesting we make the changes in Calcite. We should be able to define our own RelNode in Isthmus with something like:
package io.substrait.isthmus.calcite.relations;
import org.apache.calcite.rel.AbstractRelNode;
public abstract class TableCreate extends AbstractRelNode {
// ...
}
public class LogicalTableCreate extends TableCreate {
// ...
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I thought we need to extend a hierarchy that's a part of calcite. If we can do that in the isthmus code it will be much easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok yeah we can do something much simpler than extending Calcite. I was pointing out the
RelNode visit(RelNode other);
because that's the mechanism by which Calcite let's you visit nodes that you define outside of Calcite core.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems with this approach we still have to make DdlRel (Create) a child of RelNode, but I have a couple of concerns with it:
- RelNode represents a relational expression, that processes data (records), while DdlRel does something else (it modifies the database catalog with a new entry for table, view or mqt)
- RelNode has the
getRowType
method which represents the type of the rows returned by the relational expression, which is not applicable for DdlRel
So I'm not sure if it is correct to say that DdlRel is RelNode. In my opinion, it would be better if they have the same parent class. But this probably requires changes in calcite.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vbarua I introduced a new class CalciteOperation
to represent relational algebra and ddl operations. Please take a look when you get a chance. If the suggested approach makes sense I will update SubstraitToCalcite
accordingly.
9e9dce2
to
0847d7d
Compare
c3139c4
to
f8041dd
Compare
# Conflicts: # isthmus/src/main/java/io/substrait/isthmus/SqlToSubstrait.java # Conflicts: # isthmus/src/main/java/io/substrait/isthmus/SqlToSubstrait.java # Conflicts: # isthmus/src/main/java/io/substrait/isthmus/SqlToSubstrait.java # isthmus/src/test/java/io/substrait/isthmus/DdlRoundtripTest.java
d0e1cca
to
6c7280d
Compare
Makes sense. I've added a validator to SubstraitRelNodeConverter to ensure incoming relations have a supported configuration. If this approach is acceptable, we can move the validation logic for non-DDL and non-DML relations there as well (probably in a separate PR). |
6c7280d
to
e31d8d3
Compare
e31d8d3
to
08cd5c5
Compare
08cd5c5
to
0dc6eb3
Compare
@ZorinAnton could you avoid rebasing your PR every time you update it. It makes it harder to review because I can't just look at the new commits, and it messes with Github PR view a bit. The end result will be squash merged as well, so we don't need to keep the branch fresh with main. |
sure, will do, I just didn't want to have a separate commit per every singe change. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just didn't want to have a separate commit per every singe change.
Because it gets squashed at the end this won't matter. When I merge as well, I summarize the individual commits into a coherent message.
I've made some suggestions for this a separate PR targetting your PR
ZorinAnton#1
I've added a validator to SubstraitRelNodeConverter to ensure incoming relations have a supported configuration.
I don't think we need that level of machinary for this work. For one thing, in the long-term we shouldn't need those validations at all because we should be able to convert ALL valid Substrait messages to Calcite and back. In my suggestions, I've inlined very basic restrictions.
I'm not opposed to validations in general, but I don't want to make them part of this PR because if we start fiddling with a public API for validations it will extend how long this PR remains open.
isthmus/src/main/java/io/substrait/isthmus/calcite/rel/CreateTable.java
Outdated
Show resolved
Hide resolved
isthmus/src/main/java/io/substrait/isthmus/calcite/rel/CreateTable.java
Outdated
Show resolved
Hide resolved
isthmus/src/main/java/io/substrait/isthmus/calcite/rel/CreateView.java
Outdated
Show resolved
Hide resolved
8151c4f
to
be2e6b2
Compare
Thanks, merged it.
before it happens, a dedicated validator would give a quick view of what is supported and what's not, but I agree that it needs to be addressed in a separate PR. |
As discussed, splitting the PR #428 into dml and ddl parts. This PR is for ddl support.
For the ddl sql statements the following relations are created:
NamedWrite
for create table asNamedDdl
for create viewAll other types of sql statements are handled as before.
Since DDL statements have no representation in Calcite's relational algebra, these changes conflict with the proposed split of SqlToSubstrait into SqlToCalcite and CalciteToSubstrait: #362. This is a key architectural decision that requires careful design.